Hi Sheep64!
Well, point per point:
Quote:
I am alarmed when people work at a blistering pace. I think that I'm either making hard for myself, the other person has missed a fundamental problem or they have vastly more talent than me. Given your apparent competence with instruction pipelines and micro-operations, I suspect that you have a particular aptitude for processor design which I lack.
I am a software engineer, but it's been years that i became interested in cpu's dev. I don't think i am missing a fundamental problem, at most i would have to waste a lot of time to include "JSR memory". I didn't complete testing and i am completely missing interrupts. In fact i need to assemble the whole core (fetch_unit.v, decode_unit.v, ucore.v, lsu.v).
Quote:
You are correct to discard 16 register design. Structured programs are 4-colorable graphs and, from the view of execution units, so is an unstructured program. A 4-colorable, 3-address system requires no more than six perfectly orthogonal registers and deliberate asymmetry may benefit instruction density. Therefore, you are strongly advised to consider an eight register design rather than the ad hoc addition of RegB and RegZ which has historical accuracy in 6516 and 65CE02 and remains useful in contemporary bytecode. Admittedly, my proposed extension to 65CE02 has an alternative register set and this is arguably an architecture extension with RegB and RegZ. However, invocation occurs in a common prefix with operand size such that all 65CE02 opcodes facilitate an otherwise trivial and pure RISC extension.
Historical accuracy help to balance my decision to re-encode the whole instruction set. If i change too much people will come and say "this is not related to 6502". Registers B and Z and the inclusion of SF and PC in the addressing space make my design 8 registers.
Quote:
I recently considered micro-ops for JIT flag handling within 8080 on 6502 simulation and 6502 on AVR simulation. The general consensus was that it was not a fun hobby project, too much work for too little gain or unworkable within 16 bit address-space. This may be true for software but you demonstrate that micro-ops are fun in FPGA.
Flags are hard when not implemented directly in hardware, and when are individually implemented is even worse with hardware too. It is the reason for the inclusion of the s flag in the opcode (s = 0 mean don't save flags) and for the decision to have instruction with s = 1 write the whole register, instead of individually.
Quote:
I am concerned that you are upscaling 6502 instructions to 20 bit micro-ops. barrym95838's 65m02 is a strict, regular superset of 6502 and only requires 15 bit or so opcodes. Across three micro-ops, it should be possible to obtain 6-12 bit representation. Regular instructions are grouped into eight operations across eight addressing modes. Therefore, if you split such instructions into micro-ops, I would expect a reduction of cases. If you primarily read, modify or write with one micro-op, I presume the encodings are skewed towards ALU function or addressing mode. In such case, it my be preferable to hold the micro-op in a multiplexed representation for the appropriate stage of the pipeline. Hopefully, this compacts FPGA layout and raises the maximum clock speed.
I cannot shrink them:
3 bits for source A
3 bits for source B
1 bit reserved
1 bit bypass using k16
4 bits dest ( 8 registers + WriteMAR_Width8, WriteMAR_Width16, Discard x 6),
4 bits alu function (ADD/ADC, SUB/SBC, INC, DEP, LSR/ROR, ASL/ROL, AND, ORA, EOR, LDA, EXT, BSW, NOP, NOP, NOP, NOP)
1 bit save result
1 bit carry mask
1 bit save flags
1 bit load
1 bit write
I will respond to the remaining considerations tomorrow.
EDIT: grammar