Here was my question: was the 6502 a "historically optimal" design? A lot of people think that it's impossible to have done better at the same size, regardless of what we've learned about how to design processors.
I believe in the idea of progress, so I wanted to question this notion. So, I took to designing as modern of a RISC-V-alike CPU that could be made to fit in the footprint of a 6502. I'm calling it the "RISCY-V02" project. (Pronounced "risky five oh two".)
Rules of the game:
- Roughly the same bus as a 6502
- Roughly the same transistor count as a CMOS 65C02 (I'm not going to try to design a NMOS chip, and noone can manufacture one anyway.)
- No extreme regressions on any of code density, interrupt latency, or performance. Making different tradeoffs is fine.
- A very simple target for a compiler, but it should still be possible to write by hand, like modern ARM Thumb2 code.
I spent a lot of time designing something, and mostly completed something really cool. I got bored in actually finishing thing out, testing, bug fixing, etc. The basic trick is the RISC one: Replace things like BCD, microcode, decode PLAs, etc., with things like register files, pipelines, and barrel shifters.
My question is this: If I can actually finish the physical design, will I have succeeded in my goals? It seems to me that what I came up with outclasses the 6502 considerably, but it may just be a matter of taste on my part. It would help a lot to know how 6502 fans feel about the usability or lack thereof. I'll follow with a description of the instruction set for y'all to poke holes in.
One last thing: The *way* you program in a modern RISC is dramatically different than how you program a CISC like the 6502. I may have to point out subtleties in how the ISA works. The point wasn't to make the most 6502-like processor; for that you want a 6502
Here's what I came up with:
- LOL it's a 16-bit CPU.
- 8x 16-bit general purpose registers (including a 16-bit stack pointer)
- All instructions are 16 bits of code.
- Most instructions are two cycles. All arithmetic is 16-bit.
- Loads and stores take 3 cycles for bytes and 4 cycles for 16-bit words.
- 2 cycle variable bit shifts
- Easy/modern position independent code
- Backward branches are predicted taken; forward branches not taken. Predicted branches are 2 cycles; mispredicted, 4.
The instructions come in broadly 3 encodings:
Code: Select all
- 9-bit immediate:
- LUI reg, imm9: Load Upper Immediate: Load the immediate into the high 9 bits of reg. Used to materialize 16-bit constants.
- AUIPC reg, imm9: Add Upper Immediate to PC: Add the immediate to the high 9 bits of the PC, store the results in reg. Used in position independent code and far jumps/calls w/ JALR.
- JAL reg, imm9: Jump and Link: Jump to PC + imm, place previous PC in reg
- J reg, imm9: Jump: Jump to PC + imm
- BZ/BNZ reg, imm9: Compare reg to zero or non-zero; branch to PC + imm
- 7-bit immediate.
- Loads and stores
- Src/Dst is any register, base is any of the first 4 registers.
- LB/LBU/LW dst, base, imm: Load signed byte/unsigned byte/word from base + imm. Signed bytes are sign extended; unsigned bytes are zero extended.
- SB/SW src, base, imm: Store byte/word to base + imm.
- Arithmetic
- These modify a register in place:
- LI (Load Immediate), ADDI, ANDI, ORI, XORI, SRI (Shift Right), (SLI) Shift Left
- These place the result in register x3, but take an argument register
- SLT/SLTU: Set Less Than Signed/Unsigned. (Sets a register to 1 if less then the sign-extended immediate, otherwise sets it to zero.)
- JALR: Jump and Link Register. Not technically arithmetic. Adds the sign-extended immediate to the given register, jumps to the result, and stores the current PC in x3.
- JR: Adds the sign-extended immediate to the given register and jumps to the result.
- XORIA: Alternative XOR for equality testing
- Reg Reg Reg instructions
- These are all arithmetic; they take 3 registers as arguments. They perform arithmetic on two and place the result in a third. The arguments and destination can freely overlap; all three could be the same register, for example.
- ADD, SUB, AND, OR, XOR, SLL (Shift Left Logical), SRL (Shift Right Logical), SRA (Shift Right Arithmetic), SLT (Set Less Than), SLTU (Set Less Than Unsigned)
- Note that the shifts take the amount in a register, and still complete in 2 cycles.