RISCY-V02

BigEd · Post by **BigEd** » Wed Mar 04, 2026 8:26 am

Note that NMI here is special, not like a 6502. In particular, you can't necessarily return from an NMI. It's meant for watchdog resets, power brownouts, and the like.

Edit: I see this as a valid and interesting design choice.

mysterymath · Post by **mysterymath** » Thu Mar 05, 2026 5:45 am

BigEd wrote:

Note that NMI here is special, not like a 6502. In particular, you can't necessarily return from an NMI. It's meant for watchdog resets, power brownouts, and the like.

Edit: I see this as a valid and interesting design choice.

Yeah, this was something that I ran into when examining how RISC-V does interrupts. NMIs are explicitly non-recoverable. I personally consider this to be, well, basically just *correct*; since disabling interrupts then provides true critical (uninterruptible) sections. Unrecoverable NMIs allow as much NMI-ey-ness as possible without breaking that, since if they interrupt a critical section, it doesn't matter for the most part, since the main program is basically toast. In some of my 6502 code, I had to do this awful hardware dance to check if I was interrupted by a NMI, then retry the critical section in a loop. Truly gross. Or worse, an *additional* hardware mechanism to mask "non maskable interrupts". Madness!

But, this also made it quite a bit faster to enter or exit un-interruptible interrupt handlers! Like RISC-V, entering a interrupt handler (IRQ or NMI) copies PC (program counter) and ST (status) to EPC and EST (shadow regs in the CPU). NMI does the same thing, so it can clobber them, but it doesn't matter, while still giving a sense of where the NMI happened (say, on an illegal MMIO write). But, since interrupts are disabled upon entry, if you never need to reenable interrupts, you can just RETI to restore PC and ST without ever touching memory! If you do want to reenable interrupts, you can pull out EPC and EST into registers in a couple instructions. Since interrupts preserve all registers, that's all that's needed! (You can also stack them to free the registers; that's 9 extra cycles from the below for each of save, restore)

Code: Select all

  SRR   r1           ; 2 cycles
  EPCR  r2           ; 2 cycles
  CLI                ; 2 cycles — nested interrupts live from here
  ; ... body ...
  SEI                ; 2 cycles
  EPCW  r2           ; 2 cycles
  SRW   r1           ; 2 cycles
  RETI               ; 3 cycles

jgharston wrote:

ADD Add rd = rs1 + rs2
ADDI Add immediate rd += sext(imm8)

I don't like addressing modes being specified in the instruction, I always find it painful trying to read MASM code. It's much clearer specified in the operands.

Yeah, I waffled on the spelling of this. But they're not.. uh, "combinatorially complete"
ADD r1, r2, r3 is totally valid, as is ADD r1, r1, 5, but ADD r1, r2, 5 isn't. You could spell it ADD r1, 5, but this doesn't make it very explicit that this is mutating r1. ADDI r1, 5 makes this clearer.
I'd love to hear more about the prior art, the names for the instructions are waaaay more mutable than silicon.

gfoot · Post by **gfoot** » Tue Mar 24, 2026 11:25 pm

mysterymath wrote:

I'm still at it! I hadn't done much work on it in a while; dealing with verilog is *very* tedious, and I couldn't get the tests working. Thankfully, Claude is rather good at verilog, which has allowed me to try a lot of ideas very quickly, despite not having very much time for the project. I'm actually targetting TinyTapeout now, and I've been able to more-or-less fit my design in roughly the area that a 6502 verilog core would take when synthesized with the same tools for tiny tapeout. (With some adjustments for not being able to actually use a register SRAM with this tooling.) I'm still trying to get it down in size for vanity's sake. But it passes the tests! I do want to write a CPU fuzzer though. The deadline is fast approaching though!

It's great to see the progress on this! What is the deadline?

I haven't had much time for this sort of thing recently but look forward to checking it out in more detail soon, especially the instruction set you settled on. Your posts last year inspired me to pivot one of my (very amateurish) hardware designs to use a RISCV-like instruction set, our goals were similar in many ways and it was a nice chance to learn and experiment with RISCV. If you're interested, there's a web-based assembler/simulator interface, and documentation with code alongside it in github.

I got as far as designing the instruction set, control signals and microcode, implementing a command line assembler and simulator, writing a lot of library code and test programs, and about halfway through implementing it in VHDL (also a learning experience) before running out of steam.

I stuck more rigidly to RISCV's principles, for better or worse - this had a fair amount of drawbacks at this scale and you've probably made the right decision diverging from it more, e.g. it looks like you have a form of condition flag, and more asymmetric instructions. I particularly felt that RISCV is lacking some obvious addressing modes that would make life a lot easier in a world with so few registers, like loading or storing at an address formed by adding two registers together. But for me, not knowing RISCV at all before this, it was important to stick to it as closely as I could and not just write anything off before trying it out.

mysterymath · Post by **mysterymath** » Fri Mar 27, 2026 10:26 pm

gfoot wrote:

It's great to see the progress on this! What is the deadline?

For the shuttle I ended up on, may 11th: https://app.tinytapeout.com/shuttles/ttsky26a
Here's a link to me! https://app.tinytapeout.com/projects/3829

gfoot wrote:

I stuck more rigidly to RISCV's principles, for better or worse - this had a fair amount of drawbacks at this scale and you've probably made the right decision diverging from it more, e.g. it looks like you have a form of condition flag, and more asymmetric instructions. I particularly felt that RISCV is lacking some obvious addressing modes that would make life a lot easier in a world with so few registers, like loading or storing at an address formed by adding two registers together. But for me, not knowing RISCV at all before this, it was important to stick to it as closely as I could and not just write anything off before trying it out.

I skimmed through the ISA doc, and this actually looks much like an older version of RISCY-V02. I'm very surprised you were able to fit so many 3-operand instructions. Then again, I was completely unwilling to part with the property that RISC-V has that it's possible to use every 32-bit (in our case, 16-bit) absolute *and* relative address using only on AUIPC/LUI and one regular immediate instruction. That meant that the upper immediate and lower immediate needed to sum to at least 16 bits, which severely constrained the design. Still, it's interesting that this isn't actually a *necessary* property; there's a lot of possible tradeoffs in this space!

gfoot · Post by **gfoot** » Sat Mar 28, 2026 1:02 am

mysterymath wrote:

For the shuttle I ended up on, may 11th: https://app.tinytapeout.com/shuttles/ttsky26a
Here's a link to me! https://app.tinytapeout.com/projects/3829

Wow, what a neat concept TinyTapeout is! Good luck with it

Quote:

I skimmed through the ISA doc, and this actually looks much like an older version of RISCY-V02. I'm very surprised you were able to fit so many 3-operand instructions.

Do you mean the three-register instructions, with two register operands and one destination register? They actually weren't too bad - instructions with immediates were much more painful in general, especially loads and stores. You can see the encoding definition here, at the top of the file - one of the columns is the number of 65536ths of the instruction space that each instruction occupies: https://raw.githubusercontent.com/gfoot ... /encode.py

One trick with the reg-reg-reg instructions is to constrain the operand order in commutative instructions, in order to pack two different instructions into a single opcode. e.g. AND rd, rs1, rs2 has the same opcode as ADD rd, rs1, rs2, but the distinction between them is that for AND it is required that rs1 is greater than rs2. The assembler swaps the operands if necessary to ensure this is the case. In some cases I also had special cases for where rs1 and rs2 are the same register, to implement unary operators within the same encoding space.

Quote:

Then again, I was completely unwilling to part with the property that RISC-V has that it's possible to use every 32-bit (in our case, 16-bit) absolute *and* relative address using only on AUIPC/LUI and one regular immediate instruction. That meant that the upper immediate and lower immediate needed to sum to at least 16 bits, which severely constrained the design. Still, it's interesting that this isn't actually a *necessary* property; there's a lot of possible tradeoffs in this space!

Yes I decided to relax that in my case, as absolute address operations and long jumps seemed relatively rare. I aligned my I/O space (so far - it's only two bytes...) such that it is addressable with just LUI and LBU/SB, and some OS variables are also stored close to zero to slightly improve interrupt response. My J/JAL instructions have quite a generous range, since they don't need to specify a register, but for longer jumps it may require three instructions (e.g. AUIPC, ADDI, JR). There are lots of compromises and I don't think it's clear whether they're good or not until you've written quite a lot of code!

I can see more ways that diverging from RISC-V would reduce these problem cases, but I'm really interested in exactly where you ended up with your ISA so I'll try to check that out in more detail this weekend.

RISCY-V02

Re: RISCY-V02

Re: RISCY-V02

Re: RISCY-V02

Re: RISCY-V02

Re: RISCY-V02