Using a 16-bit bus wouldn't actually make the processor any faster without making it bigger. But, a plain-jane 16-bit implementation would also be quite straightforward if one wants merely a very small processor, not a 65c02-sized one. That would introduce the question of misaligned accesses, but that's not that big of a deal; you can just make the processor trap and implement it as a trap handler.
I left this out for brevity, but the ISA also uses a corner of the instruction space for interrupt handling control status registers ( and instructions. I didn't actually finish this, so I have much less confidence that it works/fits as is.
The interrupt instructions are:
- BRK (hardware trap)
- RETI (return from interrupt)
- CSRR (read CSR to reg)
- CSRW (write reg to CSR)
- SEI (manipulate CSRs to enable/disable interrupts)
- STP (stop!)
- WAI (wait for interrupt)
The CSRs are:
- IE: whether interrupts are enabled
- PIE: whether interrupts were enabled before the current interrupt.
- EPC: Interrupt return PC
- CAUSE: Whether BRK or interrupt. (NMI has a different vector)