And for RISC: I do not understand how you could read and write on the same bus in one cycle. We do not have enough die area for two busses. At most there could be two sets of registers each which their own bus. For example Akkumulator latches the output from ALU. Everything else is on bus. But then how could AX be used as source and target in every 8bit CPU? There need to be two and a bridge to toggle between them. Two busses do not seem that more complicated. This reduces code space.
I tried to make a small core with 6502-like instructions (Nano6502) that do all instructions in one cycle, except the first one.. I hope to extend it over time to be more 6502 or 65816-like, but it is prioritized on speed, so its a slow development.
Basically, I prefetch the data byte and have separate data and code space. Memory is also dual-port so it can be read and written to within the same cycle. The prefetch does happen a half cycle in advance, so its stricktly pipelining that, but its only visible on the first instruction after reset.
The nice thing about single cycle instructions is that zeropage becomes more like 256 registers...
Oh, and all data is fetched as 16 bits, so only two-byte instructions (at the moment). Hope to change that.