Arlet wrote:
Windfall wrote:
Yes. But it replaces the cycle that would load only the opcode instead. So there is no extra cost. Just the gain of no longer needing to load any of the argument bytes in following cycles
In the current design, the address (PC) is presented to the memory in cycle #0, the memory fetches the opcode, and the opcode is ready for decoding in cycle #1. The opcode isn't stored in flip-flops in the core, instead, the decoding logic looks straight at the data bus.
Okay, I see. You've shifted the paths a little by preprocessing the opcode while it's not even registered yet.
Arlet wrote:
With your proposal, the address is presented to the memory in cycle #0, the memory fetches a 32 bit word, which will be ready in cycle #1. The difference is that we can't feed the data straight into the decoder, because we first need to select the correct byte out of 8 possibilities.
No, 5. And we can make it 4, because we can arrange the opcode to always come out of PW. There is just one corner case to fix. If IR consumes the last byte of PW exactly (in the code example I posted earlier this happens when the fourth instruction is constructed), we replace PW and PWA anyway, even though the word read has not contributed.
See what happened here ? The critical path may have become
shorter here, since this replaces the memory access path with a combinatorial path involving only PW and the two least significant bits of the PC.