ElEctric_EyE wrote:
Arlet briefly mentioned
here about modifying his core to reduce the cycle time of the 2 cycles opcodes down to 1 cycle. These are opcodes like INX, TXA, PLA, etc. I didn't want to disturb that thread, so I ask him here: How would one go about this?
PLA is a different beast, because it requires a memory read. A first step would be to consider all instructions that don't need memory access, so no operands, and no stack usage. The shift/rotate accumulator instructions can be left out in the first step. That leaves: the clear/set flag instructions, register inc/dec, and register-register transfers.
Reducing these to 1 cycle requires: updating the instruction decode so that these instructions go to DECODE state instead of REG state. So, instead of DECODE->REG->DECODE->REG, it will be DECODE->DECODE. In the current situation, when you do INX for instance, the ALU is fed with 'X' and '0', and CI=1. This is done in the REG cycle. The ALU takes one cycle to do the addition, so in the next (DECODE) cycle, ALU outputs X + 1, and this is stored back into the register file.
If you want to save a cycle (without adding a pipeline stage), there should be a direct path from register file -> inc/dec -> register file, without the 1 cycle ALU delay. One option could be to add a simple incrementer, only capable of doing -1, +0, +1. The output from that incrementer should be written to the register file instead of the output from the ALU. Without the BCD logic, we now have:
Code:
AXYS[regsel] <= (state == JSR0) ? DIMUX : ADD;
Where 'ADD' is the ALU output. This should be changed to something like this:
Code:
AXYS[regsel] <= (state == JSR0) ? DIMUX :
bypass ? INCR : ADD;
Where INCR is the output of the new incrementer, and 'bypass' is a register that is set to '1' for the new 1-cycle opcodes.
Of course, you also need to update the flags.