BigEd wrote:
I'm sure visual6502 is right. As a quick check, we know that
RTS takes 6 cycles, and that's what happens here.
So, we might ask what happens in each cycle?
Here's a quick simulation showing a bit more detail:
Cycle 1: fetch RTS - the CPU can't do anything else as it hasn't seen the instruction
Cycle 2: fetch possible operand, as it always does, and must, because it hasn't had time to decode the instruction. But it does send SP to the ALU, and also to the address bus
Cycle 3: SP to address bus, read the empty byte below the stack, not useful but it has to do something. ALU performs the increment for the next cycle
Cycle 4: SP+1 to address bus, reading PC-1 low byte, will be directed to ALU. ALU performs another increment
Cycle 5: SP+2 to address bus from the ALU, also updates the value of SP. ALU performs a NOP with PC-1 low byte, PC-1 high byte is read
Cycle 6: PC is updated with PC-1, new PC sent to address bus, byte before destination is read because something has to happen, PC will be incremented as usual
Cycle 1: fetch next opcode as per usual
Looking back through history to the 6800, we see it's one cycle quicker on RTS because the PC value is read from the stack already incremented. But the
JSR takes 8 or even 9 cycles, so overall the 6502 is an improvement.
Well, there are several instructions that takes more cycles than the number of separate address bus accesses, so 6 cycles didn't mean that it had to do all this. INY for example takes two cycles, but it only reads the opcode, nothing else happens on the address bus.
As you say, the ALU gets SP already in cycle 2. It is somewhat strange that they did not choose to increase the SP already there. My guess is that they reuse the hardware, and as you said, there is no separate register storage, so it was probably done that way for improved density.
In cycle 3, the SP gets to the address bus. In effect it is using the address bus as a register to circumvent the need for a separate register. Therefore it takes one extra cycle to add 1 to the address bus.
For cycle 6, the stored PC-1 is used as new PC. I don't understand why they need to do this, because apparently the opcode at PC-1 is read (it appears on the databus). And why would JSR be longer if PC was stored instead of PC-1?