I've been asked for a clarification of this topic. (The topic begins
here.) But right now I want to backtrack and lay out the basics of how
dclxvi's 65816 six-cycle NEXT operates -- as I probably should have done in the first place!
Then in an additional post (below) I'll touch on my hardware mod and the problem it solves.
Attachment:
RTS threading 0 rev1.gif [ 10.27 KiB | Viewed 3523 times ]
In the lower part of this diagram there are three routines which I've named ROUTINE1, ROUTINE2, ROUTINE3. Of course they don't actually
do anything because they're mostly NOPs. But we will see how they can be invoked in sequence by
CALLER (at the top of the diagram). We want
CALLER to execute ROUTINE1, ROUTINE2, ROUTINE3 (etc) almost as if we had explicitly coded
JSR ROUTINE1,
JSR ROUTINE2 and so on.
But
CALLER has no JSR opcodes; it is just
a list of addresses -- a form of Direct Threaded Code (DTC). For ease of discussion I've located ROUTINE1 at address 1111, ROUTINE2 at 2222 and so on. If we assume the 65816 Stack Pointer has been set to
3FFF and an RTS has been executed then we'll see the process set in motion. (Some of you will immediately grasp this, but here are the exact details.)
The RTS pops the 16-bit value at ( SP+1 ) and increments SP by 2. The popped value goes to the PC, which is subsequently incremented by 1. In this case the 16-bit value at 4000 is 1110, so the PC ends up at 1111 -- the address of ROUTINE1. SP is left at 4001. Machine code is fetched and executed starting at 1111. When the work (represented by NOPs) is done the routine exits with an RTS.
Again RTS pops the 16-bit value at (SP+1) and increments SP by 2. The popped value goes to the PC, which is subsequently incremented by 1. In this case the 16-bit value at 4002 is 2221, so the PC ends up at 2222 -- the address of ROUTINE2. SP is left at 4003. Machine code is fetched and executed starting at 2222. When the work is done the routine exits with an RTS.
Again RTS pops (SP+1) and the pattern continues. SP is left at 4005 and machine code is fetched and executed starting at 3333. When the work is done ROUTINE3 exits with an RTS.
This demonstrates how the S register and RTS can be used as an Interpreter to "execute" a list of addresses. The advantage is that no JSR instructions are used -- only RTS. There is a substantial benefit in that the cost for each routine called is 2 bytes (not 3), and 6 cycles -- not
12 !
The S register in this setting acts as the "Interpretive Pointer," ie; the FORTH "program counter," and RTS is what vectors us to the NEXT routine to execute. FORTH implementations vary widely.
dclxvi's use of SP and RTS is radically unusual.