In the lower part of this diagram there are three routines which I've named ROUTINE1, ROUTINE2, ROUTINE3. Of course they don't actually do anything because they're mostly NOPs. But we will see how they can be invoked in sequence by CALLER (at the top of the diagram). We want CALLER to execute ROUTINE1, ROUTINE2, ROUTINE3 (etc) almost as if we had explicitly coded JSR ROUTINE1, JSR ROUTINE2 and so on.
But CALLER has no JSR opcodes; it is just a list of addresses -- a form of Direct Threaded Code (DTC). For ease of discussion I've located ROUTINE1 at address 1111, ROUTINE2 at 2222 and so on. If we assume the 65816 Stack Pointer has been set to 3FFF and an RTS has been executed then we'll see the process set in motion. (Some of you will immediately grasp this, but here are the exact details.)
- The RTS pops the 16-bit value at ( SP+1 ) and increments SP by 2. The popped value goes to the PC, which is subsequently incremented by 1. In this case the 16-bit value at 4000 is 1110, so the PC ends up at 1111 -- the address of ROUTINE1. SP is left at 4001. Machine code is fetched and executed starting at 1111. When the work (represented by NOPs) is done the routine exits with an RTS.
Again RTS pops the 16-bit value at (SP+1) and increments SP by 2. The popped value goes to the PC, which is subsequently incremented by 1. In this case the 16-bit value at 4002 is 2221, so the PC ends up at 2222 -- the address of ROUTINE2. SP is left at 4003. Machine code is fetched and executed starting at 2222. When the work is done the routine exits with an RTS.
Again RTS pops (SP+1) and the pattern continues. SP is left at 4005 and machine code is fetched and executed starting at 3333. When the work is done ROUTINE3 exits with an RTS.
The S register in this setting acts as the "Interpretive Pointer," ie; the FORTH "program counter," and RTS is what vectors us to the NEXT routine to execute. FORTH implementations vary widely. dclxvi's use of SP and RTS is radically unusual.