Help! I've implemented and used many Forth-like STC systems over decades now, so am very rusty on classic Forth. I kind of miss the idea of a thin inner interpreter and all the niceties that go with it (such as simplicity and ability to decompile threaded code easily, etc.) I am trying to wrap my noodle around how a DTC inner may be implemented.
Assuming Y is DSP and X is IP; I would like to keep TOS in A. That doesn't leave too many registers, so NEXT can be INX INX JMP (0,X) 5/10. Primitives can inline or jmp to NEXT at the end. So far so good.
EXIT is basically PLX NEXT 6/15.
High level words work fine if I inline the entry code as a thought experiment:
Code:
PHX ;1/4 save IP
LDX $HERE ;3/3 set new IP
JMP (0,X) ;3/6
HERE:
dw DUP
..
dw EXIT
That seems likely to work, but that's 7 bytes and 13 cycles. I am not too crazy about hardcoding a literal address, but getting the processor PC any other way is even worse, and we have no registers to keep W (and probably wouldn't want to as it's bigger and longer). Not that I would do this for real, of course, because DOCOL.
Now, DOCOL will need to swap the (return address-1) at the top of the stack with X (IP). Since I have no spare registers, this turns ugly enough for me to give up on the whole idea.
The best I've come up with is to start all high-level words with PHX JSR DOCOL (4/10), and have docol pull the address-1 into X (and increment it). That leaves IP on the return stack, and avoids ugly swapping.
Notes:
* I would have preferred to use X for DSP, but there is no JMP (0,Y).
* I thought about DP as DSP, but incrementing and decrementing it seems too painful.
* If the worst comes to worst, I could give up on A, especially if the win in DOCOL is big enough to compensate for slightly bigger/slower datastack manipulation.
* Overhead - NEXT:5/10 high-level entry 4/10, EXIT 6/15 (including a NEXT). Not too bad, minimum of 20 cycles * vs. 12 for JSR/RTS.
Any suggestions? Am I missing something obvious? Thanks in advance.