BruceRMcF wrote:
... but it is also true that a 65816 version would be constantly swapping back and forth between eight bit and sixteen bit data mode (though the indexes could stay in 16bit mode). ...
For a version located in RAM/ROM above $8000 and with a maximum of 64 primitives (so no need to shift), i think it might be something like:
Code:
EXIT: PLY
NEXT: INY
NEXT0: SEP #$20
LDA $0000,Y
BMI ENTER
STA OPCODE
REP #$20
JMP (OPCODE)
ENTER: XBA
INY
LDA $0000,Y
PHY
REP #$20
TAY
BRA NEXT0
With no code field and all primitives one byte, it would tend to be very compact. And it has the advantage of all bit-threaded coded that you do ENTER directly, INSTEAD of the indirect call (in ITC) or direct call (in DTC) ... where in most models you do the call and THAT does the ENTER operation.
But it loses part of the advantage of the 65816 in doing it's opcode fetching a byte at a time, and it add 4 clock cycles overhead in setting then resetting the 8bit/16bit mode flag.
It does rely on an X-indexed data stack, so it gets to use the "LDA (ST,X) : STA ST,X" for @ that the 65816 gives you with an direct page, X-indexed data stack. And you can put ST at $F8 for direct access to the top four cells of the stack relative to X ... ST,X; ST+2,X; ST+4,X and ST+6,X ... and with Forth's efficient use of the return stack, you can leave the initial direct page at page 0 and the stack page at page 1, for convenience, while reserving 248 bytes of the direct page for other uses.