An optimized version of the UM/MOD fix can be found here (along with the UM* fix):
Whoops, wrong link. Here is the correct link.
http://6502org.wikidot.com/errata-software-figforth
Though in this case, I'd use Y as the data stack pointer, and reference the data stack in direct-page using absolute addresses.
<snip>
I think it's only one cycle longer than DP,X mode, which shouldn't be much of an issue, I'd think.
Only writes are one cycle longer (5 vs. 4). For non-writing instructions (assuming no page boundary crossings) abs,Y and zp,X are both 4 cycles. However, unlike zp,X, abs,Y is not available with INC, DEC, STZ, and shifts, so words like 1+ and 2* (CELLS) would be several more cycles.
I would (blindly) guess that most words change only 1 or 2 cells, so that would be 2 or 4 cycles (one per byte per cell), for the STAs.
The only difficulty would come from when you need to invoke @ and !, I think, as here you'd need to tweak the Y register and a reserved direct-page location to serve as your pointer.
Well, for C@ (and C!) you could use self-modifying code, if the data stack were still on the zero page:
Code: Select all
; C@
STY :1+1
:1 LDA (0)
STA 0,Y
LDA #0
STA 1,Y
JMP NEXT
but it might be better to split the data stack into high byte stack and a low byte stack, since only C@ and C! (and maybe COUNT if it were implemented as primitive) would seem to have any benefit from keeping the high and low bytes together. That way DROP would only be 1 INY rather than 2 INXs, so you'd get a couple of cycles back that you lost elsewhere from the STA abs,Y vs. STA zp,X. Likewise for pushing onto the stack with DUP and co. The split would seem to benefit a lot more words.
However, abs,Y is of course 1 byte longer than zp,X, and there are 2 of those for every 1 byte savings from 1 INY vs. 2 INXs. Plus there's the additional space (and as you mentioned, time) of copying to the zero page so that it can be used as a pointer. So primitives will take up more space, for what may be on average only a slight speed improvement at best. Also, in the 17 cycle NEXT, Y can be overwritten, without penalty (see below), whereas in the 12 cycle NEXT both index registers are used and must be preserved. I seem to have underestimated the 12-cycle NEXT, but I'm still not sure I see a decisive advantage there.
Anyway, shifting gears back to the 17 cycle NEXT, is there a benefit to requiring primitives to preserve Y to save the 2 cycle LDY #1? LDA #imm takes the same 2 cycles as TYA. In other words, why not this:
Code: Select all
NEXT LDA #2
NEXT1 CLC
ADC :1+1
STA :1+1
BCS :2
:1 JMP ($FFFF)
:2 INC :1+2 ; this doesn't affect the carry so the BCS always branches
BCS :1 ; or JMP to ensure it always takes 3 cycles
You could then JMP to NEXT1 (after loading the accumulator appropriately, of course) for words that need to advance more than two bytes, e.g.: