So one of the things that will make you want to bang your head on the table about Forth is that between FIG and F83, they changed the way loops will work. These days, if you create a loop such as 0 0 DO, it will go through the complete word size instead of just quitting. Formally, the loop is complete when the boundry between limit-1 and limit is crossed, or something to that effect. Seriously, why?
Whatever. Mike had pointed out that there is a clever way to code this using the Overflow Flag (V) on the 65m32 (see
http://forum.6502.org/viewtopic.php?f=9&t=2026). I've started to create versions for Tali Forth (which up till now used the FIG Forth loops) on the 65c02. In case anybody has the same problem, these are the code snippets for (DO), I, J, and (LOOP). I'll post (+LOOP) when I get around to writing it. Note these routines only have had minimal testing so far, so I'm not sure about the edge cases.
First,
(DO)Code:
l_pdo: bra a_pdo
.byte CO+NC+$04
.word l_i ; link to I
.word z_pdo
.byte "(DO)"
.scope
a_pdo: ; first step: create fudge factor (FUFA) by subtracting the limit
; from $8000, the number that will trip the overflow flag
sec
lda #$00
sbc 3,x ; LSB of limit
sta 3,x ; save FUFA for later use
lda #$80
sbc 4,x ; MSB of limit
sta 4,x ; save FUFA for later use
pha ; FUFA replaces limit on R stack
lda 3,x ; LSB of limit
pha
; second step: index is FUFA plus original index
clc
lda 1,x ; LSB of original index
adc 3,x ; add LSB of FUFA
sta 1,x
lda 2,x ; MSB of orginal index
adc 4,x ; add MSB of FUFA
pha
lda 1,x ; LSB of index
pha
; we've saved the FUFA on the NOS of the R stack, so we can
; use it later
inx
inx
inx
inx
z_pdo: rts
Then
I and J:
Code:
l_i: bra a_i
.byte NC+CO+$01
.word l_j ; link to J
.word z_i
.byte "I"
.scope
a_i: dex
dex
; get the fudged index off of the top of the stack. it's
; easier to do math on the stack directly than to pop and
; push stuff around
stx TMPX
tsx
sec
lda $0101,x ; LSB
sbc $0103,x
sta TMPCNT
lda $0102,x ; MSB
sbc $0104,x
ldx TMPX
sta 2,x ; MSB of de-fudged index
lda TMPCNT
sta 1,x ; LSB of de-fudged index
z_i: rts ; should be never reached, because NC
The code for
J is basically the same except that the addresses on the stack are each four bytes down. Finally,
(LOOP):
Code:
l_ploop: bra a_ploop
.byte NC+CO+$06
.word l_abs ; link to ABS
; .word l_pploop ; link to PPLOOP # TODO change
.word z_ploop
.byte "(LOOP)"
.scope
a_ploop: ; TOP of the Return Stack has the index. We manipulate the
; 65c02 stack in place.
stx TMPX
tsx
clc
lda $0101,x ; LSB
adc #$01
sta $0101,x
clv ; we check the V flag on MSB
lda $0102,x ; MSB
adc #$00 ; we only care about the carry
sta $0102,x
ldx TMPX
bvs _hack+3 ; skip over JMP instruction
_hack: ; This is why this routine must be natively compiled: We
; compile the opcode for JMP here without an address to
; go to, which is added by the next address by LOOP.
.byte $4C
z_ploop: rts ; never reached
The acutal DO and LOOP words are high-level:
Code:
: DO POSTPONE (DO) HERE ; IMMEDIATE COMPILE-ONLY
: LOOP POSTPONE (LOOP) , POSTPONE UNLOOP ; IMMEDIATE COMPILE-ONLY
If you want to know why 32 bits make life easier, compare how long this code is to what Mike wrote
.