Dr Jefyll wrote:
... AFAIK the only downside to having the P-stack pointer in S is that X's slow push/pop performance ends up hindering the R-stack instead. But the R-stack carries less activity than the P-stack, particularly on inner loops where performance gains are more meaningful. So, IMO psp in S is the best choice for anyone writing a new Forth specific to the '816 (or M65C02A).
Once again, I say let the code do the talking:
Code:
; m65c02a, x=PSP ; m65c02a, s=PSP ; 65m32, x=PSP, y=IP, a=TOS
ENTER: ENTER: ENTER:
lda IP+1 dex pdy #0,u
pha dex jmp NEXT
lda IP lda IP+1
pha sta 1,x
lda W+1 lda IP
sta IP+1 sta 0,x
lda W lda W+1
sta IP sta IP+1
jmp NEXT lda W
sta IP
jmp NEXT
EXIT: EXIT: EXIT:
PRIMITIVE PRIMITIVE PRIMITIVE
pla lda 0,x ply
sta IP sta IP jmp NEXT
pla lda 1,x
sta IP+1 sta IP+1
jmp NEXT inx
inx
jmp NEXT
drop: ; ( x -- ) drop; ; ( x -- ) drop: ; ( x -- )
PRIMITIVE PRIMITIVE PRIMITIVE
inx pla lda 0,x+
inx pla jmp NEXT
jmp NEXT jmp NEXT
nip: ; ( x1 x2 -- x2 ) nip: ; ( x1 x2 -- x2 ) nip: ; ( x1 x2 -- x2 )
PRIMITIVE PRIMITIVE PRIMITIVE
lda 0,x lda 1,s inx
sta 2,x sta 3,s jmp NEXT
lda 1,x lda 2,s
sta 3,x sta 4,s
bra drop+2 bra drop+2
dup: ; ( x -- x x ) dup: ; ( x -- x x ) dup: ; ( x -- x x )
PRIMITIVE PRIMITIVE PRIMITIVE
dex lda 2,s sta 0,-x
dex pha jmp NEXT
lda 2,x lda 2,s
sta 0,x pha
lda 3,x jmp NEXT
sta 1,x
jmp NEXT
swap: ; ( x1 x2 -- x2 x1 ) swap: ; ( x1 x2 -- x2 x1 ) swap: ; ( x1 x2 -- x2 x1 )
PRIMITIVE PRIMITIVE PRIMITIVE
lda 0,x pla exa 0,x
pha sta N jmp NEXT
lda 1,x pla
pha tay
lda 2,x lda 2,s
sta 0,x pha
lda 3,x lda 2,s
sta 1,x pha
pla tya
sta 3,x sta 4,s
pla lda N
sta 2,x sta 3,s
jmp NEXT jmp NEXT
fetch: ; ( addr -- x ) fetch: ; ( addr -- x ) fetch: ; ( addr -- x )
PRIMITIVE PRIMITIVE PRIMITIVE
lda (0,x) ldy #0 lda 0,a
pha lda (1,s),y jmp NEXT
inc 0,x pha
bne fetch2 iny
inc 1,x lda (2,s),y
fetch2: lda (0,x) sta 3,s
sta 1,x pla
pla sta 1,s
sta 0,x jmp NEXT
jmp NEXT
plus: ; ( n1 n2 -- n3 ) plus: ; ( n1 n2 -- n3 ) plus: ; ( n1 n2 -- n3 )
PRIMITIVE PRIMITIVE PRIMITIVE
lda 0,x pla add 0,x+
clc clc jmp NEXT
adc 2,x adc 2,s
sta 2,x sta 2,s
lda 1,x pla
adc 3,x adc 2,s
sta 3,x sta 2,s
bra drop+2 jmp NEXT
store: ; ( x addr -- ) store: ; ( x addr -- ) store: ; ( x addr -- )
lda 2,x PRIMITIVE PRIMITIVE
sta (0,x) ldy #0 ldb 0,x+
inc 0,x lda 3,s stb 0,a
bne store2 sta (1,s),y bra drop+1
inc 1,x iny
store2: lda 3,x lda 4,s
sta (0,x) sta (1,s),y
inx pla
inx pla
bra drop+2 bra drop+2
Just as Garth asserted that the choice of PSP register seems insignificant for the '816, it appears to have negligible effect for Michael's m65c02a as well, unless I'm missing some opportunities to optimize. BTW, I included the equivalent words from my 65m32 Forth to show how much shorter (in source form, at least) the primitives can be with cell-width registers, TOS and IP in registers, and auto-increment/decrement.
[Edit: fixed code snippet for 65m32's ENTER]
[Edit: fixed code snippets for m65c02a's ENTER]