Dr Jefyll wrote:
Now I'm wondering if you simply made a typo when you said, "I don't want to give up the direct page indexed Y modes." I think the modes used in the snippet above are called direct page indirect indexed.
By "
modes" I meant normal, indirect and indirect long indexed. I listed them all earlier in my post and figured
(wrongly) that using "
modes" would be clear that I meant them all.
Dr Jefyll wrote:
Also, I don't see why this code wouldn't work if D is being used as the data stack pointer. Wouldn't you want F2 and W1 to be items on the data stack? You'd could do something like this...
Code:
lda [tos],y ; compare dictionary entry with word in work buffer
cmp (nos),y ; matched so far?
... where tos is a named value meaning 0, and tos likewise is 4. Am I missing something still?
You're right. The code works fine if F2 and W1 are on the stack. What you lose with D being the data stack pointer is that F2 and W1
have to be on the data stack. There is no option for them to be static. No problem, right? Just put everything on the stack.
Unfortunately, this runs counter to a basic design I've used, dating back to my first Forth, which was token threaded. To keep the size small, given my SBC at the time had limited ROM, I factored code as much as possible and to keep things reasonably fast I bypassed error checking and stack effects for these factors. For example, to compile, I have
(shown here is my 6502 TTC code):
Code:
; , ( x -- )
xt_comma:
jsr underflow_1
lda TOS,x
ldy TOS+1,x
jsr comma
inx ; clear x from stack
inx
jmp NEXT
; compile A
comma:
jsr c_comma ; store lsb TOS item at DSP
tya
jsr c_comma ; store msb TOS+1 item at DSP+1
rts
The factor
comma is called 15 times in my TTC Forth (the 65816 version is called 18 times in my current STC Forth). Primitive words that call this as an intermediate step avoid putting the value in the accumulator on the stack, only to delete it again and avoid error checking, which isn't needed for internal usage. It has a cost though. The word itself has to call the factor, incurring the cost of a subroutine call and return. But speed wasn't a concern in my TTC Forth.
Maybe blindly, I've kept this design philosophy going forward. Generally, if a primitive word uses another non-trivial word as an intermediate, it calls a factor of that word's essential code, bypassing any stack effects and error checking. Parameters are passed via registers when possible, but at times I use statics when I need to pass more data. F2 and W1 are statics used in a factor of FIND. It's called 4 times in my TTC Forth and 5 times in my STC Forth.
I suppose I should reevaluate my design choice, given that I no longer have a memory constraint. However, while using the stack more effectively with the 65816's added address modes is appealing, the added cost of adjusting the data stack pointer, gives me pause. I have roughly about 140 words (probably less) that adjust the stack size. Using D as a data stack pointer would add about 560 cycles overall. I'm guessing that adding intermediate values to the stack as well would increase that many times, resulting in a generally less efficient Forth. I don't incur that cost using X as a data stack pointer, but as you said, doing so I may have an added cost in long access words. Given that I haven't coded my long access words, or even considered much how I'll use them, I'll leave this design choice to another day.