There is one thing I forgot about: RP@ (RPAT) and RP! (RPSTO) push the (lower 16 bits) of the actual address on the return stack pointer (the S register). Hence the TSX-INX in the former, and the DEX-TXS in the latter (since the top of the 65xx stack is 1,S rather than 0,S). However, on the 65org16, the stack is on "page" 1 rather than page 0. Thus the first ! in ACTIVATE (which activates a new task and stores on the return stack in preparation) should be 1 L! instead. I have corrected this and uploaded it at the link above. However, there's probably still something else wrong as well, since ACTIVATE isn't used unless you type it yourself, and obviously it's crashing before you can type things.
Most likely, the (cooperative) multitasker isn't quite right. That's the part I modified (in ITC) and hence the most likely thing to fail. PAUSE is the word which switches tasks. Until you create and activate more tasks (and of course you can't do that until you can start typing) there's only one task, so it will switch back to itself and keep going. The switching part is a bit tricky and I screwed this up more than once trying to get it to work.
To test this, in PAUSE (PAUS), insert .byte EXITT after JLIST. This, in effect, disables the multitasker since it will never switch. Then, you can see if you're able to type things like
Code:
1 2 + .
and get the expected result (3 and an ok prompt). You could also put a breakpoint (or output something) in your INPUT routine to see if you got there.
The fact that you got the startup message is good news since a lot of things have to work right for that to happen.
Forth doesn't interpret an "empty program". The code that waits for a keypress, displays it, acts on it, and so on is itself a Forth program.
Assuming the $FF00 (end of memory address) in the ETIB .equ is unchanged
- $FEB0-$FEFF is the input buffer (i.e. >= ETIB )
- $FE70-$FEAF is the forth data stack (i.e. < ESP)
- $FE30-$FE6F is the forth return stack (i.e. < ERP)
- $FE2A-$FE2F is the user area (i.e. < EUP)
The input buffer is where characters are stored, like pretty much every other line-oriented input buffer ever.
The X register is used to index the forth data stack; it should always be between $FE71 and $FEB0 inclusive (and will usually be closer the $FEB0, since the stack builds downward in memory)
The S register (65org16 stack pointer) is used to index the forth return stack and should always be between $FE30 and $FE6F (remember 1,S is the top of the stack), and usually near $FE6F since the the 65xx hardware stack builds downward.
The user area is:
- $FE2F ($FE30-1) FOLLOWER should be the address of STATUS ($FE2E)
- $FE2E ($FE30-2) STATUS should be the address of UWAKE
- $FE2D ($FE30-3) TOS is the saved value of data stack pointer, not written until you switch tasks with PAUSE
- $FE2C ($FE30-4) TID should be the value of address of SUP1
- $FE2B ($FE30-5) TF is not written until an exception is thrown (e.g. trying to execute a non-existent word)
- $FE2A ($FE30-6) U1 is unused
The self-modifiying code isn't as bad as it looks (it's unlikely the problem is there). A self-modifying LDY abs is being used as a replacement for the non-existent LDY (zp) instruction. The self-modifying JMP is a replacement for the common 65xx PHA-RTS trick, but with JMP we don't have to adjust the address by 1. BYE exits forth (and thus restores the stack pointer) so the immediate data of LDX is self-modified.
Cells on the forth data stack (and the forth return stack) are 16 bits wide. Traditionally, ITC forth looks like this:
Code:
.word OVER
.word PLUS
.word SWAP
I went down a little more unorthodox path. Most forth programs/applications will easily fit in 64k, so the upper 16 bits will be the same. Hence we can use:
Code:
.byte OVER
.byte PLUS
.byte SWAP
since "bytes" are 16 bits wide. This is why indirect addressing isn't used anywhere -- addresses are 16 bits wide, rather than 32 bits wide (except L! and L@ which access the entire 32-bit address space, and hence use 32-bit addresses).
IMO, the most helpful things to output for debugging are the data stack pointer (the X register), the top of the data stack, the return stack pointer (the S register), the top of the return stack, the IP variable, and the user area.
Code:
DEBUG .byte DBUG1
DBUG1 jsr outcrlf
txa ;data stack pointer
jsr outhex
jsr outspace
lda 0,x ;top of data stack
jsr outhex
jsr outspace
txa
tsx ;return stack pointer
pha
txa
jsr outhex
jsr outspace
pla ;restore x reg
tax
pla ;top of return stack
pha ;put it back
jsr outhex
jsr outspace
lda IP ;IP
jsr outhex
jsr outspace
ldy #5 ;user area
DBUG2 lda EUP-6,Y
jsr outhex
dey
bpl DBUG2
brk ;or use JMP NEXT1
where outhex outputs the 16-bit value in the accumulator, and outcrlf and outspace are obvious.
If you insert
Code:
.byte DEBUG
before the J 1,QUIT (in COLD) you should see that the X register is $FEB0, the S register is $FE6F (both data and return stacks should be empty) IP is the address of J 1,QUIT and the top of the data and return stacks are "don't care". If you move the .byte DEBUG into PAUSE so I can see each step (i.e. after the JLIST then after J 0,RPAT, etc.) that should give me enough information to determine where things are going wrong.
You could also move the output stuff into the NEXT routine. That will spew a lot of data (probably way too much), so you might want the ability to turn it on and off, e.g.
Code:
DEBUG_FLAG:
.byte $FFFF
DEBUG_OUT_ON:
.byte DOO1
DOO1 LSR DEBUG_FLAG
JMP NEXT1
NEXT1:
BIT DEBUG_FLAG
BMI NEXTSKIP
; output debug info here (you could JSR to a routine which outputs
; debug info but remember that the S register will be off by 2 then)
NEXTSKIP:
; the real NEXT1 work continues here
Then you'd just insert .byte DEBUG_OUT_ON after, say the JLIST in PAUSE and you get the startup message (without a bunch of debug info for something that already works), then it will start spewing debug info at what is most likely the problem spot.
Also, one thing I did was put the debugging code at $0800 to keep the addresses from moving around much. You might want to do this with any extra initialization code rather than putting it at the beginning (and thus shifting all the addresses around). Alternatively, you could insert it after HERE0. One nice thing about having debug code at $800 is that its location was the same no matter what I changed or corrected.
Note: all the code snippets in this post are untested.