Page 1 of 4
Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 1:51 am
by IamRob
Thanks to Leepivonka for the code to get me started.
I believe I successfully converted the IP ptr into the Y-reg and also got rid of the (W) word pointer at the same time.
I am listing just the words that are affected by the word pointer to see if someone can confirm if I wrapped my head around the code properly.
Clit dw *+2
lda $000,y
and #$ff
bra Push+1
Lit lda $000,y
Push iny ; skip past the number
iny
dex
dex
Put sta $0,x ; store number on the data stack
Next lda $0000,y ; get the next CFA address
iny
iny
Exec phx ; x-reg is still the data stack
tax
jsr ($0000,x) ; should only jump to CFA addresses
plx
rts
Execute dw *+2
lda 0,X
inx ; release the Kraken, or is that pop the stack?
inx
bra EXEC
Colon dw DoCol
dw Qexec
dw SCSP
dw CURR
dw AT
dw Context
dw Store
dw Create
dw PSCode
DoCol phy ; gets pulled by ;S
lda $000,y
pha ; gets pulled by the RTS in NEXT
inc
inc
tay
bra Next
Constant dw DoCol
dw Create
dw Smudge
dw Comma
dw PSCode
Docon lda $0000,y
bra push
Variable dw DoCol
dw Zero
dw Comma
dw PSCode
DoVar tya
bra Push
User dw DoCol
dw Constant
dw PSCode ; skip the value after User when Compiling
DoUser tdc ; Users Area was set to be the direct page or zero-page
sta N
lda $0000,y ; this reads the value after User
and #$1f
asl ; Max User Variable is $3E bytes from the start of the User Area
adc N ; Users Area
bra Push
SemiS dw *+2
ply
bra next
Does dw DoCol
dw RFrom
dw Latest
dw PFA
dw Store
dw PSCode
DoDoes phy ; gets pulled by ;S
lda $0000,y
pha ; gets pulled by the RTS in Next
clc
adc #4
pha
iny
iny
lda $0000,y
tay
pla
brl Push
Warm ldx #TOS
ldy #Abort+2
brl Next
RpStore dw *+2
lda VR0
tcs
brl Next
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 3:49 am
by barrym95838
Code: Select all
Exec phx ; x-reg is still the data stack
tax
jsr ($0000,x) ; should only jump to CFA addresses
plx
rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system. Maybe something along the lines of
Code: Select all
Exec dea
pha ; push IP-1
rts ; jump IP
??? It's no longer clear to me what your threading model is, so there may be an indirection error in my suggestion. I'm most comfortable with DTC, but I think you're still ITC, so
??? We talked about it a bit here:
viewtopic.php?f=9&t=5843
Code: Select all
; dtc next, ip in y
next lda 0,y
dea
pha
iny
iny
rts
; itc next, ip in y
next lda 0,y
sta w
iny
iny
jmp (w)
There are certain advantages to offsetting ip by one or two, but it requires careful consideration (this is all off-the-cuff and may be completely off-the-mark ...)
[Edit: I apologize to anyone who read the early versions of this post, because my brain was definitely somewhere else ... I hope I got my examples a bit closer to correct ...]
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 8:11 am
by IamRob
Code: Select all
Exec phx ; x-reg is still the data stack
tax
jsr ($0000,x) ; should only jump to CFA addresses
plx
rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system.
I am still playing with these schmansy new instructions, but I believe I still have access to the
DATA stacks X-reg and thus the
DATA stack without affecting the
RETURN stack with something like this:
Code: Select all
LDA 3,S ; this skips the return address put on by the JSR ($0000,X)
TAX
LDA $00,X
My first goal was to remove the need for 2 zero-page variables: IP and W
But I haven't followed a
DO..LOOP through yet to see how much juggling of data I would have to do between the
DATA stack and
RETURN stack.
Hmmm! Maybe at the cost of just one byte in each word primitive to recover the
DATA stacks X-reg, I can make
NEXT do this:
Then every primitive would need to have a
PLX somewhere in its code before accessing the
DATA stack.
This has the advantage of making the X-reg free for use until the DATA stack needs to be accessed.
This may save code in some primitives, but is it worth having a PLX in every primitive.
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 9:11 am
by GARTHWILSON
Code: Select all
Exec phx ; x-reg is still the data stack
tax
jsr ($0000,x) ; should only jump to CFA addresses
plx
rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system.
I am still playing with these schmansy new instructions, but I believe I still have access to the
DATA stacks X-reg and thus the
DATA stack without affecting the
RETURN stack with something like this:
Code: Select all
LDA 3,S ; this skips the return address put on by the JSR ($0000,X)
TAX
LDA $00,X
I think the problem you're still going to run into is that if one word passes parameters to another, which in turn passes them to another, etc., but other words skip a couple of levels of that chain, the word that actually handles them won't know how many return addresses to jump over. This is a mess in any of the languages that don't have a separate data stack. I discuss it in the 6502 stacks treatise, starting on page 6, "Parameter-passing methods." The way Forth does it with a separate data stack makes all the re-arranging and mirroring unnecessary, and parameter-passing is implicit.
My first goal was to remove the need for 2 zero-page variables: IP and W
I put them in self-modifying code, so the variables are the operands of the instructions that use them. The 1234's are just placeholders that get constantly overwritten. This eliminates a level of indirection. IP is one address after preIP, and W is one address after preW.
Code: Select all
ROM_IMAGE_OF_NEXT: ; This gets copied to zero page before running.
; clocks:
preIP: LDA 1234 ; 5 Get cell pointed to by instruction pointer. (Code &
STA W ; 4 IP together eliminates a level of indirection.) Put
; that in the word pointer (which points to a CFA).
LDA IP ; 4 Contents must be kept anyway. Then increment the
INA2 ; 4 instruction pointer so it will be ready for next
STA IP ; 4 one, either to come to next or be saved to return
; to after a secondary call. Faster than INC_zp 2X.
preW: JMP (1234) ; 5 Finally, jump to the code pointed to by the word
; pointer. (code & W together here eliminates a JMP,
;------------------- ; an advantage of having NEXT in direct-page RAM.)
There's also a part to do interrupt service in high-level Forth with no overhead. In fact, the version of NEXT that's used for going into an ISR is faster than the normal NEXT above is for going into the next word in line without an interrupt.
If you don't need interrupts, or at least don't need instant service like I do in my uses, check out Bruce Clark's single-instruction, 6-clock 65816 NEXT in DTC Forth, at viewtopic.php?t=586, and his 2-instruction 65816 NEXT in ITC Forth at viewtopic.php?t=584 .
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 10:39 am
by IamRob
I think the problem you're still going to run into is that if one word passes parameters to another, which in turn passes them to another, etc., but other words skip a couple of levels of that chain, the word that actually handles them won't know how many return addresses to jump over.
I must be thinking about this wrong, but I only ever have to skip the last return address once, because the PHX before the JSR (0000,X) always points to the bottom of the DATA stack. So any Math, SWAP's, ROT's, PICK's or ROLL's, still are only done from the bottom of the stack at the time the last PHX was made, and any of these functions should not affect the RETURN stack or the previous values of the X-reg for the DATA stack.
But having said that, I see now that once the routine starts backing out of the chain and they start getting pulled off the return stack, then the last X-reg that was pushed may not match the first X-reg that was pushed or any of the in-between X-reg's, which then could put the Data stack pointer out of order. Unless we use another register to pull the X-reg so the last value of X is maintained through each pull like so:
Code: Select all
PHX
TAX
LDA 1,S
JSR ($0000,X)
PLA ; so the value of the X-reg is retained through each pull in the chain
RTS
but this still means that every primitive will have to start with a
I can see that the return addresses will start to stack up.
So maybe something like this instead?
Code: Select all
NEXT LDA $0000,Y ; get the next CFA address
INY
INY
PHX
TAX
PLA ; this temporarily loads the X-reg off the stack and each primitive will start with TAX to restore the DATA pointer
JMP ($0000,X) ; I woke up with a start this morning realizing my mistake and changed to this before anyone saw the old code
No more stacking up the stack with return addresses or X-reg pushes, which should mean one can go into as many levels of a chain as needed.
If no one can wrap their head around this, all you have to remember is the the old method of NEXT even always calls a primitive. It has to, otherwise it would crash.
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 11:26 am
by IamRob
So far at last count, I have eliminated 2 zero-page variables and saved 30 bytes, and should add a fair bit of speed increase.
but each word definition that contains a primitive must start with TAX, of which the basic FigForth system has about 75, for a difference of 45 bytes extra.
And at a quick glance through the definitions, very little else has to be changed. I think it's a fair tradeoff.
One more quick search sees that only 9 word definitions require a PHY/PLY combo to allow the Y-reg to be used within the primitive, for another 18 bytes to be added plus the 45 bytes extra for the TAX's for a grand total of 63 bytes added to the 65816 Forth system.
I haven't done any cycle counting, but replacing the re-direction of the zero-page addresses should add up to a significant cycle count reduction through the most used routine in the Forth system.
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 6:30 pm
by barrym95838
You're supporting your theory with actual practice, and I applaud you for that. Keep up the good fight, and don't forget to give DTC and STC at least a sincere glance.
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 6:44 pm
by IamRob
I believe I have the bugs and stack worked out.
If we compare cycles for just once through the loop. Can someone good at counting cycles add up the cycles for these methods listed? Thanks
This new fandangled method
Code: Select all
Next lda $0000,y ; get the next CFA address
iny
iny
phx ; x-reg is still the data stack
tax
pla ; every primitive needs to start with TAX
jmp ($0000,x) ; should jump to the indirect address at the CFA address
This loop should include the cycle count for the TAX in each primitive, once each time through this loop.
versus this way
Code: Select all
NEXT LDY IP
LDA $0000,Y
INY
INY
STY IP
STA W
JMP W-1
W JMP (0000) ; W is self modified and has to be included in the cycle count, once each time thru
the loop
and the old 6502 way
Code: Select all
NEXT LDY #1
LDA (IP),Y ; indirect ptr
STA W+1 ; $F2
DEY
LDA (IP),Y
STA W ; $F1
CLC
LDA IP
ADC #2
STA IP
BCS L56
L54 JMP W-1 ; $F0 ; Y-reg always exits with zero
L56 INC IP+1
BCS L54
W JMP (0000) ; self-modified that needs to be counted once each time through the loop
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 7:27 pm
by barrym95838
I can't commit to a clear favorite, but if it's ITC speed you seek, then you should also consider setting aside a handful of ZP bytes and placing a self-modifying NEXT there:
Code: Select all
YNEXT LDY #$C0DE ; IP is the immediate operand
NEXT LDA $0000,Y
INY
INY
; STY NEXT-2
STA W+1
W JMP ($C0DE) ; W+1 is the operand
I could be on the wrong track, but I would try to keep IP in Y as much as possible, and only PHY/PLY or STY/LDY when truly necessary. Or maybe just forget about IP in Y and try:
Code: Select all
NEXT LDA $C0DE ; IP is the operand
INC NEXT+1
INC NEXT+1
STA W+1
W JMP ($C0DE) ; W+1 is the operand
If you put W in Y, then A suddenly becomes a possibility for TOS:
Code: Select all
BUMP INC NEXT+1
INC NEXT+1
NEXT LDY $C0DE ; IP is the operand
INC NEXT+1
INC NEXT+1
STY W+1
W JMP ($C0DE) ; W+1 is the operand
...
doLIT DEX
DEX
STA 0,X
LDA (NEXT+1)
JMP BUMP
doFETCH TAY
LDA 00,Y
JMP NEXT
doSTORE TAY
LDA 0,X
STA 00,Y
INX
INX
BRA doDROP
doSWAP TAY
LDA 0,X
STY 0,X
JMP NEXT
doMINUS EOR #$FFFF
INA
doPLUS CLC
ADC 0,X
BRA doNIP
doAND AND 0,X
BRA doNIP
doOR ORA 0,X
BRA doNIP
doXOR EOR 0,X
BRA doNIP
doDUP DEX
DEX
STA 0,X
JMP NEXT
doTOR PHA
doDROP LDA 0,X
doNIP INX
INX
JMP NEXT
doRFROM DEX
DEX
STA 0,X
PLA
JMP NEXT
...
(profoundly untested ...)
Re: Using the Y-reg as the IP ptr
Posted: Sun Dec 26, 2021 9:25 pm
by IamRob
Thanks for the examples, Mike.
I have been looking at your posts on other threads and take your comments with great interest. I know I am late to the party, this has all been discussed back in '16 and '17, and it all has been tried before. Thank you for your patience.
I like the idea of W being in the Y-reg and thanks again for examples to get me started.
I want to stick with FigForth and ITC for now, and hope to hear from others doing DTC and STC.
Re: Using the Y-reg as the IP ptr
Posted: Tue Dec 28, 2021 6:34 pm
by IamRob
MAN! I love programming on the 65816 when you can do stuff like this:
3+ INA
2+ INA
1+ INA
JMP NEXT
3- DEA
2- DEA
1- DEA
JMP NEXT
Re: Using the Y-reg as the IP ptr
Posted: Tue Dec 28, 2021 8:02 pm
by GARTHWILSON
I sense that people don't believe me when I say that programming the '816 is easier than programming the '02. Things that had to be secondaries on the '02 easily become primitives on the '816, both shorter and quicker to write, and of course much faster.
BTW, your INC 0,X, INC 0,X (16 cycles) for 2+ will be slightly faster if you do LDA 0,X, INA, INA, STA 0,X (15 cycles). For 3+, your 24 cycles drops to only 16 cycles if you do LDA 0,X, CLC, ADC #3, STA 0,X.
Re: Using the Y-reg as the IP ptr
Posted: Tue Dec 28, 2021 9:02 pm
by barrym95838
The '816 is definitely more powerful than the '02, but so are a lot of things. My personal favorite in the real world for ease of assembly programming is the 'c02, because that's the way my brain works best and because the only mode bit is D, and the 'c02 does a decent job of handling it by itself.
It's a shame that they didn't make a nine or ten bit version, but that's just the weirdo in me talking ...
Re: Using the Y-reg as the IP ptr
Posted: Tue Dec 28, 2021 9:34 pm
by IamRob
I sense that people don't believe me when I say that programming the '816 is easier than programming the '02. Things that had to be secondaries on the '02 easily become primitives on the '816, both shorter and quicker to write, and of course much faster.
BTW, your INC 0,X, INC 0,X (16 cycles) for 2+ will be slightly faster if you do LDA 0,X, INA, INA, STA 0,X (15 cycles). For 3+, your 24 cycles drops to only 16 cycles if you do LDA 0,X, CLC, ADC #3, STA 0,X.
Darn you read my post before I changed it.
I took Mike's idea and stored the BOS value in the Accumulator.
so it is just INA for all three.
Re: Using the Y-reg as the IP ptr
Posted: Tue Dec 28, 2021 9:46 pm
by IamRob
If you put W in Y, then A suddenly becomes a possibility for TOS:
Code:
BUMP INC NEXT+1
INC NEXT+1
NEXT LDY $C0DE ; IP is the operand
INC NEXT+1
INC NEXT+1
STY W+1
W JMP ($C0DE) ; W+1 is the operand
...
Mike, I can't even begin to thank you enough for this code snippet. I made just one enhancement to it, and the compact code I am getting right now is blowing my mind. I will let you play with this for a bit, but just a hint. I have over 100 primitives created so far out of the approx 230 in FigForth, and am still less than 512 bytes used. This does not include the headers though, as programming this way really opens up the door for headerless programming, too. I am so stoked.
Code: Select all
BUMP INC NEXT+1
INC NEXT+1
NEXT LDY $C0DE ; IP is the operand
INC NEXT+1
INC NEXT+1
STY W+1
TAY
W JMP ($C0DE) ; W+1 is the operand