Using the Y-reg as the IP ptr

Topics relating to various Forth models on the 6502, 65816, and related microprocessors and microcontrollers.
IamRob
Posts: 357
Joined: 26 Apr 2020

Using the Y-reg as the IP ptr

Post by IamRob »

Thanks to Leepivonka for the code to get me started.

I believe I successfully converted the IP ptr into the Y-reg and also got rid of the (W) word pointer at the same time.

I am listing just the words that are affected by the word pointer to see if someone can confirm if I wrapped my head around the code properly.


Clit dw *+2
lda $000,y
and #$ff
bra Push+1

Lit lda $000,y

Push iny ; skip past the number
iny

dex
dex

Put sta $0,x ; store number on the data stack


Next lda $0000,y ; get the next CFA address
iny
iny

Exec phx ; x-reg is still the data stack
tax
jsr ($0000,x) ; should only jump to CFA addresses
plx
rts

Execute dw *+2
lda 0,X
inx ; release the Kraken, or is that pop the stack?
inx
bra EXEC

Colon dw DoCol
dw Qexec
dw SCSP
dw CURR
dw AT
dw Context
dw Store
dw Create
dw PSCode

DoCol phy ; gets pulled by ;S
lda $000,y
pha ; gets pulled by the RTS in NEXT
inc
inc
tay
bra Next

Constant dw DoCol
dw Create
dw Smudge
dw Comma
dw PSCode

Docon lda $0000,y
bra push

Variable dw DoCol
dw Zero
dw Comma
dw PSCode

DoVar tya
bra Push

User dw DoCol
dw Constant
dw PSCode ; skip the value after User when Compiling

DoUser tdc ; Users Area was set to be the direct page or zero-page
sta N
lda $0000,y ; this reads the value after User
and #$1f
asl ; Max User Variable is $3E bytes from the start of the User Area
adc N ; Users Area
bra Push

SemiS dw *+2
ply
bra next


Does dw DoCol
dw RFrom
dw Latest
dw PFA
dw Store
dw PSCode

DoDoes phy ; gets pulled by ;S
lda $0000,y
pha ; gets pulled by the RTS in Next

clc
adc #4
pha

iny
iny

lda $0000,y
tay

pla
brl Push


Warm ldx #TOS
ldy #Abort+2
brl Next

RpStore dw *+2
lda VR0
tcs
brl Next
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Using the Y-reg as the IP ptr

Post by barrym95838 »

Code: Select all

Exec  phx ; x-reg is still the data stack
      tax
      jsr ($0000,x) ; should only jump to CFA addresses
      plx
      rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system. Maybe something along the lines of

Code: Select all

Exec  dea
      pha  ; push IP-1
      rts  ; jump IP
??? It's no longer clear to me what your threading model is, so there may be an indirection error in my suggestion. I'm most comfortable with DTC, but I think you're still ITC, so

Code: Select all

Exec  sta  w
      jmp (w)
??? We talked about it a bit here:

viewtopic.php?f=9&t=5843

Code: Select all

; dtc next, ip in y
next  lda  0,y
      dea
      pha
      iny
      iny
      rts

; itc next, ip in y
next  lda  0,y
      sta w
      iny
      iny
      jmp (w)
There are certain advantages to offsetting ip by one or two, but it requires careful consideration (this is all off-the-cuff and may be completely off-the-mark ...)

[Edit: I apologize to anyone who read the early versions of this post, because my brain was definitely somewhere else ... I hope I got my examples a bit closer to correct ...]
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

barrym95838 wrote:

Code: Select all

Exec  phx ; x-reg is still the data stack
      tax
      jsr ($0000,x) ; should only jump to CFA addresses
      plx
      rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system.
I am still playing with these schmansy new instructions, but I believe I still have access to the DATA stacks X-reg and thus the DATA stack without affecting the RETURN stack with something like this:

Code: Select all

  LDA 3,S    ; this skips the return address put on by the JSR ($0000,X)
  TAX
  LDA $00,X
My first goal was to remove the need for 2 zero-page variables: IP and W

But I haven't followed a DO..LOOP through yet to see how much juggling of data I would have to do between the DATA stack and RETURN stack.

Hmmm! Maybe at the cost of just one byte in each word primitive to recover the DATA stacks X-reg, I can make NEXT do this:

Code: Select all

  PHX
  TAX
  JMP ($0000,X)
Then every primitive would need to have a PLX somewhere in its code before accessing the DATA stack.
This has the advantage of making the X-reg free for use until the DATA stack needs to be accessed.
This may save code in some primitives, but is it worth having a PLX in every primitive.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Using the Y-reg as the IP ptr

Post by GARTHWILSON »

IamRob wrote:
barrym95838 wrote:

Code: Select all

Exec  phx ; x-reg is still the data stack
      tax
      jsr ($0000,x) ; should only jump to CFA addresses
      plx
      rts
I don't think you can use X or the return stack like that, because you won't be able to pass data to the called word without corrupting your threading system.
I am still playing with these schmansy new instructions, but I believe I still have access to the DATA stacks X-reg and thus the DATA stack without affecting the RETURN stack with something like this:

Code: Select all

  LDA 3,S    ; this skips the return address put on by the JSR ($0000,X)
  TAX
  LDA $00,X
I think the problem you're still going to run into is that if one word passes parameters to another, which in turn passes them to another, etc., but other words skip a couple of levels of that chain, the word that actually handles them won't know how many return addresses to jump over. This is a mess in any of the languages that don't have a separate data stack. I discuss it in the 6502 stacks treatise, starting on page 6, "Parameter-passing methods." The way Forth does it with a separate data stack makes all the re-arranging and mirroring unnecessary, and parameter-passing is implicit.
Quote:
My first goal was to remove the need for 2 zero-page variables: IP and W
I put them in self-modifying code, so the variables are the operands of the instructions that use them. The 1234's are just placeholders that get constantly overwritten. This eliminates a level of indirection. IP is one address after preIP, and W is one address after preW.

Code: Select all

ROM_IMAGE_OF_NEXT:      ; This gets copied to zero page before running.
                      ; clocks:
preIP:  LDA   1234      ; 5  Get cell pointed to by instruction pointer. (Code &
        STA   W         ; 4  IP together eliminates a level of indirection.) Put
                        ;    that in the word pointer (which points to a CFA).
        LDA   IP        ; 4  Contents must be kept anyway.  Then increment the
        INA2            ; 4  instruction pointer so it will be ready for next
        STA   IP        ; 4  one, either to come to next or be saved to return
                        ;    to after a secondary call. Faster than INC_zp 2X.
preW:   JMP   (1234)    ; 5  Finally, jump to the code pointed to by the word
                        ;    pointer.  (code & W together here eliminates a JMP,
 ;-------------------   ;    an advantage of having NEXT in direct-page RAM.)

There's also a part to do interrupt service in high-level Forth with no overhead. In fact, the version of NEXT that's used for going into an ISR is faster than the normal NEXT above is for going into the next word in line without an interrupt.

If you don't need interrupts, or at least don't need instant service like I do in my uses, check out Bruce Clark's single-instruction, 6-clock 65816 NEXT in DTC Forth, at viewtopic.php?t=586, and his 2-instruction 65816 NEXT in ITC Forth at viewtopic.php?t=584 .
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

GARTHWILSON wrote:
I think the problem you're still going to run into is that if one word passes parameters to another, which in turn passes them to another, etc., but other words skip a couple of levels of that chain, the word that actually handles them won't know how many return addresses to jump over.

I must be thinking about this wrong, but I only ever have to skip the last return address once, because the PHX before the JSR (0000,X) always points to the bottom of the DATA stack. So any Math, SWAP's, ROT's, PICK's or ROLL's, still are only done from the bottom of the stack at the time the last PHX was made, and any of these functions should not affect the RETURN stack or the previous values of the X-reg for the DATA stack.

But having said that, I see now that once the routine starts backing out of the chain and they start getting pulled off the return stack, then the last X-reg that was pushed may not match the first X-reg that was pushed or any of the in-between X-reg's, which then could put the Data stack pointer out of order. Unless we use another register to pull the X-reg so the last value of X is maintained through each pull like so:

Code: Select all

  PHX
  TAX
  LDA 1,S
  JSR ($0000,X)
  PLA    ; so the value of the X-reg is retained through each pull in the chain
  RTS
but this still means that every primitive will have to start with a

Code: Select all

 TAX 
I can see that the return addresses will start to stack up.

So maybe something like this instead?

Code: Select all

NEXT LDA $0000,Y	; get the next CFA address
	INY
	INY

  PHX
  TAX
  PLA    ; this temporarily loads the X-reg off the stack and each primitive will start with TAX to restore the DATA pointer
  JMP ($0000,X)     ; I woke up with a start this morning realizing my mistake and changed to this before anyone saw the old code
No more stacking up the stack with return addresses or X-reg pushes, which should mean one can go into as many levels of a chain as needed.

If no one can wrap their head around this, all you have to remember is the the old method of NEXT even always calls a primitive. It has to, otherwise it would crash.
Last edited by IamRob on Sun Dec 26, 2021 6:02 pm, edited 6 times in total.
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

So far at last count, I have eliminated 2 zero-page variables and saved 30 bytes, and should add a fair bit of speed increase.

but each word definition that contains a primitive must start with TAX, of which the basic FigForth system has about 75, for a difference of 45 bytes extra.

And at a quick glance through the definitions, very little else has to be changed. I think it's a fair tradeoff.

One more quick search sees that only 9 word definitions require a PHY/PLY combo to allow the Y-reg to be used within the primitive, for another 18 bytes to be added plus the 45 bytes extra for the TAX's for a grand total of 63 bytes added to the 65816 Forth system.

I haven't done any cycle counting, but replacing the re-direction of the zero-page addresses should add up to a significant cycle count reduction through the most used routine in the Forth system.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Using the Y-reg as the IP ptr

Post by barrym95838 »

You're supporting your theory with actual practice, and I applaud you for that. Keep up the good fight, and don't forget to give DTC and STC at least a sincere glance.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

I believe I have the bugs and stack worked out.

If we compare cycles for just once through the loop. Can someone good at counting cycles add up the cycles for these methods listed? Thanks

This new fandangled method

Code: Select all

Next	lda $0000,y	; get the next CFA address
	iny
	iny

	phx	; x-reg is still the data stack
	tax
	pla	; every primitive needs to start with TAX
	jmp ($0000,x)	; should jump to the indirect address at the CFA address
This loop should include the cycle count for the TAX in each primitive, once each time through this loop. 
versus this way

Code: Select all

NEXT	LDY IP
		LDA $0000,Y
		INY
		INY
		STY IP
		STA W
		JMP W-1

W		JMP (0000)  ; W is self modified and has to be included in the cycle count, once each time thru
 the loop
and the old 6502 way

Code: Select all

NEXT	LDY #1
	LDA (IP),Y	; indirect ptr
	STA W+1	; $F2
	DEY
	LDA (IP),Y
	STA W	; $F1
	CLC
	LDA IP
	ADC #2
	STA IP
	BCS L56
L54	JMP W-1	; $F0 ; Y-reg always exits with zero
L56	INC IP+1
	BCS L54

W     JMP (0000)   ; self-modified that needs to be counted once each time through the loop 
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Using the Y-reg as the IP ptr

Post by barrym95838 »

I can't commit to a clear favorite, but if it's ITC speed you seek, then you should also consider setting aside a handful of ZP bytes and placing a self-modifying NEXT there:

Code: Select all

YNEXT LDY #$C0DE ; IP is the immediate operand
NEXT  LDA $0000,Y
      INY
      INY
;     STY NEXT-2
      STA W+1
W     JMP ($C0DE) ; W+1 is the operand
I could be on the wrong track, but I would try to keep IP in Y as much as possible, and only PHY/PLY or STY/LDY when truly necessary. Or maybe just forget about IP in Y and try:

Code: Select all

NEXT  LDA $C0DE ; IP is the operand
      INC NEXT+1
      INC NEXT+1
      STA W+1
W     JMP ($C0DE) ; W+1 is the operand
If you put W in Y, then A suddenly becomes a possibility for TOS:

Code: Select all

BUMP    INC NEXT+1
        INC NEXT+1
NEXT    LDY $C0DE ; IP is the operand
        INC NEXT+1
        INC NEXT+1
        STY W+1
W       JMP ($C0DE) ; W+1 is the operand
...
doLIT   DEX
        DEX
        STA 0,X
        LDA (NEXT+1)
        JMP BUMP
doFETCH TAY
        LDA 00,Y
        JMP NEXT
doSTORE TAY
        LDA 0,X
        STA 00,Y
        INX
        INX
        BRA doDROP
doSWAP  TAY
        LDA 0,X
        STY 0,X
        JMP NEXT
doMINUS EOR #$FFFF
        INA
doPLUS  CLC
        ADC 0,X
        BRA doNIP
doAND   AND 0,X
        BRA doNIP
doOR    ORA 0,X
        BRA doNIP
doXOR   EOR 0,X
        BRA doNIP
doDUP   DEX
        DEX
        STA 0,X
        JMP NEXT
doTOR   PHA
doDROP  LDA 0,X
doNIP   INX
        INX
        JMP NEXT
doRFROM DEX
        DEX
        STA 0,X
        PLA
        JMP NEXT
...
(profoundly untested ...)
Last edited by barrym95838 on Mon Dec 27, 2021 1:22 am, edited 2 times in total.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

Thanks for the examples, Mike.

I have been looking at your posts on other threads and take your comments with great interest. I know I am late to the party, this has all been discussed back in '16 and '17, and it all has been tried before. Thank you for your patience.

I like the idea of W being in the Y-reg and thanks again for examples to get me started.

I want to stick with FigForth and ITC for now, and hope to hear from others doing DTC and STC.
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

MAN! I love programming on the 65816 when you can do stuff like this:

3+ INA
2+ INA
1+ INA
JMP NEXT

3- DEA
2- DEA
1- DEA
JMP NEXT
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Using the Y-reg as the IP ptr

Post by GARTHWILSON »

I sense that people don't believe me when I say that programming the '816 is easier than programming the '02. Things that had to be secondaries on the '02 easily become primitives on the '816, both shorter and quicker to write, and of course much faster.

BTW, your INC 0,X, INC 0,X (16 cycles) for 2+ will be slightly faster if you do LDA 0,X, INA, INA, STA 0,X (15 cycles). For 3+, your 24 cycles drops to only 16 cycles if you do LDA 0,X, CLC, ADC #3, STA 0,X.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Using the Y-reg as the IP ptr

Post by barrym95838 »

The '816 is definitely more powerful than the '02, but so are a lot of things. My personal favorite in the real world for ease of assembly programming is the 'c02, because that's the way my brain works best and because the only mode bit is D, and the 'c02 does a decent job of handling it by itself.

It's a shame that they didn't make a nine or ten bit version, but that's just the weirdo in me talking ...
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

GARTHWILSON wrote:
I sense that people don't believe me when I say that programming the '816 is easier than programming the '02. Things that had to be secondaries on the '02 easily become primitives on the '816, both shorter and quicker to write, and of course much faster.

BTW, your INC 0,X, INC 0,X (16 cycles) for 2+ will be slightly faster if you do LDA 0,X, INA, INA, STA 0,X (15 cycles). For 3+, your 24 cycles drops to only 16 cycles if you do LDA 0,X, CLC, ADC #3, STA 0,X.
Darn you read my post before I changed it.
I took Mike's idea and stored the BOS value in the Accumulator.

so it is just INA for all three.
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Using the Y-reg as the IP ptr

Post by IamRob »

barrym95838 wrote:
If you put W in Y, then A suddenly becomes a possibility for TOS:
Code:
BUMP INC NEXT+1
INC NEXT+1
NEXT LDY $C0DE ; IP is the operand
INC NEXT+1
INC NEXT+1
STY W+1
W JMP ($C0DE) ; W+1 is the operand
...
Mike, I can't even begin to thank you enough for this code snippet. I made just one enhancement to it, and the compact code I am getting right now is blowing my mind. I will let you play with this for a bit, but just a hint. I have over 100 primitives created so far out of the approx 230 in FigForth, and am still less than 512 bytes used. This does not include the headers though, as programming this way really opens up the door for headerless programming, too. I am so stoked.

Code: Select all

BUMP    INC NEXT+1
        INC NEXT+1
NEXT    LDY $C0DE ; IP is the operand
        INC NEXT+1
        INC NEXT+1
        STY W+1
        TAY
W      JMP ($C0DE) ; W+1 is the operand
Post Reply