65c02 ANSI-compatible Forth looping using the V flag

Topics relating to various Forth models on the 6502, 65816, and related microprocessors and microcontrollers.
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

So one of the things that will make you want to bang your head on the table about Forth is that between FIG and F83, they changed the way loops will work. These days, if you create a loop such as 0 0 DO, it will go through the complete word size instead of just quitting. Formally, the loop is complete when the boundry between limit-1 and limit is crossed, or something to that effect. Seriously, why?

Whatever. Mike had pointed out that there is a clever way to code this using the Overflow Flag (V) on the 65m32 (see viewtopic.php?f=9&t=2026). I've started to create versions for Tali Forth (which up till now used the FIG Forth loops) on the 65c02. In case anybody has the same problem, these are the code snippets for (DO), I, J, and (LOOP). I'll post (+LOOP) when I get around to writing it. Note these routines only have had minimal testing so far, so I'm not sure about the edge cases.

First, (DO)

Code: Select all

l_pdo:          bra a_pdo
                .byte CO+NC+$04 
                .word l_i    ; link to I
                .word z_pdo
                .byte "(DO)"
.scope
a_pdo:          ; first step: create fudge factor (FUFA) by subtracting the limit
                ; from $8000, the number that will trip the overflow flag
                sec
                lda #$00
                sbc 3,x         ; LSB of limit
                sta 3,x         ; save FUFA for later use
                lda #$80
                sbc 4,x         ; MSB of limit
                sta 4,x         ; save FUFA for later use
                pha             ; FUFA replaces limit on R stack
                lda 3,x         ; LSB of limit
                pha

                ; second step: index is FUFA plus original index
                clc
                lda 1,x         ; LSB of original index
                adc 3,x         ; add LSB of FUFA
                sta 1,x
                lda 2,x         ; MSB of orginal index
                adc 4,x         ; add MSB of FUFA
                pha
                lda 1,x         ; LSB of index
                pha

                ; we've saved the FUFA on the NOS of the R stack, so we can
                ; use it later
                inx
                inx
                inx
                inx

z_pdo:          rts
Then I and J:

Code: Select all

l_i:            bra a_i
                .byte NC+CO+$01 
                .word l_j    ; link to J
                .word z_i
                .byte "I"
.scope
a_i:            dex
                dex

                ; get the fudged index off of the top of the stack. it's
                ; easier to do math on the stack directly than to pop and
                ; push stuff around
                stx TMPX
                tsx

                sec
                lda $0101,x     ; LSB
                sbc $0103,x
                sta TMPCNT

                lda $0102,x     ; MSB
                sbc $0104,x

                ldx TMPX

                sta 2,x         ; MSB of de-fudged index
                lda TMPCNT
                sta 1,x         ; LSB of de-fudged index

z_i:            rts             ; should be never reached, because NC 
The code for J is basically the same except that the addresses on the stack are each four bytes down. Finally, (LOOP):

Code: Select all

l_ploop:        bra a_ploop
                .byte NC+CO+$06 
                .word l_abs     ; link to ABS
;               .word l_pploop    ; link to PPLOOP # TODO change
                .word z_ploop
                .byte "(LOOP)"
.scope
a_ploop:        ; TOP of the Return Stack has the index. We manipulate the
                ; 65c02 stack in place.
                stx TMPX
                tsx

                clc
                lda $0101,x     ; LSB
                adc #$01
                sta $0101,x

                clv             ; we check the V flag on MSB
                lda $0102,x     ; MSB
                adc #$00        ; we only care about the carry        
                sta $0102,x

                ldx TMPX

                bvs _hack+3     ; skip over JMP instruction

_hack:          ; This is why this routine must be natively compiled: We 
                ; compile the opcode for JMP here without an address to 
                ; go to, which is added by the next address by LOOP. 
                .byte $4C

z_ploop:        rts             ; never reached 
The acutal DO and LOOP words are high-level:

Code: Select all

: DO POSTPONE (DO) HERE ; IMMEDIATE COMPILE-ONLY
: LOOP POSTPONE (LOOP) , POSTPONE UNLOOP ; IMMEDIATE COMPILE-ONLY
If you want to know why 32 bits make life easier, compare how long this code is to what Mike wrote :-) .
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

And heeeere's +LOOP:

Code: Select all

l_pploop:       bra a_pploop
                .byte NC+CO+$07 
                .word l_abs     ; link to ABS
                .word z_pploop
                .byte "(+LOOP)"
.scope
a_pploop:       clc
                pla             ; LSB of index
                adc 1,x         ; LSB of step
                tay             ; temporary storage of LSB

                clv
                pla             ; MSB of index
                adc 2,x         ; MSB of step
                pha             ; put MSB of index back on stack

                tya             ; put LSB of index back on stack
                pha

                inx             ; dump step from TOS 
                inx

                bvs _hack+3     ; skip over JMP instruction

_hack:          ; This is why this routine must be natively compiled: We 
                ; compile the opcode for JMP here without an address to 
                ; go to, which is added by the next address after +LOOP
                .byte $4C

z_pploop:       rts             ; never reached
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by barrym95838 »

Look's pretty tidy Scot! My only nit to pick is your unnecessary use of CLV before ADC ... at least I'm pretty sure that it's unnecessary! You are progressing more quickly than I am in learning Forth ... I have no clue what your high-level words are doing (at least how they're doing it).

Mike
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

Thanks! When I find the time, I'll see if the variant using direct manipulation of the Stack Pointer as in (LOOP) or the TAY version as in (+LOOP) is better; I'm tending towards the later. To be honest, I'm not sure about the CLV, but wanted to be safe.

As for the high-level commands -- I went crazy trying to understand POSTPONE at first. It doesn't help that it does different things based on if the command is IMMEDIATE or normal. What happens (and I am almost completely sure of this) is that at run-time, the POSTPONE causes the code of (DO) to be copied into the new word even though it is actually IMMEDIATE. Then HERE is executed because of the IMMEDIATE, leaving the current address cell on the Data Stack. So that is DO.

When it is time for LOOP, the code of (LOOP) is again copied into the new word and not executed because of POSTPONE, while the , ("comma") causes the address that DO put on the Data Stack to be saved right after the assembled (LOOP) code. POSTPONE UNLOOP just adds the code to drop the top two entries (four bytes) off the Return Stack -- the Index and Limit of the loop.

The part that took me so long to understand is that (LOOP) just compiles the opcode of the JMP instruction -- but not the operand! -- into place. The address is provided by the HERE instruction of (DO) in combination with the , ("comma") later. It only really clicked when I DUMPed the compiled word and went through a hard copy of the machine code with a pencil. But don't tell that to anybody.

The drawback is that I don't see a way to easily reuse the code of say (LOOP) for (+LOOP) because a BRA wouldn't have the right target. So there is a certain amount of duplication here. Once I get everything coded -- I'm still missing a bunch of very basic stuff like LEAVE -- I'll go back and try to figure a way around this. It seems silly to have so much code twice.
Brad R
Posts: 93
Joined: 07 Jan 2014
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by Brad R »

scotws wrote:
I went crazy trying to understand POSTPONE at first. It doesn't help that it does different things based on if the command is IMMEDIATE or normal. What happens (and I am almost completely sure of this) is that at run-time, the POSTPONE causes the code of (DO) to be copied into the new word even though it is actually IMMEDIATE. Then HERE is executed because of the IMMEDIATE, leaving the current address cell on the Data Stack. So that is DO.
Your understanding is correct. DO is executed during compilation because it is an IMMEDIATE word. Inside DO, POSTPONE causes (DO) to be compiled into the new word. Then HERE is executed, leaving an address on the stack.

If you're familiar with fig-Forth, here is one way to think of POSTPONE:
POSTPONE, if applied to an IMMEDIATE word, does the action of [COMPILE].
POSTPONE, if applied to a non-IMMEDIATE word, does the action of COMPILE.

So, on an indirect-threaded Forth,
POSTPONE DO will cause the cfa of DO to be compiled into this word, even though DO is IMMEDIATE. (Strictly speaking, the function of POSTPONE is to cause this word to perform the compile-time action of DO.)
POSTPONE (DO) will cause the cfa of (DO) to be compiled into the new word being defined.

Since POSTPONE is performing a compilation function, any word that uses POSTPONE should be an IMMEDIATE word (executed during compilation).
Because there are never enough Forth implementations: http://www.camelforth.com
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by BigDumbDinosaur »

scotws wrote:
To be honest, I'm not sure about the CLV, but wanted to be safe.
In this case it isn't necessary.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by barrym95838 »

scotws wrote:
... The drawback is that I don't see a way to easily reuse the code of say (LOOP) for (+LOOP) because a BRA wouldn't have the right target. So there is a certain amount of duplication here. Once I get everything coded -- I'm still missing a bunch of very basic stuff like LEAVE -- I'll go back and try to figure a way around this. It seems silly to have so much code twice.
I'm not sure that my idea would work well on the 'c02 (actually I'm not even sure that it'll work well on the 'm32 yet), but I found that the easiest way to avoid repetition in the primitives for (LOOP) and (+LOOP) was to code (LOOP) as (1 +LOOP). Pushing a 1 on the dstack isn't an expensive operation on the 'm32, and I've been concentrating on code density over speed anyway. Am I missing something important with respect to their immediate counterparts, and the way in which they make use of these run-time versions?

Mike

[I think that your (+LOOP) looks tidier than your (LOOP), all other things being equal ...]
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

barrym95838 wrote:
(...) but I found that the easiest way to avoid repetition in the primitives for (LOOP) and (+LOOP) was to code (LOOP) as (1 +LOOP).
Ah. I, well, uh ... wait ...

Code: Select all

 ": LOOP POSTPONE 1 POSTPONE (+LOOP) , POSTPONE UNLOOP ; IMMEDIATE COMPILE-ONLY
Works perfectly, and we can ditch (LOOP) completely. Now I feel like an idiot, but the code is better :-) . Thanks!
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by barrym95838 »

Sometimes a student can help another student more easily than the teacher can, but I hesitate to claim this small victory until a professor wanders by and gives our idea a "thumbs-up". It looks like a classic trade-off of compactness vs. speed, and I almost always lean toward the former when push comes to shove.

Mike

[I am really enjoying your company on this little Forth "adventure of discovery", and I selfishly encourage you to keep us up to date on your journey.]
Last edited by barrym95838 on Wed Nov 12, 2014 4:56 pm, edited 1 time in total.
Brad R
Posts: 93
Joined: 07 Jan 2014
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by Brad R »

That should certainly work, provided that you have defined 1 as a Forth word (e.g. a Forth CONSTANT). If 1 is normally compiled as a numeric literal, you'd have to do something like POSTPONE LIT 1 , (where LIT is the run-time code for numeric literals).

1 is so commonly used, though, that most Forths do define it as a constant.
Because there are never enough Forth implementations: http://www.camelforth.com
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

It actually looks like this:

Code: Select all

; ONE ( -- 1 ) ("1")
; Commonly used number, hard-coded for speed

l_one:          bra a_one
                .byte NC+$01 
                .word l_zero    ; link to ZERO ("0")
                .word z_one
                .byte "1"

a_one:          dex
                dex
                lda #$01
                sta 1,x         ; LSB
                stz 2,x         ; MSB

z_one:          rts
I've included 0 and 2 as well.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by barrym95838 »

I have noticed that your PSP (somewhat unusually) points to the next open stack byte, like the RSP (6502-style). Was this intentional, or did it just happen that way?

The 'm32 does it with a

Code: Select all

        sta ,-x
        lda #1
because a holds TOS and x points directly at NOS, very similarly to the way a 6809 would likely do it (with D and U).

Mike
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by scotws »

I did that so I don't get confused about which stack has what structure, so yes, the criterium was "the same as the system stack". Farely early on, I made this Ascii drawing of the Data Stack:

Code: Select all

          $00  +---------------+  <-- SPMAX
               |           ... |
               +-             -+
               |               |  $FE,x
               +-   (Empty)   -+
               |               |  $FF,x
               +-             -+
               |               |  <-- Stack Pointer (X Register)
               +===============+
               |              L|  $1,x
               +-    Cell     -+
               |              M|  $2,x
               +---------------+
               |              L|  $3,x
               +-    Cell     -+
          $7F  |              M|  $4,x   <-- SP0
               +===============+
As a style rule, I try move the Stack Pointer before I access the data so I don't have code like STA $FF,X . If I remember correctly, FIG Forth does that at various times, and I found it confusing at first.
User avatar
Dr Jefyll
Posts: 3525
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by Dr Jefyll »

scotws wrote:
As a style rule, I try move the Stack Pointer before I access the data so I don't have code like STA $FF,X . If I remember correctly, FIG Forth does that at various times, and I found it confusing at first.
It's true that in a few places 6502 Fig is coded to look at the cell that is "topper" than top of stack. :D That trick can save some bytes and cycles, and in the original context it's 100% reliable -- for example it's not like the hardware stack, which might get borrowed and written to at any time due to an interrupt. AFAIK the only pitfall is for an '816 in Native Mode, which won't "wrap" the $FF,X access within page zero as intended but will go into page 1 instead. Your decision to always move the Stack Pointer (X, that is) first eliminates even that unlikely issue.

As for having X indicate the first byte free (rather than the first byte used), that wouldn't be my choice but it does offer consistency with the quirky 6502 stack. (I'd rather deal with one quirky stack than two.) Just a reminder -- make sure your policy is duly accounted for in the word SP@ if you have it.

Brad R wrote:
If you're familiar with fig-Forth, here is one way to think of POSTPONE:
POSTPONE, if applied to an IMMEDIATE word, does the action of [COMPILE].
POSTPONE, if applied to a non-IMMEDIATE word, does the action of COMPILE
Thanks, Brad. Fig is a good point of reference, and I appreciate the clarification. :)

- Jeff
[Edit: "access within page zero as intended but will go into page 1 instead"]
Last edited by Dr Jefyll on Fri Nov 14, 2014 5:55 am, edited 1 time in total.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: 65c02 ANSI-compatible Forth looping using the V flag

Post by BigDumbDinosaur »

Dr Jefyll wrote:
AFAIK the only pitfall is for an '816 in Native Mode, which won't "wrap" the $FF,X access within page 1 as intended but will go into page 2 instead.
You mean page zero and page 1?
x86?  We ain't got no x86.  We don't NEED no stinking x86!
Post Reply