6502.org • Is this the optimal 6502 NEXT? - Page 2

Page 2 of 5

Posted: Wed Aug 25, 2010 7:45 pm

by chitselb

The project thus far: http://github.com/chitselb/pettil

The good news is that NEXT / ENTER / EXIT / EXECUTE are running, and NEXT is a mere 17 cycles.

The bad news is that it lacks a README, and the ability to compile for that matter. Feel free to throw down and email your diffs to me at gmail (hint: It's 'chitselb' there too)

For development work, I'm using Linux Mint 9 running:
vice 2.1.dfsg-3 The Versatile Commodore Emulator
xa65 2.3.5-1 cross-assembler and utility suite for 65xx/65816 processors

Charlie

Posted: Thu Aug 26, 2010 2:07 am

by dclxvi

some fool who sucks at cut and paste wrote:

An optimized version of the UM/MOD fix can be found here (along with the UM* fix):

Whoops, wrong link. Here is the correct link.

http://6502org.wikidot.com/errata-software-figforth

kc5tja wrote:

Though in this case, I'd use Y as the data stack pointer, and reference the data stack in direct-page using absolute addresses.

<snip>

I think it's only one cycle longer than DP,X mode, which shouldn't be much of an issue, I'd think.

Only writes are one cycle longer (5 vs. 4). For non-writing instructions (assuming no page boundary crossings) abs,Y and zp,X are both 4 cycles. However, unlike zp,X, abs,Y is not available with INC, DEC, STZ, and shifts, so words like 1+ and 2* (CELLS) would be several more cycles.

I would (blindly) guess that most words change only 1 or 2 cells, so that would be 2 or 4 cycles (one per byte per cell), for the STAs.

kc5tja wrote:

The only difficulty would come from when you need to invoke @ and !, I think, as here you'd need to tweak the Y register and a reserved direct-page location to serve as your pointer.

Well, for C@ (and C!) you could use self-modifying code, if the data stack were still on the zero page:

Code: Select all

; C@
   STY :1+1
:1 LDA (0)
   STA 0,Y
   LDA #0
   STA 1,Y
   JMP NEXT

but it might be better to split the data stack into high byte stack and a low byte stack, since only C@ and C! (and maybe COUNT if it were implemented as primitive) would seem to have any benefit from keeping the high and low bytes together. That way DROP would only be 1 INY rather than 2 INXs, so you'd get a couple of cycles back that you lost elsewhere from the STA abs,Y vs. STA zp,X. Likewise for pushing onto the stack with DUP and co. The split would seem to benefit a lot more words.

However, abs,Y is of course 1 byte longer than zp,X, and there are 2 of those for every 1 byte savings from 1 INY vs. 2 INXs. Plus there's the additional space (and as you mentioned, time) of copying to the zero page so that it can be used as a pointer. So primitives will take up more space, for what may be on average only a slight speed improvement at best. Also, in the 17 cycle NEXT, Y can be overwritten, without penalty (see below), whereas in the 12 cycle NEXT both index registers are used and must be preserved. I seem to have underestimated the 12-cycle NEXT, but I'm still not sure I see a decisive advantage there.

Anyway, shifting gears back to the 17 cycle NEXT, is there a benefit to requiring primitives to preserve Y to save the 2 cycle LDY #1? LDA #imm takes the same 2 cycles as TYA. In other words, why not this:

Code: Select all

NEXT  LDA #2
NEXT1 CLC
      ADC :1+1
      STA :1+1
      BCS :2
:1    JMP ($FFFF)
:2    INC :1+2   ; this doesn't affect the carry so the BCS always branches
      BCS :1     ; or JMP to ensure it always takes 3 cycles

You could then JMP to NEXT1 (after loading the accumulator appropriately, of course) for words that need to advance more than two bytes, e.g.:

Code: Select all

DW LIT,0
DW ZBRANCH,label

Posted: Sat Aug 28, 2010 5:21 pm

by chitselb

GARTHWILSON wrote:

Also, take a look at my article on servicing interrupts in high-level Forth with no overhead, as the method involves NEXT.

I saw that too, very nice work. And I also saw that stuff about the fig UM* bug. I'm not sure I need an interrupt handler just yet, but I'm glad your stuff is there because at some future time I will probably wish to introduce preemptive (involuntary) multitasking.

Quote:

It would take some convincing for me to believe that splitting the stack and having to keep moving things in and out of a fixed-address TOS pair of ZP bytes would be more efficient than keeping byte pairs together in every cell and doing indexing and doing the INX or DEX twice to add or drop stack cells.

What works to convince me is objective facts. I'm humble that way. To be perfectly honest, I'm not sure having two-byte TOS and 32-byte STACKL & STACKH regions on zero page will actually clock out faster than the simpler, traditional, figForth-like double-DEX STACK on zero page approach. So if this approach clocks out faster (in the aggregate) than, say, figForth, I'm happy. This leads to the question, what does "better in the aggregate" mean in this context? To me it means that an average mix of code runs faster.

A side-by-side example

Code: Select all

+
PETTIL                         figForth
pluslfa .word $adde            L632  .BYTE $81,$AB
        .byt (plus-*-1)|bit7         .WORD L619     ; link to V-ADJ
        .asc "+"|bit7          PLUS  .WORD *+2
plus    lda tos                      CLC
        adc stackl,x                 LDA 0,X
        sta tos                      ADC 2,X
        lda tos+1                    STA 2,X
        adc stackh,x                 LDA 1,X
        sta tos+1                    ADC 3,X
        inx                          STA 3,X
        jmp next                     INX
                                     INX
                                     JMP NEXT

PETTIL guarantees to primitives that NEXT will always exit with Carry clear and Y=1, saves two clock cycles here. Still, I get 25 cycles for PETTIL '+' vs. 33 cycles for figForth '+'. Not bad.

Unrelated to any of that, I'm thinking about the EDITOR. Of course I'll include the "Starting Forth" editor because most Forth programmers have seen it if they've ever read that book, and who hasn't, right? But I really liked the volksForth64 full screen editor too.

I am envisioning something that looks as much as possible like the ROM screen editor, including the 40/80-column stuff. There's no Control key on original PET or PET 2001-N keyboards, but RVS could do the trick and it's even positioned correctly. For example RVS-S could mean "flush the current block" and RVS-W could mean "toggle to the shadow block" and RVS-RVS could mean "act like I just typed a RVS". Another thing the volksForth editor has is a clear visual indication that you're in the editor (it changes the border color). For PETTIL I could flash the screen when the user presses RVS.

Case sensitive? I think not. A programmer writing a card game could define the shifted A,S,Z,X PETSCII card suit characters as Forth words, etc...

Speeding up Dictionary searches: figForth searches the entire dictionary for a blank-delimited string and then throws it to NUMBER. My approach to this is to break the dictionary up into 16 threads instead of one. A very fast and simple hashing scheme (XOR all the nybbles of the text portion of the word together) will give a hashkey between 0 and 15. A table of 16 words will point to the most recently linked LFA for each of those 16 hashkey values. Two new words RETHREAD and UNTHREAD will toggle every LFA between being threaded into this scheme (RETHREAD) and being set to the constant $DE $AD (UNTHREAD). Otherwise, implementing FORGET would be lots more difficult. The dictionary will be unthreaded at startup and COLD will invoke RETHREAD to get things working.

I'm hosting it all on github.com http://github.com/chitselb/pettil

Posted: Sat Aug 28, 2010 10:18 pm

by bogax

chitselb wrote:

PETTIL guarantees to primitives that NEXT will always exit with Carry clear and Y=1, saves two clock cycles here. Still, I get 25 cycles for PETTIL '+' vs. 33 cycles for figForth '+'. Not bad.

I don't understand the utility of NEXT clearing the carry and setting Y
It seems to me you would always be taking the time whether it's
needed or not and it would likely only save you a few bytes here and
there (?)

Posted: Sat Aug 28, 2010 10:46 pm

by GARTHWILSON

Quote:

because at some future time I will probably wish to introduce preemptive (involuntary) multitasking.

The 6502 isn't very good at multitasking, but doing it in Forth with non-preemptive multitasking is easy, and would give a lot less overhead than preemptive. With a real-time system, I think the delays from preemptive multitasking's task switches would be prohibitive. With a good interrupt system though, I have found I don't really need multitasking. I'm not doing the kinds of things we do on the desktop computers, but I have had background programs running on the workbench computer while at the same time it interprets, compiles, or assembles source code coming in over the RS-232 from the PC, and there are several kinds of interrupts hitting all the time and getting serviced to make all this happen.

Quote:

PETTIL guarantees to primitives that NEXT will always exit with Carry clear and Y=1, saves two clock cycles here.

FIG-Forth's NEXT leaves Y=0 and then some of the words take advantage of that so they don't have to zero it first. I had considered making NEXT leave C unaffected to make it easier to do multi-precision arithmetic in secondaries.

Quote:

Speeding up Dictionary searches:

We discussed this here. The idea there I think I like most is cacheing search results, but I still have not implemented it.

Posted: Sun Aug 29, 2010 1:15 am

by chitselb

bogax wrote:

chitselb wrote:

PETTIL guarantees to primitives that NEXT will always exit with Carry clear and Y=1, saves two clock cycles here. Still, I get 25 cycles for PETTIL '+' vs. 33 cycles for figForth '+'. Not bad.

I don't understand the utility of NEXT clearing the carry and setting Y
It seems to me you would always be taking the time whether it's
needed or not and it would likely only save you a few bytes here and
there (?)

Here's my NEXT:

Code: Select all

putya  sty tos+1         ; YA to the TOS register
puta   sta tos           ; your mom
next1  ldy #1            ; entry to reset Y = 1
next   tya         ;[2]  ; entry for most purposes 
nexta  sec         ;[2]  ; adds an arbitrary amount to IP
       adc ip      ;[3]
       sta ip      ;[3]
       bcs nextp   ;[2]
ip = *+1                 ; dummy value $CAFE
nexto  jmp ($cafe) ;[5]  ; outtie!  doesn't alter IP, Y or C
nextp  inc ip+1   ;[17]  ; next page
nexty  ldy #1            ; like nexto, also fixes CLC & Y
nextx  clc               ; just fix CLC, then nexto
       bcc nexto       ; bra

My presumption here is that NEXT gets executed immediately prior to any other word. NEXT itself needs to set Y to be 1 (or 0) anyway to do the indirect two-byte lookup. As long as we're in there, we may as well establish a contract with primitives. The alternative is to leave with C = clear except when we're flipping IP across a page boundary.

The unnecessary three bytes are the instructions at 'nexty' and 'nextx'. For the price of 2 clock cycles and 1 byte, (which is a price I only pay only whenever the IP crosses a page boundary), I get to know what the C flag is at the start of every primitive. The cost (2 clocks/2 bytes for ldy #1) and logic apply similarly to Y. By carefully choosing which entry point into NEXT at the end of a primitive, I shave even more clocks here and there too. For me, coding on the 6502 is a logic game, a place to entertain myself by optimizing. A little ridiculous, a lot of fun.

Charlie

Posted: Sun Aug 29, 2010 1:37 am

by chitselb

Consider this revision. It does away with the "Y=1 C=clear after NEXT" contract, and also the handy put stuff that stores the YA register pair to TOS. It still runs in 17 cycles. For clocks, it's a wash. The only win here is I free up some bytes of zero page. And I can use the Y register for something else. And it sort of simplifies things vs. the byzantine NEXT routine in the previous message.

The cost of this move is now every primitive that puts a value to TOS will have to handle that itself before calling NEXT. Every primitive that wants to use the Y register or the C flag will have to initialize them.

Code: Select all

minimum wage NEXT

next   lda #2      ;[2]  ; entry point for just about everything
nexta  clc         ;[2]  ; adds an arbitrary amount (passed in A) to IP
       adc ip      ;[3]
       sta ip      ;[3]
       bcs nextp   ;[2]
ip = *+1                 ; dummy value $CAFE
nexto  jmp ($cafe) ;[5]  ; outtie!
nextp  inc ip+1   ;[17]  ; next page
       bcs nexto         ; bra

But wait a minute! Maybe you're onto something here. Now I can use Y for something else, besides always walking around with a 1 in there. What would I use it for?

Posted: Mon Aug 30, 2010 3:19 am

by chitselb

I've had a day to think about it. It's not a good design, or at least it doesn't seem very Forth-like to me, to have a contract between the Forth virtual machine and primitives as far as the 6502 register contents. So before one or many of you try to convince me of the folly of my ways, here's what I've come up with after a day of soul searching, away from the keyboard. Same 17 clocks. It only affects the S and Z flags, and doesn't affect *any* other registers, so primitives can use A and Y to pass values between themselves. This is it, as fast, uncomplicated and minimal as I could make it.

The compiler's duty is to never break a word in half at a page boundary because of the JMP ($xxFF) bug. It will do this by inserting (or not) a NOP $EA *after* the JMP ENTER at the beginning of a colon definition.

Code: Select all

new and improved NEXT and zero page usage

stackl  = $00
stackh  = $3a
n       = $74
up      = $7c
tos     = $7e
0080 next    inc ip+1
0082         inc ip+1
0084         beq nextp
ip      = *+1
0086 nexto   jmp ($cafe)
0089 nextp   inc ip+1
008b         bne nexto

Posted: Mon Aug 30, 2010 3:40 am

by kc5tja

Register contracts already exist: for example, on x86 direct-threaded Forth implementations, it's very often the case that (E)SI holds the program counter, while (E)BX holds the top of stack.

That being said, I'm actually caught by surprise by this implementation. It's small, quick, and easy to understand.

Although, the code as listed is buggy:

Code: Select all

. . .
inc ip+1  ; should be inc ip
inc ip+1  ; should be inc ip
. . .

Posted: Mon Aug 30, 2010 5:14 am

by bogax

chitselb wrote:

The compiler's duty is to never break a word in half at a page boundary because of the JMP ($xxFF) bug. It will do this by inserting (or not) a NOP $EA *after* the JMP ENTER at the beginning of a colon definition.

So since you're doing that any way why don't you create a NEXTPAGE
primitive and get the compiler to insert a NEXTPAGE and lose the page
increment in NEXT?

Posted: Mon Aug 30, 2010 7:18 am

by chitselb

bogax wrote:

So since you're doing that any way why don't you create a NEXTPAGE
primitive and get the compiler to insert a NEXTPAGE and lose the page
increment in NEXT?

Where "that" is a word which here means...?

chitselb wrote:

The compiler's duty is to never break a word in half at a page boundary because of the JMP ($xxFF) bug. It will do this by inserting (or not) a NOP $EA *after* the JMP ENTER at the beginning of a colon definition.

I was wrong. I meant to say *before* instead of *after*

Here's my code for ENTER: Please notice particularly the bad ASCII art in the comment

Code: Select all

;--------------------------------------------------------------
;
;       ENTER
;
; IP -> [-RP]
; [IP] -> W              ; W points to the CFA of this secondary
; W+sizeof(CFA) -> IP
; NEXT
;
; ENTER is headerless, aka DOCOLON in some Forths
;
; Consider how these three secondaries cross a page boundary
;
; : foo   ( n1 -- n1 n2 n1 )  dup 2+ over ;
;                       f8  f9  fa  fb  fc  fd  fe  ff| 00  01
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+
; | 3 |   FOO     |jmp enter  | dup   |twoplus| over  | exit  |
; |<----NFA------>|<--CFA---->|<-----------PFA--------|------>+
;
; : bar   ( n1-- n2 ) 42 swap - ;
;                       f8  f9  fa  fb  fc  fd  fe  ff| 00  01  02  03 
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+---+---+
; | 3 |   BAR     |nop|jmp enter  |clit   |42 | swap  | minus | exit  |
; |<----NFA------>|<----CFA------>|<----------PFA-----|-------------->+
;
; : baz   ( -- ) ." pong" ;
;                       f8  f9  fa  fb  fc  fd  fe  ff| 00  01  02
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+---+
; | 3 |   BAZ     |jmp enter  | (.")  | 4 |  "pon"    |"g"| exit  |
; |<----NFA------>|<--CFA---->|<----------PFA---------|---------->+
;
; There are two ways for a primitive to get here
;
;(a)    NOP
;       JMP ENTER
;
;(b)    JMP ENTER
;
; The NOP is added by the compiler, as necessary, to align words within
; colon definitions on a page boundary.  More often than not, NOP won't be
; present because more often than not, secondaries don't cross a page.
;
; Since the opcodes for NOP and JMP are $EA and $4C respectively, we
; use the LDA instruction and check the N flag.
; 
; The NOP comes *before* the JMP for a couple of reasons.
; 1) The Y register already has a zero in it after copying (IP) -> W
; 2) If the NOP is stored after the JMP, there's no way to distinguish
;    between a NOP and a CFA with a low-order byte of $EA
;
; IP needs to point at the PFA of the secondary we're entering, so knowing
; whether a secondary's CFA takes up 3 or 4 bytes is pretty important.
;
enter           lda ip+1        ;[3]
                pha             ;[3]
                lda ip          ;[3]
                pha             ;[3]    ; IP -> [-RP]
                ldy #0          ;[2]
                clc             ;[2]    ; assume JMP
                lda (ip),y      ;[5]
                bpl l021        ;[2|3]  ; JMP or NOP?
                sec             ;[2|0]  ; guess we were wrong.  It's NOP
l021            adc #3          ;[2]    ; A = <(W+3) (maybe 4)
                pha             ;[3]
                tya             ;[2]
                iny             ;[2]
                adc (ip),y      ;[5]    ; A = >(W+3) (maybe 4)
                sta ip+1        ;[3]
                pla             ;[4]
                sta ip          ;[3]
                jmp nexto       ;[3]

With FOO, there is no issue with splitting a word at a page boundary crossing.

For BAR, when the word is being compiled into the dictionary, when it gets to MINUS the compiler will notice that DP = $xxFF. So it slides (CMOVE>) everything (from the first byte of the CFA of BAR to the byte at $xxFE) upward one byte. Then it stores $EA at the first byte of BAR's CFA. Finally it compiles in the MINUS at $xx00 of the following page.

For BAZ, the IP is incremented (and the page boundary crossed) by (.") so again there is no issue.

I'm not thrilled with this ENTER routine. It seems like a pretty expensive way of doing things. If anybody has a better idea, I'm all ears. Still, I'm willing to pay the price if it keeps NEXT at 17 cycles. I'm also unclear what you mean, exactly how invoking a NEXTPAGE primitive fits in with this version of ENTER. Can you please clarify, perhaps with better ascii art than what I came up with?

Posted: Mon Aug 30, 2010 7:55 am

by bogax

chitselb wrote:

With FOO, there is no issue with splitting a word at a page boundary crossing.

For BAR, when the word is being compiled into the dictionary, when it gets to MINUS the compiler will notice that DP = $xxFF. So it slides (CMOVE>) everything (from the first byte of the CFA of BAR to the byte at $xxFE) upward one byte. Then it stores $EA at the first byte of BAR's CFA. Finally it compiles in the MINUS at $xx00 of the following page.

The page incrementing get's moved to a primitive so NEXT becomes:

Code: Select all

 inc ip
 inc ip
ip  = *+1
 jmp(xxxx)

NEXTPAGE would be something like:

Code: Select all

 inc ip+1
 jmp NEXT

then the compiler inserts NEXTPAGE where it's needed

Code: Select all

; : foo   ( n1 -- n1 n2 n1 )  dup 2+ over ; 
;                       f8  f9  fa  fb  fc  fd  fe  ff| 00  01  02  03
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+---+---+ 
; | 3 |   FOO     |jmp enter  | dup   |twoplus|NEXTPAG| over  | exit  | 
; |<----NFA------>|<--CFA---->|<-----------PFA--------|-------------->+ 
; 
; : bar   ( n1-- n2 ) 42 swap - ; 
;                       f8  f9  fa  fb  fc  fd  fe  ff| 00  01  02  03  04  05 
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+---+---+---+---+ 
; | 3 |   BAR     |nop|jmp enter  |clit   |42 |NEXTPAG| swap  | minus | exit  | 
; |<----NFA------>|<----CFA------>|<----------PFA-----|-------------->+---+---+

If the jmp enter happens to end on a page boundary with no room for
NEXTPAGE :

Code: Select all

;                       fd  fe  ff| 00  01  02  03  04| 05  06  07  08  09 
; +---+---+---+---+---+---+---+---+---+---+---+---+---|---+---+---+---+---+
; | 3 |   BAR     |nop|nop|jmp enter  |clit   |42 | swap  | minus | exit  | 
; |<----NFA------>|<----CFA------>|<----------PFA-----|------------------>+

you could jsr enter and do something like:

Code: Select all

ENTER
 pla             ;(4)
 sta ip_lo_temp  ;(3)
 pla             ;(4)
 sta ip_hi_temp  ;(3)
 lda ip+1        ;(3)
 pha             ;(3)
 lda ip          ;(3)
 pha             ;(3)
 lda ip_hi_temp  ;(3) 
 sta ip+1        ;(3)
 lda ip_lo_temp  ;(3)
 sta ip          ;(3)
 jmp NEXT2       ;(3)


NEXT
 inc ip
NEXT2
 inc ip
ip = *+1
 jmp(xxxx)

or:

Code: Select all

ENTER
 pla             ;(4)
 tay             ;(2)
 pla             ;(4)
 sta ip_hi_temp  ;(3)
 lda ip+1        ;(3)
 pha             ;(3)
 lda ip          ;(3)
 pha             ;(3)
 lda ip_hi_temp  ;(3) 
 sta ip+1        ;(3)
 iny             ;(2)
 sty ip          ;(3)
 jmp NEXTTO      ;(3)

NEXT 
 inc ip 
 inc ip 
ip = *+1 
NEXTTO 
 jmp(xxxx)

Edit: this assumes that words don't cross page boundaries.

Posted: Mon Aug 30, 2010 2:31 pm

by chitselb

What about this?
CODE NEXTPAGE2 IP 1+ INC, NEXT JMP, END-CODE
CODE NEXTPAGE3 IP INC, NEXPAGE2 JMP, END-CODE

Then [COMPILE] changes from
: [COMPILE] ( -- ) ?COMP ' , ;
to
: [COMPILE] ( -- ) ?COMP
here $00ff and $00fe = if ' nextpage2 , then
here $00ff and $00fd = if ' nextpage3 , $ea c, then
' , ;

and we do something similar with COMPILE

Posted: Mon Aug 30, 2010 3:31 pm

by chitselb

Code: Select all

;               fb  fc  fd  fe  ff| 00  01  02  03  04| 05  06  07
; +---+---+---+---+---+---+---+---|---+---+---+---+---|---+---+---+
; | 3 |   BAR     | jmp enter | cl|it |42 | swap  | minus | exit  |
; |<----NFA------>|<---CFA--->|<--|-----------PFA---------------->+
                                  ^ yikes!

In the (broken) case above, the JMP ENTER instruction ends at $xxFE leaving no room for the odd-aligned pagebreak3 handler. There would have to be an alternate entry point to ENTER, let's call it ENTERP. After the ":" word creates the name field, it checks for HERE = $xxFC and if so, it pads the CFA with an extra NOP byte after JMP ENTERP. Otherwise it just encloses JMP ENTER.

Code: Select all

;               fb  fc  fd  fe  ff| 00  01  02  03  04| 05  06  07  08
; +---+---+---+---+---+---+---+---|---+---+---+---+---|---+---+---+---+
; | 3 |   BAR     | jmp enter |nop| clit  |42 | swap  | minus | exit  |
; |<----NFA------>|<----CFA------>|<----------PFA-------------------->+

ENTERP would be

Code: Select all

enterp          sec                     ; special case CFA = $xxFC
                bcs enter+1
enter           clc
                lda ip+1        ;[3]
                pha             ;[3]
                lda ip          ;[3]
                pha             ;[3]    ; IP -> [-RP]
                ldy #0          ;[2]
                lda (ip),y      ;[5]
l021            adc #3          ;[2]    ; A = <(W+3+C)
                pha             ;[3]
                tya             ;[2]
                iny             ;[2]
                adc (ip),y      ;[5]    ; A = >(W+3+C)
                sta ip+1        ;[3]
                pla             ;[4]
                sta ip          ;[3]
                jmp nexto       ;[3]

Posted: Mon Aug 30, 2010 3:55 pm

by chitselb

wups. fixed:

Code: Select all

entern          inc ip+1                ; special case CFA = $xxFD, $xxFE, $xxFF
                bne enter
enterp          inc ip+1
                sec                     ; special case CFA = $xxFC
                bcs enter+1
enter           clc
                lda ip+1        ;[3]
                pha             ;[3]
                lda ip          ;[3]
                pha             ;[3]    ; IP -> [-RP]
                ldy #0          ;[2]
                lda (ip),y      ;[5]
l021            adc #3          ;[2]    ; A = <(W+3+C)
                pha             ;[3]
                tya             ;[2]
                iny             ;[2]
                adc (ip),y      ;[5]    ; A = >(W+3+C)
                sta ip+1        ;[3]
                pla             ;[4]
                sta ip          ;[3]
                jmp nexto       ;[3]