Is this the optimal 6502 NEXT?

bogax · Post by **bogax** » Mon Aug 30, 2010 8:13 pm

chitselb wrote:

What about this?
CODE NEXTPAGE2 IP 1+ INC, NEXT JMP, END-CODE
CODE NEXTPAGE3 IP INC, NEXPAGE2 JMP, END-CODE

Then [COMPILE] changes from
: [COMPILE] ( -- ) ?COMP ' , ;
to
: [COMPILE] ( -- ) ?COMP
here $00ff and $00fe = if ' nextpage2 , then
here $00ff and $00fd = if ' nextpage3 , $ea c, then
' , ;

and we do something similar with COMPILE

Understand, I'm not advocating anything.

But I would point out that if you jsr enter you only ever need
to insert one nop and the adjustment to the address you're calling
gets done (mostly) in hardware. You're guaranteed to have to make
an adjustment (so no branching in enter) but if you pass it through y
that only cost 2 cycles (plus 3 cycles for jsr versus jmp, total 5)

On the other hand if a pass through NEXT and NEXTPAGE costs
23, cycles you can afford to insert 11 nops and still save a cycle
at the cost of some bytes of code (the nops) or two nops and
save 19 cycles at no cost in bytes (assuming the 5 cycles for
jsr versus jmp costs nothing, ie saves you as much or more in
fiddling with the address as it costs and it looks to me like that
would be true even if you didn't use y to adjust the address)

bogax · Post by **bogax** » Tue Aug 31, 2010 9:22 pm

bogax wrote:

The page incrementing get's moved to a primitive so NEXT becomes:

Code: Select all

 inc ip
 inc ip
ip  = *+1
 jmp(xxxx)

NEXTPAGE would be something like:

Code: Select all

 inc ip+1
 jmp NEXT

I did forget (and there for forget to mention) that this assumes that
words don't cross page boundaries.

chitselb · Post by **chitselb** » Wed Sep 01, 2010 3:36 am

Thanks, bogax! Going to JSR ENTER shaved at least 10 cycles from ENTER. It's still a pig at 41 clocks, but the ugly has to go somewhere. It assembles but I haven't walked through each and every scenario in the debugger yet.

This experience with cross-development in an emulator is amazing! I don't have to reset the machine and reload everything from tape, or try to make the assembler and the Forth co-resident, or save to tape every time I want to do something dangerous, or deal with editing source code in a 1000 character window without a browser, etc...

I'm pretty sure I have it. I pushed to http://github.com/chitselb/pettil The tiddlywiki file has a big chart illustrating various page boundary crossing scenarios (with color!) and I added this ENTER routine to the assembly source:

Secondaries have a code field that *ends* with JSR ENTER. In the CFA the return address + 1 is used to access the PFA, so putting the NOP after JSR would break things.

Within the secondary PAGE2 only gets inserted at $xxFE $xxFF, and PAGE3 only gets inserted at $xxFD-$xxFE followed by a NOP

Code: Select all

page3 inc ip  ; fall through
page2 inc ip+1
      jmp next

Code: Select all

;       ENTER
;
;           IP    -> -[RP]
;           W     -> IP
;           NEXT
; the JSR version
enter           pla             ;[4]
                tay             ;[2]
                pla             ;[4]
                sta n+1         ;[3]
                lda ip+1        ;[3]
                pha             ;[3]
                lda ip          ;[3]
                pha             ;[3]
                lda n+1         ;[3]
                sta ip+1        ;[3]
                iny             ;[2]
                beq l1          ;[2]
l2              sty ip          ;[3]
                jmp nexto       ;[3]
l1              inc ip+1
                bne l2

bogax · Post by **bogax** » Wed Sep 01, 2010 6:11 pm

bogax wrote:

But I would point out that if you jsr enter you only ever need
to insert one nop and the adjustment to the address you're calling
gets done (mostly) in hardware.

I misspoke there.
You still need room for the page increment, so you might need
to insert two nop's.

chitselb wrote:

I'm pretty sure I have it. I pushed to http://github.com/chitselb/pettil The tiddlywiki file has a big chart illustrating various page boundary crossing scenarios (with color!) and I added this ENTER routine to the assembly source:

Secondaries have a code field that *ends* with JSR ENTER. In the CFA the return address + 1 is used to access the PFA, so putting the NOP after JSR would break things.

Within the secondary PAGE2 only gets inserted at $xxFE $xxFF, and PAGE3 only gets inserted at $xxFD-$xxFE followed by a NOP

Code: Select all

page3 inc ip  ; fall through
page2 inc ip+1
      jmp next

Code: Select all

;       ENTER
;
;           IP    -> -[RP]
;           W     -> IP
;           NEXT
; the JSR version
enter           pla             ;[4]
                tay             ;[2]
                pla             ;[4]
                sta n+1         ;[3]
                lda ip+1        ;[3]
                pha             ;[3]
                lda ip          ;[3]
                pha             ;[3]
                lda n+1         ;[3]
                sta ip+1        ;[3]
                iny             ;[2]
                beq l1          ;[2]
l2              sty ip          ;[3]
                jmp nexto       ;[3]
l1              inc ip+1
                bne l2

I still think you'd be better off inserting an extra nop rather than
making ENTER deal with page crossings (push it in to the compiler)

It occured to me that you ought to be able to build the page increment
into NEXT and call it explicitly as a primitive,

Code: Select all


PAGE3
 inc ip
PAGE2
 inc ip+1
NEXT
 inc ip
 inc ip
ip = *+1
 jmp (xxxx)

It would save a jmp, but probably not worth wasting zp on

chitselb · Post by **chitselb** » Wed Sep 01, 2010 7:55 pm

Inner interpreter works and it's checked in! Next comes RETHREAD, UNTHREAD, COLD, WARM, USER variables, and FIND

chitselb · Post by **chitselb** » Sat Sep 25, 2010 5:59 am

So far I've thrown in a Bloom filter (to almost always eliminate searching the dictionary where numeric literals are concerned), a Pearson hash (to balance out the division of the core vocabulary into 16 linked-list strands) and today I put Woz's Sweet-16 in there. I modified it to use the BRK instruction instead of a JSR call to save bytes when I'm going between Sweet-16 and 6502 modes. I'm very impressed, and hoping to save a lot of memory on not-so-time-critical routines with a lot of 16-bit pointer stuff. Zero page was getting pretty tight, so I just overlapped TOS, UP, and the 8-byte N area with the Sweet-16 registers. Synergy!

GARTHWILSON · Post by **GARTHWILSON** » Sat Dec 25, 2010 8:47 pm

dclxvi wrote:

GARTHWILSON wrote:

Make sure you also fix the bugs in UM/MOD and UM* which is called U* in FIG-Forth.

An optimized version of the UM/MOD fix can be found here (along with the UM* fix):

http://6502.org/tutorials/65c02opcodes.html

That link seems to need correction Bruce.

dclxvi · Post by **dclxvi** » Wed Jan 05, 2011 2:54 am

GARTHWILSON wrote:

That link seems to need correction Bruce.

I saw that and posted the right link a couple posts later. Unfortunately the wrong link's on one page, and the right link's on another. I've been more in the habit of posting replies than editing posts. People seem to be editing posts a lot more often than they ever used to, so perhaps I am simply behind the curve here.

BigEd · Post by **BigEd** » Wed Jan 05, 2011 7:13 am

dclxvi wrote:

I've been more in the habit of posting replies than editing posts. People seem to be editing posts a lot more often than they ever used to, so perhaps I am simply behind the curve here.

I've sometimes done both: edit the post to fix an error (tomorrow, people will find that post by search) and post a reply mentioning the update (today, people will see that reply)

But I think it's best to add a footnote to the edited post.

And I prefer only to see small edits: add an 'oops - see later post' if I need to make a major self-correction. (I don't like to break the existing conversation, and I'm mindful of how the forum will look to a new visitor a few months or years down the line.)

GARTHWILSON · Post by **GARTHWILSON** » Wed Jan 05, 2011 7:21 am

I've edited my posts sometimes even years later, usually because I realize I skipped a word or had a typo or something like that, and I want it to be as clear as originally intended. A few might have been for updating a URL that was correct at the time I originally posted but the web page got moved. I know people do look at these even years later-- because I'm one of them.

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Jan 05, 2011 5:34 pm

I am one of those heavy editors. For one I am a horrible writer. My thoughts are very jumbled, and difficult to get into a coherent straight thought sometimes...

Other times like in my PWA thread, I will frequently edit my last post so it doesn't always show up on top of the other threads, when ever I change plans/ideas which is quite often recently. I do reread a few posts before though to make sure it follows a coherent thought.

BigEd · Post by **BigEd** » Wed Jan 05, 2011 5:37 pm

Do be careful with that: your updates can easily be missed if someone saw your first effort and then you added a lot of text - that happened to me the other day with one of your updates. Of course, most readers aren't keeping up with new posts quite as obsessively as I sometimes do...

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Jan 06, 2011 10:31 pm

Noted.

I'll stop that practice after today.

Edit: You weren't harsh BigEd, I see how it can lead to problems. LOL! If I do have to though, I will date stamp it!

BigEd · Post by **BigEd** » Thu Jan 06, 2011 10:58 pm

Ta - sorry if that seemed a bit harsh!

chitselb · Post by **chitselb** » Fri Mar 21, 2014 6:34 am

I'm using a design where the compiler

realigns at page boundaries to work around the jmp ($xxff) bug
inserts a call to the word "page" when it compiles definitions (where they cross page boundaries.)

All page does when it executes is 'inc ip+1', to cross the page. That unburdens NEXT considerably, at the expense of compiler complexity

The old NEXT used 15 cycles

Code: Select all

next inc ip ; 5
     inc ip ; 5
ip = * + 1
     jmp ($0000) ; 5

This new and improved NEXT eats only 13 cycles, a substantial improvement!

Code: Select all

next lda ip ; 3
     adc #2 ; 2
     sta ip ; 3
ip = * + 1
     jmp ($0000) ; 5

Is this the optimal 6502 NEXT?

That Forth for the PET I've been working on

Re: Is this the optimal 6502 NEXT?