6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri May 10, 2024 1:17 am

All times are UTC




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Thu Sep 09, 2010 1:03 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
And this was the only place I could think of to ( confess | brag about ) what I have done. Here's my 2ROT primitive, which rotates the stack from ABC -> BCA. All values are four-byte doubles. The data stack is indexed by X and divided between several bytes of stackl and an equally sized stackh region. Top of stack is special gets its own 2-byte zero page storage.

Waddya think, sirs?

Code:
;--------------------------------------------------------------
;
;   2ROT   ( hi3 lo3 hi2 lo2 hi1 lo1 -- hi2 lo2 hi1 lo1 hi3 lo3 )
;
tworotlfa   .word $adde
      .byt (tworot-*-1)|bit7
      .asc "2RO","T"|bit7
tworot   dex
         lda tos+1
         sta stackh,x
         lda tos
         sta stackl,x ; enclose TOS into the indexed split stack
         lda #%001101
         sta tos      ; LSR-loop flags
         lda #$ca     ; dex
         sta l057     ; unnecessary except in case of NMI (improbable on a PET)
l056     ldy 4,x      ; pass through 4x
         lda 2,x      ; on each pass juggle three bytes
         sta 4,x      ; pass 1 = 0,stackl
         lda 0,x      ; pass 2 = 1,stackl
         sta 2,x      ; pass 3 = 0,stackh
         sty 0,x      ; pass 4 = 1,stackh
         lda l057
         eor #($e8^$ca)   ; toggle dex <-> inx
         sta l057
l057     dex
         lsr tos
         bcs l056
         txa
         eor #<(stackl^stackh)
         tax
         lsr tos
         bcs l056
         jmp pop


It should also be noted that stackl = $00 and stackh = $30. (I have about 18 more bytes available for stack but this makes it easier to debug in xpet), otherwise I'd have to begin with:
Code:
tworot dex
 lda tos+1
 sta stackh,x
 lda tos
 sta stackl,x ; enclose TOS into the indexed split stack
 stx tos+1
 clc
 txa
 adc #<stackl
 tax

and at the very end
 ldx tos+1
 jmp pop


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 09, 2010 6:49 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
chitselb wrote:

Code:
;--------------------------------------------------------------
;
;   2ROT   ( hi3 lo3 hi2 lo2 hi1 lo1 -- hi2 lo2 hi1 lo1 hi3 lo3 )
;
tworotlfa   .word $adde
      .byt (tworot-*-1)|bit7
      .asc "2RO","T"|bit7
tworot   dex
         lda tos+1
         sta stackh,x
         lda tos
         sta stackl,x ; enclose TOS into the indexed split stack
         lda #%001101
         sta tos      ; LSR-loop flags
         lda #$ca     ; dex
         sta l057     ; unnecessary except in case of NMI (improbable on a PET)
l056     ldy 4,x      ; pass through 4x
         lda 2,x      ; on each pass juggle three bytes
         sta 4,x      ; pass 1 = 0,stackl
         lda 0,x      ; pass 2 = 1,stackl
         sta 2,x      ; pass 3 = 0,stackh
         sty 0,x      ; pass 4 = 1,stackh
         lda l057
         eor #($e8^$ca)   ; toggle dex <-> inx
         sta l057
l057     dex
         lsr tos
         bcs l056
         txa
         eor #<(stackl^stackh)
         tax
         lsr tos
         bcs l056
         jmp pop





I'm afraid I didn't try very hard to follow your code if you commented
it better (like explain what you're doing) I might.

It looks awfully convoluted.

Assuming I understand what you're trying to achieve,
my approach would be something like this:

Code:


 ;save current stack position it's where we want to end up
 stx temp

 ; move lo1 bytes from tos to (next position on) stack
 lda tos
 sta stackl+1,x
 lda tos+1
 sta stackh+1,x

 ; move lo3 bytes to tos
 lda stackl-3,x
 sta tos
 lda stackh-3,x
 sta tos+1

 ; move hi3 bytes to stack
 lda stackl-4,x
 sta stackl+2,x
 lda stackh-4,x
 sta stackh+2,x

 ; set x to point at hi3
 dex
 dex
 dex
 dex

 ; bump every thing down 2 positions
LOOP
 lda stackl+2,x
 sta stackl,x
 lda stackh+2,x
 sta stackh,x
 inx
 cpx temp
 bne LOOP



Not sure I got that all correct, I just sort of banged it out for the
purpose of illustration ;) .

What exactly does POP do?

I can see the utility of a subroutine that popped the top of the stack
(the position currently pointed to by x) to the tos* :P but it seems
to me you'll have to get stuff out of the way first any way if it's
something you can only jump to at the tail (ie if it's not a subroutine)
and so is likely to be redundant. (like I said I didn't try very hard to follow your code :) )


* maybe it would be less confusing to call the tos the accumulator or something ;)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Sep 09, 2010 8:05 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
The thing I was proud of here was using LSR and a series of bits to manage a pair of nested loops. The inner loop executes twice on each pass through the outer loop, which also executes twice. It uses a single byte and shifting flags off the end to control everything, and it kind of reminded me of a Turing Machine. I stored the control bits in TOS not because it's part of the stack, but because TOS suddenly became an available zero page scratch location after pushing its contents onto the real stack. The POP at the end will overwrite TOS, and that's okay. The only reason it's here is to leave the same number of bytes (12) on the stack as we came in with (12)

Ultimately, this approach didn't save me any memory vs. just juggling the stack around or I would have left it in. I didn't really care about the clock cycles for this primitive, because I can't recall a single occasion when I've ever used 2ROT or done much with double precision in a Forth program. But I thought the LSR ... BCS thing was pretty nifty. Self-modifying code is usually a bad thing (like where I toggle L057 between being an INX and DEX instruction on alternating iterations through the loop) and my inner jury is still out on whether using EOR to toggle the X register between the STACKL and the STACKH area was ugly or slick.

Here's 2ROT now, same size, probably 1/2 the clocks, and I'll still never use it.
Code:
tworot      ldy stackh+4,x
      lda stackh+2,x
      sta stackh+4,x
      lda stackh,x
      sta stackh+2,x
      sty stackh,x
      ldy stackl+4,x
      lda stackl+2,x
      sta stackl+4,x
      lda stackl,x
      sta stackl+2,x
      sty stackl,x
      ldy stackh+3,x
      lda stackh+1,x
      sta stackh+3,x
      lda tos+1
      sta stackh+1,x
      sty tos+1
      ldy stackl+3,x
      lda stackl+1,x
      sta stackl+3,x
      lda tos
      sta stackl+1,x
      sty tos
      jmp next



But back to your question, "What is POP?" Okay... on the PET, the kernel uses three zero page addresses from $8D - $8F for the 1/60th of a second 24-hour jiffy clock, and $90 - $FF for whatever. I'm just going to leave everything from $8D to the top of zero page alone, and content myself with the 141 bytes I get.

Code:
; zero page usage
stackl   = $00      ; stackl = $00..$3b (60 bytes)
stackh   = $30      ; stackh = $3c..$79 (60 bytes)
bos   = stackh-stackl   ; includes TOS
up   = $76      ; user area pointer
n   = $78      ; scratch space
w   = $7e      ; w overlaps n
tos   = $80      ; top of stack
zi   = $82      ; innermost DO LOOP counter/limit

next   = $86
;0086 next   inc ip
next1   = $88
;0088 next1   inc ip
nexto   = $8a
ip   = $8b
;008a nexto   jmp ($cafe)

I do a few nontraditional things here, like having a NEXT that ignores moving across page boundaries. Instead I leave that up to the compiler to insert a call to PAGE at $xxFD or $xxFE. There's more too, like padding at compile time and skipping to the next page at runtime (when necessary) after a literal or string if it leaves me at $xxFF. This hardware is a real NMOS 6502 complete with the JMP ($xxFF) bug. The key benefit of this approach is getting NEXT down to 15 clocks.

It is typical to have an 8-byte region called "N" on the zeropage for primitive scratch space. It's also typical to have the parameter stack in zero page and index it with zp,X addressing mode.

I haven't seen a Forth that puts the innermost DO-LOOP index and limit on the zero page (ZI) but this one does. Usually DO LOOP counters and limits live on the return stack. Also for (I hope) an overall speed boost, I keep the topmost value on the parameter stack in a separate 2-byte area and split the rest of the stack into low-byte and high-byte areas. Instead of having a two-byte pointer to code stored at the code field address (CFA) there is always actual executable machine code instead. That saves one level of indirection and is called DTC (direct-threaded code) vs. the traditional ITC (indirect-threaded code)

The first word in the dictionary is (LIT), used for pushing 2-byte values that were embedded in the dictionary onto the stack. Down at the bottom of (LIT) we have a few useful ways to exit primitives, PUSHN, PUSH, and PUT. Both PUSH and PUT assume you've loaded Y:A with the two bytes you want to drop on the stack.
Code:
pushn        ldy n+1
        lda n
push        sta n
        dex
        lda tos+1
        sta stackh,x
        lda tos
        sta stackl,x
        lda n
put        sty tos+1
        sta tos
        jmp next

A few definitions later, we run into (DO) which is the business end of setting up a DO LOOP. It's also got a couple useful ways to exit a primitive, POPTWO, and POP. I just have to make sure to slide the last thing off the bottom into TOS when I POP and also be sure to move TOS onto the real stack when I PUSH or PUT.
Code:
poptwo        inx
pop        ldy stackh,x
        lda stackl,x
        inx
        jmp put

To recap, POPTWO, POP, PUSHN, PUSH, PUT, NEXT and NEXTO are all ways to get out of a primitive and move the IP (instruction pointer) to the next word.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Sep 09, 2010 10:25 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
chitselb wrote:

But I thought the LSR ... BCS thing was pretty nifty.


It could be but I don't think it is in this case

chitselb wrote:
Self-modifying code is usually a bad thing (like where I toggle L057 between being an INX and DEX instruction on alternating iterations through the loop)


I think you can get by without it

Code:

tworotlfa   .word $adde
      .byt (tworot-*-1)|bit7
      .asc "2RO","T"|bit7
tworot   dex
         lda tos+1
         sta stackh,x
         lda tos
         sta stackl,x ; enclose TOS into the indexed split stack
         lda #%001101
         sta tos      ; LSR-loop flags

loop     ldy 4,x      ; pass through 4x
         lda 2,x      ; on each pass juggle three bytes
         sta 4,x      ; pass 1 = 0,stackl
         lda 0,x      ; pass 2 = 1,stackl
         sta 2,x      ; pass 3 = 0,stackh
         sty 0,x      ; pass 4 = 1,stackh
         inx
         lsr tos
         bcs loop
         dex
         dex
         txa
         eor #<(stackl^stackh)
         tax
         lsr tos
         bcs loop
         jmp pop



chitselb wrote:

I do a few nontraditional things here, like having a NEXT that ignores moving across page boundaries. Instead I leave that up to the compiler to insert a call to PAGE at $xxFD or $xxFE. There's more too, like padding at compile time and skipping to the next page at runtime (when necessary) after a literal or string if it leaves me at $xxFF. This hardware is a real NMOS 6502 complete with the JMP ($xxFF) bug. The key benefit of this approach is getting NEXT down to 15 clocks.


I was thinking about that

I wonder how likely it would be that a word would cross two page
boundaries. My guess is that it wouldn't be too likely.

In that case I think I'd blow a couple bytes in zp and build the page
increment into NEXT and call it explicitly

Code:

NEXTPAGE
 inc ip+1
NEXT
 inc ip
 inc ip
 jmp(xxxx)



Then if the code field is close enough to the page boundary, pad it by
prepending NOPs so that the code took you across the page boundary,
if not insert a NEXTPAGE and prepend a NOP if necessary to align.
Then have a work around for a subsequent page crossings that are
unaligned and hope you never needed it.

Code:

UA_NEXTPAGE
 ldy #$01
 lda (ip),y
 sta V
 dey
 sty ip
 inc ip+1
 lda (ip),y
 sta V+1
 jmp (v)



And get the compiler to skip over the prepended NOP's

Of course, it would complicate the compiler


Oh, and I take back what I said about pop I totally missed
the fact that you were using y :oops:


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: