Page 1 of 1

I am filled with a mixture of pride and shame

Posted: Thu Sep 09, 2010 1:03 am
by chitselb
And this was the only place I could think of to ( confess | brag about ) what I have done. Here's my 2ROT primitive, which rotates the stack from ABC -> BCA. All values are four-byte doubles. The data stack is indexed by X and divided between several bytes of stackl and an equally sized stackh region. Top of stack is special gets its own 2-byte zero page storage.

Waddya think, sirs?

Code: Select all

;--------------------------------------------------------------
;
;	2ROT   ( hi3 lo3 hi2 lo2 hi1 lo1 -- hi2 lo2 hi1 lo1 hi3 lo3 )
;
tworotlfa	.word $adde
		.byt (tworot-*-1)|bit7
		.asc "2RO","T"|bit7
tworot	dex
			lda tos+1
			sta stackh,x
			lda tos
			sta stackl,x ; enclose TOS into the indexed split stack
			lda #%001101
			sta tos		; LSR-loop flags
			lda #$ca	  ; dex
			sta l057	  ; unnecessary except in case of NMI (improbable on a PET)
l056	  ldy 4,x		; pass through 4x
			lda 2,x		; on each pass juggle three bytes
			sta 4,x		; pass 1 = 0,stackl
			lda 0,x		; pass 2 = 1,stackl
			sta 2,x		; pass 3 = 0,stackh
			sty 0,x		; pass 4 = 1,stackh
			lda l057
			eor #($e8^$ca)	; toggle dex <-> inx
			sta l057
l057	  dex
			lsr tos
			bcs l056
			txa
			eor #<(stackl^stackh)
			tax
			lsr tos
			bcs l056
			jmp pop
It should also be noted that stackl = $00 and stackh = $30. (I have about 18 more bytes available for stack but this makes it easier to debug in xpet), otherwise I'd have to begin with:

Code: Select all

tworot dex
 lda tos+1
 sta stackh,x
 lda tos
 sta stackl,x ; enclose TOS into the indexed split stack
 stx tos+1
 clc
 txa
 adc #<stackl
 tax

and at the very end
 ldx tos+1
 jmp pop

Re: I am filled with a mixture of pride and shame

Posted: Thu Sep 09, 2010 6:49 am
by bogax
chitselb wrote:

Code: Select all

;--------------------------------------------------------------
;
;	2ROT   ( hi3 lo3 hi2 lo2 hi1 lo1 -- hi2 lo2 hi1 lo1 hi3 lo3 )
;
tworotlfa	.word $adde
		.byt (tworot-*-1)|bit7
		.asc "2RO","T"|bit7
tworot	dex
			lda tos+1
			sta stackh,x
			lda tos
			sta stackl,x ; enclose TOS into the indexed split stack
			lda #%001101
			sta tos		; LSR-loop flags
			lda #$ca	  ; dex
			sta l057	  ; unnecessary except in case of NMI (improbable on a PET)
l056	  ldy 4,x		; pass through 4x
			lda 2,x		; on each pass juggle three bytes
			sta 4,x		; pass 1 = 0,stackl
			lda 0,x		; pass 2 = 1,stackl
			sta 2,x		; pass 3 = 0,stackh
			sty 0,x		; pass 4 = 1,stackh
			lda l057
			eor #($e8^$ca)	; toggle dex <-> inx
			sta l057
l057	  dex
			lsr tos
			bcs l056
			txa
			eor #<(stackl^stackh)
			tax
			lsr tos
			bcs l056
			jmp pop

I'm afraid I didn't try very hard to follow your code if you commented
it better (like explain what you're doing) I might.

It looks awfully convoluted.

Assuming I understand what you're trying to achieve,
my approach would be something like this:

Code: Select all



 ;save current stack position it's where we want to end up
 stx temp

 ; move lo1 bytes from tos to (next position on) stack
 lda tos
 sta stackl+1,x
 lda tos+1
 sta stackh+1,x

 ; move lo3 bytes to tos 
 lda stackl-3,x
 sta tos
 lda stackh-3,x
 sta tos+1

 ; move hi3 bytes to stack 
 lda stackl-4,x
 sta stackl+2,x
 lda stackh-4,x
 sta stackh+2,x

 ; set x to point at hi3
 dex
 dex
 dex
 dex

 ; bump every thing down 2 positions
LOOP
 lda stackl+2,x
 sta stackl,x
 lda stackh+2,x
 sta stackh,x
 inx
 cpx temp
 bne LOOP

Not sure I got that all correct, I just sort of banged it out for the
purpose of illustration ;) .

What exactly does POP do?

I can see the utility of a subroutine that popped the top of the stack
(the position currently pointed to by x) to the tos* :P but it seems
to me you'll have to get stuff out of the way first any way if it's
something you can only jump to at the tail (ie if it's not a subroutine)
and so is likely to be redundant. (like I said I didn't try very hard to follow your code :) )


* maybe it would be less confusing to call the tos the accumulator or something ;)

Posted: Thu Sep 09, 2010 8:05 am
by chitselb
The thing I was proud of here was using LSR and a series of bits to manage a pair of nested loops. The inner loop executes twice on each pass through the outer loop, which also executes twice. It uses a single byte and shifting flags off the end to control everything, and it kind of reminded me of a Turing Machine. I stored the control bits in TOS not because it's part of the stack, but because TOS suddenly became an available zero page scratch location after pushing its contents onto the real stack. The POP at the end will overwrite TOS, and that's okay. The only reason it's here is to leave the same number of bytes (12) on the stack as we came in with (12)

Ultimately, this approach didn't save me any memory vs. just juggling the stack around or I would have left it in. I didn't really care about the clock cycles for this primitive, because I can't recall a single occasion when I've ever used 2ROT or done much with double precision in a Forth program. But I thought the LSR ... BCS thing was pretty nifty. Self-modifying code is usually a bad thing (like where I toggle L057 between being an INX and DEX instruction on alternating iterations through the loop) and my inner jury is still out on whether using EOR to toggle the X register between the STACKL and the STACKH area was ugly or slick.

Here's 2ROT now, same size, probably 1/2 the clocks, and I'll still never use it.

Code: Select all

tworot		ldy stackh+4,x
		lda stackh+2,x
		sta stackh+4,x
		lda stackh,x
		sta stackh+2,x
		sty stackh,x
		ldy stackl+4,x
		lda stackl+2,x
		sta stackl+4,x
		lda stackl,x
		sta stackl+2,x
		sty stackl,x
		ldy stackh+3,x
		lda stackh+1,x
		sta stackh+3,x
		lda tos+1
		sta stackh+1,x
		sty tos+1
		ldy stackl+3,x
		lda stackl+1,x
		sta stackl+3,x
		lda tos
		sta stackl+1,x
		sty tos
		jmp next

But back to your question, "What is POP?" Okay... on the PET, the kernel uses three zero page addresses from $8D - $8F for the 1/60th of a second 24-hour jiffy clock, and $90 - $FF for whatever. I'm just going to leave everything from $8D to the top of zero page alone, and content myself with the 141 bytes I get.

Code: Select all

; zero page usage
stackl	= $00		; stackl = $00..$3b (60 bytes)
stackh	= $30		; stackh = $3c..$79 (60 bytes)
bos	= stackh-stackl	; includes TOS
up	= $76		; user area pointer
n	= $78		; scratch space
w	= $7e		; w overlaps n
tos	= $80		; top of stack
zi	= $82		; innermost DO LOOP counter/limit

next	= $86
;0086 next	inc ip
next1	= $88
;0088 next1	inc ip
nexto	= $8a
ip	= $8b
;008a nexto	jmp ($cafe)
I do a few nontraditional things here, like having a NEXT that ignores moving across page boundaries. Instead I leave that up to the compiler to insert a call to PAGE at $xxFD or $xxFE. There's more too, like padding at compile time and skipping to the next page at runtime (when necessary) after a literal or string if it leaves me at $xxFF. This hardware is a real NMOS 6502 complete with the JMP ($xxFF) bug. The key benefit of this approach is getting NEXT down to 15 clocks.

It is typical to have an 8-byte region called "N" on the zeropage for primitive scratch space. It's also typical to have the parameter stack in zero page and index it with zp,X addressing mode.

I haven't seen a Forth that puts the innermost DO-LOOP index and limit on the zero page (ZI) but this one does. Usually DO LOOP counters and limits live on the return stack. Also for (I hope) an overall speed boost, I keep the topmost value on the parameter stack in a separate 2-byte area and split the rest of the stack into low-byte and high-byte areas. Instead of having a two-byte pointer to code stored at the code field address (CFA) there is always actual executable machine code instead. That saves one level of indirection and is called DTC (direct-threaded code) vs. the traditional ITC (indirect-threaded code)

The first word in the dictionary is (LIT), used for pushing 2-byte values that were embedded in the dictionary onto the stack. Down at the bottom of (LIT) we have a few useful ways to exit primitives, PUSHN, PUSH, and PUT. Both PUSH and PUT assume you've loaded Y:A with the two bytes you want to drop on the stack.

Code: Select all

pushn        ldy n+1
        lda n
push        sta n
        dex
        lda tos+1
        sta stackh,x
        lda tos
        sta stackl,x
        lda n
put        sty tos+1
        sta tos
        jmp next
A few definitions later, we run into (DO) which is the business end of setting up a DO LOOP. It's also got a couple useful ways to exit a primitive, POPTWO, and POP. I just have to make sure to slide the last thing off the bottom into TOS when I POP and also be sure to move TOS onto the real stack when I PUSH or PUT.

Code: Select all

poptwo        inx
pop        ldy stackh,x
        lda stackl,x
        inx
        jmp put
To recap, POPTWO, POP, PUSHN, PUSH, PUT, NEXT and NEXTO are all ways to get out of a primitive and move the IP (instruction pointer) to the next word.

Posted: Thu Sep 09, 2010 10:25 am
by bogax
chitselb wrote:

But I thought the LSR ... BCS thing was pretty nifty.
It could be but I don't think it is in this case
chitselb wrote:
Self-modifying code is usually a bad thing (like where I toggle L057 between being an INX and DEX instruction on alternating iterations through the loop)
I think you can get by without it

Code: Select all


tworotlfa   .word $adde 
      .byt (tworot-*-1)|bit7 
      .asc "2RO","T"|bit7 
tworot   dex 
         lda tos+1 
         sta stackh,x 
         lda tos 
         sta stackl,x ; enclose TOS into the indexed split stack
         lda #%001101 
         sta tos      ; LSR-loop flags 

loop     ldy 4,x      ; pass through 4x 
         lda 2,x      ; on each pass juggle three bytes 
         sta 4,x      ; pass 1 = 0,stackl 
         lda 0,x      ; pass 2 = 1,stackl 
         sta 2,x      ; pass 3 = 0,stackh 
         sty 0,x      ; pass 4 = 1,stackh 
         inx 
         lsr tos 
         bcs loop 
         dex 
         dex
         txa 
         eor #<(stackl^stackh) 
         tax 
         lsr tos 
         bcs loop
         jmp pop

chitselb wrote:

I do a few nontraditional things here, like having a NEXT that ignores moving across page boundaries. Instead I leave that up to the compiler to insert a call to PAGE at $xxFD or $xxFE. There's more too, like padding at compile time and skipping to the next page at runtime (when necessary) after a literal or string if it leaves me at $xxFF. This hardware is a real NMOS 6502 complete with the JMP ($xxFF) bug. The key benefit of this approach is getting NEXT down to 15 clocks.
I was thinking about that

I wonder how likely it would be that a word would cross two page
boundaries. My guess is that it wouldn't be too likely.

In that case I think I'd blow a couple bytes in zp and build the page
increment into NEXT and call it explicitly

Code: Select all


NEXTPAGE
 inc ip+1
NEXT
 inc ip
 inc ip
 jmp(xxxx)

Then if the code field is close enough to the page boundary, pad it by
prepending NOPs so that the code took you across the page boundary,
if not insert a NEXTPAGE and prepend a NOP if necessary to align.
Then have a work around for a subsequent page crossings that are
unaligned and hope you never needed it.

Code: Select all


UA_NEXTPAGE
 ldy #$01
 lda (ip),y 
 sta V
 dey
 sty ip
 inc ip+1
 lda (ip),y
 sta V+1
 jmp (v)

And get the compiler to skip over the prepended NOP's

Of course, it would complicate the compiler


Oh, and I take back what I said about pop I totally missed
the fact that you were using y :oops: