6502.org • View topic - Idea. Extension the stack.

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Hardware

All times are UTC

Idea. Extension the stack.

Page 2 of 2

[ 20 posts ]

Go to page Previous 1, 2

Previous topic | Next topic

Author

Message

GARTHWILSON

Post subject: Re: Idea. Extension the stack.

Posted: Sun Jul 12, 2015 8:47 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California

Alienthe wrote:

The instruction TSX is a clear indication that you are about to access the stack directly. So why not use TSX as a mode switch to map the stack temporarily into $100 - $1ff? You could then use TXS to return to the mode where stack lives in its own separate area. Using a separate area for stack should allow for much faster instructions that use the stack.

Two issues remain. First of all how do you handle interrupts? Secondly how do you handle a stack pointer that is much wider than 8 bit and thus cannot fit into the X-register? I am sure some thinking can overcome these issues and my hunch is that this can be worthwhile too.

The short piece I gave earlier as an illustration came from this example of the unsigned, mixed-precision multiply done in a somewhat Forth-like way but with parameters on the hardware stack:

Code:

UM_STAR: LDA #0                  ; Unsigned, mixed-precision (16-bit by 16-bit input, 32-bit output)
         PHA                     ; multiply.  Add a variable byte to the stack, initializing it as 0.

         TSX                     ; Now 101,X holds that new variable, 102,X and 103,X hold the return
         LSR $107,X              ; address, and 104,X to 107,X holds the inputs and later the outputs.
         ROR $106,X

         FOR_Y  16, DOWN_TO, 0   ; Loop 16x.  The DEY, BNE in NEXT_Y below will drop through on 0.
             IF_CARRY_SET
                 CLC
                 PHA             ; Note that the PHA (and PLA below) doesn't affect the indexing.
                    LDA $101,X
                    ADC $104,X
                    STA $101,X
                 PLA
                 ADC $105,X
             END_IF

             ROR
             ROR $101,X
             ROR $107,X
             ROR $106,X
         NEXT_Y

         STA $105,X
         PLA                     ; Retrieve the variable byte we added at the top, cleaning up the stack.
         STA $104,X              ; Again note that the PLA changed S but not X, so the 104 is still 104.
         RTS
 ;------------------

The parameters are accessed after the TSX, but they were initially put on the stack (often by PHA, PHX, and/or PLY) as inputs before the subroutine was called, and the answers will be accessed after the RTS, meaning you wouldn't want to switch to another part of memory for the stack when the TSX is encountered. Note that there are PHA's and PLA's in the code where the stack-relative addressing is still being done using the X value that came from the TSX. It would also be rather cumbersome to have to disable interrupts every time a change of stack memory is made, and of course it would be bad for interrupt latency. There is no TXS at the end, and in fact X never changed after the TSX. For local variables (including for recursive routines, ie, routines that call themselves over and over until a criterion in met to start unwinding and exit), you would make room for them on the stack before the TSX, and then you wouldn't need to modify X after the TSX, nor have a TXS. Actually in this example we did add one byte of local variable on the hardware stack. If the number of output bytes on the stack does not match the number of input bytes, you can adjust the stack depth in the subroutine itself and then move the return address also in the subroutine, but it's less cumbersome to do the adjustment in the routine that calls it, since that eliminates the need to move the return address.

I'm sure there's a way to extend the stack (Jeff is a genius at that kind of thing). I'm just trying to make sure that various programming needs and aspects are not shut out in the process.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

grzeg

Post subject: Re: Idea. Extension the stack.

Posted: Tue Jul 14, 2015 7:13 am

Joined: Fri Jan 17, 2014 6:39 pm
Posts: 47
Location: Poland

I think it would be easier to do this by placing this stack on zero page I / O at the address for example $00
And the instruction STA $00 write data to the stack, and LDA $00 read.

Top

barrym95838

Post subject: Re: Idea. Extension the stack.

Posted: Wed Jul 15, 2015 1:23 am

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA

That's an interesting idea, grzeg. So, pha would be pha or sta $00, phx would be stx $00, and php would be ... uhm ... just php I guess ...

If your hardware maintained a small 0-page stack window, from say $00 to $1f, then you could read, write and manipulate items deeper in the stack very efficiently. Only accesses to $00 would grow or shrink the stack ... if you wanted to just read top-of-stack, you would have to ldy $00 : sty $00 ... something like inc $00 should work to increment TOS too, because of R-M-W, true?

Mike B.

Top

Alienthe

Post subject: Re: Idea. Extension the stack.

Posted: Mon Jul 20, 2015 6:40 pm

Joined: Mon Apr 16, 2012 8:45 pm
Posts: 60

To follow up on that idea, you could have the window always aligned so that 00 is TOS (which is what you stated) and then 01 is next etc. no matter what the stack pointer was (which I am not sure you indicated).That means massaging stacked values as in the example earlier could be done on zero page without indexing or even without looking up what the stack pointer was.

How deep does one normally have to probe the stack? The example extends to $107 but I don't know if there are other typical use cases that go deeper.

Top

GARTHWILSON

Post subject: Re: Idea. Extension the stack.

Posted: Mon Jul 20, 2015 8:31 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California

Alienthe wrote:

The problem with that is that is that it locks out the possibility of using names for local variables, and more bugs can result from not counting pushes and pulls that might be mixed in. Suppose for example you get into a routine that needs four bytes of input and output, passed through the stack, and three bytes of independent local variables. The routine might start with:

Code:

         PHA          ; Add three more bytes to the stack.
         PHA          ; They will get used below.  (Remember to
         PHA          ; pull them off the stack at the end.) 

length:  SETL  $101   ; Assign names to the three bytes of
width:   SETL  $102   ; local variables created above.  Each
height:  SETL  $103   ; variable is one byte in this case.

weight:  SETL  $104   ; Now assign names to the ones passed on the stack.
density: SETL  $106   ; weight gets 2 bytes, and density and speed each get
speed:   SETL  $107   ; one.  These could have additional names for data
                      ; sent back to the calling routine in the same bytes.

SETL in the C32 assembler is "SET Label," like EQU in most assemblers but you can change the value assigned to a label as many times as you wish. I believe Kowalski's assembler uses .= or .SET .

Now suppose you change the depth of the stack, for example by using PHP and PLP for temporary storage of the status:

Code:

        TSX
        LDA  density, X       ; Access variable "density".
        <do_stuff>
        PHP                   ; Now the stack is temporarily one byte
            <do_stuff>        ; deeper; but since we don't do TSX again,
            LDA  density, X   ; "density" is still at 106,X, even though
            <do_stuff>        ; it's no longer the 6th item on the stack.
        PLP
        <more_code>
        ADC  density, X       ; Access "density" yet again.
        <etc. etc.>

Code:

SUB_TOT:   LOCAL  3   ; Make 3-byte local variable SUB_TOT.
PRESSURE2: LOCAL  1   ; Make 1-byte local variable PRESSURE2.
FLOW2:     LOCAL  2   ; Make 2-byte local variable FLOW2.


subroutine_label:
           TSX
           <continue with the program for the process>

           DESTROY_LOCALS   ; Get local variables off the stack
           RTS              ; at the end before exiting.
 ;----------------

Inside the local environment, ie, in the subroutine that carries out the process and comes right after the set of locals definitions, locals will be referred to with absolute indexed addressing, like LDA FLOW2,X where X's contents came from the TSX.

Since a label can be assigned new values as many times as you wish (with SETL or .= or similar), and since you put the relevant locals assignments right before the subroutines that need them, names can be re-used, and the right stack offset value will be used for each subroutine. So for example we could have another routine that has the following locals in the same source code file, and there will be no conflict between FLOW2 below and FLOW2 above.

Code:

FLOW1:  LOCAL  2
FLOW2:  LOCAL  2
FLOW3:  LOCAL  2

subroutine_label:
        TSX
        <followed by the code that uses these local variables>

Each subroutine will use the right "FLOW2" local variable, even if one subroutine calls the other. In fact, a subroutine using local variables can be recursive, meaning it can even call itself, over and over, until a condition is met to stop the heavy nesting and unwind itself.

If you need a lot of local variable space, using a lot of PHA's will of course not be as efficient as:

Code:

        TSX
        TXA
        SEC
        SBC  #$18
        TAX
        TXS

In this case, putting $18 (24 in decimal) bytes on the stack takes 12 clocks instead of 72, and 7 bytes instead of 24, so it's 6 times as fast and 3.5 times as memory-efficient. The break-even point is at 4 bytes of variables for speed, and 7 bytes for program memory. (Be careful that you don't depend on uninitialized variables though.) This is another thing that should not be forfeited by any hardware tricks to extend the stack, particularly since situations where the greater stack space would be desirable are the same ones where you might want to allot such large portions of local variable space.

Quote:

How deep does one normally have to probe the stack? The example extends to $107 but I don't know if there are other typical use cases that go deeper.

I don't remember ever seeing anything in actual code that was more than $109 or $10A. The example was from a simple multiplication routine; but other applications with lots of local variables could conceivably get quite a bit more complex.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

Page 2 of 2

[ 20 posts ]

Go to page Previous 1, 2

Board index » 6502.org Users Forum » Hardware

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 73 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum