Reentrant Programming: Small versus Big Software Stacks
Reentrant Programming: Small versus Big Software Stacks
I'm noodling around with some ideas around stacks and parameter passing to allow reentrant programming. So I built two separate stack modules to see how many instructions are required for basic operations. I tried to make the interfaces similar enough that I could switch between them and judge their relative merits.
The first is the X register stack I've seen in several Forths (including Tali Forth): https://github.com/Martin-H1/Lisp65/blo ... /stack.asm
The second is a 16 bit stack pointer in page zero that grows downward from ram top: https://github.com/Martin-H1/Lisp65/blo ... gstack.asm
It's pretty clear that the Forth style stack is leaner, faster, and more suited for fast light calls. For example I could see it used as the calling standard for a math library, a heap, or I/O. The problem is that it doesn't support that much depth, consumes a register, and half of page zero. That makes interfacing it with reused code a challenge at times.
The second stack seems way too big and slow for operations like push, drop, swap. But on the plus side supports large allocations, stack depth, and a light register and page zero footprint. That allows interfacing with reused code.
Not sure where I am going with this, other than I am curious if anyone has better ideas?
The first is the X register stack I've seen in several Forths (including Tali Forth): https://github.com/Martin-H1/Lisp65/blo ... /stack.asm
The second is a 16 bit stack pointer in page zero that grows downward from ram top: https://github.com/Martin-H1/Lisp65/blo ... gstack.asm
It's pretty clear that the Forth style stack is leaner, faster, and more suited for fast light calls. For example I could see it used as the calling standard for a math library, a heap, or I/O. The problem is that it doesn't support that much depth, consumes a register, and half of page zero. That makes interfacing it with reused code a challenge at times.
The second stack seems way too big and slow for operations like push, drop, swap. But on the plus side supports large allocations, stack depth, and a light register and page zero footprint. That allows interfacing with reused code.
Not sure where I am going with this, other than I am curious if anyone has better ideas?
- Alarm Siren
- Posts: 363
- Joined: 25 Oct 2016
Re: Reentrant Programming: Small versus Big Software Stacks
To save him some typing, I will first point you to Garth's excellent Stacks Treatise.
As to your own points, yea, I feel there's no single answer. Different solutions are better for different situations, it all depends how you intend to use them.
There is however a third method you may not have considered, where you can use the Actual Stack for parameter passing - even on the 6502. You just have to use the TSX / TXS instructions and the a, x addressing mode with a base offset.
E.g. see this macro library I wrote. This is basically very similar to your first option, but without permanently tying up the X register - it can still be used for other purposes when you're not accessing the stack.
As to your own points, yea, I feel there's no single answer. Different solutions are better for different situations, it all depends how you intend to use them.
There is however a third method you may not have considered, where you can use the Actual Stack for parameter passing - even on the 6502. You just have to use the TSX / TXS instructions and the a, x addressing mode with a base offset.
E.g. see this macro library I wrote. This is basically very similar to your first option, but without permanently tying up the X register - it can still be used for other purposes when you're not accessing the stack.
Want to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Re: Reentrant Programming: Small versus Big Software Stacks
If you want to stick with a 6502 instead of a 65816, you might consider adding extra hardware to do the job. For example, you could map half of zero page to an unused section of RAM and hook up an up/down counter to drive address lines when that half of zero page is accessed. That way you could pulse the counter and have a new half zero page to work with for every function. You can pass parameters on the stack or just copy them to directly where you will need them in the mapped zero page if you aren't copying from another mapped zero page. I suppose you might also be able to set up a part of the memory map to point to the current half zero page plus one so you can copy from the current page to the next one directly saving cycles at the expense of more hardware.
This is all overly complicated but I think it will be faster than just about anything else you could do on the 6502.
EDIT: Alarm Siren, neat macros! Do you need to index by X for the STA in PTA?
This is all overly complicated but I think it will be faster than just about anything else you could do on the 6502.
EDIT: Alarm Siren, neat macros! Do you need to index by X for the STA in PTA?
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
You could use a hybrid technique using the CPU stack for arguments and locals but copying/restoring from RAM as it gets too full/empty.
I think BBC BASIC uses this approach when executing deeply nested procedures. Top active procedures are on the CPU stack for fast access while the stack image for parent precedures is copied to a software stack that grows down from below the screen memory (HIMEM).
I think BBC BASIC uses this approach when executing deeply nested procedures. Top active procedures are on the CPU stack for fast access while the stack image for parent precedures is copied to a software stack that grows down from below the screen memory (HIMEM).
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Re: Reentrant Programming: Small versus Big Software Stacks
@Druzyek, thanks for the reply, my project is a rewrite of a small Lisp interpreter from C to 6502 assembly, while making it generally reusable. So new hardware isn't something I want to consider. I tried using cc65 for the initial port, but between the cc65 overheard and lisp overheard, there was nothing left for programs.
@Alarm Siren, thanks for the links, I read through them and there was lots of useful information.
@BitWise, I think I will probably end up using a hybrid technique. The low level program machinery (e.g. I/O, memory management, and math) can use a small stack, as there's no recursion, few arguments, and call depths are shallow. It's parsing and evaluation where deep recursion can come into play. So I think a large stack specifically for those functions will be needed.
@Alarm Siren, thanks for the links, I read through them and there was lots of useful information.
@BitWise, I think I will probably end up using a hybrid technique. The low level program machinery (e.g. I/O, memory management, and math) can use a small stack, as there's no recursion, few arguments, and call depths are shallow. It's parsing and evaluation where deep recursion can come into play. So I think a large stack specifically for those functions will be needed.
- Alarm Siren
- Posts: 363
- Joined: 25 Oct 2016
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
EDIT: Alarm Siren, neat macros!
Druzyek wrote:
Do you need to index by X for the STA in PTA?
Apart from checking they compile, I've not had anything complex enough yet to test them in. Still being mucking around trying to get my UART working.
I'll update the file now.
EDIT: Updated.
Want to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Re: Reentrant Programming: Small versus Big Software Stacks
BitWise wrote:
I think BBC BASIC uses this approach when executing deeply nested procedures. Top active procedures are on the CPU stack for fast access while the stack image for parent precedures is copied to a software stack that grows down from below the screen memory (HIMEM).
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
[Edit: I think this was in response to something in the macro library Alarm Siren linked to. My comment seems pretty out of place without having said this before.]
In my macros, I like to use prepositions to clarify; so you have for example
There's no doubt about the direction, ie, which one is the source and which is the destination, and it's more English-like, and it takes no extra bytes or execution time in the final machine code. "to" and "from" are EQUates that the macro uses to conditionally assemble the copying to be one direction or the other. Such parameters can also be put in the parameter list to just improve readability, without actually using them in the macro.
In my macros, I like to use prepositions to clarify; so you have for example
Code: Select all
COPY VAR1, to, VAR2 ; Copy a single byte.
COPY2 VAR3, to, VAR4 ; Copy a pair of bytes.
COPY VAR1, from, FOOBAR ; Copy the other direction.http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
BitWise wrote:
I think BBC BASIC uses this approach when executing deeply nested procedures. Top active procedures are on the CPU stack for fast access while the stack image for parent precedures is copied to a software stack that grows down from below the screen memory (HIMEM).
The disassembly here https://acorn.huininga.nl/pub/docs/sour ... BASIC2.asm
has the stack push at $B1A7 and the pull at $B236.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
The disassembly here https://acorn.huininga.nl/pub/docs/sour ... BASIC2.asm
has the stack push at $B1A7 and the pull at $B236.
has the stack push at $B1A7 and the pull at $B236.
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
E.g. see this macro library I wrote. This is basically very similar to your first option, but without permanently tying up the X register - it can still be used for other purposes when you're not accessing the stack.
Code: Select all
PKA MACRO mOffset
TSX
LDA HWSTACK_BASE+mOffset+1, X
ENDMCode: Select all
PKA X1
ADC #5
PHA
PKA X2 ;Wrong address!- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
You can save X on the stack too, but then take that into account when formulating the number to add the index to. It not always necessary to re-do the TSX just because you pushed or pulled something. (This is all covered in the 6502 stacks treatise.) The macro will probably need to be modified.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Martin_H wrote:
It's pretty clear that the Forth style stack is leaner, faster, and more suited for fast light calls. For example I could see it used as the calling standard for a math library, a heap, or I/O. The problem is that it doesn't support that much depth, consumes a register, and half of page zero. That makes interfacing it with reused code a challenge at times.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- Alarm Siren
- Posts: 363
- Joined: 25 Oct 2016
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Quote:
E.g. see this macro library I wrote. This is basically very similar to your first option, but without permanently tying up the X register - it can still be used for other purposes when you're not accessing the stack.
Code: Select all
PKA MACRO mOffset
TSX
LDA HWSTACK_BASE+mOffset+1, X
ENDMCode: Select all
PKA X1
ADC #5
PHA
PKA X2 ;Wrong address!Code: Select all
PHN LocalSize
PKA LocalSize+X1
ADC #5
PTA LocalSize-3 ; Lets assume LocalSize is big enough to allow this
PKA LocalSize+X2 ; Now it's correct!A thought I've just had, would be modify the ENTER and LEAVE macros thusly:
Code: Select all
ENTER MACRO mLocalSize
PHA
PHX
PHY
PHN mLocalSize
ENDM
LEAVE MACRO mLocalSize
PLN mLocalSize
PLY
PLX
PLA
ENDMWant to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
You can save X on the stack too, but then take that into account when formulating the number to add the index to.
Quote:
Also, the point of the macros is that you're not permanently turning X into an extra stack pointer, so it is expected that X may get used for other purposes in between stack accesses, so logically X will need to be reloaded.
I was curious about maximizing performance, so I tested out a few different ways on a simple function. When A='a', X=3, and Y=4 the function prints "aaabbbcccddd." A returns with the count of characters printed. Here are the cycle counts for the ways I tried, even though I would not say they are generalizable in any way. I imagine doing some of this with macros but they are expanded so you can see what I'm thinking. Could I change anything to make it faster?
I think doing everything in zero page could be the fastest. The fourth way without pushing anything only works for top level functions. I don't think it is possible with macros but a more sophisticated program could automatically rearrange zero page usage so as few bytes as possible have to be pushed.
1. Hardware stack: 322
2. Pushing zero page with loop: 324
3. Pushing zero page with unrolled loop: 277
4. Zero page with no pushing: 229
5. Zero page stack: 323
Code: Select all
LDA #'A' ;Character to start printing with
LDX #3 ;How many times to repeat each character
LDY #4 ;How many sets of characters
JSR func1. Hardware stack
Code: Select all
func:
PHA ;Copy of character
PHX ;Copy of number to print (3)
TSX ;Make room for one byte to hold return value (characters printed)
;Alarm Siren's macro chooses faster of two subtraction methods
;TXA
;SEC
;SBC #1
DEX
;TAX
TXS
;Save copy of calculated stack offset (replaces TXS)
STX zp_sp_copy
;Zero one byte variable which returns total characters printed
STZ HWSTACK_BASE+0+1,X
loop1:
;Load value passed in through X (number of characters to print = 3)
LDA HWSTACK_BASE+1+1,X
;Save a copy to load into X counter
PHA
;Load the character to print
LDA HWSTACK_BASE+2+1,X
;Number of characters to print = 3
PLX
loop2:
putc
DEX
BNE loop2
;Reload X with calculated stack offset instead of TXS
LDX zp_sp_copy
;Count of characters printed
LDA HWSTACK_BASE+0+1,X
CLC
;Add 3 to total
ADC HWSTACK_BASE+1+1,X
STA HWSTACK_BASE+0+1,X
;Advance to the next character
INC HWSTACK_BASE+2+1,X
;Print five sets
DEY
BNE loop1
;Get character count before X and SP are changed
LDY HWSTACK_BASE+0+1,X
;Empty stack (character to print, characters to print, and total printed)
TSX
TXA
CLC
ADC #3
TAX
TXS
;Character count to A
TYA
RTSCode: Select all
;**********2. Pushing zero page with loop**********
func:
;Copy of A (character to print)
STA zp_temp
;Copy of X (characters to print = 3)
STX zp_temp2
LDX #0
;Push zero page bytes so we can use them
loop3:
LDA R0,X
PHA
INX
CPX #3
BNE loop3
;Store A (character to print) in R2
LDA zp_temp
STA R2
;X = characters to print = 3
LDA zp_temp2
STA R1
;Count of total characters printed
STZ R0
;**********3. Pushing zero page with unrolled loop**********
func:
STA zp_temp
LDA R0
PHA
LDA R1
PHA
LDA R2
PHA
LDA zp_temp
STA R2
STX R1
STZ R0
;**********4. No pushing**********
func:
STA R2
STX R1
STZ R0
;**********Body for 2, 3, and 4**********
loop4:
;X = characters to print = 3
LDX R1
;A = character to print
LDA R2
loop5:
putc
DEX
BNE loop5
;Add characters printed (3) to total in R0
LDA R0
CLC
ADC R1
STA R0
;Advance to the next charcter
INC R2
DEY
BNE loop4
;**********Ending for 2**********
LDX #2
;Restore the three bytes we used
loop6:
PLA
STA R0,X
DEX
BPL loop6
;Character count is return value
LDA R0
RTS
;**********Ending for 3**********
PLA
STA R2
PLA
STA R1
PLA
STA R0
;Not needed but macros would not optimize it out
LDA R0
RTS
;**********Ending for 4**********
LDA R0
RTSCode: Select all
func5:
;Character to print
PHA
;Pointer into zero page stack
LDA zp_ptr
;Increase by 3 for A copy, X copy, and character count
CLC
ADC #3
STA zp_ptr
;Push copy of characters to print (3)
PHX
;X points to zero page stack
TAX
;Characters to print (3)
PLA
STA 1,X
;Character to print
PLA
STA 2,X
;Zero count of total characters printed
STZ 0,X
loop34:
;Count of characters to print (3)
LDA 1,X
;Necessary since we cant do LDX 1,X
PHA
;Character to print
LDA 2,X
;X is now 3, count of characters to print
PLX
loop35:
putc
DEX
BNE loop35
;Pointer to stack in zero page
LDX zp_ptr
;Count of total characters printed
LDA 0,X
CLC
;Add three to total (printed three characters
ADC 1,X
STA 0,X
;Advance to next character
INC 2,X
DEY
BNE loop34
;Free up the three bytes we took on the stack
LDA zp_ptr
SEC
SBC #3
STA zp_ptr
;Returns total characters printed in A
LDA R0
RTS