Mike B.
Reentrant Programming: Small versus Big Software Stacks
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Reentrant Programming: Small versus Big Software Stacks
Martin_H wrote:
The second is a 16 bit stack pointer in page zero that grows downward from ram top: https://github.com/Martin-H1/Lisp65/blo ... gstack.asm
Mike B.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Quote:
You can save X on the stack too, but then take that into account when formulating the number to add the index to.
- If you want to access that output data that's on the stack before pulling it off, perhaps even in arbritrary order, remember that X is still valid from the subroutine using it for indexing into the stack, as shown below. If you've done TSX again, adjust your indexed numbers to account for the fact that the return address is no longer there, and in the case below, that there was a PLA after the TSX such that now after the subroutine return, the outputs are at 101,X to 104,X. If you have not done TSX again, the old indexed numbers will still be valid.
The example above calls the UM_STAR subroutine below which takes Bruce Clark's improvement on my commented bug fix on the UM* multiplication in fig-Forth, and it modifies it for I/O on the hardware stack. The structures are per my structure macro article and source code linked there. They assemble exactly the same thing you would by hand, but make the conditions and branches in the source code clearer.
(Note: In many situations, it will be advantageous to give names to the items on the stack, using EQUates, so it's more clear what 101,X is, what 102,X is, 103,X, and so on. (But don't forget to still put the ,X after them.) More on that later.Code: Select all
UM_STAR: LDA #0 ; Unsigned, mixed-precision (16-bit by 16-bit input, 32-bit output) PHA ; multiply. Add a variable byte to the stack, initializing it as 0. TSX ; Now 101,X holds that new variable, 102,X and 103,X hold the return LSR $107,X ; address, and 104,X to 107,X holds the inputs and later the outputs. ROR $106,X FOR_Y 16, DOWN_TO, 0 ; Loop 16x. The DEY, BNE in NEXT_Y below will drop through on 0. IF_CARRY_SET CLC PHA ; Note that the PHA (and PLA below) doesn't affect the indexing. LDA $101,X ADC $104,X STA $101,X PLA ADC $105,X END_IF ROR ROR $101,X ROR $107,X ROR $106,X NEXT_Y STA $105,X PLA ; Retrieve the variable byte we added at the top, cleaning up the stack. STA $104,X ; Again note that the PLA changed S but not X, so the 104 is still 104. RTS ;------------------
- If you want to access that output data that's on the stack before pulling it off, perhaps even in arbritrary order, remember that X is still valid from the subroutine using it for indexing into the stack, as shown below. If you've done TSX again, adjust your indexed numbers to account for the fact that the return address is no longer there, and in the case below, that there was a PLA after the TSX such that now after the subroutine return, the outputs are at 101,X to 104,X. If you have not done TSX again, the old indexed numbers will still be valid.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
I haven't thought about how to have a macro do that.
Quote:
Note that the PHA (and PLA below) doesn't affect the indexing
Would you tend to use X indexing for everything until you absolutely need the extra speed? What do you like about doing it this way over zero page with no X index?
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Something I liked about the 2500AD assembler I used at work from 1986 to about 1990 was that it allowed you to say in a macro definition in essence "if there's a fourth macro parameter, do this with it;" or "If there's a fifth one, do the following with it." The C32 assembler I use now requires you to use the same number of parameters in the macro invocation that the macro was written to use, although I suppose you could pad unused ones with 0's or whatever value would guarantee you're not using it (although that doesn't help readability).
I have never used a macro that I myself did not write though. They're very quick to write, assuming you know what you want. They can also have any amount of conditional assembly you could want. As much as I've used and written about them, I think there's still a lot more that macros could be used to do that I haven't thought of yet, or at least haven't thought of how to do, like parsing a string. Of course, it would depend on the particular assembler's macro capabilities, how powerful its macro language is. (Most assemblers are pretty similar in this respect though.) This is all part of extending the power and capabilities of assembly language. I don't think we're anywhere near having maxed it out yet. We just need to get more creative and resourceful and share our developments to build upon.
Much of what's been talked about in the preceding posts was about passing parameters on the hardware stack, C-style. Sometimes that will be appropriate; but it does come with penalties, as I describe in the 6502 stacks treatise. Virtually all of my own parameter-passing by stacks goes on in the ZP data stack, using X as the stack pointer. This is in Forth though, so I very seldom need X for anything else. When I do, I save and restore it. As they say, you can get 95% of max performance with only 5% of your code in assembly, provided it's the right 5%, the critical 5%. In Forth it's very easy to dip into assembly when you need the performance or the tightest control of the machine. In those cases, I have to remember to save and restore X if I use it for something besides the data-stack pointer.
I have never used a macro that I myself did not write though. They're very quick to write, assuming you know what you want. They can also have any amount of conditional assembly you could want. As much as I've used and written about them, I think there's still a lot more that macros could be used to do that I haven't thought of yet, or at least haven't thought of how to do, like parsing a string. Of course, it would depend on the particular assembler's macro capabilities, how powerful its macro language is. (Most assemblers are pretty similar in this respect though.) This is all part of extending the power and capabilities of assembly language. I don't think we're anywhere near having maxed it out yet. We just need to get more creative and resourceful and share our developments to build upon.
Much of what's been talked about in the preceding posts was about passing parameters on the hardware stack, C-style. Sometimes that will be appropriate; but it does come with penalties, as I describe in the 6502 stacks treatise. Virtually all of my own parameter-passing by stacks goes on in the ZP data stack, using X as the stack pointer. This is in Forth though, so I very seldom need X for anything else. When I do, I save and restore it. As they say, you can get 95% of max performance with only 5% of your code in assembly, provided it's the right 5%, the critical 5%. In Forth it's very easy to dip into assembly when you need the performance or the tightest control of the machine. In those cases, I have to remember to save and restore X if I use it for something besides the data-stack pointer.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
The C32 assembler I use now requires you to use the same number of parameters in the macro invocation that the macro was written to use, although I suppose you could pad unused ones with 0's or whatever value would guarantee you're not using it (although that doesn't help readability).
Quote:
This is all part of extending the power and capabilities of assembly language. I don't think we're anywhere near having maxed it out yet. We just need to get more creative and resourceful and share our developments to build upon.
Code: Select all
;Assuming functions use zero page addresses $00-$07
main:
;This macro should jump to func1 without pushing anything
macro_jsr func1
func1:
;This macro should assign a-d to addresses $00-$03
macro_var a, b, c, d
;Code using a, b, c, and d
...
IF something
;This macro should jump to func2 without pushing anything
macro_jsr func2
ENDIF
rts
func2:
;This macro should assign a-d to addresses $04-$07 since $00-03$ are taken
macro_var a, b, c, d
;Code using a, b, c, and d
...
;This macro should push $00-$03 since the first call to it is already using them
;Note that the first call to func1 did not push anything!
macro_jsr func1
rtsAnother problem is where to put the memory to reduce pushes. Imagine if you assigned $00-$07 like this:
func1: $00-$03
func2: $04-$07
func3: ??? (4 bytes)
If the calling order is func1->func2->func3, it doesn't matter since it will have to push anyway.
With this order: func2->func1 and func2->func3 , func3 should use $00-$03.
With this order: func1->func2 and func1->func3 , func3 should use $03-$07.
Another useful feature would be basic optimizing like this:
Code: Select all
main:
ldx #25
lda #VALUE
macro_jsr func
ldx #35
...
func:
clc
adc #4
IF A, EQUALS, 9
ina
rts
ELSE
dex
stx PERIPHERAL1
lda #0
ENDIFCode: Select all
lda #10
ldx #35Code: Select all
ldx #24
stx PERIPHERAL1
lda #0
ldx #35I can deal without most of what C has to offer but it would be nice to have just enough abstraction to do those kinds of optimizations. I think if you 1.) treated normal labels as function names, 2.) only used temporary labels within functions, and 3.) only jumped to functions using their names, rather than hard coded addresses, you could do quite a lot. I think you would also never have to compromise performance and you would only increase program size if you chose speed over size. I suspect you would need a lot more than macros though.
Quote:
This is in Forth though, so I very seldom need X for anything else.
- Alarm Siren
- Posts: 363
- Joined: 25 Oct 2016
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
I swear no one is paying me to shill for the ca65 assembler, I am just really impressed. You can supply less than the number of arguments the macro was written to use and compile conditionally based on whether an argument is blank or how many arguemtns it receives: http://www.cc65.org/doc/ca65-12.html
Code: Select all
ARGCHK OFF
COPYB MACRO mSrc, mDest, mDest2
LDA mSrc
STA mDest
IFMA 2 ; does mDest2 exist?
STA mDest2
ENDIF
ENDMWant to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Quote:
The C32 assembler I use now requires you to use the same number of parameters in the macro invocation that the macro was written to use, although I suppose you could pad unused ones with 0's or whatever value would guarantee you're not using it (although that doesn't help readability).
Quote:
This is all part of extending the power and capabilities of assembly language. I don't think we're anywhere near having maxed it out yet. We just need to get more creative and resourceful and share our developments to build upon.
Code: Select all
;Assuming functions use zero page addresses $00-$07
main:
;This macro should jump to func1 without pushing anything
macro_jsr func1
func1:
;This macro should assign a-d to addresses $00-$03
macro_var a, b, c, d
;Code using a, b, c, and d
...
IF something
;This macro should jump to func2 without pushing anything
macro_jsr func2
ENDIF
rts
func2:
;This macro should assign a-d to addresses $04-$07 since $00-03$ are taken
macro_var a, b, c, d
;Code using a, b, c, and d
...
;This macro should push $00-$03 since the first call to it is already using them
;Note that the first call to func1 did not push anything!
macro_jsr func1
rtsAnother problem is where to put the memory to reduce pushes. Imagine if you assigned $00-$07 like this:
func1: $00-$03
func2: $04-$07
func3: ??? (4 bytes)
If the calling order is func1->func2->func3, it doesn't matter since it will have to push anyway.
With this order: func2->func1 and func2->func3 , func3 should use $00-$03.
With this order: func1->func2 and func1->func3 , func3 should use $03-$07.
One or more of the variables a, b, c, and d may not need to exist as variables at all if they were just calculated by recent operations and won't be needed ever again after main and/or func1 and/or func2 is done with them. They take space on the data stack only while they're needed, then they cease to exist. They never had a hard address, so the problem of trying to figure out where to put them (like $00-03 versus $04-07) is gone. If you pass them on the return (page-1) stack instead, C-style, there's extra overhead. [Some of the following is just copied from the stacks treatise.] Consider the problem where one routine passes parameters to another, and now there's a subroutine-return address on top putting the target data farther down the stack than expected, so you get incorrect results.
The separate data stack in ZP avoids these problems, because the subroutine return addresses don't go on it. ZP addressing itself is also more efficient than absolute addressing of course, and there are more addressing modes for ZP. Parameter-passing becomes implicit. Samuel Falvo (kc5tja on this forum) has an excellent post about this, the second post in the topic "Stack Frames?" Key benefits he mentions are:
- not having to re-mirror the input parameters on the stack all the time. This results in the fastest possible execution while still using stack frames.
- reducing the impact of recursion (the 6502 stacks treatise covers recursion in chapter 15)
- it's easy to return multiple values on the data stack too (even varying numbers of values, if it's advantageous enough)
- it results in more compact code, and even permits better factoring because you don't have to constantly push and pop duplicate arguments all the time when calling sub-subroutines
If a high-level language passes parameters through the hardware stack, you could have an instruction like:
Code: Select all
CALL REPLINE (A(4), M(), "Limit=", B, 600, T2)and all those parameters have to be specifically dealt with in the call. Several steps must be carried out at the call and return times, beyond just the call and return. However, with a separate data stack, input parameters that have been derived earlier can be left on the stack, just waiting there, without interfering with anything, until REPLINE is called, not storing them elsewhere, and not having to re-load them to put them in a stack frame now. Subroutine outputs can also be read from the stack and dealt with whenever you get around to it, again without having to store them right away as part of the return process. The resulting simpler, more efficient instruction might then be simply,
Code: Select all
REPLINEor in the case of 6502 assembly language,
Code: Select all
JSR REPLINEfunc1 can call func2 which can call func3 etc., and maybe func1 calls func3 directly after func2 is done running, and there is absolutely no overhead to passing the parameters. This is the way Forth does it, although you can do the same kind of thing in assembly.
In the following:
Quote:
Another useful feature would be basic optimizing like this:
If #VALUE is 5 then it should automatically reduce to this, since it is both smaller and faster than what it replaces:
On the other hand if #VALUE is 7 (and it's your only call to func) it could become this:
The trick is, I think, having the assembler spot that #VALUE is a constant that always leads to the same result and computing it for you. The strength is you only have to write the function once and it still works when A is loaded with something not constant, in which case the function should stay intact.
Code: Select all
main:
ldx #25
lda #VALUE
macro_jsr func
ldx #35
...
func:
clc
adc #4
IF A, EQUALS, 9
ina
rts
ELSE
dex
stx PERIPHERAL1
lda #0
ENDIFCode: Select all
lda #10
ldx #35Code: Select all
ldx #24
stx PERIPHERAL1
lda #0
ldx #35the accumulator being tested in the IF line doesn't exist at assembly time, so the assembler would have to somehow examine the preceding instructions, ones that are not in the macro definition or invocation. No assembler that I know of can do this. A possible workaround is to have a macro to replace the LDA#, and besides laying down the LDA# instruction, it stores what you loaded there, and the func macro (if that's what it is) can examine this to use in its conditional assembly. Even then, it might be risky, because what if another intervening instruction alters A, so it's no longer what the LDA# put in it.
Quote:
I can deal without most of what C has to offer but it would be nice to have just enough abstraction to do those kinds of optimizations. I think if you 1.) treated normal labels as function names,
Quote:
2.) only used temporary labels within functions,
Quote:
and 3.) only jumped to functions using their names, rather than hard coded addresses, you could do quite a lot. I think you would also never have to compromise performance and you would only increase program size if you chose speed over size. I suspect you would need a lot more than macros though.
Every routine will have an address, although part of the assembler's job is to hide it. If you want the routines relocatable, well, the '02 can do it—sorta—but it's very poorly suited for that. Chapters 12 and 13 deal with that. The 65816 does much better.
Interesting. Out of curiosity, do you have an idea of the speed trade off between doing the same job in assembly?
Every routine will have an address, although part of the assembler's job is to hide it. If you want the routines relocatable, well, the '02 can do it—sorta—but it's very poorly suited for that. Chapters 12 and 13 deal with that. The 65816 does much better.
Quote:
This is in Forth though, so I very seldom need X for anything else.
Depending on the characteristics one is after, there are several ways to run Forth, so there could be a wide range of answers. I won't write them all up here as it would get too long; but I'll just give a few tidbits:
- Data on the stacks is normally handled in cells which are a standard size. For the 6502, that's 16 bits, ie, two bytes, even if you only need 8 bits and you ignore the other 8. The 16 is because the '02 has a 16-bit address bus, and cells often contain addresses. If you want 32-, 48-, or 64-bit double-, triple-, or quad-precision numbers, you can combine cells. (You virtually never need more than double; and most operations are on single cells) Strings normally use one byte per character; but if you put just one character in a stack cell, it will take two bytes and the high byte is zeroed and/or ignored.
- In this sense, the 8-bit 6502 has to work harder than it would to handle just 8 bits at a time, so there's some inefficiency that is accepted in order to avoid the programming confusion and bugs you'd get if you tried to have mixed-length cells. (The treatise discusses this too.) It definitely works out well for development though. The 65816 with its 16-bit accumulator in particular is much more efficient at it, and my '816 Forth runs two to three times as fast as my '02 Forth at a given clock rate.
- There is always some overhead to get from one routine/function/word to the next. In the case of subroutine-threaded code, or STC, that's a JSR-RTS pair; but if something takes no more bytes than the JSR instruction itself, you might as well straightline it and speed it up; for example the word DROP drops a cell from the data stack by using INX INX which is only two bytes compared to a JSR's three, and much faster than JSR, INX, INX, RTS, so obviously it's worth straightlining. Even if something is a little longer, you might decide to optimize for speed instead of program size. Even with direct-threaded code (DTC) and indirect-threaded code (ITC) Forth, there are surprisingly low-overhead ways to get from one routine to the next in a list. (See Bruce Clark's ideas here and here.) Token threading is one of two other ways, but I don't know of any implementations of it in '02 Forth. Each of the 255 most-used routines are given a byte value so they can be referred to by a single byte, making the program more compact, but I'm sure the address look-up time slows things down.
- In any case, if you need greater speed for something, you can dip into assembly language at any time; so you're not really giving up performance when it matters most.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
Assuming I understand correctly what you want to do (which may not be a safe assumption), you can get around these problems by using a data stack, separate from the return stack, as discussed in the 6502 stacks treatise, particularly chapters 4 through 6. Although I'm a macro junkie, this might eliminate the complication that macros are meant to hide or automate, because the matter of passing parameters becomes implicit, rather than explicit. (I hope you're not disappointed.)
Imagine a function that uses four temporary variables in zero page with values calculated in the function (ie not arguments passed in). Here is an example of a fragment of an X indexed loop.
Code: Select all
LDA var1 ;3 cycles
STA peripheral ;3 cycles
STX peripheral ;4 cycles (if peripheral not in zero page)
LDA var2 ;3 cycles
STA peripheral ;4 cycles
LDA (var3,X) ;6 cycles
STA peripheral ;4 cycles
STA var4 ;3 cycleslCode: Select all
LDX data_sp ;3 cycles if in zero page, or ? cycles if ? (on stack?)
LDA var1,X ;4 cycles
STA peripheral ;4 cycles
LDX x_copy ;3 cycles if in zero page
STX peripheral ;4 cycles
LDX data_sp ;3 cycles
LDA var2, X ;4 cycles
STA peripheral ;4 cycles
LDA var3, X ;4 cycles
STA indirection_temp ;3 cycles, assuming in zero page
LDX x_copy ;3 cycles
LDA (indirection_temp,X) 6 cycles
STA peripheral ;4 cycles
LDX data_sp ;3 cycles
STA var4, X ;4 cyclesQuote:
One or more of the variables a, b, c, and d may not need to exist as variables at all if they were just calculated by recent operations and won't be needed ever again after main and/or func1 and/or func2 is done with them. They take space on the data stack only while they're needed, then they cease to exist.
Quote:
They never had a hard address, so the problem of trying to figure out where to put them (like $00-03 versus $04-07) is gone.
Quote:
However, with a separate data stack, input parameters that have been derived earlier can be left on the stack, just waiting there, without interfering with anything, until REPLINE is called, not storing them elsewhere, and not having to re-load them to put them in a stack frame now.
Code: Select all
x=Calculation1();
y=Calculation2();
z=Calculation3();
func(x,y,z);
func(z,x,y);1.) leave the calculated x, y, and z values on the stack without storing them somewhere else (yet)
2.) not make a copy of them for the first call to func
3.) make a copy of them on the stack in a different order for the second call to func
Quote:
the accumulator being tested in the IF line doesn't exist at assembly time, so the assembler would have to somehow examine the preceding instructions, ones that are not in the macro definition or invocation. No assembler that I know of can do this.
Do you know of an assembler that will output source after macro expansion but before assembling?
EDIT: Woops, missed part of your post:
Quote:
Quote:
and 3.) only jumped to functions using their names, rather than hard coded addresses, you could do quite a lot. I think you would also never have to compromise performance and you would only increase program size if you chose speed over size. I suspect you would need a lot more than macros though.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Quote:
Assuming I understand correctly what you want to do (which may not be a safe assumption), you can get around these problems by using a data stack, separate from the return stack, as discussed in the 6502 stacks treatise, particularly chapters 4 through 6. Although I'm a macro junkie, this might eliminate the complication that macros are meant to hide or automate, because the matter of passing parameters becomes implicit, rather than explicit. (I hope you're not disappointed.)
If somethng can be done both ways, the stack method often will be slower. Its benefit comes partly from being able to be re-entrant (including recursive) (the title of this topic is "Reentrant Programming [...]"), and reducing the number of variables needed, variables which may even run you out of ZP space otherwise. The translation of your first five lines below bring no memory penalty and almost no speed penalty. If you're doing thing by stacks, your X will be the stack pointer almost full time, and you don't keep swapping it out to use for something else like your LDX/STX data_sp / x-copy. It may take a little adjustment period to accustom oneself to doing things differently. The stacks method for your last three lines won't look favorable, at least not up front.
I shortened your comments a bit below so hopefully when I added more, they won't exceed anyone's window width:
Quote:
Imagine a function that uses four temporary variables in zero page with values calculated in the function (ie not arguments passed in). Here is an example of a fragment of an X indexed loop.
Code: Select all
LDA var1 ;3 cycles
STA peripheral ;3 cycles
STX peripheral ;4 cycles (if peripheral not in zero page)
LDA var2 ;3 cycles
STA peripheral ;4 cycles
LDA (var3,X) ;6 cycles
STA peripheral ;4 cycles
STA var4 ;3 cycleslFirst I'll comment on the first five lines:
Quote:
If you store your variables in zero page, this fragment takes 30 cycles. Here is how I think you would do it with a data stack in zero page (please let me know if there is a faster way to do it).
Code: Select all
x LDX data_sp ;3 cy if in ZP, or ? cycles if ? (on stack?) X is a nearly full-time stack pointer,
LDA var1,X ;4 cy so you probably don't need to re-load it.
STA peripheral ;4 cy
x LDX x_copy ;3 cy if in ZP You'd probably also have the desired value on the stack, so it's just LDA ZP,X.
STX peripheral ;4 cy You could re-use A anyway, because you don't need its contents from above again.
x LDX data_sp ;3 cy Then this line would be unnecessary, because you didn't overwrite X above.
LDA var2, X ;4 cy
STA peripheral ;4 cy So so far, we have 5 lines and 13 bytes (just as before), and 20 verus 17 cycles.The indexing for the next part is particularly inefficient for the '02 because indirection_temp is two bytes and you have to handle the low and high bytes separately, meaning it's even worse than you wrote. (It may be alleviated a bit if the high byte can remain fixed.) The '816 is a lot more efficient at handling 16 bits at once. Instead of using more ragtag variables which are hard to tame so one routine won't accidentally overwrite data still needed by another pending routine, a ZP space of about 8 bytes which we call N works well along with the stacks. N is always finished being used when you get to the end of a routine, and it's never used to pass parameters, and its use is never interrupted by subroutines. If the rules are followed, it's always safe. Anyway, I can't offer much of an improvement on the part replacing your three lines starting with LDA (var3,X).
Quote:
This takes 56 cycles (or can you speed it up?), which is almost twice as slow! Of course, I am deliberately trying to create the worst case scenario to prove a point, which is that unindexed zero page is faster when you are working with intermediate values.
You can however do an unlimited number of levels of indirection. With the '816 with A in 16-bit (for shorter, clearer code, it's just a series of
Code: Select all
LDA (0,X)
STA 0,X
LDA (0,X)
STA 0,X
<etc.>and you can insert an indexing anywhere, with something like CLC, ADC, STA 0,X.
Quote:
If you are using arguments passed to the function, which might already be on the stack, as you do with the Forth style stack you mentioned, I see why it is faster to leave them on the data stack and not make a new copy for every function call. However, every time the function rereads one of those stack arguments with X indexing, it incurs a one-cycle penalty. If you access it enough times, it will be faster to allocate space for it in zero page and make a copy.
True; but if avoiding the stack means you have to do more processing elsewhere to find a suitable place to put the data where it won't step on other data you still need, that will take extra time. If you further need it re-entrant (again the title of the topic), you're sunk. The ZP data stack solves these problems as well as ones that may have no solution at all outside of stacks. You can mix methods when needed though.
Quote:
Quote:
One or more of the variables a, b, c, and d may not need to exist as variables at all if they were just calculated by recent operations and won't be needed ever again after main and/or func1 and/or func2 is done with them. They take space on the data stack only while they're needed, then they cease to exist.
I've used the same memory locations for multiple things on the PIC16 where RAM is so limited and stack operations are rather prohibitive, and you sure have to be careful to make sure two different things don't need it at the same time when there are flurries of subroutines calling each other in different orders and directions. It's very bug-prone. I documented heavily, and still wound up with some real head-scratchers.
Quote:
Quote:
However, with a separate data stack, input parameters that have been derived earlier can be left on the stack, just waiting there, without interfering with anything, until REPLINE is called, not storing them elsewhere, and not having to re-load them to put them in a stack frame now.
Code: Select all
x=Calculation1();
y=Calculation2();
z=Calculation3();
func(x,y,z);
func(z,x,y);1.) leave the calculated x, y, and z values on the stack without storing them somewhere else (yet)
2.) not make a copy of them for the first call to func
3.) make a copy of them on the stack in a different order for the second call to func
Code: Select all
JSR <calc_x> ; Leaves x on the data stack. (Not necessarily a subroutine.)
JSR <calc_y> ; Leaves y on the data stack, on top of x. "
JSR <calc_z> ; Leaves z on the data stack, on top of y. "
JSR 3DUP ; DUPlicate the top 3 stack items. (See notes below.)
JSR func ; I assume your func takes 3 stack items, and returns none.
JSR minusROT ; ROTate z back behind x & y. (ROT goes the other direction.)
JSR func ; If you wanted func to use the stack data but not do the customary deletion of cells it uses when it's done with them, you could omit the 3DUP.
Quote:
Quote:
the accumulator being tested in the IF line doesn't exist at assembly time, so the assembler would have to somehow examine the preceding instructions, ones that are not in the macro definition or invocation. No assembler that I know of can do this.
The assembler would also have to see if any intervening called subroutines affect it, and if not, if any of the ones they call affect it, etc., however many levels deep it might go. If any of those have not been encountered yet in the assembly because they're further donw in the source code, later passes might produce phase errors, and it may take a lot of passes to resolve them. (That part in itself is not necessarily a problem.) If there's a way to do it though, it sounds attractive. You could do similar things with watching the carry flag for example so it knows if it can omit the CLC before an ADC or the SEC before a SBC, or more rarely, the decimal-mode flag. Maybe more.
Quote:
Do you know of an assembler that will output source after macro expansion but before assembling?
No, but enso started a topic "A sensible macro engine" where he proposes and explores the idea of a macro pre-processor that could then be used with various assemblers including ones with no macro capability. (This and other links are in my article on forming program-structure macros in assembly language through macros.)
Quote:
EDIT: Woops, missed part of your post:
Every routine will have an address, although part of the assembler's job is to hide it. If you want the routines relocatable, well, the '02 can do it—sorta—but it's very poorly suited for that. Chapters 12 and 13 deal with that. The 65816 does much better.
I meant if you wanted to try to get a program to optimize like in my example, you would have to follow a few rules of abstraction. There is no way to tell from a JMP instruction if it is entering a new function or just going back to the head of a loop in the same function. You would have to know that (I think) to optimize well.
Quote:
Quote:
and 3.) only jumped to functions using their names, rather than hard coded addresses, you could do quite a lot. I think you would also never have to compromise performance and you would only increase program size if you chose speed over size. I suspect you would need a lot more than macros though.
My article on using macros to form program structures shows this kind of thing where going back to the head of a loop does not require a label. It can be like BEGIN...UNTIL, BEGIN...WHILE...REPEAT, FOR_X...NEXT_X, etc., using indentation to make the structure clear, and not needing labels.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Reentrant Programming: Small versus Big Software Stacks
barrym95838 wrote:
You might end up encountering some difficulties assembling those stz (_sp),y instructions
Mike B.
Mike B.
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
If somethng can be done both ways, the stack method often will be slower. Its benefit comes partly from being able to be re-entrant (including recursive) (the title of this topic is "Reentrant Programming [...]"), and reducing the number of variables needed, variables which may even run you out of ZP space otherwise.
Code: Select all
main:
LDA #10
JSR count_down ;Recurse down to 1
...
count_down:
LDX $00 ;Save a copy of $00
PHX
STA $00 ;Do something useful with $00
STA screen
DEA ;Count down (from 10)
BEQ _zero
jsr count_down ;Recurse
_zero:
LDA $00 ;Do more useful stuff with $00
STA peripheral
PLX ;Restore calling function's $00
STX $00
RTSQuote:
If you're doing thing by stacks, your X will be the stack pointer almost full time, and you don't keep swapping it out to use for something else like your LDX/STX data_sp / x-copy.
Quote:
LDX x_copy ;3 cy if in ZP You'd probably also have the desired value on the stack, so it's just LDA ZP,X.
Quote:
Fortunately it's also not very common.
Quote:
True; but if avoiding the stack means you have to do more processing elsewhere to find a suitable place to put the data where it won't step on other data you still need, that will take extra time.
Quote:
If you further need it re-entrant (again the title of the topic), you're sunk.
Quote:
It's very bug-prone. I documented heavily, and still wound up with some real head-scratchers.
Quote:
The assembler would also have to see if any intervening called subroutines affect it, and if not, if any of the ones they call affect it, etc., however many levels deep it might go.
Quote:
No, but enso started a topic "A sensible macro engine" where he proposes and explores the idea of a macro pre-processor that could then be used with various assemblers including ones with no macro capability.
Quote:
My article on using macros to form program structures shows this kind of thing where going back to the head of a loop does not require a label. It can be like BEGIN...UNTIL, BEGIN...WHILE...REPEAT, FOR_X...NEXT_X, etc., using indentation to make the structure clear, and not needing labels.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Quote:
If somethng can be done both ways, the stack method often will be slower. Its benefit comes partly from being able to be re-entrant (including recursive) (the title of this topic is "Reentrant Programming [...]"), and reducing the number of variables needed, variables which may even run you out of ZP space otherwise.
Code: Select all
main:
LDA #10
JSR count_down ;Recurse down to 1
...
count_down:
LDX $00 ;Save a copy of $00
PHX
STA $00 ;Do something useful with $00
STA screen
DEA ;Count down (from 10)
BEQ _zero
jsr count_down ;Recurse
_zero:
LDA $00 ;Do more useful stuff with $00
STA peripheral
PLX ;Restore calling function's $00
STX $00
RTSQuote:
Quote:
If you're doing things by stacks, your X will be the stack pointer almost full time, and you don't keep swapping it out to use for something else like your LDX/STX data_sp / x-copy.
No one method will be best for all situations. The point is that when you add tools to your toolbox, you open up more possibilities of what you can do.
Quote:
Quote:
True; but if avoiding the stack means you have to do more processing elsewhere to find a suitable place to put the data where it won't step on other data you still need, that will take extra time.
A way you could avoid some use of stacks is to allocate buffers. These have their own advantages and disadvantages. It's just another tool. One advantage is that they can be bigger than a one-page stack allows—even tens of KB. Another is that they don't have to be a last-on-first-off kind of thing, IOW, you can delete (or even resize) allocated buffers that are not at the end. A disadvantage is that it's generally slower, especially if you have to do some memory-moving to prevent fragmentation. An example buffer usage would be to hold a symbol table during assembly on the 6502 computer itself. The table remains for the duration of the assembly process, then gets deleted, possibly after being stored on mass storage. The speed of the assembler is not as important as getting the flexibility you need. The resulting assembled code can be fast; but the onboard assembler that translated the assembly language into machine language doesn't need to be that fast. I recently wrote a set of words to allocate, resize, and delete buffers which would never fragment memory. I was a bit disappointed at how much code it took to do it. The code itself could sure benefit in clarity from a method I have in mind to implement named local variables (including local arrays, which ANS Forth doesn't provide for) in Forth, which again use a buffer. Chicken and egg, but it should be doable. Although the existing code needed almost no debugging to get it going, I still have never had this degree of stack gymnastics to deal with. The local variables would clear the air. I may also re-write the material in assembly someday.
Quote:
Quote:
My article on using macros to form program structures shows this kind of thing where going back to the head of a loop does not require a label. It can be like BEGIN...UNTIL, BEGIN...WHILE...REPEAT, FOR_X...NEXT_X, etc., using indentation to make the structure clear, and not needing labels.
Code: Select all
CMP #14
IF_EQ
<actions>
<actions>
<actions>
ELSE_
<actions>
<actions>
<actions>
END_IFand it would assemble exactly what you would do by hand, but it's more clear. It's also nestable. It will reduce bugs too, because you can spot them more easily before you even try assembling.
OTOH if you're talking about extending into using macros for forming nestable data structures and hiding some of the stack-type details, I'm interested, and if I ever catch up, I'll implement that too, and then expand the stacks treatise to cover it.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Reentrant Programming: Small versus Big Software Stacks
Quote:
What you're suggesting is the first point after the heading "A couple of locals methods:" in the Local variables & environments chapter of the 6502 stacks treatise. I recommend going through the whole treatise.
Quote:
Quote:
Too bad there is no LDA Address, (zp, X) to replace LDA Address, X
Quote:
You have to leave room for varying conditions that may come up during run time. There may be varying sizes of data which cannot be anticipated at assembly time, or combinations of branches based on conditions that also cannot be anticipated at assembly time.
Quote:
In any case, you're never locked into any set of macros, as you can always modify them or make your own.
Quote:
It will reduce bugs too, because you can spot them more easily before you even try assembling.
Quote:
OTOH if you're talking about extending into using macros for forming nestable data structures and hiding some of the stack-type details, I'm interested, and if I ever catch up, I'll implement that too, and then expand the stacks treatise to cover it. 
Re: Reentrant Programming: Small versus Big Software Stacks
Druzyek wrote:
Exactly. This is why I think a program to figure out what goes where would be a great help.
I don't think it would be a real stretch to write a "high level assembler" that has Garth-like iteration and control structures with higher level routine management, and the compiler can do some of these optimizations of allocating zero page work variables based on graph and data flow analysis.
Re: Reentrant Programming: Small versus Big Software Stacks
whartung wrote:
Druzyek wrote:
Exactly. This is why I think a program to figure out what goes where would be a great help.
I don't think it would be a real stretch to write a "high level assembler" that has Garth-like iteration and control structures with higher level routine management, and the compiler can do some of these optimizations of allocating zero page work variables based on graph and data flow analysis.
Dave...