Alienthe wrote:
To follow up on that idea, you could have the window always aligned so that 00 is TOS (which is what you stated) and then 01 is next etc. no matter what the stack pointer was (which I am not sure you indicated).That means massaging stacked values as in the example earlier could be done on zero page without indexing or even without looking up what the stack pointer was.
The problem with that is that is that it locks out the possibility of using names for local variables, and more bugs can result from not counting pushes and pulls that might be mixed in. Suppose for example you get into a routine that needs four bytes of input and output, passed through the stack, and three bytes of independent local variables. The routine might start with:
Code:
PHA ; Add three more bytes to the stack.
PHA ; They will get used below. (Remember to
PHA ; pull them off the stack at the end.)
length: SETL $101 ; Assign names to the three bytes of
width: SETL $102 ; local variables created above. Each
height: SETL $103 ; variable is one byte in this case.
weight: SETL $104 ; Now assign names to the ones passed on the stack.
density: SETL $106 ; weight gets 2 bytes, and density and speed each get
speed: SETL $107 ; one. These could have additional names for data
; sent back to the calling routine in the same bytes.
SETL in the C32 assembler is "SET Label," like EQU in most assemblers but you can change the value assigned to a label as many times as you wish. I believe Kowalski's assembler uses .= or .SET .
Now suppose you change the depth of the stack, for example by using PHP and PLP for temporary storage of the status:
Code:
TSX
LDA density, X ; Access variable "density".
<do_stuff>
PHP ; Now the stack is temporarily one byte
<do_stuff> ; deeper; but since we don't do TSX again,
LDA density, X ; "density" is still at 106,X, even though
<do_stuff> ; it's no longer the 6th item on the stack.
PLP
<more_code>
ADC density, X ; Access "density" yet again.
<etc. etc.>
Inside the local environment, ie, in the subroutine that carries out the process and comes right after the set of locals definitions, locals will be referred to with absolute indexed addressing, like LDA FLOW2,X where X's contents came from the TSX.
Since a label can be assigned new values as many times as you wish (with SETL or .= or similar), and since you put the relevant locals assignments right before the subroutines that need them, names can be re-used, and the right stack offset value will be used for each subroutine. So for example we could have another routine that has the following locals in the same source code file, and there will be no conflict between FLOW2
below and FLOW2
above.
Code:
FLOW1: LOCAL 2
FLOW2: LOCAL 2
FLOW3: LOCAL 2
subroutine_label:
TSX
<followed by the code that uses these local variables>
Each subroutine will use the right "FLOW2" local variable, even if one subroutine calls the other. In fact, a subroutine using local variables can be
recursive, meaning it can even call itself, over and over, until a condition is met to stop the heavy nesting and unwind itself.
If you need a lot of local variable space, using a lot of PHA's will of course not be as efficient as:
Code:
TSX
TXA
SEC
SBC #$18
TAX
TXS
In this case, putting $18 (24 in decimal) bytes on the stack takes 12 clocks instead of 72, and 7 bytes instead of 24, so it's 6 times as fast and 3.5 times as memory-efficient. The break-even point is at 4 bytes of variables for speed, and 7 bytes for program memory. (Be careful that you don't depend on uninitialized variables though.) This is another thing that should not be forfeited by any hardware tricks to extend the stack, particularly since situations where the greater stack space would be desirable are the same ones where you might want to allot such large portions of local variable space.
Quote:
How deep does one normally have to probe the stack? The example extends to $107 but I don't know if there are other typical use cases that go deeper.
I don't remember ever seeing anything in actual code that was more than $109 or $10A. The example was from a simple multiplication routine; but other applications with lots of local variables could conceivably get quite a bit more complex.