GARTHWILSON wrote:
The 6502's native stack is in page 1, ie, from $0100-01FF. I say "native" for lack of a better term since you can have loads of stacks
The proper term for this is the
hardware stack, since this is the stack implemented by the microprocessor's hardware itself. Other forms of stacks are implemented in software.
Also, sometimes it helps to know the order of operations. The S register is used the same way a text cursor on the display is used. The cursor tells where
the next character to be typed will go. Note that the cursor is
always to the immediate right of the last typed character; it never points
at the last character, unless you type a cursor movement key to place it explicitly.
Likewise, S register works the exact same way. It always points just one byte beyond the last pushed byte -- in other words, it determines where
the next byte to be pushed will go. Note that in most other microprocessors, the stack pointer registers are used somewhat differently: they always determine where
the last pushed byte or word is located, depending on whether the CPU deals with bytes or whole words.
Thus, to push something onto the stack, the microprocessor executes the following steps (note: this is pseudocode -- many "instructions" that I'm using here do not physically exist on the 6502!):
Code:
how_PHA_works: (analogous to typing a character on the screen)
sta $0100,s
dec s
how_PLA_works: (analogous to backspacing over the last typed character)
inc s
lda $0100,s
Remember that S is an 8-bit register, just like X and Y are in the 6502; hence, after 256 pushes or pops, the stack pointer will automatically wrap around. The bad news is that you're limited to at most 128 levels of nesting for subroutines. The good news is you'll
never have a stack over-run problem like you sometimes do with other CPUs. Realistically speaking, however, well designed software generally won't even approach 8 levels of subroutine nesting in practice -- this is good because there's usually more data you want to store on the stack, which eats into that theoretical limit pretty quickly. IIRC, Garth made a measurement of stack utilization once for one of his projects, and it came to something like only 48 bytes or so, including interrupt handling overhead. Therefore, the only time you have to worry about the stack in practice is when you're writing a recursive function. But even here, you can always use a software-implemented stack to overcome that limitation.
Note also that the stack grows
downward in memory. This is done more to optimize the operation of the hardware -- it actually will take less transistors to implement a stack this way than it would for a stack growing upward in memory. Part of the reasons this is the case is because addresses stored in the stack match the layout of addresses stored everywhere else; hence, when computing an effective address from popping an address off the stack (which the CPU must do when executing an RTS instruction, for example), it can do it in the normal low-byte-first, high-byte-next format.
The mechanism for pushing and popping is the same for other registers.
Again, contrast this with how the x86 or 680x0 series works:
Code:
how_PUSH_AX_works: (8086; 16-bit code; 32-bit works the same way)
sub sp,2
mov [sp],ax
how_POP_AX_works: (8086; 16-bit code; 32-bit works the same way)
mov ax,[sp]
add sp,2
how_PUSH_AX_works_286: ( for 80286 or later processors )
mov [sp-2],ax
sub sp,2
how_PUSH_AX_works_286: ( for 80286 or later processors )
add sp,2
mov ax,[sp-2]
how_-(A7)_works: (680x0)
subq.l #4,a7
move.l reg,(a7)
how_(a7+)_works: (680x0)
move.l (a7),reg
addq.l #4,a7