GARTHWILSON wrote:
Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)
I'd like to add:
1. If the data stack pointer is the X register (i.e. the usual 6502 implemenation), it will often be more convenient to use LDY H_STK_PTR and access the "high-precision" stack with H_STK_0,Y and H_STK_1,Y etc. than using LDX and H_STK_0,X etc. since you won't have to save and restore X in the former case. Of course abs,Y addressing isn't available for all instructions (e.g. ASL).
2. The "high-precision" stack pointer can be decremented with a DEC H_STK_PTR or incremented with an INC H_STK_PTR. Decrementing or incrementing the data stack pointer traditionally takes 4 cycles (a pair of DEXs or a pair of INXs), and a INC zp takes only 5 cycles and a INC abs takes only 6 cycles, so the performance hit is small.
3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)
4. There are LDX abs,Y and LDY abs,X instructions but no corresponding STX abs,Y and STY abs,X instructions, so you'll have to use a TXA STA sequence or a TYA STA sequence instead, which adds 2 cycles.