Quote:
You can save X on the stack too, but then take that into account when formulating the number to add the index to.
Would you do that by having the macro assign a temporary value that is added to the hard coded base address and replace all push and pull arguments with macros that increase or decrease that value? It seems like it would be tough to keep adding and subtracting to the base yourself, especially if you go back and add a push or pull and have to redo the following ones.
Quote:
Also, the point of the macros is that you're not permanently turning X into an extra stack pointer, so it is expected that X may get used for other purposes in between stack accesses, so logically X will need to be reloaded.
That makes sense. If it helps you, I think it is good that you have them, even if you sacrifice a little speed for ease of use.
I was curious about maximizing performance, so I tested out a few different ways on a simple function. When A='a', X=3, and Y=4 the function prints "aaabbbcccddd." A returns with the count of characters printed. Here are the cycle counts for the ways I tried, even though I would not say they are generalizable in any way. I imagine doing some of this with macros but they are expanded so you can see what I'm thinking. Could I change anything to make it faster?
I think doing everything in zero page could be the fastest. The fourth way without pushing anything only works for top level functions. I don't think it is possible with macros but a more sophisticated program could automatically rearrange zero page usage so as few bytes as possible have to be pushed.
1. Hardware stack: 322
2. Pushing zero page with loop: 324
3. Pushing zero page with unrolled loop: 277
4. Zero page with no pushing: 229
5. Zero page stack: 323
Code:
LDA #'A' ;Character to start printing with
LDX #3 ;How many times to repeat each character
LDY #4 ;How many sets of characters
JSR func
Output = AAABBBCCCDDD
1. Hardware stack
Code:
func:
PHA ;Copy of character
PHX ;Copy of number to print (3)
TSX ;Make room for one byte to hold return value (characters printed)
;Alarm Siren's macro chooses faster of two subtraction methods
;TXA
;SEC
;SBC #1
DEX
;TAX
TXS
;Save copy of calculated stack offset (replaces TXS)
STX zp_sp_copy
;Zero one byte variable which returns total characters printed
STZ HWSTACK_BASE+0+1,X
loop1:
;Load value passed in through X (number of characters to print = 3)
LDA HWSTACK_BASE+1+1,X
;Save a copy to load into X counter
PHA
;Load the character to print
LDA HWSTACK_BASE+2+1,X
;Number of characters to print = 3
PLX
loop2:
putc
DEX
BNE loop2
;Reload X with calculated stack offset instead of TXS
LDX zp_sp_copy
;Count of characters printed
LDA HWSTACK_BASE+0+1,X
CLC
;Add 3 to total
ADC HWSTACK_BASE+1+1,X
STA HWSTACK_BASE+0+1,X
;Advance to the next character
INC HWSTACK_BASE+2+1,X
;Print five sets
DEY
BNE loop1
;Get character count before X and SP are changed
LDY HWSTACK_BASE+0+1,X
;Empty stack (character to print, characters to print, and total printed)
TSX
TXA
CLC
ADC #3
TAX
TXS
;Character count to A
TYA
RTS
2, 3, 4. Zero page
Code:
;**********2. Pushing zero page with loop**********
func:
;Copy of A (character to print)
STA zp_temp
;Copy of X (characters to print = 3)
STX zp_temp2
LDX #0
;Push zero page bytes so we can use them
loop3:
LDA R0,X
PHA
INX
CPX #3
BNE loop3
;Store A (character to print) in R2
LDA zp_temp
STA R2
;X = characters to print = 3
LDA zp_temp2
STA R1
;Count of total characters printed
STZ R0
;**********3. Pushing zero page with unrolled loop**********
func:
STA zp_temp
LDA R0
PHA
LDA R1
PHA
LDA R2
PHA
LDA zp_temp
STA R2
STX R1
STZ R0
;**********4. No pushing**********
func:
STA R2
STX R1
STZ R0
;**********Body for 2, 3, and 4**********
loop4:
;X = characters to print = 3
LDX R1
;A = character to print
LDA R2
loop5:
putc
DEX
BNE loop5
;Add characters printed (3) to total in R0
LDA R0
CLC
ADC R1
STA R0
;Advance to the next charcter
INC R2
DEY
BNE loop4
;**********Ending for 2**********
LDX #2
;Restore the three bytes we used
loop6:
PLA
STA R0,X
DEX
BPL loop6
;Character count is return value
LDA R0
RTS
;**********Ending for 3**********
PLA
STA R2
PLA
STA R1
PLA
STA R0
;Not needed but macros would not optimize it out
LDA R0
RTS
;**********Ending for 4**********
LDA R0
RTS
5. Zero page stack
Code:
func5:
;Character to print
PHA
;Pointer into zero page stack
LDA zp_ptr
;Increase by 3 for A copy, X copy, and character count
CLC
ADC #3
STA zp_ptr
;Push copy of characters to print (3)
PHX
;X points to zero page stack
TAX
;Characters to print (3)
PLA
STA 1,X
;Character to print
PLA
STA 2,X
;Zero count of total characters printed
STZ 0,X
loop34:
;Count of characters to print (3)
LDA 1,X
;Necessary since we cant do LDX 1,X
PHA
;Character to print
LDA 2,X
;X is now 3, count of characters to print
PLX
loop35:
putc
DEX
BNE loop35
;Pointer to stack in zero page
LDX zp_ptr
;Count of total characters printed
LDA 0,X
CLC
;Add three to total (printed three characters
ADC 1,X
STA 0,X
;Advance to next character
INC 2,X
DEY
BNE loop34
;Free up the three bytes we took on the stack
LDA zp_ptr
SEC
SBC #3
STA zp_ptr
;Returns total characters printed in A
LDA R0
RTS