Another option is self modifying code. It is faster because indirect ops eat more cycles. This would require the code to be in RAM, or if you are running from ROM to have a segment of the code to be moved to RAM. EHbasic does that as part of its get current/next byte routine.
Code:
; page 0 initialisation table from $BC
; increment and scan memory
LAB_2CEE
INC Bpntrl ; increment BASIC execute pointer low byte
BNE LAB_2CF4 ; branch if no carry
; else
INC Bpntrh ; increment BASIC execute pointer high byte
; page 0 initialisation table from $C2
; scan memory
LAB_2CF4
LDA $FFFF ; get byte to scan (addr set by call routine)
CMP #TK_ELSE ; compare with the token for ELSE
BEQ LAB_2D05 ; exit if ELSE, not numeric, carry set
CMP #':' ; compare with ":"
BCS LAB_2D05 ; exit if >= ":", not numeric, carry set
CMP #' ' ; compare with " "
BEQ LAB_2CEE ; if " " go do next
SEC ; set carry for SBC
SBC #'0' ; subtract "0"
SEC ; set carry for SBC
SBC #$D0 ; subtract -"0"
; clear carry if byte = "0"-"9"
LAB_2D05
RTS
Above code is moved here during init:
Code:
LAB_IGBY = $BC ; get next BASIC byte subroutine
LAB_GBYT = $C2 ; get current BASIC byte subroutine
Bpntrl = $C3 ; BASIC execute (get byte) pointer low byte
Bpntrh = Bpntrl+1 ; BASIC execute (get byte) pointer high byte
; = $D7 ; end of get BASIC char subroutine
In this case the location pointed to by Bpntrl/h can be accessed as LDA (Bpntrl),Y or by calling LAB_GBYT. Calling LAB_IGBY will increment Bpntr before accessing it.
Another example in non ZP RAM from my SPI/I2C speed tests:
Code:
ldy #hi(buf1) ;reset self modified address
sty rd_pio1_hi
ldx #lo(buf1) ;0 in this case
rd_pio1
lda spi_165
rd_pio1_hi = *+2
sta $1000,x ;self modified address
inx
bne rd_pio1
iny
sty rd_pio1_hi
cpy #(buf1>>8)+$10 ;16 pages - 4k
bne rd_pio1
;.....
ldy #hi(buf1) ;reset self modified address
sty wt_pio1_hi
ldx #lo(buf1) ;0 in this case
wt_pio1
wt_pio1_hi = *+2
lda $1000,x ;self modified address
sta spi_595
inx
bne wt_pio1
iny
sty wt_pio1_hi
cpy #(buf1>>8)+$10 ;16 pages - 4k
bne wt_pio1