I've started looking at what changes I can make to the Commodore 64's BASIC to take advantage of the new instructions in my 65020 design (which I'm working on again, slowly).
Profiling a simple test program showed that a huge amount of time is spent in CHRGET / CHRGOT. This routine always struct me as overcomplicated. Surely you don't always need to know if the character you've fetched is a digit - why go through that lengthy test every time, only to ignore the carry flag when you return?
If you have an LDA (zp) instruction without indexing, it's possible to replace many (but not all) calls to CHRGOT with an inlined LDA (TXTPTR). There's no need to test for space and loop, because CHRGOT can never be called on a space.
So I tried that on one instance, and ... well, it does work, and it does make difference. But in the process I've discovered a latent bug. The program would crash with a "NEXT WITHOUT FOR" error, unless I followed the LDA (TXTPTR) with a NOP.
Here's the code responsible:
Code:
DOPRE1 LDA OPTAB+2,Y
PHA
LDA OPTAB+1,Y
PHA
JSR PUSHF1
LDA OPMASK
JMP LPOPER
SNERR5 JMP SNERR
PUSHF1 LDA FACSGN
LDX OPTAB,Y
PUSHF TAY
PLA
STA INDEX1
INC INDEX1
PLA
STA INDEX1+1
TYA
PHA
The JSR PUSHF1 pushes the address of (one byte before) the next instruction. PUSHF then pops this address, increments it, and stores it in INDEX1. Later on there's a JMP (INDEX1). Can you see the problem?
It's only incrementing the low byte of INDEX1. My changes placed the instruction after JSR PUSHF1 at the start of a page, so the address pushed to the stack ended in $ff. Increment just the low byte of that, and JMP (INDEX1) will take you 256 bytes from where you should be.
It's not a problem in the Commodore 64, because that instruction is normally nowhere near the start of a page. I just got very, very unlucky.