kc5tja wrote:
Is it possible to statically analyze the code being translated to see if $FF appears as a direct-page address anywhere?
Um, the issue here is not:
LDA ($FF),Y
which is extremely rare to nonexistent (I can't think of a single program with an LDA ($FF),Y instruction), but zp wrap around like (from
http://6502.org/source/floats/wozfp1.txt):
Code:
1FB0 A2 FD LDX =$FD INDEX FOR 3-BYTE CONDITIONAL MOVE
1FB2 68 DIV3 PLA PULL A BYTE OF DIFFERENCE OFF STACK
1FB3 90 02 BCC DIV4 IF MANT2<E THEN DONT RESTORE MANT2
1FB5 95 08 STA M2+3,X
1FB7 E8 DIV4 INX NEXT LESS SIGNIF BYTE
1FB8 D0 F8 BNE DIV3 LOOP UNTIL DONE
Since M2 (the zp address) is $05, and STA M2+3,X assembles to 95 08, that, in and of itself, is not going to be much of a hint that wrap around will occur; similar code starts at $F4C5 in
http://6502.org/source/floats/wozfp3.txt. Likewise, in 64-bit or 128-bit division routines, where the comparison is high byte to low byte and the subtraction is done low byte to high byte, X (when used for indexing) will be incremented for one operation and decremented for the other, whether the bytes are stored in memory low byte first or high byte first.
A much less common case would be an ($FE,X) operand (likely near a (0,X) operand), which wraps around unless X is zero.
kc5tja wrote:
BTW, I think the 65816 does away completely with the wrap-around, at least in native-mode.
Yes. The 65816 has no page wrap around at all in native mode, but there is bank wrap around, e.g. when X is $FFFF, LDA 1,X (B5 01) reads address $000000 (i.e. bank 0). (Absolute addressing can span a bank boundary, though; LDA $0001,X (BD 01 00) reads address $010000, i.e. bank 1., when X is $FFFF) In emulation mode, there is stack, zp, etc. page wrap around, of course.
kc5tja wrote:
The practice of using RTS is much faster, and is also re-entrant. Using JMP (abs) requires a global variable, which might cause problems, particularly when/if an interrupt handler happens to do an indirect jump and trashes your pointer.
In addition to what has already been mentioned, another reason why
LDA TABLE_HI,X
PHA
LDA TABLE_LO,X
PHA
RTS
is/was used (and yes, it's quite common) is that the code is shorter (less space) than
LDA TABLE_HI,X
STA DEST+1
LDA TABLE_LO,X
STA DEST
JMP (DEST)
which was once an very important factor. However, when DEST is on the zero page, the latter takes one fewer cycle on the NMOS 6502 (they both take the same number of cycles on the 65C02). (Both forms have been used; Apple II Integer Basic used the latter with DEST on the zero page, but Sweet 16 (in the repository) uses the former, and both had the same author). Also, JMP (LABEL) has a bug on the NMOS 6502 when LABEL is $XXFF, which has resulted in the use of JMP (abs) being discouraged.
The 65C02 has a JMP (abs,X) instruction which largely eliminates the need for RTS-style jump tables, but a lot of software is/was written so as not to exclude the NMOS 6502 whenever feasible.