This is a 65816-coding efficiency question about the working code at the bottom of this post. My 65816 code assumes 16-bit wide accumulator and index registers, but I switch to 8-bit wide registers as needed. While porting the accumulator print function from the 6502, I found two ways to do it:
1) Maintain a 16-bit wide accumulator and in the @print_nybble function extended three 8-bit constants from $0f to $000f, $90 to $9990, and $40 to $9940.
2) Retain the 8-bit constants but switch to an 8-bit wide accumulator.
The synchronous serial I/O is the bottleneck, so this isn't significant in this code block. But loading 16-bit constants and ANDing and ADDing them to a 16-bit accumulator when only 8-bit wide is relevant seems wasteful. I read the 65816 documentation, and it says add one cycle for 16-bit wide operations. So that's three cycles per call, a total of six for the 8-bit accumulator and twelve for a 16-bit wide accumulator. But that savings is offset by the six cycles required for the SEP and REP instructions.
Am I doing this accounting correctly? If so, it seems like switching to an 8-bit wide accumulator in a tight loop of some sort is worth it, but otherwise is not.
Code: Select all
; f_printa - prints lower eight bits of the accumulator in hex to the console.
; Inputs:
; A - byte to print
; Outputs:
; A - retained
PUBLIC f_printa
pha
pha
lsr
lsr
lsr
lsr
jsr @print_nybble
pla
jsr @print_nybble
pla
rts
@print_nybble:
and #LOWNIB
sed
clc
adc #$9990 ; Produce $90-$99 or $00-$05
adc #$9940 ; Produce $30-$39 or $41-$46
cld
jmp f_putch
ENDPUBLIC
; f_printc - prints C as a 16 bit hex number to the console.
; Inputs:
; C - number
; Outputs:
; C - preserved
PUBLIC f_printc
pha
pha
xba
jsr f_printa
pla
jsr f_printa
pla
rts
ENDPUBLIC