This thread totally nerd-sniped me. I decided to go for time efficiency and implement a solution similar to Chromatix's idea of a decision-based algorithm that picks one of 8 cases (2 of which are invalid).
I was tempted to do some self-modifying code magic (
opcodes of "STY/STA/STX zp" and "LDY/LDA/LDX zp" are dangerously close to each other, and they all differ only with 2 lowest bits! ) - but I didn't have guts to make any use of that...
Here's what I ended up with.
NOTE: This is not tested. I did some very basic sanity checks with sim65 in order to count clock cycles. But this code should give you a general idea.
Code:
; O(1) time sort of three values from zero page
; 48 cycles in most cases (excluding RTS); 33 cycles if input is already sorted
; - Requires 65C02 for JMP (abs,X)
; - Not very memory-efficient
; - Requires extra cycle if `table` crosses page boundary
; - swap2..swap7 can be optimized even further by not using reorder macro (which sometimes does redundant writes)
; - zp0..zp2 can be located anywhere
; ZP addresses of inputs
zp0 = 253
zp1 = 254
zp2 = 255
.macro reorder aa, bb, cc
LDA zp0
; Y is already loaded with zp1
LDX zp2
STA aa
STY bb
STX cc
.endmacro
sort:
LDA #0
LDY zp0
CPY zp1
ROL ; 1 if zp0 >= zp1
CPY zp2
ROL ; 1 if zp0 >= zp2
LDY zp1
CPY zp2
ROL ; 1 if zp1 >= zp2
ASL
TAX
JMP (table, x) ; (jsr, rts)
table:
.word swap0 ; zp0 < zp1, zp1 < zp2, zp0 < zp2
.word 0 ; zp0 < zp1, zp1 < zp2, zp0 >= zp2 - not possible
.word swap2 ; zp0 < zp1, zp1 >= zp2, zp0 < zp2
.word swap3 ; zp0 < zp1, zp1 >= zp2, zp0 >= zp2
.word swap4 ; zp0 >= zp1, zp1 < zp2, zp0 < zp2
.word swap5 ; zp0 >= zp1, zp1 < zp2, zp0 >= zp2
.word 0 ; zp0 >= zp1, zp1 >= zp2, zp0 < zp2 - not possible
.word swap7 ; zp0 >= zp1, zp1 >= zp2, zp0 >= zp2
swap0:
; Already sorted (-15 cycles)
RTS
swap2:
reorder zp0, zp2, zp1
RTS
swap3:
reorder zp1, zp2, zp0
RTS
swap4:
reorder zp1, zp0, zp2
RTS
swap5:
reorder zp2, zp0, zp1
RTS
swap7:
reorder zp2, zp1, zp0
RTS
EDIT: Turns out `reorder` macro can be easily improved:
EDIT 2: This updated macro has a bug. See TobyLobster's next post with a fix.
Code:
.macro reorder aa, bb, cc
.if aa != zp0 ; save 6 cycles if already sorted
LDA zp0
STA aa
.endif
.if bb != zp1 ; save 3 cycles if already sorted
; Y is already loaded with zp1
STY bb
.endif
.if cc != zp2 ; save 6 cycles if already sorted
LDA zp2
STA cc
.endif
.endmacro
This saves 6 cycles if `zp0` or `zp2` are already sorted, and 3 cycles if `zp1` is already sorted - up to 15 saved cycles total.
EDIT 3: Here's an updated table & functions with correct equations (thanks TobyLobster!)
Code:
table:
.word swap0 ; zp0 < zp1, zp0 < zp2, zp1 < zp2
.word swap1 ; zp0 < zp1, zp0 < zp2, zp1 >= zp2
.word 0 ; zp0 < zp1, zp0 >= zp2, zp1 < zp2 - not possible
.word swap3 ; zp0 < zp1, zp0 >= zp2, zp1 >= zp2
.word swap4 ; zp0 >= zp1, zp0 < zp2, zp1 < zp2
.word 0 ; zp0 >= zp1, zp0 < zp2, zp1 >= zp2 - not possible
.word swap6 ; zp0 >= zp1, zp0 >= zp2, zp1 < zp2
.word swap7 ; zp0 >= zp1, zp0 >= zp2, zp1 >= zp2
swap0:
reorder zp0, zp1, zp2; no-op
RTS
swap1:
reorder zp0, zp2, zp1
RTS
swap3:
reorder zp1, zp2, zp0
RTS
swap4:
reorder zp1, zp0, zp2
RTS
swap6:
reorder zp2, zp0, zp1
RTS
swap7:
reorder zp2, zp1, zp0
RTS