Mini-challenge: Sort three bytes

Programming the 6502 microprocessor and its relatives in assembly and other languages.
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Mini-challenge: Sort three bytes

Post by TobyLobster »

Anyone up for a fun little challenge?

Suppose there are three bytes in zero page: z0, z1, z2.
The result should be the same three bytes sorted in place from smallest to largest: z0 <= z1 <= z2.

Two categories: smallest code, and fastest code!
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: Mini-challenge: Sort three bytes

Post by Chromatix »

Let's start with a straightforward approach, then:

Code: Select all

SortThree:
  LDA z0
  LDX z1
  LDY z2

  CMP z1
  BCS :+
  STX z0
  TAX
  LDA z0

: CMP z2
  BCS :+
  STA z2
  STY z0
  TYA

: CPX z2
  BCS :+
  STX z2
  STY z1
  RTS

: STX z1
  RTS
The fastest approach on average is probably a decision tree, loading all the values into registers (as above), then taking the three comparisons in sequence to produce eight leaves, and just perform up to three stores in each leaf. That would be a lot bigger than the above, however. The tricky part would be making the above smaller.
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: Mini-challenge: Sort three bytes

Post by gfoot »

My first thought was to CMP pairs and ROL the carry bits into the bottom of the accumulator, then figure out what to do. But that's probably slow!

Based on Chromatix's code, I made some changes that make it a bit shorter, I haven't thought through the speed implications though:

Code: Select all

SortThree:
  LDA z0
  LDX z1

  CMP z1
  BCC :+
  STX z0
  TAX

: CPX z2
  BCC :+

  LDA z2
  STX z2
  TAX

  CPX z0
  BCS :+

  LDX z0
  STA z0

: STX z1
  RTS
Edit: And here's a first pass speed-optimised version - just writing out all the comparisons and swaps that are required, without attempting to share any code between branches. It may be possible to reduce the time taken by changing the order of comparisons in some cases. By my count the worst case here is 33 cycles until the RTS, not including it.

Code: Select all

SortThree:
    LDA z0                 ; 33
    CMP z1                 ; 30
    BCC z0_lt_z1           ; 27 / 25

    ; z1 < z0
    CMP z2                 ; 25
    BCC z1_lt_z0_lt_z2     ; 22 / 12

    ; z1 < z0 and z2 < z0
    LDX z2                 ; 20
    STA z2                 ; 17
    CPX z1                 ; 14
    BCC z2_lt_z1_lt_z0     ; 11 / 6

    ; z1 < z2 < z0
    LDA z1                 ; 9
    STA z0                 ; 6
    STX z1                 ; 3
    RTS

z2_lt_z1_lt_z0:
    STX z0                 ; 3
    RTS

z1_lt_z0_lt_z2:
    LDX z1                 ; 9
    STX z0                 ; 6
    STA z1                 ; 3
    RTS

z0_lt_z1:
    CMP z2                 ; 22
    BCS z2_lt_z0_lt_z1     ; 19 / 18

    ; z0 < z1 and z0 < z2
    LDA z1                 ; 17
    CMP z2                 ; 14
    BCC return             ; 11 / 3

    LDX z2                 ; 9
    STX z1                 ; 6
    STA z2                 ; 3
return:
    RTS

z2_lt_z0_lt_z1:
    LDX z1                 ; 15
    LDY z2                 ; 12
    STY z0                 ; 9
    STA z1                 ; 6
    STX z2                 ; 3
    RTS
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Re: Mini-challenge: Sort three bytes

Post by TobyLobster »

This is Chromatix version but with fixes added:

Code: Select all

; 38 bytes, 44.61 cycles average
SortThree
  LDA z0
  LDX z1
  LDY z2

  CMP z1
  BCS +         ; branch if z0 >= z1
  STX z0
  STA z1    ; TOBY: added
  TAX
  LDA z0

+ CMP z2
  BCS +         ; branch if z0 >= z2
  STA z2
  STY z0
  TYA
  LDY z2    ; TOBY: added

+ CPX z2
  BCS +         ; branch if z1 >= z2
  STX z2
  STY z1
+           ; TOBY: added
  RTS

 ;STX z1   ; TOBY: removed
 ; RTS
This is 38 bytes, 44.61 cycles average.

gfoot: your versions may not be correct without these fixes.
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: Mini-challenge: Sort three bytes

Post by gfoot »

TobyLobster wrote:
gfoot: your versions may not be correct without these fixes.
I tested my first one in Owlet: https://bbcmic.ro/#%7B%22v%22%3A1%2C%22 ... C%2C%22%7D

But didn't test the second one.

Edit: enhanced test framework that measures and times each situation - applied to my second (speed) one: https://bbcmic.ro/#%7B%22v%22%3A1%2C%22 ... PROC%22%7D
IamRob
Posts: 357
Joined: 26 Apr 2020

Re: Mini-challenge: Sort three bytes

Post by IamRob »

gfoot wrote:
My first thought was to CMP pairs and ROL the carry bits into the bottom of the accumulator, then figure out what to do. But that's probably slow!

Based on Chromatix's code, I made some changes that make it a bit shorter, I haven't thought through the speed implications though:

Code: Select all

SortThree:
  LDA z0
  LDX z1

  CMP z1
  BCC :+
  STX z0
  TAX

: CPX z2
  BCC :+

  LDA z2
  STX z2
  TAX

  CPX z0
  BCS :+

  LDX z0
  STA z0

: STX z1
  RTS

The BCC :+ could go straight to the RTS to save a couple cycles. The only difference between this one and the optimized speed version that I can see is the two TAX's for a total of 4 cycles of savings thru the routine.

This routine has the added advantage that it preserves the Y-reg. Which means it can be used to seed the 3 bytes from a table or input register of some kind.

Very nice.
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Re: Mini-challenge: Sort three bytes

Post by TobyLobster »

gfoot wrote:
TobyLobster wrote:
gfoot: your versions may not be correct without these fixes.
I tested my first one in Owlet
My mistake: I've run it though my own tests here and your code is correct. I have C code that runs through all possible input values doing the cycle counting.
I make your first routine 31 bytes long and ~39 cycles average, the second is 63 bytes long ~34 cycles average (including the final RTS, or ~28 without).
Good stuff.
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Re: Mini-challenge: Sort three bytes

Post by TobyLobster »

IamRob wrote:
The BCC :+ could go straight to the RTS to save a couple cycles.
I think the final ": STX z1" seems to be needed, e.g. when [z0,z1,z2] = [1,2,0].
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Re: Mini-challenge: Sort three bytes

Post by TobyLobster »

I can do a smaller 29 byte version (which is much slower though) using a different approach, a simple sorting network:

Code: Select all

; 29 bytes, 99.35 cycles average
.sort3
    ldx #0
    ldy #1
    jsr .swap
    iny
    jsr .swap
    inx
.swap
    lda z0,y
    cmp z0,x
    bcs .done
    pha
    lda z0,x
    sta z0,y
    pla
    sta z0,x
.done
    rts
It assumes z0,z1,z2 are adjacent in memory.
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: Mini-challenge: Sort three bytes

Post by gfoot »

Oh that's very neat.

I got sidetracked making my VIA-based timing code as accurate as possible, timing down to the individual cycle even though the VIAs only run at half the CPU clock speed :)
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: Mini-challenge: Sort three bytes

Post by gfoot »

Riffing on yours tobylobster - 22 bytes, 111 cycles on average including the RTS:

Code: Select all

.SortThree
    jsr s3
.s3
    ldx #z0
    jsr swap
    inx
.swap
    lda 0,X
    cmp 1,X
    bcc done
    ldy 1,X
    sty 0,X
    sta 1,X
.done
    rts
teamtempest
Posts: 443
Joined: 08 Nov 2009
Location: Minnesota
Contact:

Re: Mini-challenge: Sort three bytes

Post by teamtempest »

Not the world's fastest code, but only 20 bytes. Untested and no time estimate.

Code: Select all

swap:
	ldx #1

next:
	lda z1,x
	cmp z0,x
	bcs noswap
	ldy z0,x
	sta z0,x
	sty z1,x
	bcc swap

noswap:
	dex
	beq next
	rts

; worst case: z0 > z1 > z2
; 1: z1 > z2; swapped
; 2: z1 < z2; no swap; z0 > z1; swapped
: 3: z1 < z2; no swap; z0 < z1; no swap 	
Incidentally, I compared the way I did in case zx == zy, to prevent unnecessary swaps of equal values.
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: Mini-challenge: Sort three bytes

Post by gfoot »

teamtempest, nice, I just did roughly the same thing - it was 20 bytes, 90 cycles.

Edit: but this is even better, 18 bytes and 84 cycles - note the entry point is not at the start:

Code: Select all

.swap
    ldy z0,X
    sty z1,X
    sta z0,X
.SortThree
    ldx #1
.cmp
    lda z1,X
    cmp z0,X
    bcc swap
.noswap
    dex
    bpl cmp
    rts
TobyLobster
Posts: 37
Joined: 17 Jun 2021

Re: Mini-challenge: Sort three bytes

Post by TobyLobster »

Ooh I didn't think it could get that small. Very nice.
User avatar
and3rson
Posts: 163
Joined: 17 Feb 2023
Location: Lviv, Ukraine
Contact:

Re: Mini-challenge: Sort three bytes

Post by and3rson »

This thread totally nerd-sniped me. I decided to go for time efficiency and implement a solution similar to Chromatix's idea of a decision-based algorithm that picks one of 8 cases (2 of which are invalid).
I was tempted to do some self-modifying code magic (opcodes of "STY/STA/STX zp" and "LDY/LDA/LDX zp" are dangerously close to each other, and they all differ only with 2 lowest bits! :lol:) - but I didn't have guts to make any use of that...

Here's what I ended up with.
NOTE: This is not tested. I did some very basic sanity checks with sim65 in order to count clock cycles. But this code should give you a general idea.

Code: Select all

; O(1) time sort of three values from zero page
; 48 cycles in most cases (excluding RTS); 33 cycles if input is already sorted
; - Requires 65C02 for JMP (abs,X)
; - Not very memory-efficient
; - Requires extra cycle if `table` crosses page boundary
; - swap2..swap7 can be optimized even further by not using reorder macro (which sometimes does redundant writes)
; - zp0..zp2 can be located anywhere

; ZP addresses of inputs
zp0 = 253
zp1 = 254
zp2 = 255

.macro  reorder aa, bb, cc
        LDA zp0
        ; Y is already loaded with zp1
        LDX zp2
        STA aa
        STY bb
        STX cc
.endmacro

sort:
        LDA #0
        LDY zp0
        CPY zp1
        ROL             ; 1 if zp0 >= zp1
        CPY zp2
        ROL             ; 1 if zp0 >= zp2
        LDY zp1
        CPY zp2
        ROL             ; 1 if zp1 >= zp2
        ASL

        TAX
        JMP (table, x)  ; (jsr, rts)

table:
        .word swap0  ; zp0 <  zp1, zp1 <  zp2, zp0 <  zp2
        .word 0      ; zp0 <  zp1, zp1 <  zp2, zp0 >= zp2 - not possible
        .word swap2  ; zp0 <  zp1, zp1 >= zp2, zp0 <  zp2
        .word swap3  ; zp0 <  zp1, zp1 >= zp2, zp0 >= zp2
        .word swap4  ; zp0 >= zp1, zp1 <  zp2, zp0 <  zp2
        .word swap5  ; zp0 >= zp1, zp1 <  zp2, zp0 >= zp2
        .word 0      ; zp0 >= zp1, zp1 >= zp2, zp0 <  zp2 - not possible
        .word swap7  ; zp0 >= zp1, zp1 >= zp2, zp0 >= zp2

swap0:
        ; Already sorted (-15 cycles)
        RTS

swap2:
        reorder zp0, zp2, zp1
        RTS

swap3:
        reorder zp1, zp2, zp0
        RTS

swap4:
        reorder zp1, zp0, zp2
        RTS

swap5:
        reorder zp2, zp0, zp1
        RTS

swap7:
        reorder zp2, zp1, zp0
        RTS
EDIT: Turns out `reorder` macro can be easily improved:
EDIT 2: This updated macro has a bug. See TobyLobster's next post with a fix.

Code: Select all

.macro  reorder aa, bb, cc
    .if aa != zp0  ; save 6 cycles if already sorted
        LDA zp0
        STA aa
    .endif
    .if bb != zp1  ; save 3 cycles if already sorted
        ; Y is already loaded with zp1
        STY bb
    .endif
    .if cc != zp2  ; save 6 cycles if already sorted
        LDA zp2
        STA cc
    .endif
.endmacro
This saves 6 cycles if `zp0` or `zp2` are already sorted, and 3 cycles if `zp1` is already sorted - up to 15 saved cycles total.

EDIT 3: Here's an updated table & functions with correct equations (thanks TobyLobster!)

Code: Select all

table:
        .word swap0  ; zp0 <  zp1, zp0 <  zp2, zp1 <  zp2
        .word swap1  ; zp0 <  zp1, zp0 <  zp2, zp1 >= zp2
        .word 0      ; zp0 <  zp1, zp0 >= zp2, zp1 <  zp2 - not possible
        .word swap3  ; zp0 <  zp1, zp0 >= zp2, zp1 >= zp2
        .word swap4  ; zp0 >= zp1, zp0 <  zp2, zp1 <  zp2
        .word 0      ; zp0 >= zp1, zp0 <  zp2, zp1 >= zp2 - not possible
        .word swap6  ; zp0 >= zp1, zp0 >= zp2, zp1 <  zp2
        .word swap7  ; zp0 >= zp1, zp0 >= zp2, zp1 >= zp2

swap0:
        reorder zp0, zp1, zp2; no-op
        RTS

swap1:
        reorder zp0, zp2, zp1
        RTS

swap3:
        reorder zp1, zp2, zp0
        RTS

swap4:
        reorder zp1, zp0, zp2
        RTS

swap6:
        reorder zp2, zp0, zp1
        RTS

swap7:
        reorder zp2, zp1, zp0
        RTS
Last edited by and3rson on Tue Oct 10, 2023 9:04 pm, edited 6 times in total.
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD
Post Reply