Optimization competition: An efficient calling convention

Programming the 6502 microprocessor and its relatives in assembly and other languages.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

BigDumbDinosaur wrote:
johnwbyrd wrote:
For that reason, a 65816-only solution, that may or may not use the Z accumulator, is also of interest.
There is no "Z accumulator" in the 65C816. The 65C816's registers are .A, .B, .X, .Y, DB, DP, PB, PC and SP.
D'oh! I was confusing the 65816 and the obscure 65CE02, which apparently had a Z register.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

barrym95838 wrote:
I certainly don't know if I'm misunderstanding or oversimplifying, but my attempt which does what you seem to describe would be:

Code: Select all

; add two 16-bit numbers in zp:
; x and y point to the addends, a points to the sum
; a and x are modified, y is preserved
If I count correctly (big if), 57 cycles between jmp main and brk, one cycle faster than BillG's fastest NMOS attempt.
At first glance, this code seems to me to behave as per the specifications -- let's see if any more cycles can be shaved off by others...
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

BillG wrote:

Code: Select all

        org     0

P3      db      0       ; Pointer to third operand
SumLow  db      0

Num1    dw      1234    ; First number
Num2    dw      5678    ; Second number
Num3    dw      0       ; Sum

;
; (P1) + (P2) -> (P3)
;
Add
        sta     P3

        clc
 
        lda     0,Y
        adc     0,X
        sta     SumLow
 
        lda     1,Y
        adc     1,X
        ldx     P3      ; Point to sum
        sta     1,X
        lda     SumLow
        sta     0,X

        rts

;
Start
        ldy     #Num1   ; Store pointer to first number

        ldx     #Num2   ; Store pointer to second number
 
        lda     #Num3   ; Store pointer to sum

        jsr     Add     ; Add them

        end     Start
It's fascinating that juggling SumLow into a temporary zero page location, and handling the low byte last, seems to produce a faster result.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Optimization competition: An efficient calling conventio

Post by BigDumbDinosaur »

johnwbyrd wrote:
BigDumbDinosaur wrote:
johnwbyrd wrote:
For that reason, a 65816-only solution, that may or may not use the Z accumulator, is also of interest.
There is no "Z accumulator" in the 65C816. The 65C816's registers are .A, .B, .X, .Y, DB, DP, PB, PC and SP.
D'oh! I was confusing the 65816 and the obscure 65CE02, which apparently had a Z register.
The Z register in the 65CE02 has no analog in the 65C816.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

Any more suggestions? The existing entries seem optimal superficially, but I still hold out hope.

Lest that any of you think that this is purely an academic competition, let me quietly tease some screenshots of my current work.

Note the line number 7773, which is my attempt at a l33t spelling of LLVM.
Attachments
Hello, LLVM for 6502 for Apple II
Hello, LLVM for 6502 for Apple II
hello-apple2.png (6.41 KiB) Viewed 1111 times
Hello, LLVM for 6502 for VIC 20
Hello, LLVM for 6502 for VIC 20
Hello, LLVM for 6502 for C64
Hello, LLVM for 6502 for C64
Last edited by johnwbyrd on Sat Jul 18, 2020 12:22 am, edited 3 times in total.
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Optimization competition: An efficient calling conventio

Post by BillG »

Are you working on a virtual machine or compiler backend infrastructure?
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

BillG wrote:
Are you working on a virtual machine or compiler backend infrastructure?
Yes to both. The first is a by-product of the second. Code optimized for speed would be emitted in ordinary 6502 assembly. But code optimized for size would be emitted in an LLVM IR-like bytecode, to be parsed by a runtime language interpreter, and hence this calling convention would apply.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

Going once...
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

Going twice...
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Optimization competition: An efficient calling conventio

Post by johnwbyrd »

The optimization competition is now over!

:idea: :idea: :idea:

I award Top Honors to BillG for embracing the problem so completely, and for eventually finding the fastest solution. :D

However, an important Inspirational Award goes to barrym95838, for finding the creative breakthrough that enabled the fastest solution to exist. 8)

Congratulations everyone, and thank you so much for playing!

:!: :!: :!: :!:
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Optimization competition: An efficient calling conventio

Post by BillG »

I would like to thank the Academy...

Actually, I thank barrym95838 as well for the competitive spirit and out-of-the-box thinking which made the final result possible.
Post Reply