6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 10:00 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Jan 02, 2022 11:30 pm 
Offline

Joined: Thu Apr 23, 2020 12:57 pm
Posts: 11
Hi,

I'm looking for a way to shorten (by the size, not the cycles) a procedure that is performing multiplication of 8bit value to 16bit result (where the result address is not overlapping the value address). So far I got it down to 21 ($15) bytes. Any ideas how to shorten it more?

Code:
   org $80

val .byte 121    
result .word 0

   org $2000
   
start:
   jsr mul4
   jsr mul3
   jsr mul2
   jsr mul1
   
   jmp *
   
.proc mul1
   lda val
   
   // nibble swap - 8 bytes
        ASL 
        ADC  #$80
        ROL 
        ASL 
        ADC  #$80
        ROL 
       
        pha
        and #%00001111
        sta result+1
        pla
        and #%11110000
        sta result
   rts
.endp
.print "* MUL1: ", .len mul1   

.proc mul2
   lda val
   
   pha
   :4 lsr
   sta result+1
   pla
   
   clc
   :4 asl
   sta result
          
   rts
.endp
.print "* MUL2: ", .len mul1   
   
.proc mul3
   lda val
   sta result
   
   ldy #4
loop:
   ; ASL16
   LDA result
   ASL
   STA result
   LDA result+1
   ROL
   STA result+1
   ; dec loop
   dey
   bne loop   
   rts
.endp

.print "* MUL3: ", .len mul3

.proc mul4
   lda val
   sta result
   
        lda #$00
        ldx #$08
        clc
m0      bcc m1
        clc
        adc #16
m1      ror
        ror result
        dex
        bpl m0
        ldx result
        sta result+1
   rts
.endp

.print "* MUL4: ", .len mul4
   
   run start   ;Define run address


* MUL1: $0015
* MUL2: $0015
* MUL3: $0014
* MUL4: $0019


Last edited by ilmenit on Mon Jan 03, 2022 12:01 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 02, 2022 11:59 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
Code:
   lda val
   rol a
   rol a
   rol a
   rol a
   tay
   and #15
   sta result
   tya
   rol a
   and #240
   sta result+1
   rts

If everything is in zero page, I think that's 18 bytes. You can of course replace the TAY/TYA with PHA/PLA without changing the size.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:05 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
If I understand the problem correctly, my proposed solution is 17 bytes if your variables are in ZP, and 21 bytes if not:
Code:
mul:                ; result16 = factor8 * 16
    lda #0
    sta result+1
    ldx #4
    lda value
mul2:
    asl
    rol result+1
    dex
    bne mul2
    sta result
    rts
[Edit: One byte smaller and many cycles slower than John's code]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:09 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
The 65C02 has STZ, so if you're not running on old hardware you could take 2 more bytes off that.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:12 am 
Offline

Joined: Thu Apr 23, 2020 12:57 pm
Posts: 11
John West wrote:
If everything is in zero page, I think that's 18 bytes. You can of course replace the TAY/TYA with PHA/PLA without changing the size.

Are you sure the code is correct? With either 4 or 5 ROLs it's giving a wrong result.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:16 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I think John might have accidentally switched his AND #15 and AND #240 instructions.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:17 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
ilmenit wrote:
Are you sure the code is correct? With either 4 or 5 ROLs it's giving a wrong result.

The code I tested was right, but that was a hurried re-write after my first go, and I didn't edit the post correctly. Swap the AND #15 and AND #240 and it should work better.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:18 am 
Offline

Joined: Thu Apr 23, 2020 12:57 pm
Posts: 11
barrym95838 wrote:
If I understand the problem correctly, my proposed solution is 17 bytes if your variables are in ZP, and 21 bytes if not

that's the shortest one, indeed. Great job, thank you!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:20 am 
Offline

Joined: Thu Apr 23, 2020 12:57 pm
Posts: 11
John West wrote:
ilmenit wrote:
Are you sure the code is correct? With either 4 or 5 ROLs it's giving a wrong result.

The code I tested was right, but that was a hurried re-write after my first go, and I didn't edit the post correctly. Swap the AND #15 and AND #240 and it should work better.

Correct, thanks!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 12:43 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
ilmenit wrote:
that's the shortest one, indeed. Great job, thank you!

Hold on there, buddy!
Code:
mul:                ; result16 = factor8 * 16
    lda value
    sta result
    lda #$10
mul2:
    asl result
    rol
    bcc mul2
    sta result+1
    rts

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 5:20 am 
Offline

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357
It is not just a matter of having the shortest code. You have to also ask yourself what do you want to do with the result. If you want to use the result later, then the zero-page locations are better, but you have to reload them into a register down the road. If you want the result right away, then this one might be better as you can get the result immediately

This one doesn't use any zero-page locations but uses all the registers.

Code:
ldx #value
ldy #$10
]lp txa
    asl
    tax
    tya
    rol
    tay
bcc ]lp
    rts


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 5:34 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
IamRob wrote:
It is not just a matter of having the shortest code.
(he says as he posts the shortest code yet :) )
Quote:
You have to also ask yourself what do you want to do with the result.
Excellent point.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 7:51 pm 
Offline

Joined: Thu Apr 23, 2020 12:57 pm
Posts: 11
barrym95838 wrote:
ilmenit wrote:
that's the shortest one, indeed. Great job, thank you!

Hold on there, buddy!
Code:
mul:                ; result16 = factor8 * 16
    lda value
    sta result
    lda #$10
mul2:
    asl result
    rol
    bcc mul2
    sta result+1
    rts

Wow, you are a 6502 wizard! :-)
As the others pointed, in such sizecoding everything depends on what you do next with the code and which regs, flags or memory need to be preserved.

Btw, if you anyone is interested in size optimizations there is an ongoing competition in different categories at https://lovebyte.party/ The last year competition was full of great entries on different CPUs.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: