@TobyLobster
You might find this interesting
https://github.com/halifaxgeorge-bot/Emu6502/blob/master/bench/arith/umult8/plot.png
Still need to double-check the cycle counts to see if they are the same as yours. You wanna hear something funny? I wrote it because I couldn't compile yours...
There's ...
Search found 26 matches
- Fri Mar 06, 2026 9:14 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
- Thu Aug 08, 2024 10:00 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
I just realized a note I had. Since the carries are noted in X, you can eliminate all the CLC and just sbc id,x at the end, where id(x)=x. In this case, I save 14xCLC or 28 cycles but add 4, still saving 24 cycles. These savings apply to two inner columns of 14 adds and more, until a break even ...
- Thu Aug 08, 2024 8:50 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
>Yeah, I would certainly be very interested to see a full 32x32 version.
It will be close. Three 16x16 multiplies in the best case is already 561 cycles, then a lot of adding, say another 160, estimating over 720 cycles.
There is already a 32x32=64 in <750 cycles. It uses the sqr tables but with a ...
It will be close. Three 16x16 multiplies in the best case is already 561 cycles, then a lot of adding, say another 160, estimating over 720 cycles.
There is already a 32x32=64 in <750 cycles. It uses the sqr tables but with a ...
- Sun Feb 25, 2024 4:57 am
- Forum: Programming
- Topic: Old Guy with Special Case Need to Divide by Three
- Replies: 43
- Views: 20235
Re: Old Guy with Special Case Need to Divide by Three
oh right, sorry...
- Sun Feb 25, 2024 12:51 am
- Forum: Programming
- Topic: Old Guy with Special Case Need to Divide by Three
- Replies: 43
- Views: 20235
Re: Old Guy with Special Case Need to Divide by Three
https://www.nesdev.org/wiki/Divide_by_3
Code: Select all
sta temp
lsr
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
rts- Sun Feb 25, 2024 12:44 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
I believe that is the fastest possible 8-bit signed multiply. I would point out a few things though, the total size is wrong. I break it down like this:
;zp 0
;data 2044
;data total 2044
;code 35
;code+data 2079
and the table generation is slightly wrong. lda $x0FF,X where X=$ff can only ...
;zp 0
;data 2044
;data total 2044
;code 35
;code+data 2079
and the table generation is slightly wrong. lda $x0FF,X where X=$ff can only ...
- Sun Feb 18, 2024 10:02 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
Great! However, I would suggest
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).
I've also done a first pass of a signed 16x16.
The fastest of course, is to use signed magnitude.
As for 2's ...
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).
I've also done a first pass of a signed 16x16.
The fastest of course, is to use signed magnitude.
As for 2's ...
- Tue Feb 13, 2024 11:29 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
trick:
x1 = p_sqr_lo
umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi
; set multiplier as x0
; *x1 is no longer preserved
To easily save 3 cycles and 1 byte zp.
I could also accept Y0 in Y, but that can place restrictions on the ...
x1 = p_sqr_lo
umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi
; set multiplier as x0
; *x1 is no longer preserved
To easily save 3 cycles and 1 byte zp.
I could also accept Y0 in Y, but that can place restrictions on the ...
- Tue Feb 13, 2024 7:54 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
I've properly published this now at https://codebase64.org/doku.php?id=base:fastest_multiplication_2023
Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0
;x1y0l = low(x1*y0)
;x1y0h ...
Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0
;x1y0l = low(x1*y0)
;x1y0h ...
- Mon Feb 12, 2024 6:34 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
It could go faster yet - maybe save another 20 cycles, look forward to trying the idea.
- Mon Feb 12, 2024 6:28 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26482
Re: Comparing 6502 multiply routines
Here you go, turns out I lost it :(
* = $c000
p_sqr_lo = $8b
p_sqr_hi = $8d
p_neg_sqr_lo = $a3
p_neg_sqr_hi = $a5
x0 = $fb
x1 = $fc
y0 = $fd
y1 = $fe
z0 = $8004
z1 = $8005
z2 = $8006
z3 = $8007
umult16:
;unsigned 16x16 mult, 188.1 cycle version
;111 (code) + 2044 (data) = 2155 bytes
;inputs ...
* = $c000
p_sqr_lo = $8b
p_sqr_hi = $8d
p_neg_sqr_lo = $a3
p_neg_sqr_hi = $a5
x0 = $fb
x1 = $fc
y0 = $fd
y1 = $fe
z0 = $8004
z1 = $8005
z2 = $8006
z3 = $8007
umult16:
;unsigned 16x16 mult, 188.1 cycle version
;111 (code) + 2044 (data) = 2155 bytes
;inputs ...
- Wed Feb 07, 2024 6:37 am
- Forum: Programming
- Topic: Worlds Worst Videocard BadApple Demo. I want more FPS!
- Replies: 58
- Views: 31090
Re: Worlds Worst Videocard BadApple Demo. I want more FPS!
I just wanted to mention, its not hard to get a desired frame-rate. You have to realize that its not necessary to update the entire set of pixels. Just pick the bigger blocks and cut the ones you don't have time to display. This basic principle is used in all temporal video codecs. We can't really ...
- Wed Feb 07, 2024 6:23 am
- Forum: Programming
- Topic: How to manage identical execution pathes
- Replies: 11
- Views: 7122
Re: How to manage identical execution pathes
Hi,
It wasn't mentioned in your original post which CPU, now I realize its 65C02. I'm not familiar with this version. I see there are some 1-cycle commands (though, they merge before an interrupt can occur).
I wouldn't be able to work well on the PS/2 reading without the cycles memorized, nor it is ...
It wasn't mentioned in your original post which CPU, now I realize its 65C02. I'm not familiar with this version. I see there are some 1-cycle commands (though, they merge before an interrupt can occur).
I wouldn't be able to work well on the PS/2 reading without the cycles memorized, nor it is ...
- Sun Jan 28, 2024 5:41 am
- Forum: Programming
- Topic: How to manage identical execution pathes
- Replies: 11
- Views: 7122
Re: How to manage identical execution pathes
Another idea, you can avoid branches with a better algorithm, and it seems your PS2 decoding logic is a prime suspect for that. For example, instead of checking if a certain bit is set, you can lookup values in a table which takes a constant time.
- Sun Jan 28, 2024 5:33 am
- Forum: Programming
- Topic: How to manage identical execution pathes
- Replies: 11
- Views: 7122
Re: How to manage identical execution pathes
I see. There are no 1-cycle instructions, even illegal ones. However, a trick to balance the branch not taken paths to add an extra cycle, is to either load a value from a table across a page boundary, or load a value from an absolute address rather than zero page, converting a 3 cycle instruction ...