Search found 26 matches

by repose
Fri Mar 06, 2026 9:14 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

@TobyLobster
You might find this interesting
https://github.com/halifaxgeorge-bot/Emu6502/blob/master/bench/arith/umult8/plot.png

Still need to double-check the cycle counts to see if they are the same as yours. You wanna hear something funny? I wrote it because I couldn't compile yours...
There's ...
by repose
Thu Aug 08, 2024 10:00 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

I just realized a note I had. Since the carries are noted in X, you can eliminate all the CLC and just sbc id,x at the end, where id(x)=x. In this case, I save 14xCLC or 28 cycles but add 4, still saving 24 cycles. These savings apply to two inner columns of 14 adds and more, until a break even ...
by repose
Thu Aug 08, 2024 8:50 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

>Yeah, I would certainly be very interested to see a full 32x32 version.
It will be close. Three 16x16 multiplies in the best case is already 561 cycles, then a lot of adding, say another 160, estimating over 720 cycles.
There is already a 32x32=64 in <750 cycles. It uses the sqr tables but with a ...
by repose
Sun Feb 25, 2024 4:57 am
Forum: Programming
Topic: Old Guy with Special Case Need to Divide by Three
Replies: 43
Views: 20235

Re: Old Guy with Special Case Need to Divide by Three

oh right, sorry...
by repose
Sun Feb 25, 2024 12:51 am
Forum: Programming
Topic: Old Guy with Special Case Need to Divide by Three
Replies: 43
Views: 20235

Re: Old Guy with Special Case Need to Divide by Three

https://www.nesdev.org/wiki/Divide_by_3

Code: Select all

 sta temp
 lsr
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 rts
by repose
Sun Feb 25, 2024 12:44 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

I believe that is the fastest possible 8-bit signed multiply. I would point out a few things though, the total size is wrong. I break it down like this:
;zp 0
;data 2044
;data total 2044
;code 35
;code+data 2079

and the table generation is slightly wrong. lda $x0FF,X where X=$ff can only ...
by repose
Sun Feb 18, 2024 10:02 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

Great! However, I would suggest
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).

I've also done a first pass of a signed 16x16.

The fastest of course, is to use signed magnitude.

As for 2's ...
by repose
Tue Feb 13, 2024 11:29 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

trick:

x1 = p_sqr_lo

umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi

; set multiplier as x0
; *x1 is no longer preserved

To easily save 3 cycles and 1 byte zp.

I could also accept Y0 in Y, but that can place restrictions on the ...
by repose
Tue Feb 13, 2024 7:54 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

I've properly published this now at https://codebase64.org/doku.php?id=base:fastest_multiplication_2023

Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0

;x1y0l = low(x1*y0)
;x1y0h ...
by repose
Mon Feb 12, 2024 6:34 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

It could go faster yet - maybe save another 20 cycles, look forward to trying the idea.
by repose
Mon Feb 12, 2024 6:28 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26482

Re: Comparing 6502 multiply routines

Here you go, turns out I lost it :(

* = $c000

p_sqr_lo = $8b
p_sqr_hi = $8d
p_neg_sqr_lo = $a3
p_neg_sqr_hi = $a5
x0 = $fb
x1 = $fc
y0 = $fd
y1 = $fe
z0 = $8004
z1 = $8005
z2 = $8006
z3 = $8007

umult16:
;unsigned 16x16 mult, 188.1 cycle version
;111 (code) + 2044 (data) = 2155 bytes
;inputs ...
by repose
Wed Feb 07, 2024 6:37 am
Forum: Programming
Topic: Worlds Worst Videocard BadApple Demo. I want more FPS!
Replies: 58
Views: 31090

Re: Worlds Worst Videocard BadApple Demo. I want more FPS!

I just wanted to mention, its not hard to get a desired frame-rate. You have to realize that its not necessary to update the entire set of pixels. Just pick the bigger blocks and cut the ones you don't have time to display. This basic principle is used in all temporal video codecs. We can't really ...
by repose
Wed Feb 07, 2024 6:23 am
Forum: Programming
Topic: How to manage identical execution pathes
Replies: 11
Views: 7122

Re: How to manage identical execution pathes

Hi,
It wasn't mentioned in your original post which CPU, now I realize its 65C02. I'm not familiar with this version. I see there are some 1-cycle commands (though, they merge before an interrupt can occur).

I wouldn't be able to work well on the PS/2 reading without the cycles memorized, nor it is ...
by repose
Sun Jan 28, 2024 5:41 am
Forum: Programming
Topic: How to manage identical execution pathes
Replies: 11
Views: 7122

Re: How to manage identical execution pathes

Another idea, you can avoid branches with a better algorithm, and it seems your PS2 decoding logic is a prime suspect for that. For example, instead of checking if a certain bit is set, you can lookup values in a table which takes a constant time.
by repose
Sun Jan 28, 2024 5:33 am
Forum: Programming
Topic: How to manage identical execution pathes
Replies: 11
Views: 7122

Re: How to manage identical execution pathes

I see. There are no 1-cycle instructions, even illegal ones. However, a trick to balance the branch not taken paths to add an extra cycle, is to either load a value from a table across a page boundary, or load a value from an absolute address rather than zero page, converting a 3 cycle instruction ...