6502.org • View topic - Comparing 6502 multiply routines

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Comparing 6502 multiply routines

Page 4 of 4

[ 48 posts ]

Go to page Previous 1, 2, 3, 4

Previous topic | Next topic

Author

Message

repose

Post subject: Re: Comparing 6502 multiply routines

Posted: Thu Aug 08, 2024 8:50 am

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 25
Location: America

>Yeah, I would certainly be very interested to see a full 32x32 version.
It will be close. Three 16x16 multiplies in the best case is already 561 cycles, then a lot of adding, say another 160, estimating over 720 cycles.
There is already a 32x32=64 in <750 cycles. It uses the sqr tables but with a variation on the inner columns where most of the adding of partial products is done in the column. In that case, optimized adding is most important. I believe it comes down to
txa:adc #k3:ldx #0:ldy y3:adc (sqr),y:bcc s1:inx:clc s1...sta z+3.
This looks strange, but remember the sqr method subtracts two table look ups. Some adjustments have to be made, but I add the 2's complement so that I add them all in a row. There's some adjustments which creates constants kn. I have to fudge the sqr tables. There's up to 14 adds in a row in the middle columns. There's a total of 64 adds which are averaging <11 cycles each, so the bulk of the routine is <704 cycles.
I haven't published that one yet.
You could also try the sqr formula with larger inputs, or the Knuth method.
Given x=x1*256+x0, y=y1*256+y0
y1 y0
*x1 x0

Calculation Range
k=(y1-y0)*(x1-x0) +-FE01
l=y1*x1 0-FE01
m=y0*x0 0-FE01
n=l+m-k 0-1FC02
l+m 0-1FC02
l-k or m-k +-FE01

x*y=l*65536+n*256+m (for 8 bit Xn/Yn)

Example (16x16 bits):
y1 y0 ->$20 10
*x1 x0 40 30

k=(20-10)*(40-30)=100
l=20*40=800
m=10*30=300
n=800+300-100=a00

llll0000 -> 08000000
nnnnn00 00a0000
mmmm 0300
----------------
2010*4030 = 080a0300

x*y=l*4294967296+n*65536+m (for 16 bit Xn/Yn)
---
In the situation of multiplying negative numbers via shift and add, indeed you can save time by checking if there's a lot of 1 bits in the multiplier then eor #$ff, and fix up later.

Top

repose

Post subject: Re: Comparing 6502 multiply routines

Posted: Thu Aug 08, 2024 10:00 am

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 25
Location: America

I just realized a note I had. Since the carries are noted in X, you can eliminate all the CLC and just sbc id,x at the end, where id(x)=x. In this case, I save 14xCLC or 28 cycles but add 4, still saving 24 cycles. These savings apply to two inner columns of 14 adds and more, until a break even amount of adds is reached in the outer columns.
Put another way, the adds average to about 9.5 cycles each (instead of 11). That makes the bulk of the routine in the 608 cycle range.

Top

BigEd

Post subject: Re: Comparing 6502 multiply routines

Posted: Thu Aug 08, 2024 10:25 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England

Nice optimisations!

Top

Page 4 of 4

[ 48 posts ]

Go to page Previous 1, 2, 3, 4

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 3 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum