6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 12:01 am

All times are UTC




Post new topic Reply to topic  [ 48 posts ]  Go to page Previous  1, 2, 3, 4
Author Message
PostPosted: Thu Aug 08, 2024 8:50 am 
Offline

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 25
Location: America
>Yeah, I would certainly be very interested to see a full 32x32 version.
It will be close. Three 16x16 multiplies in the best case is already 561 cycles, then a lot of adding, say another 160, estimating over 720 cycles.
There is already a 32x32=64 in <750 cycles. It uses the sqr tables but with a variation on the inner columns where most of the adding of partial products is done in the column. In that case, optimized adding is most important. I believe it comes down to
txa:adc #k3:ldx #0:ldy y3:adc (sqr),y:bcc s1:inx:clc s1...sta z+3.
This looks strange, but remember the sqr method subtracts two table look ups. Some adjustments have to be made, but I add the 2's complement so that I add them all in a row. There's some adjustments which creates constants kn. I have to fudge the sqr tables. There's up to 14 adds in a row in the middle columns. There's a total of 64 adds which are averaging <11 cycles each, so the bulk of the routine is <704 cycles.
I haven't published that one yet.
You could also try the sqr formula with larger inputs, or the Knuth method.
Given x=x1*256+x0, y=y1*256+y0
y1 y0
*x1 x0

Calculation Range
k=(y1-y0)*(x1-x0) +-FE01
l=y1*x1 0-FE01
m=y0*x0 0-FE01
n=l+m-k 0-1FC02
l+m 0-1FC02
l-k or m-k +-FE01

x*y=l*65536+n*256+m (for 8 bit Xn/Yn)

Example (16x16 bits):
y1 y0 ->$20 10
*x1 x0 40 30

k=(20-10)*(40-30)=100
l=20*40=800
m=10*30=300
n=800+300-100=a00

llll0000 -> 08000000
nnnnn00 00a0000
mmmm 0300
----------------
2010*4030 = 080a0300


x*y=l*4294967296+n*65536+m (for 16 bit Xn/Yn)
---
In the situation of multiplying negative numbers via shift and add, indeed you can save time by checking if there's a lot of 1 bits in the multiplier then eor #$ff, and fix up later.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2024 10:00 am 
Offline

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 25
Location: America
I just realized a note I had. Since the carries are noted in X, you can eliminate all the CLC and just sbc id,x at the end, where id(x)=x. In this case, I save 14xCLC or 28 cycles but add 4, still saving 24 cycles. These savings apply to two inner columns of 14 adds and more, until a break even amount of adds is reached in the outer columns.
Put another way, the adds average to about 9.5 cycles each (instead of 11). That makes the bulk of the routine in the 608 cycle range.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2024 10:25 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Nice optimisations!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 48 posts ]  Go to page Previous  1, 2, 3, 4

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: