TobyLobster wrote:
I have exhaustively tested and compared the performance and memory use of over 60 integer multiplication routines from the internet. (EDIT: Now over 120 routines)
I also explore the algorithms and techniques behind them:
https://github.com/TobyLobster/multiply_testThere was more to say on this subject than I thought when I started, so it was interesting to explore. If there are other good or interesting multiply routines out there let me know!
Hi,
I was pleased to find your comparison, and I must say thrilled to see that I am the world record holder
I have improved my published version; the fastest unsigned 16x16=32 is now 194.1 cycles. I'll publish it on codebase when I can. I've also updated my previous entry. I estimate that using a 128k table (C64 REU) would take 134 cycles.
I could easily make versions to claim the top spot of signed, partial results etc.
I'm also working on x^2 (about 160), 32x32=64 (about 800), and 7x8 (about 30). I'm putting all these into an integrated library.
There's many more ways of multiplying, including trig identities, knuth, karatsuba, nybble, residue, table, tree, jmp, and convolutions.
Could you add integer division and floating point as well?
formula based mults
1 - sin(a)*cos(b)=(sin(a+b)+sin(a-b))/2
2 - cos(a)*cos(b)=(cos(a+b)+cos(a-b))/2
3 - sin(a)*sin(b)=(cos(a-b)-cos(a+b))/2
4 - a*b=exp(log(a)+log(b))
5 - a*b = [(a + b)² - (a - b)²]/4
Knuth mult
k=(a-b)*(c-d)
l=a*c
m=b*d
n=l+m-k
x*y=l*256+n*16+m (for 8 bit a/b/c/d)
Residue uses modulo numbers, which can reduce the multiplies to trade for conversions. Convolution reduces multiplies for a large number of additions. Nybble trades for a large number of tables and additions. Tree selects code fragments for each bit of the multiplier. Preprocessing can speed up any shift and add. Stats can optimize for specific ranges.