drogon's algo describes a 32x32=32 multiply, while the OP is looking for a 32x32=64 one. So you need to compute all the multiplications:
first round W*PQRS
second round V*PQRS
third round U*PQRS
fourth round T*PQRS
and add them shifted correctly
so you just split the input numbers into "x" sized chunks, multiply each part of the first number with each part of the second number, and then add them together somehow.
but could you be more specific about how you shift them "correctly" before adding them to the final result? does the amount you shift depend on the width of the MUL operation or the input values?
from what i can see in drogon's code, each round adds (without carry) to the final result and also each round gets shifted by one additional byte in the result. so each addition to the result overlaps by 1 byte...
except the last operation writes outside of the regW vairable by 1 byte (16 bit STA to regW+3 writes to regW+3 and regW+4), is that intentional or am i mis-reading the code?
regW here is 64-bits wide - that code is "optimised" for a 32x32 multiply to give a 32-bit result, so it's truncating the last (overflow/carry) byte.
The way it works is the same as my school-days long multiplication, so there it was Hundreds, Tens and Units, Here it's er. bytes. (as I have a "hardware" 8x8 lookup table multiplier)
Code: Select all
; 32-32 multiply of
; P Q R S
; x T U V W
PQRS and TUVW are 32-bit hex numbers - with each 'digit' being $00 through $FF, so it might be: $12345678 where P = $12, Q = $34, R = $56 and S = $78. (ie. each letter is a byte)
Here I start with 'regW' being all zeros. Multiply P Q R S with W and put the result into regW, so
regW[0] = W * S
regW[1] = W * R + carry
regW[2] = W * Q + carry
regW[3] = W * P + carry
The next line, er move up a place, like putting a zero down under the S,W column, here it's a byte to the left.
regW[1] = regW[1] + V * S
regW[2] = regW[2] + V * R + carry
regW[3] = regW[3] + V * Q + carry
At this point, in that code, I abandon the final step as it doesn't contribute to a 32-bit result, but if I needed the 64-bit result, I'd do the last stage:
regW[4] = regW[4] + V * P + carry (noting that the add would be redundant, but I use a macro and I zero'd regW to start with)
And so on.
This:
https://www.mathsisfun.com/numbers/mult ... -long.html has a sort of animated version of decimal long multiplication, rather then decimal digits 0-9 we're using hex numbers 0-255.
I think your diagram is on it's side.
-Gordon