Somehow I missed this when it was posted.
I have a couple of comments.
GARTHWILSON wrote:
Miles J. wrote:
BigEd wrote:
A simple MUL can be dropped in, I think. Division seems to be difficult, in the sense that it's a multi-cycle operation and there's no drop-in hardware for it, so that's less likely unless someone pops up. A division-step instruction might be more likely.
Okay, no problem. For now I will use a little subroutine for division when needed. MUL would be more useful anyway, I guess (e.g. for calculating the pixel address inside a window with arbitrary size).
Don't forget you can use the big math tables at
http://wilsonminesco.com/16bitMathTables/index.html, if you can spare the I/O for them, or load them into RAM. One of the math tables I provide is for inverting, so to divide, you can multiply by the inverse. The input number is 16 bits, and the inverse is 32-- not that you have to use all 32, but it lets you get 16-bit resolution and accuracy across the entire range.
For fast multiplication, you can speed it up with the multiplication table which goes to 255x255, or perhaps better, the table of squares which has 16-bit input and 32-bit output, and consider that:
(a+b)² = a² + b² + 2ab
so if you solve for a*b, the multiplication becomes:
ab = ( (a+b)² - a² - b² ) / 2
meaning it is reduced to an addition, three squarings (from the table), two subtractions, and a right shift.
These two particular tables are unsigned.
The quarter-square multiply (usually sourced from cbmhacking, don't remember which one)
and attributed to George Taylor (who attributed the idea to some one in Australia)
uses AB=((A+B)²-(A-B)²)/4
The really clever bit is that the A-B (as an addition) and the division by 4 are built into
the tables.
The A+B and the A-B are done by indirect indexing.
Once B and -B are set up in zp it's just a 16 bit subtraction (for an 8*8 multiply)
(so if B is constant the overhead of setting up B can be amortized over several multiplications)
For division I've been considering a modified Goldschmidt's algorithm.
roughly:
Code:
r=n*1/n (8 bit reciprocal from a table)
p=(r-1)² (presumably the squaring could be sped up with any
quarter-square mutiply squares table you happen to have handy)
e=2-r
r=e
iterate on
(
e=e*p (e is an error term)
r=r+e
)
end up by scaling r by the 8 bit reciprocal from the table
r=r*1/n = reciprocal n
n*1/n ends up being close to 1
so (r-1)² has leading zeros you don't have to multiply
(8 bits of reciprocal may not be enough though)
e is an increasingly small error term lots of leading zeros there
but it would involve scaling