... I see the 6309 has a divide instruction, and that it takes a minimum of 25 clocks, so it takes more clocks to do just a divide than the '816 with tables takes to look up a trig or log function as I showed.
(It's worth noting that there are many ways to calculate log and trig, Taylor series being only one and not especially efficient. CORDIC, I think, doesn't use division. So, best not to assume that division is the limiting factor: it's slower than multiplication in every case I've seen, and so the people who design the algorithms for other calculations don't overuse it.)
I really would like to understand CORDIC, but in all my searching, none of the explanations I've found for CORDIC have been clear. I asked a friend who's a math teacher with a bachelor's degree in math, but he couldn't help me either. What I do find is that it still takes a step, with a small table lookup, for each bit or digit as you advance toward an answer; IOW, it won't be nearly as fast as a table lookup. One source said that the 8087 through 80486 used CORDIC, but that the Pentium went back to polynomial evaluation, infamous bug in the first iteration notwithstanding.
For a sufficiently large table, a table will always win, I think. Your own tables might be at the limit of convenience, and they are of course excellent if they fit what's needed, but they use a lot of memory and deliver a limited precision. Any application which needs more precision will need a different approach.
It seems that CORDIC can be thought of as a binary search. So, the tables needed to support CORDIC are very small - about the size of the number of bits you want in the result, not the size of the number of values you expect to compute. The number of operations needed is about the size of the number of bits, and the operations are just shifts and adds, which on our favourite platform are much cheaper than the multiplications needed by polynomial methods.
There's an HP Journal issue which explains the pseudo-division used in the early HP calculators. It's not CORDIC. Oh, but from 1968, we see that the HP-9100 did use CORDIC, if I understand correctly: http://www.hpl.hp.com/hpjournal/pdfs/Is ... df#page=15
I recommend also Ken Shirriff's writeup of the algorithms used in Sinclair's Scientific - it wasn't accurate, but for other reasons: http://righto.com/sinclair
I'm not sure why the Basics I'm aware of use polynomials rather than CORDIC or similar. Is there a good reason?
(Once you have fast multiplication, CORDIC doesn't look so good, which I think is why it isn't used in modern (huge) microprocessors.)
The Wikipedia entry for CORDIC has a fairly clear explanation of how it works under "Modes of Operation". The idea is to start with the point (1,0) and rotate it to (cos(a), sin(a)). Doing that in one step requires knowing what sin(a) and cos(a) are, so it's done in multiple smaller steps.
A single step is done by replacing (x,y) with (x*cos(a)-y*sin(a), x*sin(a)+y*cos(a)). That's just a rotation by angle a around the origin. If we have a table of sin and cos for our chosen angles, and a fast multiplier, we can stop there. By rotating clockwise and anticlockwise by ever-decreasing angles, we will eventually get close enough to the target angle, and we'll have (approximately) its sin and cos.
But CORDIC was invented in the days when digital multipliers were not fast, so we don't stop there. At each step, we divide by cos(2^-n), replacing (x,y) with (x - y*tan(a), x*tan(a) + y), and remember to multiply by 1/cos(2^-n) at the end. Now, cos(-a) = cos(a), so we don't have to worry about the direction of the step when calculating the correction. The corrections will always be the same, so we can calculate their product in advance and do all of them at once. And because multiplication commutes, you can do the correction at the start instead of at the end.
So instead of starting with (1,0), we start with (c,0) where c is the product of cos(2^a) for each of the steps we're going to do. The next improvement is to choose the step angles so that tan(a) is a power of 2, so we can replace the multiplications x*tan(a) and y*tan(a) with shifts.
If you've got lots of memory, a lookup table will be faster. But that does need a lot of memory if you want high accuracy, doubling with each bit of the input. CORDIC's advantage is that it works with a small table of angles and no multiply. It's designed for low hardware resources. There are still places where it makes sense, but not as many as there were 40 or 50 years ago.
If you've got fast multiplies, polynomial-based methods will be better. And while division is generally slower than multiplication, these days it's not that slow. The last time I looked, people were using rational polynomial methods - but that was about 15 years ago.
When I needed fast sin and cos on the PS2 (with fast multiplies), I went with a table-based range reduction step that reduced the angle to only a few degrees, where you can use a much shorter polynomial. It was accurate, fast, and used very little memory.
Table 1. Characteristics of the CORDIC II, model B computer.
Clock frequency 198.4 KHz
Number representation Binary, two’s complement
Precision 30 bits, including sign
Computation Bit-serial
Time for addition 156.25 µs Time for multiplication or division 5 ms Time for CORDIC operation 5 ms
Performance 6400 additions per second, 200 CORDIC operations, multiplications, or divisions per second.
What jumps out is that the log and trig operations come out as the same cost as multiplication or division - that's a huge step forward.
really i no understund this interest for CORDIC algorithm, even for little cpu like 6502 and family. If one look to serious math library can see that some functions are computed using newton method (square root, cubic root), many others using rational approximation (not polynomial, because convergence radius cannot be too small even with good argument reduction), some times mixed with lookup tables.
And rational approximation can be used with any function, elliptic functions too, with arbitrary precision, and in indipendent way about implementation of basic floating point operations (look at Remes algorithm for coefficient computation).
In my implementation for 65C816 (precision: 128 bits) a sin/cos with arbitrary argument can be computed in 70/80 ms at 4MHz clock with 34 significative digits (absolute relative error < 1e-34).
This code can be used for any precision (32 bits, 64 bits, 80 bits), just need to fix the 4 basic operations, the binary exponent, the mantissa rounding.
I applaud your diligence in writing such a library, Marco!
But it still seems to me that CORDIC must have some advantage - it was, after all, invented and used by people who knew about Taylor series and Newton's method.
Hi BigEd,
my opinion is : cordic algorithm was good before the born of 8 bit microprocessor, when was easy to implement in hardware BCD shifter and adder. But even with an 8 bit micro we have binary shifter/adder so is easy implement fixed point mult/div and we need just of this for implement a basic floating point unit. Is true, long time ago we know good rational approximation of any function with assigned error, but before a true microprocessor not easy to implement in software, so pocket calculator was based on cordic.
Pocket calculaturs, in the past, was based in BCD beacuse the good approximation in financial applications and beacuse is easy to display decimal digits, and at the end a full BCD adder is easy to implement (shifter too: 4 bit shift at time).
The onerous part in binary numbers is the conversion in decimal (scaling by power of ten) but today is easy with any little microprocessor: i think cordic is good for implement algorithm in hardware, while rational approximation is best for implement algorithm in software.
Best math library of arbitrary precision use just binary shift/add/2's complement integer operations (even when is available an fpu hardware and/or integer mult/div in cpu) for implement floating point basic operations and rational functions for approximation.
Thanks for the additional links. Although I've seen the explanations at Wikipedia for example, I'm sure I've been hurrying through them too much, always feeling like I have too much to do to really spend the time. Although I could just use it as a black box and drop the numbers in the chute and turn the crank, I want to understand the insides, and I probably need to just turn off everything and lie on the floor with a quadrille pad and pencil and a big eraser and draw circles and triangles and so on and derive it for myself—unless granati or anyone else convinces us that there's really no point (which he very well might).
Memory is cheap today for the 16-bit look-up tables, but as someone said, the price goes up quickly for 24-bit or higher. My 2MB of mostly 16-bit tables turns into half a gigabyte for 24-bit, which in DRAM is still just a few bucks today but it's not as simple or fast to read as SRAM is. One possibility to get more bits is to have 65,536 cells of 24 bits each and then go back to interpolation between those to get more than 16 bits' precision. It wouldn't be as accurate as having a pre-computed answer for each of 16,777,216 (ie, 2^24) inputs, but it should give perhaps six more bits, depending on the function and what part of the range you're on. To improve the speed of the interpolation, the slope between each pair of points could also be pre-computed and stored in an accompanying table. These are just musings though. I don't plan to go to that extent.
I found many years ago that certain functions can be interpolated surprisingly accurately with a small table. Base-2 log (which can then be easily converted into any other base) only needs a 9-entry (ie, 2^3+1) table to get a maximum error that's like having the input off by 0.2%, and IIRC, the error is cut in four each time you double the size of the table after that. The 0.2% is way more than accurate enough for decibel calculations for audio work.
ATAN for up to 45° can be interpolated from a 17-entry (ie, 2^4+1) table, getting a maximum error of about .02°. The 16-bit table entries are slightly tweaked which sacrifices a little accuracy on them to get it back between them, reducing the maximum error. Worst-case resolution (which is near 0) is about .007°.
SIN & COS can be calculated to 16 bits' accuracy using the first four terms of the infinite series, with coefficients tweaked a bit to compensate for the fact that you're using only four. Doing this and then doing a lot of spot-checking, what I have found is that they're usually right on, and when there's an error, it's never more than 1 LSB. I used a series of multiplications and divisions for this, but I think I could eliminate the divisions. The way to do that is that you multiply a 16-bit number by a scaled inverse (which you can do if you're dividing by a constant so you can calculate that inverse at compile time) to get a 32-bit output, and then throw away the low 16 bits of the result, keeping the high 16 as your answer (or make it 24 and 48 bits, or whatever size you need).
Garth makes a good point. If you have a specific application and you know just how accurate your results need to be, you can custom-tailor something that is far more economical than a general-purpose solution.
Mike B.
Last edited by barrym95838 on Thu Dec 19, 2019 1:42 am, edited 1 time in total.
Absolutely! I just checked the source of Elite - the 3d space-trading game - and there's a 32-element lookup table to help with arctan, and another for sine. Because 8 bits is enough in that case, and both time and space are tight. Audio, on the other hand, is one of the more demanding applications.
But of course it's general-purpose calculations which tend to be done at 24 bits or better precision, where tables alone become unwieldy. As Garth says, you can interpolate - and that doesn't have to be linear - and indeed you don't necessarily need evenly spaced elements in the table.
My personal interests lead me to algorithms rather than tables.
It seems that CORDIC can be thought of as a binary search. So, the tables needed to support CORDIC are very small - about the size of the number of bits you want in the result, not the size of the number of values you expect to compute. The number of operations needed is about the size of the number of bits, and the operations are just shifts and adds, which on our favourite platform are much cheaper than the multiplications needed by polynomial methods.
There's an HP Journal issue which explains the pseudo-division used in the early HP calculators. It's not CORDIC. Oh, but from 1968, we see that the HP-9100 did use CORDIC, if I understand correctly: http://www.hpl.hp.com/hpjournal/pdfs/Is ... df#page=15
I recommend also Ken Shirriff's writeup of the algorithms used in Sinclair's Scientific - it wasn't accurate, but for other reasons: http://righto.com/sinclair
I'm not sure why the Basics I'm aware of use polynomials rather than CORDIC or similar. Is there a good reason?
(Once you have fast multiplication, CORDIC doesn't look so good, which I think is why it isn't used in modern (huge) microprocessors.)
CORDIC is pseudo division
It may be a binary search but not necessarily
but it is successive approximation
The ideas explained here can be extended to implement other elementary functions such as sin(x) or arctan(x); the resulting algorithms are similar to CORDIC methods, descriptions of which can be found in many places.
The drawback I can see is that standard Cordic is not floating point but fixed point arithmetics. I'm interested in seeing more of a 8-bit floating point implementation of Cordic.
In the 80's, would Cordic used more space in a Basic ROM?
I'd posted here a while back. Since then I've taken on a related project. It isn't for the 6502 but still interesting.
In my searches for code that ran on the I4004, I came on some code written by one of Gary Kildall's students, while at the NPS in Monterey, Ca. Finding any code related to the I4004 has been really hard. It was what is called a Maneuver Board. These are used for ships to determine "Closest Point of Approach". This is, of course, important if you don't want to run into some other ship or such. It involves determining sin() and cos() if relative positions if things keep going at the current vectors.
The code used a base 10 cordic.
I see one assumes on needs to do a base 2 on a computer but it needs the number of steps gets about another digit. With base 2, you need about 4 steps to equal one BCD step. With a computer, it might make more sense to do something like base 2^8 instead. I believe it increases the table lookup but that doesn't seem to be as much of a problem anymore.
Back to the I4004 project. I spent a couple months debugging the code. It was a PDF of what looked like code typed on a ASR33. It had poor drum registration, such that the number 0 and the letter C often looked the same. Other characters missing the right 1/4 were not as much an issue. P and F were tough but in context I was able to figure these out. P isn't a Hex number.
I've been building a working Maneuver Board calculator with a real I4004. I'm in the middle of the board layout right now.
Dwight
The big feature of CORDIC is that it can be efficiently implemented using a table lookup and adders. This means that on low-cost hardware lacking hardware multiply or divide, you can do trigonometry roughly as fast as multiply or divide operations, which however will be slow due to the lack of dedicated hardware. I note that hardware multiply was uncommon in microprocessors until at least the late 1980s; some CPUs (eg. 6809) had instructions to do a multiply or divide faster than a software loop, but still did it iteratively using an adder and shifter.
In a BASIC ROM, you need to have subroutines implementing multiply and divide anyway. So from a code size perspective, it costs very little to make use of them. Using a Taylor series (which has a regular structure, so you can loop its generator code) might well end up as smaller code (albeit slower) than implementing CORDIC, and still results in acceptable accuracy if you stick to a reasonable input domain.