6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 7:13 pm

All times are UTC




Post new topic Reply to topic  [ 20 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Sat Mar 02, 2019 8:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
spiff wrote:
Stumbled into this topic from a web search on Cordic.

BigEd wrote:
Clock frequency 198.4 KHz
Precision 30 bits, including sign
Time for CORDIC operation 5 ms

granati wrote:
In my implementation for 65C816 (precision: 128 bits) a sin/cos with arbitrary argument can be computed in 70/80 ms at 4MHz clock

Cordic: 992 cycles, 33 cycles/bit
granati: 280K cycles, 2190 cycles/bit

Welcome, spiff! I think it's difficult to compare these two cases, because a 128 bit result demands at least 128 precision in each of the steps, and on an 8/16 bit machine each step is going to take quite a few cycles.

The usual approach is polynomials: rational polynomials (or continued fractions) in the faster implementations, and Chebyshev or some other tweaked Taylor-like series in the slower implementations. A pure Taylor series is, I think, not the wisest tactic - it just happens to be the simplest approach which most people come across first. The derivation of the 'best' coefficients is rather deep, as far as I can tell. We had a bit of a look into this over here.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 02, 2019 9:03 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
The code used a base 10 cordic.
I see one assumes on needs to do a base 2 on a computer but it needs the number of steps gets about another digit. With base 2, you need about 4 steps to equal one BCD step.
I'm working on a calculator project and just got done with BCD CORDIC. This site shows how the HP-35 did it, and some guys on the HP forum showed how to implement CORDIC for sine and cosine without having to derive it from an identity involving atan like the first page shows. I ended up saving a lot of cycles this way since division by 10 is easy with BCD. Shifting one bit to divide by two is slow with BCD since you still have to correct an 8 shifted into the MSB of every nibble to a 5. Also, because you reuse every table value ten times, you end up with a smaller table than division by two.

Quote:
I've been building a working Maneuver Board calculator with a real I4004. I'm in the middle of the board layout right now.
Cool! Please keep us updated.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 18, 2019 11:58 am 
Offline

Joined: Thu Dec 17, 2015 2:52 pm
Posts: 4
Chromatix wrote:
In a BASIC ROM, you need to have subroutines implementing multiply and divide anyway. So from a code size perspective, it costs very little to make use of them. Using a Taylor series (which has a regular structure, so you can loop its generator code) might well end up as smaller code (albeit slower) than implementing CORDIC, and still results in acceptable accuracy if you stick to a reasonable input domain.

True. Code size was probably a more important priority than execution speed back then.

BigEd wrote:
The usual approach is polynomials: rational polynomials (or continued fractions) in the faster implementations, and Chebyshev or some other tweaked Taylor-like series in the slower implementations. A pure Taylor series is, I think, not the wisest tactic - it just happens to be the simplest approach which most people come across first. The derivation of the 'best' coefficients is rather deep, as far as I can tell. We had a bit of a look into this over here.

Thanks for sharing your insights. I know you have spent some time looking at different implementations.

I will try to look further into this and do some experiments of my own...


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 18, 2019 3:36 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 136
Hi!

spiff wrote:

Chromatix wrote:
In a BASIC ROM, you need to have subroutines implementing multiply and divide anyway. So from a code size perspective, it costs very little to make use of them. Using a Taylor series (which has a regular structure, so you can loop its generator code) might well end up as smaller code (albeit slower) than implementing CORDIC, and still results in acceptable accuracy if you stick to a reasonable input domain.

True. Code size was probably a more important priority than execution speed back then.

BigEd wrote:
The usual approach is polynomials: rational polynomials (or continued fractions) in the faster implementations, and Chebyshev or some other tweaked Taylor-like series in the slower implementations. A pure Taylor series is, I think, not the wisest tactic - it just happens to be the simplest approach which most people come across first. The derivation of the 'best' coefficients is rather deep, as far as I can tell. We had a bit of a look into this over here.

Thanks for sharing your insights. I know you have spent some time looking at different implementations.

I will try to look further into this and do some experiments of my own...


For derivation of coefficients of rational/polynomial interpolations, I found that http://sollya.gforge.inria.fr/ is very good. It allows to specify the target floating point format in number of bits, so you can use it to generate polynomials appropriate for your specific implementation.

Using it I derived best polynomials for my SIN / COS implementation in FastBasic, see https://github.com/dmsc/fastbasic/blob/ ... os.asm#L38 . As an optimization of code space, I also fixed the last coefficient to exactly PI/2, this works as I'm calculating SIN( PI/2 * y ). It is very fast as it only uses 6 multiplications and produces relative error < 1.23*10^-8


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 18, 2019 9:21 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
A few years ago I built a hobby robot that was programmed in Forth. My Forth implementation was 16 bit integer only, and I needed trig functions for inverse kinematics. I used brads (binary scaling) to implement pseudo floating point for both CORDIC and lookup tables. CORDIC requires multiple iterations (e.g. 16 or so) to converge to a reasonable approximation, while a lookup is an order one operation. So I ended up using lookup tables with binary search of the lookup tables for inverse trig functions, as it was much much faster.

You only need one quadrant of sine table samples to build all the trig functions, and linear interpolation between samples lets you further compress tables size. I've been meaning to get this code back in shape again for another project as I am using a different version of Forth.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: