6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 11:41 pm

All times are UTC




Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Oct 01, 2012 11:02 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I have posted a video on youtube where I demo my 10MHz accelerator card (using the 65816).
I had planned to show this demo on the Classic Computing exhibition last weekend, but due to time constraints could not finish it in time.

http://www.youtube.com/watch?v=ar45QJisxSg&feature=plcp

The accelerator card is not really new, the info is already on my webpage for some time, only the demo is new...
Here is the web page: http://www.6502.org/users/andre/adv65/pet816/index.html

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 01, 2012 11:37 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
That's pretty cool. More impressive since it's running in Basic rather than assembler or Forth.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 12:13 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
That is cool. Math and graphics, I like it. 6502, or 65Orgxx controlled (I want to grow old with my mind active in these areas)... Black and white to keep it simple at first. I imagine something similar in a video project we are now tackling, sped up not only by frequency, but also by parallel computing.

Andre, you are one my inspirations here, if I may be so bold.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 12:46 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Quote:
I imagine something similar in a video project we are now tackling, sped up not only by frequency, but also by parallel computing.

and with the big look-up tables for trig functions hundreds of times as fast.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 1:17 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
GARTHWILSON wrote:
Quote:
I imagine something similar in a video project we are now tackling, sped up not only by frequency, but also by parallel computing.

and with the big look-up tables for trig functions hundreds of times as fast.

Exactly! Your hard work will not go wasted Garth, plus I will need some pointers, on how to implement your table within a 65Orgxx core, when the time comes. Hopefully sooner than later now.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 2:37 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Quote:
plus I will need some pointers, on how to implement your table within a 65Orgxx core, when the time comes.

I think all of it except the smaller bit-reversal tables for FFTs can probably be used as-is on a 16-bit bus too. If you do it with EPROMs, the programmer software could be set to separate high byte from low byte of each cell to put in two separate 8-bit EPROMs that get read simultaneously to provide the whole 16-bit cell in a single read. For those smaller bit-reversal tables for FFTs, you would just have to AND-out the byte you don't want and in every other case shift the desired byte into the 8 lowest bits of a register. I anticipate that the bit-reversal tables will get very little use compared to the other ones though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 5:04 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
Exactly! Your hard work will not go wasted Garth, plus I will need some pointers, on how to implement your table within a 65Orgxx core, when the time comes. Hopefully sooner than later now.

Even faster would be to build some math hardware directly in the FPGA.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 5:18 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
That would take an awful lot of logic, since it takes a lot of multiplications and divisions to get a single trig or log function.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 5:26 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Not necessarily. For sine/cosine functions, you could use CORDIC algorithm, for instance. You could also use other tricks, geared towards the hardware. Multiplication is not a problem, since there are dedicated 18 bit hardware multipliers.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 6:02 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8513
Location: Midwestern USA
fachat wrote:
I have posted a video on youtube where I demo my 10MHz accelerator card (using the 65816).

I must say it motors right along. Imagine trying to do that without the '816 card. Good job, André.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 6:44 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Arlet wrote:
Not necessarily. For sine/cosine functions, you could use CORDIC algorithm, for instance. You could also use other tricks, geared towards the hardware. Multiplication is not a problem, since there are dedicated 18-bit hardware multipliers.

I need to learn CORDIC, but the example in Figure 7 is a section of one that runs at 52MSPS in a Xilinx XC4013E-2, which does not seem as fast as you could probably go with tables in RAM with similar hardware. It's 14-bit whereas my tables are 16-bit, so there's not much difference there. It says it's 5-iteration. How would its accuracy compare? The tables are accurate to all 16 bits.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 3:58 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Figure 3 has a 16 bit serial implementation that calculates 16 bits in 8 iterations. Each iteration is one clock cycle. Note that the 4000 series FPGA is ancient stuff. Modern FPGAs have more resources, and run a lot faster. But even assuming an easy 100 MHz clock, that means you can calculate 16 bits sine or cosine in 80 ns. The fastest table look up is 23 instruction cycles, but you'll have to add a few cycles to wait for the slow external memory.

I don't know about the precision, but it would be easy to do the calculations with a few extra bits, and round off at the end.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 4:49 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Quote:
that means you can calculate 16 bits sine or cosine in 80 ns. The fastest table look up is 23 instruction cycles, but you'll have to add a few cycles to wait for the slow external memory.

The 6 instructions' 23 clocks is for a 65816 with 8-bit bus. If you make a processor with a 16-bit bus in a fast FPGA and put the tables in RAM, the limitation will be the 8ns (or whatever speed) RAM, and there won't be any shifting or bank-number increment. So it could be done in a single instruction if such an instruction is implemented.

Edit: Without adding another post, I will add that tables can be for any function, and the ones I have supplied are not just trig but also log and antilog in three scales, squares, square root, and inverses (since division is such a huge job, and you can just multiply by the inverse to speed it up), and the bit-reversal tables for FFTs. Other tables could be formed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 02, 2012 5:57 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes, if you dedicate a large chunk of fast SRAM, and add a special instruction, you could save a few cycles. Given that in typical cases, you'd use a lot more cycles actually doing something useful with the results, it would be extremely hard to justify wasting that expensive fast SRAM.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 03, 2012 8:58 pm 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
There are other ways to generate sines.
Especially if you have fast mutiplication.

I'm Not sure how applicable they are here.
When I've looked at the various sine algorithms
I've usually been looking at them with the
idea to leverage a table of squares.
That is, assuming you have a table of squares
for a fast quarter-square mutiplication routine,
what else can you do with it?

I had formed the impression that a sine table
with quadratic interpolation is generally faster
and less resource intensive than CORDIC
(but I never really compared the two myself)

You can do a quadratic approximation with
table look ups and simple addition similar to
the simple square root.
You could probably fit that into piplined bit serial adders
I think the tables could be smaller but the adders
would probably have to be wider (it's been a while
but I think I figured 24 bit addition to get
16 bit accuracy)
(When I was doing it I was shooting for 14 bit accuracy
for 256 values over 90 degrees for a table generator)

Or you can apply your multipliers to the usual
quadratic expression.

There's also series expansion:

Sin(z) = z - (z^3/3!) + (z^5/5!) - (z^7/7!) ...

I think you need about four terms to get 16 bits
(actually more like 20 bits)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 78 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron