6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 2:02 am

All times are UTC




Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Apr 20, 2013 4:23 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I recently posted a note on the progression of speeds of various 6502 implementations, telling a story which runs from the 1MHz original NMOS chips to today's FPGA cores which promise 100MHz operation.

I got myself a bit confused in the process, and realise that the topic of the fastest FPGA cores needs a bit more space and some more facts.

As far as I'm aware, we have two cores which promise speeds over say 50MHz. That is, Arlet's core (and the 65Org16 derivatives) and Michael's core (a 65C02 workalike.) If anyone knows of additional cores running over 50MHz, please let me know and I'll edit this head post.

There are two immediate caveats: firstly, that the different FPGA chips have different intrinsic speed. We've seen that a Spartan 6 might reach speeds about double that of a Spartan 3, for example. Secondly, that single-cycle memory access is limited to on-chip memory, and the cheapest and non-BGA FPGAs don't usually have as much as 64kbytes of block RAM - so off-chip memory is likely to be necessary for many projects.

I don't think we've yet seen a cache on FPGA, so those designs which use off-chip RAM are probably going to hit multi-cycle access times for all RAM access. In principle we can easily use on-chip RAM for zero-page and stack, but I don't think anyone here has yet done that in a design which has off-chip RAM too. (Please correct me if I'm wrong) [But note, Michael does report that additional multiplexing and decode for mixed on and off chip RAM cost a little speed - so something has been tried.]

As I understand it, Michael's core originally had a target clock speed of 100MHz (and offers some cycle-count reductions.) But notably, the core runs at this speed only when using distributed RAM - usually there's less of that than the block RAM, for which the core needs to run with a wait state, effectively halving the performance. However, all the builds and timing analysis reported for this core are for the Spartan 3 (the higher speed grade, -5) - we could expect greater things from Spartan 6. The github repo presently focusses on the problem of an FPGA using external memory, not on the fastest core in a self-contained system, and is still a work in progress. More recent releases seem to be slower implementations than the earlier ones - I think the latest reports are showing a core runs at 64MHz with a 4-cycle memory access making it an effective 16MHz, allowing the use of relatively slow external memory. I'm not aware of any reports on the forum of this core running at speed or in a system - will update if any come to light.

On the other hand, Arlet's core, at least in the 65Org16 flavour, is reported as running at 100MHz - on Spartan 6 I believe - and has been connected to fast 6ns (167MHz) external SDRAM. (Does that mean single-cycle access?) I'd welcome clarification from Arlet or EEye on what's been seen on boards: there have been impressively long threads but it's difficult to distill it to a performance summary. I think I'm right in saying that there are two board designs?

(For my own part, I've reported 50MHz with an Arlet-type core, on Spartan 3. I have a Spartan 6 board but haven't made use of it yet)

I'll update this post as information comes to light, so it can serve as a reference.

Code:
Core:   Clock:  FPGA type:       Memory access:  Memory type/speed:  Relevant threads or posts:
Arlet    45MHz  Spartan 3        ??              SRAM                http://forum.6502.org/viewtopic.php?f=10&t=2644
M65C02   74MHz  Spartan 3A (-4)  4 cycles        Async SRAM          http://forum.6502.org/viewtopic.php?t=2163&start=132
bc6502  117MHz  Spartan 6 (-3)   ??              ??                  http://forum.6502.org/viewtopic.php?t=2500&start=42


Last edited by BigEd on Sun Aug 25, 2013 7:50 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 6:23 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
With 10 ns SRAM, you can do a write in 1 cycle, and a read in 2 cycles (using 100 MHz clock). With SDRAM it's a bit more complicated. After you activate a row, you can do random access within that row in 2 or 3 cycles (depends on CAS delay setting). Switching to a different row takes a few cycles more. The size of the row depends on the SDRAM, but is typically something like 512 words, and you can keep 4 arbitrary rows activated. Once every 64 us or so, you need to interrupt for a refresh, which takes a few cycles.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 6:35 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Also, the Devboard had the SDRAM. The Parallel Video Board has the Synchronous RAM, which is similar to the FPGA blockRAM. Eventually, I would like to experiment with the 65Org16 and having a program running from this RAM and indeed see if it acts exactly like a blockRAM. I know Arlet's core has an issue when using external asynchronous RAM, I believe.

Although the synchronous RAM is significantly more expensive than the SDRAM, about 10x per part IIRC, I built the PVB with top performance in mind.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 6:45 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
To attach my (or any) core to external memory, you need some sort of memory controller. One thing you need to pay attention to is back-to-back write cycles. Most SRAM datasheets are not really explicit about what happens if you leave WE asserted for multiple cycles, while changing address/data signals. There are various solutions: (1) insert a dummy (or read) cycle between two write cycles, (2) use a DDR flipflop to only assert WE for half a cycle if SRAM timing allows it, or (3) just do it anyway. The first solution works pretty well for the 6502, since in most cases, the core doesn't generate back-to-back write cycles anyway. I'm thinking that solution #3 should work too.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 6:47 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Arlet: I think you've hooked up your core with such an SDRAM controller, with success (using your core which somehow avoids back-to-back writes?). What speed did you run at and with what RAM IC? What was the instruction rate (or similar measure of the average effect of the multi-cycle accesses)?

EEye: thanks for confirming the two boards. I see a BOM for the Dev Board here ($5 16MBx16 SDRAM 167MHz MT48LC16M16A2P-6A:D) and for the Video Board here ($60 2MBx18 SyncRAM 133MHz CY7C1463AV33-133AXC and $160 4MBx18 SyncRAM 5.5ns GS8640Z18GT-300I) What speed do you run the core at and how many cycles does a memory access take, on these two systems?

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 7:12 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Because the SDRAM is synchronous, it supports back-to-back write cycles. I've run SDRAM at 100 MHz (I've used a few different Micron parts, but they should all work fine). I never ran code from SDRAM, just data, so I can't say what the effective instruction rate was. For data, writes are fast because you can make a mini write buffer for 1 byte that'll accept the data in 1 cycle. Reads depend on the access pattern like I mentioned before.

By the way, my core only generates back-to-back writes when pushing an address on the stack, so a fourth solution would be to put the stack in internal memory. This would be smart anyway, since the stack is frequently used. The same goes for zero page.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 7:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Very good point about on-chip stack! But I'd like only to catalogue solutions we have source for, and simulations or preferably real circuits.

Of course I'd got confused about SDRAM being synchronous and therefore having a sampled write signal.

I see the point about the write buffer - is that part of your design? Is that the source you linked to in viewtopic.php?p=14571#p14571 ?

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 7:27 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...By the way, my core only generates back-to-back writes when pushing an address on the stack, so a fourth solution would be to put the stack in internal memory. This would be smart anyway, since the stack is frequently used. The same goes for zero page.

Thanks for clearing that up! Because I would never even consider putting zero page and stack off the FPGA. To me it should be considered something that should be integrated within the cpu. And for a '6502-like' system, which the 65Org16 is a part of, 1K of zero page and 1K of stack memory is plenty. Even so, on the 144-pin Spartan 6 it is capable of much more. Right now I use this setup on the video board, and ISE reports that only 18% of 16 bit (actually 18) blockRAM is being used.

@Ed, I've run the 65Org16.b at 100MHz on the devboard and the video boards. On the video boards I've found and corrected errors in the core, but this has not effectively slowed down the speed. Also, I anticipate that soon I will have a working internal bus structure and it will run @100MHz. In the devboard I was working with mostly schematic entry, the video board is 100% Verilog.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 7:55 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi EEye,
to be clear then, your Video Board uses the expensive "SyncRAM" which are clocked SRAMs which give a 2 cycle read and 1 cycle write access. The Dev Board uses the inexpensive "SDRAM" (CL=3) which results in multi-cycle access (faster for consecutive reads or writes within a row).

I note that Arlet sketched some cycle count possibilities here. (But I'd still like to know what has been built, rather than what might be buildable)


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 8:16 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
..(But I'd still like to know what has been built, rather than what might be buildable)

I built 2 Dev Boards using the SDRAM and sent 1 to Arlet. He was using it for this thread about sprites. I believe he was using an 8-bit system he designed around his original core. I think that too he had up to 100MHz, if not more.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 8:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Thanks - great work on the Dev Board, by the way, it really has made 65Org16 a reality.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 8:49 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
The cooperative effort between myself and Arlet was a joy for me to see in that the board come alive so quickly. He surely was the catalyst.

Let me just say though, and I will stop about my projects as there are others' work to be considered here for this thread, that the video boards I'm working on now is pretty much just an extension of the dev boards. I started to realize with the devboard, that so much was on them, i.e. USB to serial, video, keyboard, etc. that all the controlling hardware would in the end seriously limit the overall speed. The PVB's are just the first part where 1 Spartan 6 will be dedicated for video, 1 for audio on another board, and a another board 1 for USB to serial + Keyboard & Control. In this way I can maximize the speed of each board by dedicating them to 1 or 2 functions each.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 21, 2013 6:10 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
Very good point about on-chip stack! But I'd like only to catalogue solutions we have source for, and simulations or preferably real circuits.

We have several projects with on-chip memories. It's a very simple process to attach a memory to the internal bus.
Quote:
I see the point about the write buffer - is that part of your design? Is that the source you linked to in viewtopic.php?p=14571#p14571 ?

No, that's just the module that interfaces directly to SDRAM. There's an additional module that sits on the 6502 bus and interfaces to this one. I posted that source before, but here it is again for easy reference. The 'write buffer' is a big name, for what is basically only one line of code:
Code:
        sdram_wr_data <= { DO, DO };

In this module, I have my original 8 bit core hooked up to 16 bit memory, so the 8 bits are just repeated. Normally, you'd use 8 bit wide memory, or 16 bit CPU.


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 21, 2013 9:05 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Thanks for the link to sdram_if.

So, I think you've said that you have run a core at 100MHz with on-chip code and off-chip data. Have you also run with on-chip stack and zero page, and with off-chip code? I agree that in principle these things are easy to set up, but of course there's the potential to lose some speed in the extra decoding. That's why it seems worthwhile finding out what's actually run: to make a distinction between an implementation and a design idea.

Your trivial write buffer - does that make some assumption that the buffer will drain before another write comes along? In general, a write buffer needs to be able to refuse further writes, if it is already full.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 21, 2013 9:36 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes, I had a setup running at 100 MHz (regular 8 bit core), and using the posted SDRAM + IF code. Although it was only used for data, the same design also supports running code from external memory. All it requires is filling the SDRAM with code, and jumping to it. There's no difference in decoding required. The only problem is that the 6502 core does some extra dummy read cycles, which is rather wasteful given that it takes a few cycles to fetch data that will end up being discarded.

As far as the write buffer, the state machine in the SDRAM IF only handles a write when it's in the IDLE state. It then copies the data, and moves to the WRITE1 state. In the WRITE1 state, RDY is only deasserted when the CPU attempts to do another SDRAM access. If you only use the SDRAM for data, the 6502 usually doesn't do that, so the write can be handled in parallel. If the CPU does attempt to access SDRAM, the CPU is stopped using the RDY signal until SDRAM is ready again.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: