6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 8:26 am

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Sun Apr 24, 2016 6:27 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
This could be interesting to students of high performance CPU implementation, and eventually to people wanting a fast 6502 core. It's a CPU core for the MEGA65 project, which needs to be able to work very much like a C64, a C65, or a fast 6502. (So it's cycle accurate when it needs to be, and supports the undocumented opcodes when it needs to.)

"GS4502B - An attempt to create a high-performance 4502 and 6502 compatible CPU"
"Experimental pipelined 4502 CPU design"
https://github.com/gardners/gs4502b#readme

Related blog posts at and near
http://c65gs.blogspot.co.uk/2016/04/pla ... or-re.html
Quote:
I have started implementing an all new CPU, that will be much smaller, will meet timing closure, and will generally be simpler and easier to understand, and therefore to debug.

This will in fact be the 3rd or 4th CPU design for the MEGA65, depending on how you count things, and will also incorporate what I have learnt through that process, and also some other modern CPU features that I have been reading up on. The net result is that the new CPU should be quite a lot faster than the current design


Quote:
The gs4502b will, at this stage, be a pipelined, triple-core, out-of-order instruction retirement, register-renaming processor with parallel instruction pre-fetch buffer and self-modifying code hazard avoidance.

We also plan for it to run at 192MHz.


I'm not sure what FPGA it might run on, but it's a large one, apparently. Possibly an Artix-7 as found for example on a $300 Nexys 4 board.

See also
https://groups.google.com/forum/#!forum ... evelopment
http://mega65.org

Edit: fixup title - thanks Jeff!


Last edited by BigEd on Wed Aug 03, 2016 2:09 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 25, 2016 2:58 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Looks like an interesting project. And as you say relevant to students of high-performance CPU's.

Right now, until memory becomes as fast as the cpu, the cpu really needs a large number of registers eg. 32. so that it isn't moving data between registers and memory all the time. This is difficult to add to the 6502/65816 instruction set without losing code density.

They've set a good goal for the performance. Using a faster part would help a lot.

I'm kinda curious to know what the 4502 is like. Is it instruction set compatible ? Does it have extra registers ?

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 25, 2016 5:15 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
A Z register, possibly.. the only reference I found (with any info at all) was something by Bill Herd on Hackaday somewhere.


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 25, 2016 7:27 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
On the topic of register count, I think the idea with this design is for all of the 64k base memory map to be in block RAM and therefore to be single-cycle access. I think there is some kind of expanded memory too, which is off chip and therefore will be multi-cycle. But, I think I might have seen mention of a cache, which will help.

On the 4502 - it seems to be a coined term for the core found in the C65's 4510, which is elsewhere described as compatible with the 65CE02. I found pointers to a couple of related threads on lemon64 forum at
https://news.ycombinator.com/item?id=9451228


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 25, 2016 7:35 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
The original Commodore 65CE02 data sheet is on this site at http://6502.org/documents/datasheets/mo ... 02_mpu.pdf . The http://www.commodore.ca/manuals/funet/c ... 65ce02.txt page is very nice though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 25, 2016 8:57 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Rob Finch wrote:
Right now, until memory becomes as fast as the cpu, the cpu really needs a large number of registers eg. 32. so that it isn't moving data between registers and memory all the time. This is difficult to add to the 6502/65816 instruction set without losing code density.

BTW, I do mostly agree with this! Although it's really as much a question of balance as of performance: a 25MHz machine with an excellent architecture might be outperformed by something less sophisticated running at 150MHz.

The ARM designers, as you may know, felt that making best use of memory bandwidth was very important. They had no space on the (original) chip for caches. So they have a register file, and relatively complex instructions, and move-multiple instructions, to make the most of memory bandwidth.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 20, 2016 9:50 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
This sounds amazing ! Not so long ago I came to this board and claimed that such a design of a pipelined 6502 was theoretically possible, and someone called me an idiot and a retard telling me this was completely dumb. Now I'm glad I'm not the only one which had this crazy idea - and the guy's probably more competent that I am to implement it.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 20, 2016 11:58 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Rob Finch wrote:
...Right now, until memory becomes as fast as the cpu...

What kind of memory?
Check out Synchronous RAM's. They are a very fast/slightly expensive type of clocked static RAM.

I used ones (2Mx18) that reached into 450MHz.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Fri Jun 03, 2016 5:20 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
ElEctric_EyE wrote:
Rob Finch wrote:
...Right now, until memory becomes as fast as the cpu...

What kind of memory?
Check out Synchronous RAM's. They are a very fast/slightly expensive type of clocked static RAM.

I used ones (2Mx18) that reached into 450MHz.


The problem with synchronous RAMs is that they assume you're talking to a cache (which is basically what BRAMs are inside an FPGA). You cannot really perform random accesses on synchronous memories (DRAM or SRAM) without incurring the usual access time overheads. SRAMs are faster than DRAMs, but you're still going to pay the "row access" latency unique to that device. The 6502 instruction set was designed when memory was at least as fast as the CPU, if not faster. So, it'll ultimately be bottlenecked by the synchronous interface if you attempt to run it without a local instruction and/or data cache.

Folks may remember me talking about having aspirations to research these more advanced ideas in my own Kestrel, back when "Kestrel-2" was still on paper. It's been a long time indeed.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 30, 2016 3:58 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
kc5tja wrote:
ElEctric_EyE wrote:
Rob Finch wrote:
...Right now, until memory becomes as fast as the cpu...

What kind of memory?
Check out Synchronous RAM's. They are a very fast/slightly expensive type of clocked static RAM.

I used ones (2Mx18) that reached into 450MHz.


The problem with synchronous RAMs is that they assume you're talking to a cache (which is basically what BRAMs are inside an FPGA). You cannot really perform random accesses on synchronous memories (DRAM or SRAM) without incurring the usual access time overheads. SRAMs are faster than DRAMs, but you're still going to pay the "row access" latency unique to that device. The 6502 instruction set was designed when memory was at least as fast as the CPU, if not faster. So, it'll ultimately be bottlenecked by the synchronous interface if you attempt to run it without a local instruction and/or data cache.

Folks may remember me talking about having aspirations to research these more advanced ideas in my own Kestrel, back when "Kestrel-2" was still on paper. It's been a long time indeed.


Just to be sure I understand, the 8ns SRAM from Alliance would require that the 6502 runs on a maximum of 125MHz?


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 30, 2016 7:53 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
kakemoms wrote:
Just to be sure I understand, the 8ns SRAM from Alliance would require that the 6502 runs on a maximum of 125MHz?

Even a 1MHz 6502 could use it, but if you had a 125MHz 6502, 8ns SRAM would not be nearly fast enough for it to run at 125MHz. Here's from the address-decoding page of the 6502 primer, about 30% of the way down the page:

      Note that 100ns memory is not fast enough for 10MHz on a 6502! It's only a slight oversimplification to say that the 6502 basically does a memory access in half a cycle, meaning 50ns @ 10MHz, 500ns @ 1MHz, etc., and some of that time will be taken up by glue logic, set-up times, etc., leaving less than you might think for the memory itself. In fact, the Apple II did two memory accesses per cycle, two million per second at 1MHz, with the video accessing the memory during the first half of Φ2, and the processor during the second half, interleaving, so both could access the same memory at the same time at full speed, with no conflicts. Anyway, speed is not just the inverse of the access time.

      To expand on the scenario above, consider 100ns memory (let's say it's ROM, so we can leave Φ2 out of it) and a 10MHz 6502. One period at 10MHz is 100ns; but from there you have to subtract the specified address setup time (tADS, 30ns for a 14MHz 6502) and the read data setup time (tDSR, 10ns for a 14MHz 6502) and probably some address-decoding logic time, let's say 10ns but it will depend on your circuit and how fast your logic is, leaving you with about 50ns for the ROM at 10MHz. If you're running it at 3.3V, the specs say you need to take off another 15ns, leaving you with ROM that can dish up the data in 35ns @3.3V. That's if you want to be sure the product will always work. It's nice to know that parts are usually faster than the guaranteed worst case; but for production, you can't assume they always will, because at any time the suppliers could give you slower parts that are still within spec and they won't work at your speed and it won't be any fault of theirs!

      Each part's data sheet will have the timing diagrams and timing specifications. Don't ignore them!

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 31, 2016 3:26 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
It's true that the bus between CPU and RAM, and any logic in there, or in the address decoding, will consume time which must be added to the RAM's access time to get some idea of how fast the CPU can go.

But note that a 100MHz 6502, in our world, is going to be an FPGA design. So its internal timing might be more favourable: the access time of the RAM might be much closer to the cycle time of the CPU. The address decoding might be on the FPGA too. There's still inherently some pin delay, for the addresses leaving and the data coming back in from the RAM.

But, again, it's possible that an FGPA-based 6502 core would have an internal memory buffer or cache: the core might run faster than the memory in most cycles. Or it might have some on-chip memory which runs at full speed and it might access the off-chip RAM slower than that.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 01, 2016 8:51 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
kakemoms wrote:
Just to be sure I understand, the 8ns SRAM from Alliance would require that the 6502 runs on a maximum of 125MHz?

Just checked the datasheet of the AS7C34096A-8TIN 512K X 8 BIT HIGH SPEED CMOS SRAM and indeed it does support access time and cycle time of 8ns. So the 6502 could go at most 125MHz for single-cycle access. I'm not sure we have a 125MHz 6502 core, for inexpensive FPGAs, but we might be close. There's a reasonable chance that you'd lose a half or one nanosecond... at these speeds it's hard to get all the details to line up and get full speed and reliable operation. But with an 8ns part you should probably be confident you could run at 100MHz.

But as I tried to say in my previous reply, there are other options: on-chip accesses at full speed and off-chip accesses with a wait state is straightforward, and some kind of on-chip buffer or cache should be possible but is less straightforward.

Using synchronous memory is probably going to make things easier, but most 6502 HDL cores are designed for asynchronous operation, so there might be a bit of work in that, depending on which core you choose.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 04, 2017 5:41 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
An interesting experimental result: an 8080 emulation running on the 48MHz super 6502 runs at a healthy 12MHz equivalent:
http://c65gs.blogspot.co.uk/2017/04/emu ... ega65.html

Quote:
it turns out that the 45GS10 is quite friendly for emulating foreign hardware. Specifically, the combination of a large address space, together with the ZP-indirect 32-bit addressing mode and the JMP ($nnnn,X) jump-table instruction means that an emulator can operate in its own address space, separate from the emulated system, and use ZP registers as emulated registers, with the ZP-indirect 32-bit mode allowing dereferencing of those emulated registers


(Although I note LGB's disclaimer that this is an early result on a simple benchmark and might not be the true final result)


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 05, 2017 3:16 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Bregalad wrote:
This sounds amazing ! Not so long ago I came to this board and claimed that such a design of a pipelined 6502 was theoretically possible, and someone called me an idiot and a retard telling me this was completely dumb. Now I'm glad I'm not the only one which had this crazy idea - and the guy's probably more competent that I am to implement it.

That doesn't sound like this place at all, Bregalad! I have never seen a discussion here dip to the level of petty insults. Are you sure that you're not mistaken?

Mike B.

P.S. Ahh ... a forum member privately pointed me to the incident to which you seem to refer. Toshi's bedside manner may leave something to be desired, but he always tries to make valid points, and I'm willing to bet that he's correct most of the time.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: