memory subsystem for fast 6502
memory subsystem for fast 6502
If anyone was thinking of a 6502 running faster than their memory - perhaps an FPGA CPU - they might be interested in Robert Finch's document on a simple pipelined interface:
http://www.birdcomputer.ca/Documents/in ... c_ram.html
It includes verilog, which might be also be a useful example to anyone getting started in the language.
http://www.birdcomputer.ca/Documents/in ... c_ram.html
It includes verilog, which might be also be a useful example to anyone getting started in the language.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I haven't put a 6502 on an FPGA yet: richarde has possession of our OHO modules, and the next intended purpose for those is to make some kind of logic analyser so we can debug our async 65816 upgrade board. As (paid) work calms down a bit we should have a chance to make progress on that.
However, bound has got a T65 design working. I've a feeling one or two other board members have 6502s working in FPGA too - it can be done!
I do hope to do this some time - hopefully this year - and would also like to have a go at extended instruction sets, however minimal. I'm sure I'll follow an incremental approach, if I ever get started.
However, bound has got a T65 design working. I've a feeling one or two other board members have 6502s working in FPGA too - it can be done!
I do hope to do this some time - hopefully this year - and would also like to have a go at extended instruction sets, however minimal. I'm sure I'll follow an incremental approach, if I ever get started.
Pipelined memory access comes in handy when your CPU implements a pipeline. If it does not, you'll just be inserting wait states in the CPU until the previous request comes back.
See guys? You all mocked me for advocating pipelines and other high throughput features for 6502 enhancements, particularly caches. Remember that synchronous RAMs are expressly designed for caches, and their interfaces work well with pipelined memory access. Used for single cell access, SDRAMs are no faster than their asynchronous counterparts, and sometimes even slower.
EDIT: Some felt that the above paragraph was much too cutting or confrontational. This wasn't my intention -- if you read it as I did, it would actually sound like something you'd hear on a sitcom. Once again, I fall victim to the lack of emotive power of the written word. Had I expressed these exact same words in person, it would have come across much more jovially.
See guys? You all mocked me for advocating pipelines and other high throughput features for 6502 enhancements, particularly caches. Remember that synchronous RAMs are expressly designed for caches, and their interfaces work well with pipelined memory access. Used for single cell access, SDRAMs are no faster than their asynchronous counterparts, and sometimes even slower.
EDIT: Some felt that the above paragraph was much too cutting or confrontational. This wasn't my intention -- if you read it as I did, it would actually sound like something you'd hear on a sitcom. Once again, I fall victim to the lack of emotive power of the written word. Had I expressed these exact same words in person, it would have come across much more jovially.
Last edited by kc5tja on Sun Mar 07, 2010 9:49 pm, edited 1 time in total.
I think there are plenty of intermediate possibilities.
For example, a simple two-byte buffer would allow a 6502-on-FPGA to get some speedup from fast 16-bit wide memory. (For an initial access to an odd address it would be no help, but for accesses to an even address followed by the next address it would be an advantage.)
For me, this is all about solving interesting problems: the absolute performance of a homebrew CPU is relatively unimportant. There might well be more satisfaction from building a 1MHz TTL system compared to a 20MHz FPGA system.
Building a (working!) cache on FPGA would be great, but a two-byte buffer is a simpler first step. One might think of it as a fully associative cache with a single 2-byte line.
(I suppose bytewide i/o operations could somehow be handled by on-FPGA address decoding.)
For example, a simple two-byte buffer would allow a 6502-on-FPGA to get some speedup from fast 16-bit wide memory. (For an initial access to an odd address it would be no help, but for accesses to an even address followed by the next address it would be an advantage.)
For me, this is all about solving interesting problems: the absolute performance of a homebrew CPU is relatively unimportant. There might well be more satisfaction from building a 1MHz TTL system compared to a 20MHz FPGA system.
Building a (working!) cache on FPGA would be great, but a two-byte buffer is a simpler first step. One might think of it as a fully associative cache with a single 2-byte line.
(I suppose bytewide i/o operations could somehow be handled by on-FPGA address decoding.)
BigEd wrote:
For example, a simple two-byte buffer would allow a 6502-on-FPGA to get some speedup from fast 16-bit wide memory. (For an initial access to an odd address it would be no help, but for accesses to an even address followed by the next address it would be an advantage.)
But it's still a cache.
My point still stands; I never said anything about how complex one makes the cache.
Quote:
For me, this is all about solving interesting problems: the absolute performance of a homebrew CPU is relatively unimportant.
For instance, I can pick up a PC-compatible memory stick for pennies compared to a comparable amount of asynchronous memory in discrete packages.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
The idea of an internal high speed cache for the 6502 is intriguing when designing on an FPGA platform. Most of the SRAM options from the Spartan 3 schematic library are synchronous, (about half are dual port). I've started today scratching notes and blocks of my hopefully 6502 compatible 8 bit CPU. I am starting with the ALU capabilities, and I think the Cache idea kc5tja mentioned, would be very useful going hand in hand with the ALU.
I would like to keep the data bus at 8 bits so I can use an existing assembly editor, but make the address bus 32 bits (4.2GB). Keep the same 8 bit add, sub, shift left & right, magnitude compare, and have an additional, 8 bit multiply with 16 bit result (instruction(s).
I know I am getting WAY ahead of myself, but I pose these ideas to try and grasp the possibilites from more experienced folks, and potential problems I might face with this kind of design before I start actually designing. I would like to keep all the existing 65C02 commands.
I would like to keep the data bus at 8 bits so I can use an existing assembly editor, but make the address bus 32 bits (4.2GB). Keep the same 8 bit add, sub, shift left & right, magnitude compare, and have an additional, 8 bit multiply with 16 bit result (instruction(s).
I know I am getting WAY ahead of myself, but I pose these ideas to try and grasp the possibilites from more experienced folks, and potential problems I might face with this kind of design before I start actually designing. I would like to keep all the existing 65C02 commands.
I haven't decided on a memory access strategy yet, but my FPGA plans include a 64k CPU address space, but an MMU to expand to 1M, maybe 16M or even 256M. 65536 memory blocks with 4k each - in continuation of my CS/A computer where I use the 74LS610 as MMU.
I'm also planning to keep all the current opcodes, but include 2, maybe 3 extra 16-bit registers (U,V,W) with extra opcodes for stuff like
using a 16 bit ALU to reduce invalid memory cycles.
Additionally a prefix opcode for "special features" like Blitter (e.g. transfer U bytes from address V to address W, adding X to V and Y to W as offsets, with U,V,W,X,Y as index registers - it would even be interruptable), vector operations (like 16 bit CRC e.g. for a TCP/IP packet), 16 bit arithmetic (U*V, etc)
All still in the idea stage, but I'd like to reserve the name "65020" for it now :-)
Hm, same here :-)
André
I'm also planning to keep all the current opcodes, but include 2, maybe 3 extra 16-bit registers (U,V,W) with extra opcodes for stuff like
Code: Select all
LDU $1234
LDA U,X
Additionally a prefix opcode for "special features" like Blitter (e.g. transfer U bytes from address V to address W, adding X to V and Y to W as offsets, with U,V,W,X,Y as index registers - it would even be interruptable), vector operations (like 16 bit CRC e.g. for a TCP/IP packet), 16 bit arithmetic (U*V, etc)
All still in the idea stage, but I'd like to reserve the name "65020" for it now :-)
Quote:
I know I am getting WAY ahead of myself, but I pose these ideas to try and grasp the possibilites from more experienced folks, and potential problems I might face with this kind of design before I start actually designing. I would like to keep all the existing 65C02 commands.
André
-
heronfisher
- Posts: 2
- Joined: 12 Mar 2010
- Location: london
fachat wrote:
Additionally a prefix opcode for "special features" like Blitter (e.g. transfer U bytes from address V to address W, adding X to V and Y to W as offsets, with U,V,W,X,Y as index registers - it would even be interruptable)
(The 80286 fixed this bug, thankfully.)
Quote:
All still in the idea stage, but I'd like to reserve the name "65020" for it now 
(I rather like the name 65002 myself.)
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
fachat wrote:
...All still in the idea stage, but I'd like to reserve the name "65020" for it now
...
André
André
And aside from the name calling, and staying true to the topic, if designing a 6502 in FPGA, it might be possible to put all 64K SRAM onboard. I've never designed with synchronous RAM, but there are many synchronous, even dual port RAM config's in the Spartan 3 library.
Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)
Quote:
ElEctric_EyE wrote:
fachat wrote:
...All still in the idea stage, but I'd like to reserve the name "65020" for it now :-)...
André
André
So when I'm going from a 8bit ALU to a 16bit ALU, I thought the "20" would be justified. But maybe you're right, this first step should probably be 65002. or 65012.
Quote:
Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)
André
ElEctric_EyE wrote:
Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)
The 68000 had a nearly completely virtualizable instruction set. Only MOVE.W SR, Dn instruction violated the virtualization capability; the 68010 (not 020!) solved this one problem in its entirety, thus fixing the virtualization problem once and for all. This would put the 680x0's virtualization features as far back as early 80s -- 1982ish, IIRC.
With regards to supervisor modes, though, the Intel architecture introduced this along with Protected Mode's appearance in the 80286 processor, back as far as 1987(ish). Protected mode introduced a Multics-like 4-ring security model. Ring 3 is the lowest privilege level, while ring 0 is the highest. Hence, your applications ran in ring 3, while the OS kernel ran in ring 0. Hence the origin of the phrase, "This code runs in ring 0."
Like the 68000's MOVE SR, though, the SMSW instruction was unprivileged, granting read-access to the protected mode bit, and thus undermining its ability to institute a true virtual machine. It also had a hardware bug, wherein once you engaged protected mode, you stayed in protected mode forever. The only work-around was to physically reset the CPU. This is why running DOS applications in OS/2 1.0 were so damn slow -- every time that thread took control, the CPU had to undergo a complete reset sequence!
The 80386 was the first processor to fix the real/protected-mode bug (some say that the 80286 was a rushed beta of the 80386 to try and fend off the 68000's success; however, I question this. It IS known that the 80x86 architecture inherited many of its protected-mode features and descriptor tables from the i432 architecture, though), but true support for virtualization didn't appear until the Pentium-IV series of processors!
ElEctric_EyE wrote:
The idea of an internal high speed cache for the 6502 is intriguing when designing on an FPGA platform. Most of the SRAM options from the Spartan 3 schematic library are synchronous, (about half are dual port).
- XAPP204 Using Block RAM for High Performance Read/Write CAMs
XAPP463 - Using Block RAM in Spartan-3 Generation
We're working on a tube+T65+rom+RAM design now, but it's still in debug. Once that's working, it might be an interesting platform to try for some caching.