memory subsystem for fast 6502

BigEd · Post by **BigEd** » Fri Mar 05, 2010 7:07 pm

If anyone was thinking of a 6502 running faster than their memory - perhaps an FPGA CPU - they might be interested in Robert Finch's document on a simple pipelined interface:
http://www.birdcomputer.ca/Documents/in ... c_ram.html

It includes verilog, which might be also be a useful example to anyone getting started in the language.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Mar 06, 2010 9:44 pm

You're wany ahead of the power curve. Have you guys made a 6502 in FPGA yet?

BigEd · Post by **BigEd** » Sun Mar 07, 2010 4:37 pm

I haven't put a 6502 on an FPGA yet: richarde has possession of our OHO modules, and the next intended purpose for those is to make some kind of logic analyser so we can debug our async 65816 upgrade board. As (paid) work calms down a bit we should have a chance to make progress on that.

However, bound has got a T65 design working. I've a feeling one or two other board members have 6502s working in FPGA too - it can be done!

I do hope to do this some time - hopefully this year - and would also like to have a go at extended instruction sets, however minimal. I'm sure I'll follow an incremental approach, if I ever get started.

kc5tja · Post by **kc5tja** » Sun Mar 07, 2010 5:06 pm

Pipelined memory access comes in handy when your CPU implements a pipeline. If it does not, you'll just be inserting wait states in the CPU until the previous request comes back.

See guys? You all mocked me for advocating pipelines and other high throughput features for 6502 enhancements, particularly caches. Remember that synchronous RAMs are expressly designed for caches, and their interfaces work well with pipelined memory access. Used for single cell access, SDRAMs are no faster than their asynchronous counterparts, and sometimes even slower.

EDIT: Some felt that the above paragraph was much too cutting or confrontational. This wasn't my intention -- if you read it as I did, it would actually sound like something you'd hear on a sitcom. Once again, I fall victim to the lack of emotive power of the written word. Had I expressed these exact same words in person, it would have come across much more jovially.

BigEd · Post by **BigEd** » Sun Mar 07, 2010 5:28 pm

I think there are plenty of intermediate possibilities.

For example, a simple two-byte buffer would allow a 6502-on-FPGA to get some speedup from fast 16-bit wide memory. (For an initial access to an odd address it would be no help, but for accesses to an even address followed by the next address it would be an advantage.)

For me, this is all about solving interesting problems: the absolute performance of a homebrew CPU is relatively unimportant. There might well be more satisfaction from building a 1MHz TTL system compared to a 20MHz FPGA system.

Building a (working!) cache on FPGA would be great, but a two-byte buffer is a simpler first step. One might think of it as a fully associative cache with a single 2-byte line.

(I suppose bytewide i/o operations could somehow be handled by on-FPGA address decoding.)

kc5tja · Post by **kc5tja** » Sun Mar 07, 2010 5:50 pm

BigEd wrote:

For example, a simple two-byte buffer would allow a 6502-on-FPGA to get some speedup from fast 16-bit wide memory. (For an initial access to an odd address it would be no help, but for accesses to an even address followed by the next address it would be an advantage.)

OK, so you just described a single line, two byte cache, direct mapped cache.

But it's still a cache.

My point still stands; I never said anything about how complex one makes the cache.

Quote:

For me, this is all about solving interesting problems: the absolute performance of a homebrew CPU is relatively unimportant.

It is, for me, of the utmost importance that the homebrew CPU works with, not against, the peripherals to which it's attached. Asynchronous RAMs of all kinds are a dying breed these days. They won't likely disappear off the face of the Earth anytime soon, but between dwindling stocks, exotic packaging requirements, and reduced sales volumes, their prices can be expected to skyrocket far and above other technologies, like synchronous memories.

For instance, I can pick up a PC-compatible memory stick for pennies compared to a comparable amount of asynchronous memory in discrete packages.

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Mar 15, 2010 4:57 pm

The idea of an internal high speed cache for the 6502 is intriguing when designing on an FPGA platform. Most of the SRAM options from the Spartan 3 schematic library are synchronous, (about half are dual port). I've started today scratching notes and blocks of my hopefully 6502 compatible 8 bit CPU. I am starting with the ALU capabilities, and I think the Cache idea kc5tja mentioned, would be very useful going hand in hand with the ALU.

I would like to keep the data bus at 8 bits so I can use an existing assembly editor, but make the address bus 32 bits (4.2GB). Keep the same 8 bit add, sub, shift left & right, magnitude compare, and have an additional, 8 bit multiply with 16 bit result (instruction(s).

I know I am getting WAY ahead of myself, but I pose these ideas to try and grasp the possibilites from more experienced folks, and potential problems I might face with this kind of design before I start actually designing. I would like to keep all the existing 65C02 commands.

fachat · Post by **fachat** » Mon Mar 15, 2010 10:19 pm

I haven't decided on a memory access strategy yet, but my FPGA plans include a 64k CPU address space, but an MMU to expand to 1M, maybe 16M or even 256M. 65536 memory blocks with 4k each - in continuation of my CS/A computer where I use the 74LS610 as MMU.

I'm also planning to keep all the current opcodes, but include 2, maybe 3 extra 16-bit registers (U,V,W) with extra opcodes for stuff like

Code: Select all

    LDU $1234
    LDA U,X

using a 16 bit ALU to reduce invalid memory cycles.

Additionally a prefix opcode for "special features" like Blitter (e.g. transfer U bytes from address V to address W, adding X to V and Y to W as offsets, with U,V,W,X,Y as index registers - it would even be interruptable), vector operations (like 16 bit CRC e.g. for a TCP/IP packet), 16 bit arithmetic (U*V, etc)

All still in the idea stage, but I'd like to reserve the name "65020" for it now :-)

Quote:

I know I am getting WAY ahead of myself, but I pose these ideas to try and grasp the possibilites from more experienced folks, and potential problems I might face with this kind of design before I start actually designing. I would like to keep all the existing 65C02 commands.

Hm, same here :-)

André

heronfisher · Post by **heronfisher** » Tue Mar 16, 2010 10:37 am

My goal is to beat the T65 project in LE usage. Speed and 100%
compatibility with the original 6502 (e.g. the strange S0 and V-flag
feature or the original hardware reset vectors) is not important for me,
but code compiled with /www.cc65.org/ must work.

kc5tja · Post by **kc5tja** » Thu Mar 18, 2010 5:22 am

fachat wrote:

Additionally a prefix opcode for "special features" like Blitter (e.g. transfer U bytes from address V to address W, adding X to V and Y to W as offsets, with U,V,W,X,Y as index registers - it would even be interruptable)

If you're going to make it interruptable, make sure the CPU pushes the address of the prefix byte, not of the secondary opcode.

The 8086 had a bug involving some of its segment override prefix bytes. If you used a string instruction with a segment override of any kind, and an interrupt occured, you will likely trash memory upon returning from the interrupt handler because it went back to using the default segment registers. This was caused because the CPU pushed the address of the string instruction, and not the prefix bytes ahead of it.

(The 80286 fixed this bug, thankfully.)

Quote:

All still in the idea stage, but I'd like to reserve the name "65020" for it now

Actually, I came up with the 65020 name years ago, but that's OK -- you can have it.

(I rather like the name 65002 myself.)

John West · Post by **John West** » Thu Mar 18, 2010 10:43 am

kc5tja wrote:

Actually, I came up with the 65020 name years ago, but that's OK -- you can have it.

I did too. First to implement gets the name, I say. I've been failing to do anything with my design for over 20 years now, so it's not likely to be me.

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Mar 22, 2010 6:35 am

fachat wrote:

...All still in the idea stage, but I'd like to reserve the name "65020" for it now

...
André

Not bowing down to 68xxx family here, but the 65020 sounds alot like the 68020, a 32 bit cpu (mmmm, 1984, nice). Maybe a true 16bit version of 6502 should be 65002?... Which begs the question what is the actual origin of the number(s) 6.5.0.2.?

And aside from the name calling, and staying true to the topic, if designing a 6502 in FPGA, it might be possible to put all 64K SRAM onboard. I've never designed with synchronous RAM, but there are many synchronous, even dual port RAM config's in the Spartan 3 library.

Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)

fachat · Post by **fachat** » Mon Mar 22, 2010 9:33 am

Quote:

ElEctric_EyE wrote:

fachat wrote:

...All still in the idea stage, but I'd like to reserve the name "65020" for it now :-)...
André

Not bowing down to 68xxx family here, but the 65020 sounds alot like the 68020, a 32 bit cpu (mmmm, 1984, nice). Maybe a true 16bit version of 6502 should be 65002?... Which begs the question what is the actual origin of the number(s) 6.5.0.2.?

Well, the 68000 was internally 16bit (16bit ALU for example), just with 32bit registers. You could see that with the opcode timing for example (IIRC). Only the 68020 changed that to a true 32bit CPU.

So when I'm going from a 8bit ALU to a 16bit ALU, I thought the "20" would be justified. But maybe you're right, this first step should probably be 65002. or 65012.

Quote:

Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)

Besides, I plan on including an MMU and other virtual machine concepts in that one (to enable e.g. simple multiprocessor support), so another point for the "20".

André

kc5tja · Post by **kc5tja** » Mon Mar 22, 2010 3:15 pm

ElEctric_EyE wrote:

Edit:And speaking of the 65xx/68xxx architecture, if you pull up a 68020 datasheet, you'll see virtual machine/supervisory concepts dated back to 1992. (A superiority I had to point out compared to the 8086 evolution, maybe a year or two have passed since Intel has posed a virtual machine concept)

I must disagree with this, having used both architectures extensively, both professionally and while growing up.

The 68000 had a nearly completely virtualizable instruction set. Only MOVE.W SR, Dn instruction violated the virtualization capability; the 68010 (not 020!) solved this one problem in its entirety, thus fixing the virtualization problem once and for all. This would put the 680x0's virtualization features as far back as early 80s -- 1982ish, IIRC.

With regards to supervisor modes, though, the Intel architecture introduced this along with Protected Mode's appearance in the 80286 processor, back as far as 1987(ish). Protected mode introduced a Multics-like 4-ring security model. Ring 3 is the lowest privilege level, while ring 0 is the highest. Hence, your applications ran in ring 3, while the OS kernel ran in ring 0. Hence the origin of the phrase, "This code runs in ring 0."

Like the 68000's MOVE SR, though, the SMSW instruction was unprivileged, granting read-access to the protected mode bit, and thus undermining its ability to institute a true virtual machine. It also had a hardware bug, wherein once you engaged protected mode, you stayed in protected mode forever. The only work-around was to physically reset the CPU. This is why running DOS applications in OS/2 1.0 were so damn slow -- every time that thread took control, the CPU had to undergo a complete reset sequence!

The 80386 was the first processor to fix the real/protected-mode bug (some say that the 80286 was a rushed beta of the 80386 to try and fend off the 68000's success; however, I question this. It IS known that the 80x86 architecture inherited many of its protected-mode features and descriptor tables from the i432 architecture, though), but true support for virtualization didn't appear until the Pentium-IV series of processors!

BigEd · Post by **BigEd** » Mon Jun 14, 2010 11:42 am

ElEctric_EyE wrote:

The idea of an internal high speed cache for the 6502 is intriguing when designing on an FPGA platform. Most of the SRAM options from the Spartan 3 schematic library are synchronous, (about half are dual port).

Xilinx have a few application notes about using block ram to implement caches:

XAPP204 Using Block RAM for High Performance Read/Write CAMs
XAPP463 - Using Block RAM in Spartan-3 Generation

and there's also Content-Addressable Memory v6.1 Datasheet but possibly that's something you have to pay for.

We're working on a tube+T65+rom+RAM design now, but it's still in debug. Once that's working, it might be an interesting platform to try for some caching.