6502.org
http://forum.6502.org/

CPLD 6502
http://forum.6502.org/viewtopic.php?f=10&t=5418
Page 1 of 3

Author:  Arlet [ Mon Dec 31, 2018 1:42 pm ]
Post subject:  CPLD 6502

Hello,

After a long pause, I decided to get back into 6502 hacking, and implement an idea I've been toying with for a few years: using multiple small CPLDs to implement a 6502.

My CPLD of choice was the Xilinx XC9572XL in 44 pin TFQP package. My original plan was to use 5 or 6 of them, but somewhat as a surprise to myself, I was able to fit it into 4. It's a very tight fit, especially for the control logic and the ALU. At first, it seemed completely hopeless watching the tools allocate big chunks of resources for the simplest expressions, but with a lot of experimenting and reading the fitter reports, I gradually gained an understanding on how to write the code so it would match the capabilities of the CPLDs and the tools.

A few years ago I tried something similar, but noticed that the CPLD was a very poor fit for bigger adders (mostly because there's no fast carry chain, and also because the AND-OR structure is not good for XOR operations), and had given up on the idea. But then a while ago, I was going through the datasheets for another project, and I noticed that nice dedicated XOR port in each macrocell. I spent a few days going over different ways to turn that XOR into the centerpiece of the ALU.

I had several reasons for picking this particular type of CPLD. It's fairly easy to solder, even by hand, I had previous experience with it, and it also has just enough resources to make this possible, but not too many to make it simple, resulting in a very nice puzzle that has kept me busy for a while.

Everything builds with standard settings, optimized for speed, on ISE 14.7, except for a couple of KEEP attributes at strategic places. Also, automatic FSM extraction needs to be turned off, because it doesn't respect the ordering in overlapping casez clauses, introducing bugs in the control logic. I did run it with automatic FSM extraction once, copied the state encoding, and then turn it back off. I must say, the tools are pretty amazing when optimizing smallish logic functions, but tend to be very clueless how to deal with more complex stuff, such as deciding when to allocate another macrocell for a subexpression. I always check the fitter reports for extra variables (recognized by totally random names with dollar signs). If there are any, I try to rewrite my code to avoid them. Also, the code generation for the built-in "+" expression isn't very good (except for adding/subtracting simple constants), so it's best avoided. Also, optimizing for size sometimes makes the implementation bigger. I recommend always optimize for speed because you get more control over the result.

I made a small board, containing the 4 CPLDs, plus an extra CPLD for UART/SPI, some SRAM and Flash, and 6 LED displays on a 74HC595 chain. My goal was to have it run at speed of 10 MHz. It's currently running stable at 12 MHz. I've added a wait state for the flash, mainly to test the RDY logic, but also because flash is rather slow. I briefly tried it at 24 MHz but it crashed. Haven't tried testing the maximum speed, nor did I do any analysis of the longest path. [Edit: I've added the source code for the extra CPLD as well, even though it's not strictly part of the project, it can still be useful]

Bootstrapping was done by writing simple UART loader and hard-coding it into I/O CPLD (which is connected to full address+data bus). The UART loader reads 256 bytes over UART, writes them to memory, and jumps to the first instruction. From there, a secundary loader took more data from UART and wrote it to flash.

Source code still needs to be cleaned up a bit, but I've made a repository on github: https://github.com/Arlet/cpld-6502

(By the way, if anybody's is working in Verilog, I highly recommend the Verilator project: https://www.veripool.org/wiki/verilator If I disable all output, it simulates the entire design for 100 million cycles in 21 seconds on my desktop PC, that's more than enough to run Klaus Dormann's verification program)

Attachments:
CPLD-6502.JPG
CPLD-6502.JPG [ 1.97 MiB | Viewed 3425 times ]

Author:  cbmeeks [ Mon Dec 31, 2018 1:46 pm ]
Post subject:  Re: CPLD 6502

Very interesting project. I like how you labeled the different parts of the virtual CPU.

Author:  Arlet [ Mon Dec 31, 2018 2:27 pm ]
Post subject:  Re: CPLD 6502

Resource usage for each of the 4 modules:

ABL module:

Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          11/18       28/54       52/90       9/ 9*
FB2          15/18       41/54       80/90       9/ 9*
FB3          17/18       39/54       86/90       9/ 9*
FB4          14/18       42/54       79/90       7/ 7*
             -----       -----       -----      -----
             57/72      150/216     297/360     34/34


ABH module (still has a bit of room)
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          16/18       33/54       85/90       9/ 9*
FB2           8/18       32/54       72/90       9/ 9*
FB3           5/18       26/54       22/90       8/ 9
FB4          13/18       33/54       85/90       7/ 7*
             -----       -----       -----      -----
             42/72      124/216     264/360     33/34


ALU module (only one macrocell left!)
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          18/18*      35/54       49/90       8/ 9
FB2          18/18*      43/54       74/90       9/ 9*
FB3          18/18*      46/54       86/90       9/ 9*
FB4          17/18       36/54       42/90       7/ 7*
             -----       -----       -----      -----
             71/72      160/216     251/360     33/34


CTL module (lots of wide input functions, there still appear to be free macrocells, but they only have a few product terms. Most of the free product terms are in FB1, but that one only has 2 macrocells).
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          16/18       31/54       69/90       9/ 9*
FB2          12/18       44/54       77/90       9/ 9*
FB3          11/18       22/54       88/90       8/ 9
FB4          14/18       19/54       82/90       7/ 7*
             -----       -----       -----      -----
             53/72      116/216     316/360     33/34

Author:  Arlet [ Mon Dec 31, 2018 4:35 pm ]
Post subject:  Re: CPLD 6502

Forgot to tell: the core is not cycle exact, but rather removes a couple of cycles in order to simplify control logic (and who needs dummy cycles anyway?)

  • JSR takes 5 cycles
  • RTS takes 4 cycles.
  • Simple implied instructions such as INX and ROL A take 1 cycle.
  • ZP, X takes 3 cycles, same as ZP.
  • (ZP,X) takes 5 cycles
  • PLA/PLP/PLX/PLY take 3 cycles.
  • No penalty for page boundary crossing on any instruction.
  • INC ZP takes 4 cycles, INC ABS takes 5, INC ABS, X also takes 5. ( also for DEC and shift/rotate)

There is still a useless cycle in PHA/PLA where instruction fetch is repeated. It is possible to remove those, but at the cost of considerable increase in control logic complexity.

Author:  Dr Jefyll [ Mon Dec 31, 2018 4:54 pm ]
Post subject:  Re: CPLD 6502

Whoa, fun project! :D And I share your feelings about the dummy cycles -- who needs 'em! Speeding up a JSR/RTS pair by 33% is something I can appreciate.

Same for the other speedups. ZP, X and (ZP,X) are very commonly used in Forth, so it's fun to contemplate that. And of course implied instructions such as INX and ROL A are ubiquitous in all contexts, not just Forth.

Great work! Thanks for posting!

-- Jeff

Author:  Drass [ Mon Dec 31, 2018 5:20 pm ]
Post subject:  Re: CPLD 6502

Great to see this Arlet. Welcome back!

I agree with Jeff, looks like a lot of fun — especially working to a tight fit!

Cheers.

Author:  BigEd [ Mon Dec 31, 2018 7:44 pm ]
Post subject:  Re: CPLD 6502

Wonderful project Arlet! I see the readme on github gives some more details of how you partitioned the design - thanks for that.
https://github.com/Arlet/cpld-6502#readme

Author:  Arlet [ Mon Dec 31, 2018 8:06 pm ]
Post subject:  Re: CPLD 6502

I just added schematic to github. https://github.com/Arlet/cpld-6502/blob ... matics.pdf

Author:  Arlet [ Mon Dec 31, 2018 8:22 pm ]
Post subject:  Re: CPLD 6502

Here's a simplified block diagram showing most of the interconnections.

Attachments:
cpld-6502-block.png
cpld-6502-block.png [ 8.89 KiB | Viewed 3356 times ]

Author:  GaBuZoMeu [ Mon Dec 31, 2018 8:52 pm ]
Post subject:  Re: CPLD 6502

Very nice project, indeed :)

Are you intend to implement some/all of the 65C02 opcodes as well?

I have a problem with your link above - I get "404" :(

There is most likely a typo in your block diagram: ABL will serve for AB[7:0] and ABH for AB[15:8] I assume :)


Regards,
Arne

Author:  Arlet [ Mon Dec 31, 2018 9:00 pm ]
Post subject:  Re: CPLD 6502

Thanks. Fixed the link and the diagram.

I have PHX/PHY/PLX/PLY as well as BRA implemented. I tried to add INC/DEC A, but realized that the ALU doesn't have controls to perform that operation. Maybe I can still add the STZ and the (ZP) instructions.

Author:  BigEd [ Mon Dec 31, 2018 9:14 pm ]
Post subject:  Re: CPLD 6502

Very handy diagram - thanks!

Author:  Chromatix [ Tue Jan 01, 2019 12:04 am ]
Post subject:  Re: CPLD 6502

I wonder whether adding a signal to explicitly indicate dummy cycles is feasible in your design. The 65816 does this by holding both VDA and VPA low.

Author:  ElEctric_EyE [ Tue Jan 01, 2019 3:22 am ]
Post subject:  Re: CPLD 6502

Great things are possible with ISE14.7! Welcome back... ;)
I think 2019 is going to be a great year!

Author:  Arlet [ Tue Jan 01, 2019 6:19 am ]
Post subject:  Re: CPLD 6502

Chromatix wrote:
I wonder whether adding a signal to explicitly indicate dummy cycles is feasible in your design. The 65816 does this by holding both VDA and VPA low.


The logic would be simple enough. Finding a spare pin on the CTL part would be a bit more of a challenge. Figuring out a way to remove the remaining dummy cycles may be easier than freeing up an I/O pin.

Page 1 of 3 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/