6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 4:27 am

All times are UTC




Post new topic Reply to topic  [ 34 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: CPLD 6502
PostPosted: Mon Dec 31, 2018 1:42 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Hello,

After a long pause, I decided to get back into 6502 hacking, and implement an idea I've been toying with for a few years: using multiple small CPLDs to implement a 6502.

My CPLD of choice was the Xilinx XC9572XL in 44 pin TFQP package. My original plan was to use 5 or 6 of them, but somewhat as a surprise to myself, I was able to fit it into 4. It's a very tight fit, especially for the control logic and the ALU. At first, it seemed completely hopeless watching the tools allocate big chunks of resources for the simplest expressions, but with a lot of experimenting and reading the fitter reports, I gradually gained an understanding on how to write the code so it would match the capabilities of the CPLDs and the tools.

A few years ago I tried something similar, but noticed that the CPLD was a very poor fit for bigger adders (mostly because there's no fast carry chain, and also because the AND-OR structure is not good for XOR operations), and had given up on the idea. But then a while ago, I was going through the datasheets for another project, and I noticed that nice dedicated XOR port in each macrocell. I spent a few days going over different ways to turn that XOR into the centerpiece of the ALU.

I had several reasons for picking this particular type of CPLD. It's fairly easy to solder, even by hand, I had previous experience with it, and it also has just enough resources to make this possible, but not too many to make it simple, resulting in a very nice puzzle that has kept me busy for a while.

Everything builds with standard settings, optimized for speed, on ISE 14.7, except for a couple of KEEP attributes at strategic places. Also, automatic FSM extraction needs to be turned off, because it doesn't respect the ordering in overlapping casez clauses, introducing bugs in the control logic. I did run it with automatic FSM extraction once, copied the state encoding, and then turn it back off. I must say, the tools are pretty amazing when optimizing smallish logic functions, but tend to be very clueless how to deal with more complex stuff, such as deciding when to allocate another macrocell for a subexpression. I always check the fitter reports for extra variables (recognized by totally random names with dollar signs). If there are any, I try to rewrite my code to avoid them. Also, the code generation for the built-in "+" expression isn't very good (except for adding/subtracting simple constants), so it's best avoided. Also, optimizing for size sometimes makes the implementation bigger. I recommend always optimize for speed because you get more control over the result.

I made a small board, containing the 4 CPLDs, plus an extra CPLD for UART/SPI, some SRAM and Flash, and 6 LED displays on a 74HC595 chain. My goal was to have it run at speed of 10 MHz. It's currently running stable at 12 MHz. I've added a wait state for the flash, mainly to test the RDY logic, but also because flash is rather slow. I briefly tried it at 24 MHz but it crashed. Haven't tried testing the maximum speed, nor did I do any analysis of the longest path. [Edit: I've added the source code for the extra CPLD as well, even though it's not strictly part of the project, it can still be useful]

Bootstrapping was done by writing simple UART loader and hard-coding it into I/O CPLD (which is connected to full address+data bus). The UART loader reads 256 bytes over UART, writes them to memory, and jumps to the first instruction. From there, a secundary loader took more data from UART and wrote it to flash.

Source code still needs to be cleaned up a bit, but I've made a repository on github: https://github.com/Arlet/cpld-6502

(By the way, if anybody's is working in Verilog, I highly recommend the Verilator project: https://www.veripool.org/wiki/verilator If I disable all output, it simulates the entire design for 100 million cycles in 21 seconds on my desktop PC, that's more than enough to run Klaus Dormann's verification program)


Attachments:
CPLD-6502.JPG
CPLD-6502.JPG [ 1.97 MiB | Viewed 3426 times ]


Last edited by Arlet on Mon Dec 31, 2018 8:45 pm, edited 11 times in total.
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 1:46 pm 
Offline
User avatar

Joined: Wed Aug 17, 2005 12:07 am
Posts: 1250
Location: Soddy-Daisy, TN USA
Very interesting project. I like how you labeled the different parts of the virtual CPU.

_________________
Cat; the other white meat.


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 2:27 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Resource usage for each of the 4 modules:

ABL module:

Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          11/18       28/54       52/90       9/ 9*
FB2          15/18       41/54       80/90       9/ 9*
FB3          17/18       39/54       86/90       9/ 9*
FB4          14/18       42/54       79/90       7/ 7*
             -----       -----       -----      -----
             57/72      150/216     297/360     34/34


ABH module (still has a bit of room)
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          16/18       33/54       85/90       9/ 9*
FB2           8/18       32/54       72/90       9/ 9*
FB3           5/18       26/54       22/90       8/ 9
FB4          13/18       33/54       85/90       7/ 7*
             -----       -----       -----      -----
             42/72      124/216     264/360     33/34


ALU module (only one macrocell left!)
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          18/18*      35/54       49/90       8/ 9
FB2          18/18*      43/54       74/90       9/ 9*
FB3          18/18*      46/54       86/90       9/ 9*
FB4          17/18       36/54       42/90       7/ 7*
             -----       -----       -----      -----
             71/72      160/216     251/360     33/34


CTL module (lots of wide input functions, there still appear to be free macrocells, but they only have a few product terms. Most of the free product terms are in FB1, but that one only has 2 macrocells).
Code:
Function    Mcells      FB Inps     Pterms      IO
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot
FB1          16/18       31/54       69/90       9/ 9*
FB2          12/18       44/54       77/90       9/ 9*
FB3          11/18       22/54       88/90       8/ 9
FB4          14/18       19/54       82/90       7/ 7*
             -----       -----       -----      -----
             53/72      116/216     316/360     33/34


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 4:35 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Forgot to tell: the core is not cycle exact, but rather removes a couple of cycles in order to simplify control logic (and who needs dummy cycles anyway?)

  • JSR takes 5 cycles
  • RTS takes 4 cycles.
  • Simple implied instructions such as INX and ROL A take 1 cycle.
  • ZP, X takes 3 cycles, same as ZP.
  • (ZP,X) takes 5 cycles
  • PLA/PLP/PLX/PLY take 3 cycles.
  • No penalty for page boundary crossing on any instruction.
  • INC ZP takes 4 cycles, INC ABS takes 5, INC ABS, X also takes 5. ( also for DEC and shift/rotate)

There is still a useless cycle in PHA/PLA where instruction fetch is repeated. It is possible to remove those, but at the cost of considerable increase in control logic complexity.


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 4:54 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Whoa, fun project! :D And I share your feelings about the dummy cycles -- who needs 'em! Speeding up a JSR/RTS pair by 33% is something I can appreciate.

Same for the other speedups. ZP, X and (ZP,X) are very commonly used in Forth, so it's fun to contemplate that. And of course implied instructions such as INX and ROL A are ubiquitous in all contexts, not just Forth.

Great work! Thanks for posting!

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 5:20 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Great to see this Arlet. Welcome back!

I agree with Jeff, looks like a lot of fun — especially working to a tight fit!

Cheers.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 7:44 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Wonderful project Arlet! I see the readme on github gives some more details of how you partitioned the design - thanks for that.
https://github.com/Arlet/cpld-6502#readme


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 8:06 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I just added schematic to github. https://github.com/Arlet/cpld-6502/blob ... matics.pdf


Last edited by Arlet on Mon Dec 31, 2018 8:56 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 8:22 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Here's a simplified block diagram showing most of the interconnections.


Attachments:
cpld-6502-block.png
cpld-6502-block.png [ 8.89 KiB | Viewed 3357 times ]


Last edited by Arlet on Mon Dec 31, 2018 8:55 pm, edited 1 time in total.
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 8:52 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Very nice project, indeed :)

Are you intend to implement some/all of the 65C02 opcodes as well?

I have a problem with your link above - I get "404" :(

There is most likely a typo in your block diagram: ABL will serve for AB[7:0] and ABH for AB[15:8] I assume :)


Regards,
Arne


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 9:00 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Thanks. Fixed the link and the diagram.

I have PHX/PHY/PLX/PLY as well as BRA implemented. I tried to add INC/DEC A, but realized that the ALU doesn't have controls to perform that operation. Maybe I can still add the STZ and the (ZP) instructions.


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Mon Dec 31, 2018 9:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Very handy diagram - thanks!


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Tue Jan 01, 2019 12:04 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I wonder whether adding a signal to explicitly indicate dummy cycles is feasible in your design. The 65816 does this by holding both VDA and VPA low.


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Tue Jan 01, 2019 3:22 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Great things are possible with ISE14.7! Welcome back... ;)
I think 2019 is going to be a great year!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD 6502
PostPosted: Tue Jan 01, 2019 6:19 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Chromatix wrote:
I wonder whether adding a signal to explicitly indicate dummy cycles is feasible in your design. The 65816 does this by holding both VDA and VPA low.


The logic would be simple enough. Finding a spare pin on the CTL part would be a bit more of a challenge. Figuring out a way to remove the remaining dummy cycles may be easier than freeing up an I/O pin.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: