New Verilog 6502 core

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
Andrew Holme
Posts: 4
Joined: 08 Nov 2016

New Verilog 6502 core

Post by Andrew Holme »

Hello All,

I am new to 6502.org and would like to introduce myself. I am a lifelong electronics and software hobbyist and a professional electronics engineer. I recently auto-generated a Verilog 6502 core from the Visual 6502 transistor-level net list. Documentation and full source code can be found here:

http://www.aholme.co.uk/6502/Main.htm

You may also be interested in my other hobby projects:

http://www.aholme.co.uk/Projects.htm
User avatar
MichaelM
Posts: 761
Joined: 23 Apr 2012
Location: Huntsville, AL

Re: New Verilog 6502 core

Post by MichaelM »

Welcome. Cool implementation. Also liked the wide range of projects you list on your website.
Michael A.
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: New Verilog 6502 core

Post by Dr Jefyll »

Hi, Andrew -- nice to see your name turn up here. :)
Andrew Holme wrote:
I recently auto-generated a Verilog 6502 core from the Visual 6502 transistor-level net list.
Verilog generated from a transistor net-list? I must be confused -- could've sworn it was supposed to be done the other way round! :wink:

cheers,
Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: New Verilog 6502 core

Post by BigEd »

Welcome, Andrew, and congratulations! I think this might be the first successful effort to reduce the actual 6502 down to gates and to model them in an HDL using two-state logic - as you've found, the transparent latches, the bidirectional pass gates and the use of charge storage all make it quite a challenge.

The result is not only cycle-accurate but phase-accurate, and should be expected to perform some (but not all) of the undocumented opcodes.

I note that you clock the FPGA at 160MHz to produce a 10MHz 6502 - it seems that you've found empirically that 8 passes through the giant logic cloud, per phase, is enough to get the right answer. (In an email, you've said that sometimes logic evaluation takes a little longer, so the work spreads into the following phase, but we both agree that the real 6502 can and probably does do the same thing, built as it is with many transparent latches.

The program (or algorithm) you found (or implemented), called Sub-Gemini, to traverse a netlist and find the common primitives, is also very interesting!

There's still one thing which puzzles me. While it is clear that you need to introduce over-sampling clocked storage elements to model dynamic state or charge storage, it isn't clear to me that you need to clock the output of every logic element. But I think perhaps you do? So, for example, the ripple carry through the ALU, that will take 8 clock cycles to traverse 8 logic gates? I would have expected you would have condensed pure combinatorial clouds into a single clocked evaluation.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: New Verilog 6502 core

Post by Arlet »

Nice job. Do you have plans to optimize the design further ?
Andrew Holme
Posts: 4
Joined: 08 Nov 2016

Re: New Verilog 6502 core

Post by Andrew Holme »

Arlet, thank you. Are you Arlet Ottens, the author of https://github.com/Arlet/verilog-6502 ? I tried your core with my Pool demo. It dropped-in very easily. I managed to close timing on it at 50 MHz, no problem, in a Spartan 3. It is very economical on FPGA resources.

Ed, as you probably know, the Spartan 3 FPGA has 4-input LUTs; and the fastest designs are those in which each register DATA input is a function of 4 or fewer signals. If combinatorial expressions depend on more than 4 terms, the tool chain has to use more LUTs, there are more routing and logic delays between the flops, and the maximum achievable FPGA clock rate might be reduced.

Increasing the depth of combinatorial logic between registers in my design would indeed allow the FPGA to 6502 clock ratio to be closer; however, it could make timing closure more challenging, such that there was no change in the ultimate 6502 clock rate achieved.

If you look in the file chip_6502.v, you will see this wrapper module for the `included combinatorial logic cloud:

module LOGIC (
input [`NUM_NODES-1:0] i,
output [`NUM_NODES-1:0] o);
`include "logic.inc" // this file contains combinatorial assign statements
endmodule

There is only once instance of this module, the outputs from which are registered and fed back to its inputs. I have experimented with a cascade of two and a cascade of four instances in-between the registers. The FPGA clock had to be reduced. The 6502 clock ended up about the same; and the cost of duplicating the entire combinatorial cloud was a doubling (or quadrupling) of FPGA fabric resources consumed!

So, my sledge-hammer attack of simply duplicating the entire cloud did not work. I would need to be more selective about which unnecessary registers I removed. Perhaps an algorithm could be devised that understood logic delays and made decisions about register removal, or it could be done by hand. It is an interesting problem. I agree there is potential here to improve performance.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: New Verilog 6502 core

Post by BigEd »

Thanks Andrew - that does explain and justify your approach! It's tempting to think that some coalescing of critical pairs of logic gates would be an improvement, but it does feel like one of those combinatorial challenges.

What 6502 programs have you run by way of verifying that the 16:1 ratio is good and the core is working well?
Andrew Holme
Posts: 4
Joined: 08 Nov 2016

Re: New Verilog 6502 core

Post by Andrew Holme »

Ed, I have run the AllSuiteA test to completion in simulation; and I have run the Klaus Dormann test in simulation; however, it reached the size limit of the waveform capture file and halted prematurely after some hours. I need to re-run it with maybe just the program counter being logged. I have also run various small test code fragments targeted at exercising specific operations or instructions. On real hardware, I have only ever run my pool demo. The source for that is on my web page. It uses interrupts.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: New Verilog 6502 core

Post by Arlet »

Quote:
Arlet, thank you. Are you Arlet Ottens, the author of https://github.com/Arlet/verilog-6502 ?
Yes, that's me.
Andrew Holme
Posts: 4
Joined: 08 Nov 2016

Re: New Verilog 6502 core

Post by Andrew Holme »

I tried the Klaus Dormann test again in the Xilinx ISIM simulator, this time without a waveform database, but it was still taking too long, so I ran it on the Spartan 3E Starter Kit and I am happy to say that it passes.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: New Verilog 6502 core

Post by BigEd »

That's great! Thanks for reporting back.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: New Verilog 6502 core

Post by Arlet »

In case you still want to run some simulations, I can recommend the verilator project. I was playing with it a while ago, and could run my 6502 core at about 3 MHz on my desktop PC (no waveforms).
kakemoms
Posts: 349
Joined: 02 Mar 2016

Re: New Verilog 6502 core

Post by kakemoms »

I tried Arlets 6502 core in Lattice Diamond which has a simulator included. It reports it to be able to run at 45MHz on a MachXO, so quite fast. In comparison the commercial WDC 65C02 core runs at 75MHz, but its 2.5 times bigger.

Do you know if anyone has tried to pipeline a 6502-like core?
User avatar
Rob Finch
Posts: 465
Joined: 29 Dec 2002
Location: Canada
Contact:

Re: New Verilog 6502 core

Post by Rob Finch »

A few people have tried pipelining the 6502 with limited success. It’s actually very difficult to improve on the 6502’s timing without changing it a whole lot. The 6502 uses the bus very efficiently and there’s already very few if any “dead” cycles to remove. In order to pipeline the ’02 further one would have to change the bus structure so that multiple bytes could be read at once for both instructions and data.
There is a commercially used 6502 clone that has more pipelining to it. I believe they got some of the two cycle operations down to a single cycle. I forget what the part number is though.

Congrats on getting the cpu to work. From a transistor list !
User avatar
GARTHWILSON
Forum Moderator
Posts: 8774
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: New Verilog 6502 core

Post by GARTHWILSON »

kakemoms wrote:
Do you know if anyone has tried to pipeline a 6502-like core?
Manili started the topic recently Pipelined 6502, and there some good discussion there. He eventually concluded however, "When I started the project, I thought this is going to be a useful project but now I think I'm going to ruin the 6502 performance with this kind of pipeline :( !!! "

Michael Barry is doing a nice job of designing his 65m32 which is a 32-bitter that merges most operands with the op code so it all gets fetched together in one cycle.
Quote:
There is a commercially used 6502 clone that has more pipelining to it. I believe they got some of the two cycle operations down to a single cycle. I forget what the part number is though.
That might be the 65CE02: http://6502.org/documents/datasheets/mo ... 02_mpu.pdf
Besides making a big improvement in the instruction set and registers, they got rid of almost all the dead bus cycles, and there are 31 op codes that execute in one cycle.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Post Reply