6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Apr 18, 2024 6:46 pm

All times are UTC




Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: New Verilog 6502 core
PostPosted: Tue Nov 08, 2016 10:46 pm 
Offline

Joined: Tue Nov 08, 2016 10:32 pm
Posts: 4
Hello All,

I am new to 6502.org and would like to introduce myself. I am a lifelong electronics and software hobbyist and a professional electronics engineer. I recently auto-generated a Verilog 6502 core from the Visual 6502 transistor-level net list. Documentation and full source code can be found here:

http://www.aholme.co.uk/6502/Main.htm

You may also be interested in my other hobby projects:

http://www.aholme.co.uk/Projects.htm


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 08, 2016 11:31 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Welcome. Cool implementation. Also liked the wide range of projects you list on your website.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 3:22 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3342
Location: Ontario, Canada
Hi, Andrew -- nice to see your name turn up here. :)

Andrew Holme wrote:
I recently auto-generated a Verilog 6502 core from the Visual 6502 transistor-level net list.
Verilog generated from a transistor net-list? I must be confused -- could've sworn it was supposed to be done the other way round! :wink:

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 9:59 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Welcome, Andrew, and congratulations! I think this might be the first successful effort to reduce the actual 6502 down to gates and to model them in an HDL using two-state logic - as you've found, the transparent latches, the bidirectional pass gates and the use of charge storage all make it quite a challenge.

The result is not only cycle-accurate but phase-accurate, and should be expected to perform some (but not all) of the undocumented opcodes.

I note that you clock the FPGA at 160MHz to produce a 10MHz 6502 - it seems that you've found empirically that 8 passes through the giant logic cloud, per phase, is enough to get the right answer. (In an email, you've said that sometimes logic evaluation takes a little longer, so the work spreads into the following phase, but we both agree that the real 6502 can and probably does do the same thing, built as it is with many transparent latches.

The program (or algorithm) you found (or implemented), called Sub-Gemini, to traverse a netlist and find the common primitives, is also very interesting!

There's still one thing which puzzles me. While it is clear that you need to introduce over-sampling clocked storage elements to model dynamic state or charge storage, it isn't clear to me that you need to clock the output of every logic element. But I think perhaps you do? So, for example, the ripple carry through the ALU, that will take 8 clock cycles to traverse 8 logic gates? I would have expected you would have condensed pure combinatorial clouds into a single clocked evaluation.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 12:57 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Nice job. Do you have plans to optimize the design further ?


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 7:45 pm 
Offline

Joined: Tue Nov 08, 2016 10:32 pm
Posts: 4
Arlet, thank you. Are you Arlet Ottens, the author of https://github.com/Arlet/verilog-6502 ? I tried your core with my Pool demo. It dropped-in very easily. I managed to close timing on it at 50 MHz, no problem, in a Spartan 3. It is very economical on FPGA resources.

Ed, as you probably know, the Spartan 3 FPGA has 4-input LUTs; and the fastest designs are those in which each register DATA input is a function of 4 or fewer signals. If combinatorial expressions depend on more than 4 terms, the tool chain has to use more LUTs, there are more routing and logic delays between the flops, and the maximum achievable FPGA clock rate might be reduced.

Increasing the depth of combinatorial logic between registers in my design would indeed allow the FPGA to 6502 clock ratio to be closer; however, it could make timing closure more challenging, such that there was no change in the ultimate 6502 clock rate achieved.

If you look in the file chip_6502.v, you will see this wrapper module for the `included combinatorial logic cloud:

module LOGIC (
input [`NUM_NODES-1:0] i,
output [`NUM_NODES-1:0] o);
`include "logic.inc" // this file contains combinatorial assign statements
endmodule

There is only once instance of this module, the outputs from which are registered and fed back to its inputs. I have experimented with a cascade of two and a cascade of four instances in-between the registers. The FPGA clock had to be reduced. The 6502 clock ended up about the same; and the cost of duplicating the entire combinatorial cloud was a doubling (or quadrupling) of FPGA fabric resources consumed!

So, my sledge-hammer attack of simply duplicating the entire cloud did not work. I would need to be more selective about which unnecessary registers I removed. Perhaps an algorithm could be devised that understood logic delays and made decisions about register removal, or it could be done by hand. It is an interesting problem. I agree there is potential here to improve performance.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 7:55 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Thanks Andrew - that does explain and justify your approach! It's tempting to think that some coalescing of critical pairs of logic gates would be an improvement, but it does feel like one of those combinatorial challenges.

What 6502 programs have you run by way of verifying that the 16:1 ratio is good and the core is working well?


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 8:53 pm 
Offline

Joined: Tue Nov 08, 2016 10:32 pm
Posts: 4
Ed, I have run the AllSuiteA test to completion in simulation; and I have run the Klaus Dormann test in simulation; however, it reached the size limit of the waveform capture file and halted prematurely after some hours. I need to re-run it with maybe just the program counter being logged. I have also run various small test code fragments targeted at exercising specific operations or instructions. On real hardware, I have only ever run my pool demo. The source for that is on my web page. It uses interrupts.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 09, 2016 9:02 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
Arlet, thank you. Are you Arlet Ottens, the author of https://github.com/Arlet/verilog-6502 ?

Yes, that's me.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 10, 2016 6:47 pm 
Offline

Joined: Tue Nov 08, 2016 10:32 pm
Posts: 4
I tried the Klaus Dormann test again in the Xilinx ISIM simulator, this time without a waveform database, but it was still taking too long, so I ran it on the Spartan 3E Starter Kit and I am happy to say that it passes.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 10, 2016 6:51 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
That's great! Thanks for reporting back.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 10, 2016 6:58 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
In case you still want to run some simulations, I can recommend the verilator project. I was playing with it a while ago, and could run my 6502 core at about 3 MHz on my desktop PC (no waveforms).


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 07, 2017 4:12 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
I tried Arlets 6502 core in Lattice Diamond which has a simulator included. It reports it to be able to run at 45MHz on a MachXO, so quite fast. In comparison the commercial WDC 65C02 core runs at 75MHz, but its 2.5 times bigger.

Do you know if anyone has tried to pipeline a 6502-like core?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 7:21 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
A few people have tried pipelining the 6502 with limited success. It’s actually very difficult to improve on the 6502’s timing without changing it a whole lot. The 6502 uses the bus very efficiently and there’s already very few if any “dead” cycles to remove. In order to pipeline the ’02 further one would have to change the bus structure so that multiple bytes could be read at once for both instructions and data.
There is a commercially used 6502 clone that has more pipelining to it. I believe they got some of the two cycle operations down to a single cycle. I forget what the part number is though.

Congrats on getting the cpu to work. From a transistor list !

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 9:05 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8422
Location: Southern California
kakemoms wrote:
Do you know if anyone has tried to pipeline a 6502-like core?

Manili started the topic recently Pipelined 6502, and there some good discussion there. He eventually concluded however, "When I started the project, I thought this is going to be a useful project but now I think I'm going to ruin the 6502 performance with this kind of pipeline :( !!! "

Michael Barry is doing a nice job of designing his 65m32 which is a 32-bitter that merges most operands with the op code so it all gets fetched together in one cycle.

Quote:
There is a commercially used 6502 clone that has more pipelining to it. I believe they got some of the two cycle operations down to a single cycle. I forget what the part number is though.

That might be the 65CE02: http://6502.org/documents/datasheets/mo ... 02_mpu.pdf
Besides making a big improvement in the instruction set and registers, they got rid of almost all the dead bus cycles, and there are 31 op codes that execute in one cycle.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: