6502-Core Comparisons: Fitting a Xilinx Spartan 2 XC2S200
Hi Ruud
Another point about how many LUTs your design might use: the 6502 has an ALU and an incrementor for the PC. All the address indexing and stack pointer adjustment is done by the ALU. In NMOS, muxing is quite cheap and tristate multi-master busses are too. In FPGA, a bus mux might be free if the mux is simple and just in front of a register, or it might take up dedicated slices if not. Also, it would be easy to use many adders: for address indexes, for stack adjustment, even for inc/dec operations - this would increase the slice count, although it might be a tactic for a faster machine. It might also be easier to write, to debug, and to modify.
Cheers
Ed
Another point about how many LUTs your design might use: the 6502 has an ALU and an incrementor for the PC. All the address indexing and stack pointer adjustment is done by the ALU. In NMOS, muxing is quite cheap and tristate multi-master busses are too. In FPGA, a bus mux might be free if the mux is simple and just in front of a register, or it might take up dedicated slices if not. Also, it would be easy to use many adders: for address indexes, for stack adjustment, even for inc/dec operations - this would increase the slice count, although it might be a tactic for a faster machine. It might also be easier to write, to debug, and to modify.
Cheers
Ed
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related.
If you think it's worth the effort, you could perhaps check the synthesis report for these cores: it may well tell us number of add/subtracts or something else interesting.
I've got an old report here for a T65 build which says:
and then later we see
Not sure what's the best way of digesting that - or if it's worthwhile - but if say Ruud was interested in comparing his implementation with another, he might look at those tables.
(The synth report is probably found in a file *.syr)
Cheers
Ed
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related.
If you think it's worth the effort, you could perhaps check the synthesis report for these cores: it may well tell us number of add/subtracts or something else interesting.
I've got an old report here for a T65 build which says:
Code: Select all
Summary:
inferred 1 Counter(s).
inferred 145 D-type flip-flop(s).
inferred 10 Adder/Subtractor(s).
inferred 2 Comparator(s).
inferred 56 Multiplexer(s).
Code: Select all
HDL Synthesis Report
Macro Statistics
# ROMs : 2
4x13-bit ROM : 1
4x2-bit ROM : 1
# Adders/Subtractors : 21
16-bit adder : 2
16-bit subtractor : 1
5-bit subtractor : 2
6-bit adder : 2
6-bit subtractor : 2
7-bit adder : 3
7-bit subtractor : 1
8-bit adder : 3
8-bit addsub : 1
8-bit subtractor : 1
9-bit adder : 3
# Counters : 2
26-bit up counter : 1
3-bit up counter : 1
# Registers : 93
1-bit register : 82
2-bit register : 2
3-bit register : 1
4-bit register : 1
8-bit register : 7
# Comparators : 4
3-bit comparator equal : 1
3-bit comparator not equal : 1
5-bit comparator greater : 2
# Multiplexers : 40
1-bit 4-to-1 multiplexer : 24
1-bit 8-to-1 multiplexer : 1
2-bit 32-to-1 multiplexer : 1
2-bit 8-to-1 multiplexer : 5
24-bit 4-to-1 multiplexer : 1
3-bit 4-to-1 multiplexer : 6
8-bit 6-to-1 multiplexer : 1
8-bit 8-to-1 multiplexer : 1
(The synth report is probably found in a file *.syr)
Cheers
Ed
Hallo Ed,
No *.syr found. I'm using Xilinx ISE Webpack. What are you using?
BigEd wrote:
(The synth report is probably found in a file *.syr)
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
Same software. Is it hidden in a subdir? If you're using the GUI, maybe there's a way to request or suppress a synthesis report.
In my case, I ran using the command line, with something like
Maybe the report doesn't normally have that suffix, maybe it's encoded in my scripts (which I've inherited). Sorry. It should be there somewhere. If you have a *.xst file perhaps it specifies the report with '-ofn'?
Ed
In my case, I ran using the command line, with something like
Code: Select all
xst -ifn T65.xst -intstyle xflow -ofn ./T65.syr
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
BigEd wrote:
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
BigEd wrote:
... I've got an old report here for a T65 build which says:
and then later we see
Code: Select all
Summary:
inferred 1 Counter(s).
inferred 145 D-type flip-flop(s).
inferred 10 Adder/Subtractor(s).
inferred 2 Comparator(s).
inferred 56 Multiplexer(s).
Code: Select all
HDL Synthesis Report
Macro Statistics
# ROMs : 2
4x13-bit ROM : 1
4x2-bit ROM : 1
# Adders/Subtractors : 21
16-bit adder : 2
16-bit subtractor : 1
5-bit subtractor : 2
6-bit adder : 2
6-bit subtractor : 2
7-bit adder : 3
7-bit subtractor : 1
8-bit adder : 3
8-bit addsub : 1
8-bit subtractor : 1
9-bit adder : 3
# Counters : 2
26-bit up counter : 1
3-bit up counter : 1
# Registers : 93
1-bit register : 82
2-bit register : 2
3-bit register : 1
4-bit register : 1
8-bit register : 7
# Comparators : 4
3-bit comparator equal : 1
3-bit comparator not equal : 1
5-bit comparator greater : 2
# Multiplexers : 40
1-bit 4-to-1 multiplexer : 24
1-bit 8-to-1 multiplexer : 1
2-bit 32-to-1 multiplexer : 1
2-bit 8-to-1 multiplexer : 5
24-bit 4-to-1 multiplexer : 1
3-bit 4-to-1 multiplexer : 6
8-bit 6-to-1 multiplexer : 1
8-bit 8-to-1 multiplexer : 1
I too, am looking for more comparative information that can be extracted from the Reports. Especially, some info relating to max speed. Won't be able to devote time until next week...
Ruud wrote:
No *.syr found.
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
ElEctric_EyE wrote:
BigEd wrote:
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
On the topic of looking at more detail of the complexity of a design, I see in your jpg files there's a table of blue links to reports at the bottom of the page, the first of which is 'Synthesis Report' - hopefully that's the same as the *.syr file I use.
Cheers
Ed
I've updated my tabulation (running it in parallel with EE's, hope that's not a problem) and I see that Arlet's is the only design which uses LUTs as 16-bit RAMs - this could be a reason why the slice count is by far the smallest.
The Xilinx documentation contains descriptions of some particular coding styles which will cause the tool to apply particular implementations.
Another point or two on density:
Ed
The Xilinx documentation contains descriptions of some particular coding styles which will cause the tool to apply particular implementations.
Another point or two on density:
- How you encode control information will affect the complexity of the decode. One-hot, two-hot, binary, Gray code, etc.
Whether you decode all bits of an encoding, or just the bits which distinguish interesting cases, will make a difference. (You'll get an effect like the 6502's undocumented behaviour of illegal opcodes .)
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
BigEd,
I see where I've copied the wrong info regarding Arlet's core... MY apologies.
Integrated your table into the original post!
There is one piece of data missing. In case there is an update to a Core, we need to add a column titled "Last Core Updated Here" or something similar.
The data I posted was on all Cores available as of 10-26-10.
-EyE
I see where I've copied the wrong info regarding Arlet's core... MY apologies.
Integrated your table into the original post!
There is one piece of data missing. In case there is an update to a Core, we need to add a column titled "Last Core Updated Here" or something similar.
The data I posted was on all Cores available as of 10-26-10.
-EyE
-
fpgaarcade
- Posts: 9
- Joined: 11 Nov 2010
Hi,
My name is Mike and I took over the T65 core a few years ago.
I've just got CVS access to opencores, so I will push back some changes.
The latest version is at www.fpgaarcade.com in the library
you'll also find a very elegant table based 6502 core there as well - my 68000 core in development uses a similar technique.
I would agree the T65 is hard to follow, but it sort of evolved. It is highly accurate, tested against a real chip and "finished" so you shouldn't need to do much work on it
The size if bigger than it could be, but FPGAs are so big now it has not been an issue. The T80 core has an option for some LUT RAMs which make it a bit smaller.
I'm working with the visual6502 netlist and back converting this to VHDL now. I will optimise this and see how small/fast we can get it.
Best,
Mike
My name is Mike and I took over the T65 core a few years ago.
I've just got CVS access to opencores, so I will push back some changes.
The latest version is at www.fpgaarcade.com in the library
you'll also find a very elegant table based 6502 core there as well - my 68000 core in development uses a similar technique.
I would agree the T65 is hard to follow, but it sort of evolved. It is highly accurate, tested against a real chip and "finished" so you shouldn't need to do much work on it
The size if bigger than it could be, but FPGAs are so big now it has not been an issue. The T80 core has an option for some LUT RAMs which make it a bit smaller.
I'm working with the visual6502 netlist and back converting this to VHDL now. I will optimise this and see how small/fast we can get it.
Best,
Mike
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Mike,
Have you done any speed comparisons? or tried to push max speed for the T65 Core on any Xilinx FPGA's in a real world test?
I've done comparisons here as far as resources used, and pinouts, but I would've liked to do a "relative" speed comparison as well...
I do believe I used your core, from www.fpgaarcade.com, for comparison here.
What happened to Daniel Wallner? I guess I should edit the original post here!
Have you done any speed comparisons? or tried to push max speed for the T65 Core on any Xilinx FPGA's in a real world test?
I've done comparisons here as far as resources used, and pinouts, but I would've liked to do a "relative" speed comparison as well...
I do believe I used your core, from www.fpgaarcade.com, for comparison here.
What happened to Daniel Wallner? I guess I should edit the original post here!
-
fpgaarcade
- Posts: 9
- Joined: 11 Nov 2010
Hallo Mike,
I mentioned before that T65 was diffecult to understand, certainly for a newbie. So I made my own core: http://www.baltissen.org/vhdl/RB65.vhd. The original used twice the resources T65 needed in my FPGA. I now need about 85% of my original.
By studying your code I also start to understand what you mean with "evolved". I have been thinking about "evolving" my design as well but for the moment I keep it as it is as this is has one big advantage: I can easily add new opcodes or change the old ones.
This is a good reason to evolve my core only when needed.
I don't know if you are familiar with the 1541Ultimate-II http://1541ultimate.net/content/index.php. Now it can be used as 1541 drive but one of the ideas I have is to turn it into a CP/M computer like the Z80 Second Processor for the Acorn Atom, http://en.wikipedia.org/wiki/BBC_Micro_expansion_units. I could make my own code or, with your permission of course, using yours. In that case I have to find out how it can be fit in my design.
The design is no secret: Z80, RAM, ROM and a PIO-like interface. This "PIO" communicates with a 6522-like interface that can be seen by the Commodore 64.
One question: T80 is (AFAIK) cycle exact. The Z80 needs 3 or 4 cycles for every byte it processes. But I don't need this cycle-exactness. Isn't it possible to tweak T80 in such a way that it only needs one, like the 6502 does?
The C64 provides a 8 MHz colour clock, which would mean we deal with at least a 24 MHz equivalent machine. And hopefully one of the onboard crystals, 24 or 50 MHz, can be used instead, resulting in a 72 or 150 MHz machine!
Many thanks in advance!
fpgaarcade wrote:
I would agree the T65 is hard to follow, but it sort of evolved.
By studying your code I also start to understand what you mean with "evolved". I have been thinking about "evolving" my design as well but for the moment I keep it as it is as this is has one big advantage: I can easily add new opcodes or change the old ones.
Quote:
but FPGAs are so big now it has not been an issue.
Quote:
The T80 core ...
The design is no secret: Z80, RAM, ROM and a PIO-like interface. This "PIO" communicates with a 6522-like interface that can be seen by the Commodore 64.
One question: T80 is (AFAIK) cycle exact. The Z80 needs 3 or 4 cycles for every byte it processes. But I don't need this cycle-exactness. Isn't it possible to tweak T80 in such a way that it only needs one, like the 6502 does?
The C64 provides a 8 MHz colour clock, which would mean we deal with at least a 24 MHz equivalent machine. And hopefully one of the onboard crystals, 24 or 50 MHz, can be used instead, resulting in a 72 or 150 MHz machine!
Many thanks in advance!
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
-
fpgaarcade
- Posts: 9
- Joined: 11 Nov 2010
Ruud,
The T80 can certainly be speeded up but not easily in it's current state.
It's designed to reflect the (guessed) underlying hardware so all through the MCode section you will see MCycle used to tell it what to do on each cycle.
It would be better to branch the code, flatten the microcode and then pipe-line it to get a much faster design. Not a trivial task however...
Best,
Mike
The T80 can certainly be speeded up but not easily in it's current state.
It's designed to reflect the (guessed) underlying hardware so all through the MCode section you will see MCycle used to tell it what to do on each cycle.
It would be better to branch the code, flatten the microcode and then pipe-line it to get a much faster design. Not a trivial task however...
Best,
Mike
Hallo Mike,
In that case I think I'll start from scratch and use the same structure as used with RB65. But that will take some time as I'm not that familiar with the Z80 anymore.
fpgaarcade wrote:
It would be better to branch the code, ...
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org