Page 2 of 9

Posted: Fri Oct 29, 2010 6:00 am
by BigEd
Hi Ruud
Another point about how many LUTs your design might use: the 6502 has an ALU and an incrementor for the PC. All the address indexing and stack pointer adjustment is done by the ALU. In NMOS, muxing is quite cheap and tristate multi-master busses are too. In FPGA, a bus mux might be free if the mux is simple and just in front of a register, or it might take up dedicated slices if not. Also, it would be easy to use many adders: for address indexes, for stack adjustment, even for inc/dec operations - this would increase the slice count, although it might be a tactic for a faster machine. It might also be easier to write, to debug, and to modify.

Cheers
Ed

Posted: Sat Oct 30, 2010 12:37 pm
by BigEd
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related.

If you think it's worth the effort, you could perhaps check the synthesis report for these cores: it may well tell us number of add/subtracts or something else interesting.

I've got an old report here for a T65 build which says:

Code: Select all

 Summary:
     inferred   1 Counter(s).
     inferred 145 D-type flip-flop(s).
     inferred  10 Adder/Subtractor(s).
     inferred   2 Comparator(s).
     inferred  56 Multiplexer(s).
and then later we see

Code: Select all

HDL Synthesis Report

Macro Statistics
# ROMs                                                 : 2
 4x13-bit ROM                                          : 1
 4x2-bit ROM                                           : 1
# Adders/Subtractors                                   : 21
 16-bit adder                                          : 2
 16-bit subtractor                                     : 1
 5-bit subtractor                                      : 2
 6-bit adder                                           : 2
 6-bit subtractor                                      : 2
 7-bit adder                                           : 3
 7-bit subtractor                                      : 1
 8-bit adder                                           : 3
 8-bit addsub                                          : 1
 8-bit subtractor                                      : 1
 9-bit adder                                           : 3
# Counters                                             : 2
 26-bit up counter                                     : 1
 3-bit up counter                                      : 1
# Registers                                            : 93
 1-bit register                                        : 82
 2-bit register                                        : 2
 3-bit register                                        : 1
 4-bit register                                        : 1
 8-bit register                                        : 7
# Comparators                                          : 4
 3-bit comparator equal                                : 1
 3-bit comparator not equal                            : 1
 5-bit comparator greater                              : 2
# Multiplexers                                         : 40
 1-bit 4-to-1 multiplexer                              : 24
 1-bit 8-to-1 multiplexer                              : 1
 2-bit 32-to-1 multiplexer                             : 1
 2-bit 8-to-1 multiplexer                              : 5
 24-bit 4-to-1 multiplexer                             : 1
 3-bit 4-to-1 multiplexer                              : 6
 8-bit 6-to-1 multiplexer                              : 1
 8-bit 8-to-1 multiplexer                              : 1
Not sure what's the best way of digesting that - or if it's worthwhile - but if say Ruud was interested in comparing his implementation with another, he might look at those tables.

(The synth report is probably found in a file *.syr)

Cheers
Ed

Posted: Sat Oct 30, 2010 7:57 pm
by Ruud
Hallo Ed,
BigEd wrote:
(The synth report is probably found in a file *.syr)
No *.syr found. I'm using Xilinx ISE Webpack. What are you using?

Posted: Sat Oct 30, 2010 8:15 pm
by BigEd
Same software. Is it hidden in a subdir? If you're using the GUI, maybe there's a way to request or suppress a synthesis report.

In my case, I ran using the command line, with something like

Code: Select all

xst -ifn T65.xst -intstyle xflow -ofn ./T65.syr
Maybe the report doesn't normally have that suffix, maybe it's encoded in my scripts (which I've inherited). Sorry. It should be there somewhere. If you have a *.xst file perhaps it specifies the report with '-ofn'?

Ed

Posted: Sun Oct 31, 2010 1:13 am
by ElEctric_EyE
BigEd wrote:
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
Thank you for double checking. I see what you're talking about. I re-ran Arlet's 2 files and came up with same result. He does mention BCD mode does not work on his site. Maybe this accounts for the discrepancy?
BigEd wrote:
... I've got an old report here for a T65 build which says:

Code: Select all

 Summary:
     inferred   1 Counter(s).
     inferred 145 D-type flip-flop(s).
     inferred  10 Adder/Subtractor(s).
     inferred   2 Comparator(s).
     inferred  56 Multiplexer(s).
and then later we see

Code: Select all

HDL Synthesis Report

Macro Statistics
# ROMs                                                 : 2
 4x13-bit ROM                                          : 1
 4x2-bit ROM                                           : 1
# Adders/Subtractors                                   : 21
 16-bit adder                                          : 2
 16-bit subtractor                                     : 1
 5-bit subtractor                                      : 2
 6-bit adder                                           : 2
 6-bit subtractor                                      : 2
 7-bit adder                                           : 3
 7-bit subtractor                                      : 1
 8-bit adder                                           : 3
 8-bit addsub                                          : 1
 8-bit subtractor                                      : 1
 9-bit adder                                           : 3
# Counters                                             : 2
 26-bit up counter                                     : 1
 3-bit up counter                                      : 1
# Registers                                            : 93
 1-bit register                                        : 82
 2-bit register                                        : 2
 3-bit register                                        : 1
 4-bit register                                        : 1
 8-bit register                                        : 7
# Comparators                                          : 4
 3-bit comparator equal                                : 1
 3-bit comparator not equal                            : 1
 5-bit comparator greater                              : 2
# Multiplexers                                         : 40
 1-bit 4-to-1 multiplexer                              : 24
 1-bit 8-to-1 multiplexer                              : 1
 2-bit 32-to-1 multiplexer                             : 1
 2-bit 8-to-1 multiplexer                              : 5
 24-bit 4-to-1 multiplexer                             : 1
 3-bit 4-to-1 multiplexer                              : 6
 8-bit 6-to-1 multiplexer                              : 1
 8-bit 8-to-1 multiplexer                              : 1
What version(s) of ISE?

I too, am looking for more comparative information that can be extracted from the Reports. Especially, some info relating to max speed. Won't be able to devote time until next week...

Posted: Sun Oct 31, 2010 7:33 am
by Ruud
Ruud wrote:
No *.syr found.
Stupid me. I started rewriting RB65, haven't used ISE for almost two weeks and cleaned the directory. For details about the rewriting see André's 65k thread and later on my own RB65 one.

Posted: Sun Oct 31, 2010 11:41 am
by BigEd
ElEctric_EyE wrote:
BigEd wrote:
Hi EE
have you got Arlet's slice count right? The LUT count and slice count are generally somewhat related....
Thank you for double checking. I see what you're talking about. I re-ran Arlet's 2 files and came up with same result. He does mention BCD mode does not work on his site. Maybe this accounts for the discrepancy?
Hi EE - your summary.jpg for Arlet shows 276 occupied slices, but your table says 465. Is your summary.jpg out of date or have you copied the wrong number?

On the topic of looking at more detail of the complexity of a design, I see in your jpg files there's a table of blue links to reports at the bottom of the page, the first of which is 'Synthesis Report' - hopefully that's the same as the *.syr file I use.

Cheers
Ed

Posted: Sun Oct 31, 2010 12:05 pm
by BigEd
I've updated my tabulation (running it in parallel with EE's, hope that's not a problem) and I see that Arlet's is the only design which uses LUTs as 16-bit RAMs - this could be a reason why the slice count is by far the smallest.

The Xilinx documentation contains descriptions of some particular coding styles which will cause the tool to apply particular implementations.

Another point or two on density:
  • How you encode control information will affect the complexity of the decode. One-hot, two-hot, binary, Gray code, etc.

    Whether you decode all bits of an encoding, or just the bits which distinguish interesting cases, will make a difference. (You'll get an effect like the 6502's undocumented behaviour of illegal opcodes .)
Cheers
Ed

Posted: Sun Oct 31, 2010 11:33 pm
by ElEctric_EyE
BigEd,

I see where I've copied the wrong info regarding Arlet's core... MY apologies.

Integrated your table into the original post! :D

There is one piece of data missing. In case there is an update to a Core, we need to add a column titled "Last Core Updated Here" or something similar.

The data I posted was on all Cores available as of 10-26-10.


-EyE

Posted: Fri Nov 12, 2010 8:41 pm
by fpgaarcade
Hi,

My name is Mike and I took over the T65 core a few years ago.
I've just got CVS access to opencores, so I will push back some changes.

The latest version is at www.fpgaarcade.com in the library

you'll also find a very elegant table based 6502 core there as well - my 68000 core in development uses a similar technique.

I would agree the T65 is hard to follow, but it sort of evolved. It is highly accurate, tested against a real chip and "finished" so you shouldn't need to do much work on it :)

The size if bigger than it could be, but FPGAs are so big now it has not been an issue. The T80 core has an option for some LUT RAMs which make it a bit smaller.

I'm working with the visual6502 netlist and back converting this to VHDL now. I will optimise this and see how small/fast we can get it.

Best,
Mike

Posted: Fri Nov 12, 2010 10:51 pm
by ElEctric_EyE
Mike,

Have you done any speed comparisons? or tried to push max speed for the T65 Core on any Xilinx FPGA's in a real world test?
I've done comparisons here as far as resources used, and pinouts, but I would've liked to do a "relative" speed comparison as well...

I do believe I used your core, from www.fpgaarcade.com, for comparison here.

What happened to Daniel Wallner? I guess I should edit the original post here!

Posted: Fri Nov 12, 2010 11:33 pm
by fpgaarcade
Hi,
No, I haven't really pushed it.
I did achieve 40MHz with no problem in a Virtex4 - I am sure it would go quite a bit faster on modern FPGAs.

I don't know what happened to Daniel, we were in regular contact, then a few years ago I never heard from him again.

Best
Mike

Posted: Sat Nov 13, 2010 3:46 pm
by Ruud
Hallo Mike,
fpgaarcade wrote:
I would agree the T65 is hard to follow, but it sort of evolved.
I mentioned before that T65 was diffecult to understand, certainly for a newbie. So I made my own core: http://www.baltissen.org/vhdl/RB65.vhd. The original used twice the resources T65 needed in my FPGA. I now need about 85% of my original.
By studying your code I also start to understand what you mean with "evolved". I have been thinking about "evolving" my design as well but for the moment I keep it as it is as this is has one big advantage: I can easily add new opcodes or change the old ones.
Quote:
but FPGAs are so big now it has not been an issue.
This is a good reason to evolve my core only when needed.
Quote:
The T80 core ...
I don't know if you are familiar with the 1541Ultimate-II http://1541ultimate.net/content/index.php. Now it can be used as 1541 drive but one of the ideas I have is to turn it into a CP/M computer like the Z80 Second Processor for the Acorn Atom, http://en.wikipedia.org/wiki/BBC_Micro_expansion_units. I could make my own code or, with your permission of course, using yours. In that case I have to find out how it can be fit in my design.
The design is no secret: Z80, RAM, ROM and a PIO-like interface. This "PIO" communicates with a 6522-like interface that can be seen by the Commodore 64.

One question: T80 is (AFAIK) cycle exact. The Z80 needs 3 or 4 cycles for every byte it processes. But I don't need this cycle-exactness. Isn't it possible to tweak T80 in such a way that it only needs one, like the 6502 does?
The C64 provides a 8 MHz colour clock, which would mean we deal with at least a 24 MHz equivalent machine. And hopefully one of the onboard crystals, 24 or 50 MHz, can be used instead, resulting in a 72 or 150 MHz machine!

Many thanks in advance!

Posted: Sun Nov 14, 2010 7:13 pm
by fpgaarcade
Ruud,

The T80 can certainly be speeded up but not easily in it's current state.
It's designed to reflect the (guessed) underlying hardware so all through the MCode section you will see MCycle used to tell it what to do on each cycle.

It would be better to branch the code, flatten the microcode and then pipe-line it to get a much faster design. Not a trivial task however...

Best,
Mike

Posted: Sun Nov 14, 2010 9:21 pm
by Ruud
Hallo Mike,
fpgaarcade wrote:
It would be better to branch the code, ...
In that case I think I'll start from scratch and use the same structure as used with RB65. But that will take some time as I'm not that familiar with the Z80 anymore.