6502-Core Comparisons: Fitting a Xilinx Spartan 2 XC2S200

Arlet · Post by **Arlet** » Fri Nov 04, 2011 6:19 pm

BigEd wrote:

Single cycle instructions? I always thought that would mean major rework - be interesting to see it though! And we know the 65CE02 managed it. In an FPGA fabric, an incrementer local to the register might well be efficient. It would presumably simplify push and pop type operations.

I didn't say it wouldn't involve major rework

For instance, if you have two consecutive INX instructions, the first INX instruction writes the new X register value at the same time that the second INX instruction already needs it. So, to avoid using the old value, it needs an extra MUX to choose between the current value of the X and the new value that it will get.

Also, all these extra muxes may mean the maximum clock rate will drop, which could make the end result slower for typical programs. The problem is that you won't find out until you've already done most of the work.

BigEd · Post by **BigEd** » Sat Nov 05, 2011 5:22 pm

While I had the ISE session open, I had a quick run or two with spartan 6. The LUT counts are not comparable because the LUTs are different, but the smallest spartan6 is only 25% full, compared to the smallest spartan3 being 50% full.

Speeds for 65Org16, using RAMs, unconstrained post-synth:

Code: Select all

                 synthed           routed
spartan3     78MHz  12.794ns   87MHz  11.476ns
spartan6-2   86MHz  11.585ns   68MHz  14.753ns
spartan6-3  117MHz   8.516ns  102MHz   9.762ns

The -3 grade is £12 rather than £11 for the LX9 size. The LX4 is only £8 but I see it only in -2 grade at digikey. The actual speed of a system might well have other limits of course.

Cheers
Ed

(*) Edit: added in post-routed speeds, because it looks like the post-routing speed for spartan 6 is rather worse than the post-synth, unlike for spartan3, and more so for the -2 grade.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Nov 05, 2011 7:01 pm

Digi-Key is a good source for Xilinx. Avnet-Express is even better. For other IC's too, like SDRAM's in low qty...
Also, you guys in Europe can get some Xilinx IC's (e.g. higher density Virtex) we can't here in the states for some reason.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Nov 22, 2011 10:05 pm

This is a reminder for me to re-install ISE10.1 and add a 6502 core made by Thomas Skibo (and thankfully shared) for comparison, using the same Spartan 2 XC2S200, amongst the other cores.

BigEd · Post by **BigEd** » Tue Nov 22, 2011 10:19 pm

Not sure if this is asking too much, but if you already have a very modern version installed, would it be better to rerun everything through a new version?

Edit: ah, I see that newer ISE do not support spartan 2. So that would mean targetting spartan 3, which means giving up on 5v.

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Nov 22, 2011 10:34 pm

Hi Ed, I sort of agree with you.
What I can do, is finish off this thread with Thomas' 6502 Core and re-title this thread as geared towards the 5V input tolerant Spartan 2... (most of us know the 3.3V TTL output voltage levels are 5V CMOS compatible as well), but oh well...
Then I could start a new thread, since I have all the original 6502 Cores stored away in a safe place, based maybe on the 3.3V I/O Spartan 6?

BigEd · Post by **BigEd** » Wed Nov 23, 2011 6:16 am

Hi EE
yes, that would work. You could add a pointer to the new thread, of course.

If it turns out that newer versions give much the same results then it's not so important.

The other thing about moving forward is that spartan 2 isn't produced any more.

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Nov 23, 2011 4:11 pm

I got the email from DigiKey stating the Spartan 2 and a couple of Xilinx CPLDs won't be made anymore a few days ago. I hate when that happens...

Ok, I can take this up on my spare time when I am away from my current project, but it will take some time... Any volunteers?
Initial comparison of a core shows a difference I think that is worth the effort of a new thread, especially since the Spartan 2 is EOL, but a new device to be fitted would have to be chosen...

IMO it would be a race amongst:
1) The ~$6 Spartan 3 XC3S50 100-pin QFP
2) The ~28$ Spartan 3E XC3S500E 100-pin QFP
3) The ~18$ Spartan 6 144-pin QFP

BigEd · Post by **BigEd** » Wed Nov 23, 2011 6:30 pm

I think the cheaper options have merit. Is there anything to recommend the 3E?

kc5tja · Post by **kc5tja** » Wed Nov 23, 2011 8:47 pm

The 3E series is exceptionally popular on starter kits, like my Nexys2 board. They seem to offer reasonable capacity. The Kestrel-2 consumes only 8% of the XC3S1200E's fabric, 68% of its block RAM resources (for a total memory compliment of 32KiB addressible from the CPU). Conceivably, if I reworked the J1A and MGIA units to draw their data from external RAM, I could pack close to eight of these things on the FPGA (remembering that logic will need to be invested to form a cache controller for each instantiated Kestrel).

Additionally, the 3E series appears to be quite reasonably priced. A 400-pin BGA version of the chip costs around $58 from Digikey, which sounds pricy for a single chip, but considering the density of logic you can throw on it, and thus the PCB savings it provides, I'd be willing to bet you'll actually save money with it.

BigEd · Post by **BigEd** » Wed Nov 23, 2011 8:56 pm

Which certainly makes it an excellent chip, but for the purposes of showing off the various cores, would you agree that targeting one or both of the smallest/cheapest spartan3 and spartan6 would be enough?

And if only one, should it be the cheapest, or the most modern (fastest)?

Actually, I might well run this series of quick synths myself, and run both targets - should be fairly quick. The main problem is getting confused and running the wrong experiment.

Cheers
Ed

kc5tja · Post by **kc5tja** » Wed Nov 23, 2011 8:59 pm

If your goal is to push the MHz boundary, I'd go with the latest, greatest chip architecture that is offered. Virtex 5 or Spartan 6, etc.

If your goal is to measure average performances on what is essentially commodity hardware, I'd use the Spartan 3 line.

I suppose the law of benchmarks applies here -- they mean nothing without some kind of additional context.

BigEd · Post by **BigEd** » Wed Nov 23, 2011 9:06 pm

Indeed!

There are a couple of issues with trying to benchmark the performance of these cores: firstly that writing real timing constraints takes a bit of time and care, and secondly that (in my experience) the post-synth and post-routing speeds reported for spartan6 were much more divergent than for spartan3. So the cheap and easy approach of reporting post-synth speeds from an unconstrained design might be more or less meaningless.

So, probably just tabulating the complexity - and of course tabulating the availability and the URLs - is the main point of the exercise. Which means that updating to a new ISE might not show us much. Only one way to find out!

In more concrete news, I tried to fit Arlet's core (the smallest) to CoolrunnerII, and it failed to fit the largest CPLD: 577 macrocells needed versus 512 available. Not missing by an enormous margin, but missing.

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Nov 23, 2011 11:47 pm

BigEd wrote:

...In more concrete news, I tried to fit Arlet's core (the smallest) to CoolrunnerII, and it failed to fit the largest CPLD: 577 macrocells needed versus 512 available. Not missing by an enormous margin, but missing.

Cheers
Ed

Ed, Have you tried the XPLA3 Family of Coolrunner to fit Arlet's Core? These are 5V input tolerant (IIRC) for those still interested in 5V I/O without voltage level translators...

BigEd · Post by **BigEd** » Thu Nov 24, 2011 5:09 am

I hadn't tried it, but I have now! Again, it has 512 macrocells, although of a different architecture, and again, not quite enough:

Quote:

Design requires at least 609 macrocells, exceeds device limit 512.
Design contains 1748 unique product terms, exceeds device limit 1536.

CoolrunnerII family (pdf), XPLA3 family (pdf)

It's possible that coding/microarchitecting a CPU specifically for the device would improve the fitting, but would also be extra work and might well make the code less readable and more difficult to maintain.

Cheers
Ed