Single cycle instructions? I always thought that would mean major rework - be interesting to see it though! And we know the 65CE02 managed it. In an FPGA fabric, an incrementer local to the register might well be efficient. It would presumably simplify push and pop type operations.
I didn't say it wouldn't involve major rework
For instance, if you have two consecutive INX instructions, the first INX instruction writes the new X register value at the same time that the second INX instruction already needs it. So, to avoid using the old value, it needs an extra MUX to choose between the current value of the X and the new value that it will get.
Also, all these extra muxes may mean the maximum clock rate will drop, which could make the end result slower for typical programs. The problem is that you won't find out until you've already done most of the work.
While I had the ISE session open, I had a quick run or two with spartan 6. The LUT counts are not comparable because the LUTs are different, but the smallest spartan6 is only 25% full, compared to the smallest spartan3 being 50% full.
Speeds for 65Org16, using RAMs, unconstrained post-synth:
The -3 grade is £12 rather than £11 for the LX9 size. The LX4 is only £8 but I see it only in -2 grade at digikey. The actual speed of a system might well have other limits of course.
Cheers
Ed
(*) Edit: added in post-routed speeds, because it looks like the post-routing speed for spartan 6 is rather worse than the post-synth, unlike for spartan3, and more so for the -2 grade.
Digi-Key is a good source for Xilinx. Avnet-Express is even better. For other IC's too, like SDRAM's in low qty...
Also, you guys in Europe can get some Xilinx IC's (e.g. higher density Virtex) we can't here in the states for some reason.
This is a reminder for me to re-install ISE10.1 and add a 6502 core made by Thomas Skibo (and thankfully shared) for comparison, using the same Spartan 2 XC2S200, amongst the other cores.
Not sure if this is asking too much, but if you already have a very modern version installed, would it be better to rerun everything through a new version?
Edit: ah, I see that newer ISE do not support spartan 2. So that would mean targetting spartan 3, which means giving up on 5v.
Hi Ed, I sort of agree with you.
What I can do, is finish off this thread with Thomas' 6502 Core and re-title this thread as geared towards the 5V input tolerant Spartan 2... (most of us know the 3.3V TTL output voltage levels are 5V CMOS compatible as well), but oh well...
Then I could start a new thread, since I have all the original 6502 Cores stored away in a safe place, based maybe on the 3.3V I/O Spartan 6?
I got the email from DigiKey stating the Spartan 2 and a couple of Xilinx CPLDs won't be made anymore a few days ago. I hate when that happens...
Ok, I can take this up on my spare time when I am away from my current project, but it will take some time... Any volunteers?
Initial comparison of a core shows a difference I think that is worth the effort of a new thread, especially since the Spartan 2 is EOL, but a new device to be fitted would have to be chosen...
IMO it would be a race amongst:
1) The ~$6 Spartan 3 XC3S50 100-pin QFP
2) The ~28$ Spartan 3E XC3S500E 100-pin QFP
3) The ~18$ Spartan 6 144-pin QFP
The 3E series is exceptionally popular on starter kits, like my Nexys2 board. They seem to offer reasonable capacity. The Kestrel-2 consumes only 8% of the XC3S1200E's fabric, 68% of its block RAM resources (for a total memory compliment of 32KiB addressible from the CPU). Conceivably, if I reworked the J1A and MGIA units to draw their data from external RAM, I could pack close to eight of these things on the FPGA (remembering that logic will need to be invested to form a cache controller for each instantiated Kestrel).
Additionally, the 3E series appears to be quite reasonably priced. A 400-pin BGA version of the chip costs around $58 from Digikey, which sounds pricy for a single chip, but considering the density of logic you can throw on it, and thus the PCB savings it provides, I'd be willing to bet you'll actually save money with it.
Which certainly makes it an excellent chip, but for the purposes of showing off the various cores, would you agree that targeting one or both of the smallest/cheapest spartan3 and spartan6 would be enough?
And if only one, should it be the cheapest, or the most modern (fastest)?
Actually, I might well run this series of quick synths myself, and run both targets - should be fairly quick. The main problem is getting confused and running the wrong experiment.
There are a couple of issues with trying to benchmark the performance of these cores: firstly that writing real timing constraints takes a bit of time and care, and secondly that (in my experience) the post-synth and post-routing speeds reported for spartan6 were much more divergent than for spartan3. So the cheap and easy approach of reporting post-synth speeds from an unconstrained design might be more or less meaningless.
So, probably just tabulating the complexity - and of course tabulating the availability and the URLs - is the main point of the exercise. Which means that updating to a new ISE might not show us much. Only one way to find out!
In more concrete news, I tried to fit Arlet's core (the smallest) to CoolrunnerII, and it failed to fit the largest CPLD: 577 macrocells needed versus 512 available. Not missing by an enormous margin, but missing.
...In more concrete news, I tried to fit Arlet's core (the smallest) to CoolrunnerII, and it failed to fit the largest CPLD: 577 macrocells needed versus 512 available. Not missing by an enormous margin, but missing.
Cheers
Ed
Ed, Have you tried the XPLA3 Family of Coolrunner to fit Arlet's Core? These are 5V input tolerant (IIRC) for those still interested in 5V I/O without voltage level translators...
It's possible that coding/microarchitecting a CPU specifically for the device would improve the fitting, but would also be extra work and might well make the code less readable and more difficult to maintain.