Yet another (unnamed) 65C02 core

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

BigEd wrote:
Any idea if careful HDL (re)coding could improve fmax?
That has been the main focus of recent optimizations (cycle counts lowered only a bit, here and there).

In particular, at least for Stratix V (compiles for other FPGAs may very well react differently), separating write enables and data for A/X/Y/P writing instructions reduced multiplexer complexity quite a bit, and yielded roughly 30 MHz. The current critical path reported by the timing analyzer seems like the end of the line (i.e. cannot be optimized further).
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Yet another (unnamed) 65C02 core

Post by BigEd »

There's always a brick wall eventually! What does the critical path look like - is it more or less expected?
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

BigEd wrote:
There's always a brick wall eventually! What does the critical path look like - is it more or less expected?
Kind of, yes. It always requires quite a bit of lateral thinking to interpret these paths, but it looks like it's the +X/+Y bypass (which may also conspire with the additional +1 needed to address the top byte of a 2-byte ABS,X or ZPG,X read). And this is all on top of an incoming instruction read. Removing the bypass is almost certainly the only possible relief there, but all X and Y changing instructions would incur an extra cycle, and it is very doubtful that the combination would pay off. Another way would be to drop from 2 to 1 byte for ABS,X and ZPG,X (of which the second byte isn't used very often), but I don't think that that would pay off either.
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

200 MHz now. But that must truly be the (far) end of the line. 420 MHz benchmark. Take that emulator guys (just kidding, 370 is damn respectable, and the underlying hardware is far more practical).
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Yet another (unnamed) 65C02 core

Post by BigEd »

So, the IPC (instructions per clock) is about double the 6502's? That's a good stake in the ground for what can be done.

200MHz is a nice round number too.
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

BigEd wrote:
So, the IPC (instructions per clock) is about double the 6502's? That's a good stake in the ground for what can be done.
It probably is. At least with this specific / impractical (for new designs) / FPGA bound architecture. But a good pipelined architecture, whose only problems would be inter-instruction dependencies and pipeline flushes, could probably get close to 1 cycle per instruction. Or beyond, if you go really crazy with the memory bandwidth and execution units. A fun puzzle for sure.
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Yet another (unnamed) 65C02 core

Post by BillG »

Windfall wrote:
200 MHz now. But that must truly be the (far) end of the line. 420 MHz benchmark. Take that emulator guys (just kidding, 370 is damn respectable, and the underlying hardware is far more practical).
As a reference point, my 6502 debugging simulator cranks out about 230 MHz while running on a netbook containing a 1 GHz AMD C60 processor. Today's fastest machines should do about four times better for around 1 GHz.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Yet another (unnamed) 65C02 core

Post by BigEd »

(I suspect John is referring to PiTubeDirect, which runs on the super-cheap Raspberry Pi, also an approx 1GHz machine, but with the 6502 emulation written in super-tight ARM code, mostly by dp11 here. On a Pi Zero the latest code runs at 290MHz, on a less-cheap 1.5GHz Pi 4 it runs at 370MHz, approx. It's not a clock-for-clock accurate kind of emulator, it's an as-fast-as-you-can kind.)
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

BigEd wrote:
(I suspect John is referring to PiTubeDirect, which runs on the super-cheap Raspberry Pi, also an approx 1GHz machine, but with the 6502 emulation written in super-tight ARM code, mostly by dp11 here.)
Sort of, but the 370 is for a 1.5 GHz RPi 4 (as I understand it, I have a 'PiTubeDirect' but I haven't tried it yet on my RPi 4). It is really quite amazing how well ARM can emulate 6502. Not exactly a coincidence, of course (considering that the ARM instruction set was inspired by the 6502).
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

It is also quite interesting to see, right here, that emulation on dedicated hardware can compete on speed with implementations on flexible hardware (FPGAs). The price you pay for having all the flexibility, in terms of coins and performance, is pretty high. It will be interesting to see if this changes, especially since clock speed on contemporary processors has been flattening out.
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

Another interesting tidbit is that a reduced RAM version of the core, using 2.0 instead of 3.5 Mb (both resulting in 64 KB of usable memory), is slower by only 5% (benchmarked). This involves moving absolute operand reads (+0, +X, +Y) from the first to the second execution cycle, and then multiplexing all absolute and indirect addresses into one (so only one true dual ported RAM block is used for all six addressing modes). Of course, the speed reduction depends ultimately on the frequency with which absolute addressing is used in code, but on average, an extra cycle for all instructions using absolute addressing seems relatively inconsequential for speed.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Yet another (unnamed) 65C02 core

Post by BigEd »

Is that almost saying that you've managed to halve the size and lose only 5% performance? Not a bad tradeoff, I'd say!
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

This core is now part of my 'soft' Acorn 6502 Second Processor for hardware development boards. If you have an Acorn BBC, and one of the supported development boards, you may want to have a look here :

http://www.zeridajh.org/hardware/soft65 ... /index.htm
User avatar
Windfall
Posts: 229
Joined: 27 Nov 2011
Location: Amsterdam, Netherlands
Contact:

Re: Yet another (unnamed) 65C02 core

Post by Windfall »

I've now put the source for this 65C02 core on my website. See https://www.zeridajh.org/articles/me_various_sources/ under 'Verilog HDL'.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Yet another (unnamed) 65C02 core

Post by BigEd »

Thanks for sharing your sources!
Post Reply