And yet another 65C02 core was born. Yes, don't look away.
Let me first say that making a 6502 core small, efficient, cycle accurate, or any combination thereof, is laudable. I adore efficiency myself. But what about already resource rich environments, like, say, a Stratix V FPGA with nearly a million LEs and 54 Mb of RAM ? Seems a waste, to have a 2000 cell core run in its little 0.2% corner of such a chip ... So, exclusively in the context of my 'soft' Second Processors for Acorn BBCs (google it), which run on a number of hardware development boards (both cheap and expensive ...) I set out to make something 'ridiculously wasteful but fast'. If you got it, use it.
Mainly by using many copies of main memory to a) read instructions in one go and b) speculatively read all possible memory operands, the resulting core is able to execute most 65C02 instructions in 1 (immediate or direct addressing) or 2 cycles (indirect addressing). Control flow instructions take a few cycles more than that, but are still awaiting optimization. The current end result is a 65C02 core that executes instructions in roughly half the cycles of a regular core.
There's quite a bit of Fmax optimization left to do (I only just started that), but on a 5SGXEABN2F45C2 Stratix V (yes, I know it is ridiculously high end, but it allows me to get close to actual limits), it currently runs at 170 MHz.
Proof that radically reducing cycle count is a viable concept. If only just to myself
Don't hold your breath for a public release (yet). It's somewhat of a 'lab' thing, at present.