Welcome!
Quote:
1) Remove the dummy fetches in instructions that don't need it (such as ROL, ROR memory instructions, etc..)
The Commodore 65C
E02 from 20+ years ago eliminated almost all the dead bus cycles, even having over 30 op codes that took only one clock instead of the normal minimum of two; so without re-writing code to take advantage of its new instructions, it still gave a speed-up of about 25%. It was only 10MHz though, whereas the current production ones are conservatively spec'ed for at least 14MHz and usually top out at 25MHz if the supporting parts can keep up.
The next step of course is to re-write the code to take advantage of the new instructions, or go with the 65816. I have a post on the huge difference you can get if you're constantly using 16-bit numbers (as in a higher-level language) at
viewtopic.php?f=9&t=1505&p=9705#p9705. There's an example shown there where the '816 does in
two instructions what the 6502 takes
ten to do.
There was the
65GZ032 project with its own Yahoo forum which was for a modern 32-bit processor that could still run old 6502 code. It had a ton of registers, deep pipelining, branch prediction, onboard cache, etc., and ended up with something that has little resemblance to the 6502, but, after a lot of progress and even some working hardware, still fizzled out before it was done. I kind of lost interest when they went in directions that abandoned the 6502 flavor (outside of the 6502 emulation mode).
We were discussing an all-32-bit 65-family processor (
the 65Org32), but as Arlet pointed out the problem is a shortage of time and motivation to do the hard work. ElEctric_EyE here is working toward the 65Org32 in steps, first to do a 16-bit NMOS 6502 equivalent. He's working on a video-chip project at the moment though.
A standard part of the program-structure words in Forth is DO...LOOP, with 16-bit (if you implement it on 6502) loop counter, index, and limit which are normally kept on the hardware stack in page 1. I did an
equivalent 32-bit set of words (DO, ?DO, LOOP, +LOOP, I, BOUNDS, LEAVE, ?LEAVE, UNLOOP) for 6502, and the number of instructions it took was incredible. DO, which sets up the loop, took about 30 instructions (not cycles, but instructions), and LOOP which does the incrementing of the loop counter and compares it to the limit to see if it's time to exit the loop, took about 44 instructions (again, not cycles). With a 65Org32, it would be trivial, like doing a loop on 6502 with an 8-bit counter-- not even a half-dozen instructions total (plus whatever you actually
do in the loop).
Additionally of course there would be multiply and divide instructions that would replace the long routines the 6502 requires, shown at
viewtopic.php?f=9&t=689 and
http://6502.org/source/integers/ummodfix/ummodfix.htm.
There are other things that can be done to get better performance with even old technology though, like the
16-bit look-up tables for accurately getting math functions,
hundreds of times as fast as actually having to calculate them. These tables take a lot of memory, but the cost and size of memory has come way, way down to where it's somewhat practical now.
Although Arlet is not wrong about throwing the whole thing out and going with a newer processor, my point is that huge, dramatic improvements in performance could still be gained with a true 65-family processor, and some of those can be had even with existing, off-the-shelf current production 65c02's and 65816's.