Welcome. Most of the bus cycles are used, so there's probably not much more that could be done. The
65CE02 eliminated almost all the dead bus cycles so there were 31 op codes that only took one cycle each, and others that had a dead bus cycle here and there also got those eliminated. It also added other powerful features. Unfortunately the 65CE02 is not being made today like the 65c02 and 65816 are, and it only reached 10MHz, unlike the 65c02 which is conservatively rated for 14MHz and usually tops out at 25MHz for individual processors, and over 200MHz for those that are at the heart of custom microcontrollers for dedicated purposes in industrial, automotive, and other applications. For languages other than assembly language, languages where you're constantly dealing with 16-bit quantities, the 65816 will perform much better than the '02. My '816 Forth runs two to three times as fast as my '02 Forth at a given clock speed. For the '02 however, in the
programmable logic section of this forum you'll see that a few different members have done ones in FPGAs that do dozens of MHz, and a couple of the enthusiasts are (very slowly) working on 32-bit versions. Mike's 65m32 merges operands up to 24 bits with the instruction so it gets read all in one cycle.