Of course, I stole your basic J-K flop wait-state design.
I've thought about what you have suggested. Scabbing and bodging V1.3 is greatly complicated for me by the use of SOIC packages. I really can't see well enough to work with that stuff—I had to get someone to solder the SOIC stuff for me. So I'm unlikely to do any hardware hacking on the unit. When I designed V1.3, I concluded that 16 MHz would likely be the ceiling if I were lucky. I'm happy with it, as it is stable and plenty fast.
Competing for time is my locomotive, which is in the process of receiving its final electrical system, and is also awaiting final body prep and livery. Progress in that area has been slow for a variety of reasons, not the least of which are my ongoing cardiac calamities.
I've got something else cooking 65816-wise on which I rather devote my hobby computing time. This new design builds on the knowledge gleaned from V1.2 and V1.3. In the process, the chip count will be greatly reduced, resulting in more timing headroom and, I hope, a return to the 20 MHz that was achieved in V1.2.
Incidentally, to anyone who is reading this but plans to build a 65C02 unit, the clock-stretching circuit that Jeff devised with the 163 synchronous counter would be ideal for wait-stating that slow ROM or I/O device.