This gives more reason to use the EconOscillator chip I mentioned earlier, since the easiest way to handle this is with 2 more flops, as you show in your design (I don't have a flop on the output of the current clock generation, because it only goes up to 33Mhz, and halving that would not let me try and get to 20Mhz). The part mentioned earlier can go much much faster, and so a halving flop (that can also be controlled by the wsg) is not an issue with it.
Ultimately, the afore mentioned clock seems to have 2 clock outputs that can be switched instantaneously, without glitches. Because of this, it may be more profitable to forget the wait states, and go with two separate clocks, a fast and slow clock, programmed into the chip, that can be selected by address decoding.
EDIT: so, reading further into your thread has lead me to a few thoughts.
A: I may not have the 816 part down fine, because I'm qualifying the latch with !RDY and PHI2 as described earlier in the thread
B: using the 2 separate clocks seems the only way to fix this, but I'll still need the "global clock" output for timers on the 65c22s and such.
Using RDY on the 816 appears to be, after reading your thread, nearly impossible to do and not have weird edge cases. The nice thing about switching to that other chip is it saves me a few logic chips, and some board space, and some routing. I'm going to read over its data sheet again, and see about implementing.
Design -> check -> all good -> actually no -> rinse and repeat