I looked at the link you posted about clock stretching. I think that's the way to go, since it's faster than a truly different frequency, since it gives more phi2 high time.
My only worry is, I'm planning on using whatever mechanism I come up with to control the whole system clock, to be able to test out new frequencies (without having to switch out cans), and, of course, move to lower frequencies to talk to slower devices (eeprom and probably RTC). To do this, I was going to get a single fast can, (say, 40Mhz) and devide it down to a usable frequency for the computer, starting out at maybe 2-4Mhz and working my way up as I gained confidence the computer could handle it. I'm not sure how to apply your circuit to this scenario, because I believe that chaining them together will just give me a longer phi2 high time, with shorter and shorter low pulses, which will may not give the devices enough time to react to a low pulse.
I've still been looking around, but honestly I think I'm going to have to do this in discrete components, I've not found a suitable setup yet except maybe the one I mentioned earlier