I'm surprised that a DIP could operate at all at that speed. That's good to hear. I take it that since the RAM seems to be the only thing the processor is connected to, there's no address-decoding logic to cause propagation delays. Still, the 15ns memory might be a bottleneck because of the processor's required read data setup time.
I've seen Peltier devices down to less than 1/4" square. It's not necessary to have a fan, but the hot side of it will do better with a heat sink on it. Another possibility would be to use a Peltier device large enough to cover the entire board, and use a compressible, thermally-conductive material between the board and the Peltier. That way the same device cools all the ICs, even if they're not all the same height off the board. Bergquist's "Gap Pad" seen at
www.bergquistcompany.com is such a material. We use it in a couple of our products to transfer heat from a linear regulator to the product's actual aluminum case for heat sinking. You can get info on Peltier device manufacturers at
http://www.peltier-info.com/manufacturers.html
I suppose there are several ways the duty cycle could be varied. One would be to use digital delay line ICs along with other logic, giving different delays to rising and falling edges of the clock source. Another might work something like PWM, starting with a triangle wave and varying the threshold to a comparator. Although the concept for the latter is simple, making it work well at such speeds might be a bit of a challenge. Having an accurate triangle wave shape would not be inperative though.
A programmable delay line I kept an ad on is the Maxim DS1023S, which gives 256 steps, with step size as small as 1/4 ns.
With both the duty cycle and the clock advance or delay to the various parts of the circuit, experimentation will of course be necessary to find the optimum settings, since this is beyond the scope of manufacturer-supplied info.