First a note in regard to troubleshooting. Even a simple circuit adds complexity. For the sake of a smoother project launch, novices in particular are well advised to omit the RDY/stretching and simply use a slower CPU clock. Alternatively, it doesn't take much to plan ahead for the troubleshooting scenario. That just means designing the project so it includes some simple means -- jumpers, for example -- by which a slower CPU clock can be substituted and also the RDY/stretching circuit can be disabled.
As for the wiring that's involved, at the heart of our RDY or Clock-Stretch circuit there'll be a counter or pulse generator of some kind — one that implements a delay. Its output will either serve as the CPU clock, or it will act to pull down the RDY input of the CPU while the latter is supplied with a full-time, uninterrupted clock from elsewhere (a crystal oscillator, for example). Either method is sufficient to cause the CPU to momentarily delay its progress... and we don't need a lot of resources to get the job done. Measured in terms of jelly-bean logic, a highly workable circuit can be had for one IC or less!
- Immediately below is my minimal RDY scheme -- just a single JK flipflop, providing a single wait state. (A two-wait-state variation is discussed here.)
- A little further below you'll find my minimal Clock Stretching circuit -- just a single '163 counter IC.
- Later in this thread I present some Clock Stretching circuits specifically intended for overclockers -- folks who like to push the CPU beyond its rated maximum frequency.
Broadly speaking, even a full speed device needs three things, timing-wise. There needs to be at least a certain minimum period during which the device is fully enabled for the read or write... and by "fully enabled" I mean the condition in which all the necessary chip select(s) are active and also any necessary control inputs such as /WE are simultaneously active. We'll call that the enabled period. And, we need the address presented to the device to be valid and stable for at least a certain minimum period before the enabled period (this is the address setup time) and after the enabled period (this is the address hold time). The address hold time is often quite easy to satisfy, but right now it isn't my goal to closely examine details. (For details you can refer to the device data sheet. And you may even wish to examine the specs pertaining to when the data bus is valid.) Right now I just want to underscore the point that there's more to consider than simply causing the CPU to momentarily delay its progress.
65xx designs usually use the PHI2 to define the "fully enabled time" that paces the actual data transfers. A RAM for example may be supplied with /WE and /OE signals that are qualified by PHI2. And if the design includes peripherals from the 65xx family, PHI2 will be supplied directly to them. But these arrangements aren't well suited for a slow peripheral trying to keep up with a fast CPU whose progress has been momentarily delayed.
In all three cases illustrated below, the CPU cycle has been extended to double its usual duration using my simple JK circuit, and three different glue logic schemes are used to pace the data transfer to/from the slow device. Bearing in mind the three timing requirements mentioned earlier, notice (highlighted in pink) the fully enabled period and, surrounding it (in green) the address setup time and the address hold time.
- is the preferred scheme, especially if the slow device is writeable (eg, not a ROM). Compared with an unstretched cycle, all three of the timing requirements mentioned earlier have been extended. And BTW, in regard to implementation, you may find it advantageous to replace the inverter and the two NANDs with a 74_1G19, or with half of a 74_139.

- is included only for reference. It shows what would happen if PHI2 were used to pace the transfer to/from the slow device. Although perhaps not guaranteed to fail, it's a scheme that raises some perplexing questions.
- may be (marginably) tolerable for a ROM or other read-only device. Clearly, it uses the fewest resources. But it's unacceptable for writable devices because its lack of address setup time is likely to result in "rogue writes" to unintended addresses within the device. And, even with a read-only device, there's some risk that scheme (c) may inject noise into the power supply. This is due to bus contention with other devices if the slow device fails to go tri-state quickly enough after being read.
- Many schemes for Clock Stretching degrade the accuracy of timers in the system, such as the timers in a 6522 VIA. However, my stretcher circuits offer an optional auxiliary output that persists with the usual, non-stretched waveform, allowing this problem to be avoided. (With RDY there's no stretching and thus no timekeeping problem.)
- The RDY pin grants one or more entire CPU cycles, whereas Clock Stretching can allow finer control. Its increments can be equivalent to fractions of a CPU cycle, and sometimes that will allow you to come closer to delivering only as much delay as is required. Clock stretching also sidesteps certain CPU-specific issues, as follows.
- (for NMOS): 6502 and 6510 etc only "listen" to RDY during read cycles. If you wanna coax an NMOS 65xx to grant extra access time during a write, clock stretching is the only option.
- (for '816): Assuming you want more than 64K then you'll need a Bank Address latch, and that latch will require an enable signal. If you're using RDY then the logic to generate that signal becomes slightly more complex, because it's no longer acceptable to simply drive the latch enable with an inverted version of Phi2. I put the details in a footnote, below.
- (WDC '816 and 'C02): The RDY pin on WDC CPUs is bidirectional, and thus there's a risk of overcurrent resulting from contention. This occurs if external logic is trying to drive RDY high while at the same time the CPU is driving RDY low because a WAI instruction has been executed, either deliberately or as the result of a crash. Prospective mitigations are discussed in the text accompanying the circuit linked to above. But contention isn't a threat with clock stretching, because no external gate needs to drive RDY. (The pin will require a pullup resistor, though. Only with non-WDC processors can you safely tie RDY directly to VCC.)
Shown below is a clock-stretch circuit built around the wondrous and ever entertaining 74xx163 4-bit synchronous counter!
Note: with this circuit, as with the JK circuits, it's imperative that /SELECT becomes valid somewhat before the rise of Phi2.
PHI2_S is the stretchable clock, and it's what generally serves as Phi2 for your system. PHI2_VIA would only be used to drive the Phi2 pin on your 6522 VIA's.
The red asterisks pertain to debugging strategy. To bypass the circuit, sever the connections where indicated; then temporarily connect PHI2_S and VIA2_VIA together and drive them from an oscillator which is slow enough that clock stretching isn't required. If you're doing a PCB you may want to provide jumpers, or exposed traces you can easily cut, as a means to sever the connections. Or, if the '163 is in a socket, you can simply remove it!
2XCLK runs at twice the nominal CPU frequency, and its duty cycle isn't especially critical. Every rising edge of 2XCLK causes the '163 to either count or load, with the loads occurring at the end of every state 0.
If /SELECT is inactive (ie; high), the value loaded will be $F, and if /SELECT is low the value loaded will be determined by how you've connected the counter's P3-P0 inputs:
- Loading $D, $B or $9 will result in extra time equivalent to 1, 2 or 3 normal CPU cycles, respectively. PHI2_VIA will continue, unaffected.
- Loading $E, $C, $A or $8 will result in extra time equivalent to .5, 1.5, 2.5, or 3.5 normal CPU cycles, respectively.
These fractional-cycle delays *do* minimally impact VIA timing. In all cases the VIA is slowed by 0.5 cycle.
It's possible to generate an enable signal that's padded exactly as you wish, but to do that that you'll probably need an extra gate or two. Otherwise, you can examine the existing '163 waveforms to see if one of them is close enough.
In the example below, Q3 still cues the counter reload as before, but the stretchable clock PHI2_S is now instead taken from Q2 of the '163. And, the reload value ($B) is such that stretching causes Q2 to reload with a 0 not a 1, thus granting one extra 2XCLK period of Clock-low time. But there's still no extra address hold time (as compared with an unstretched cycle).
Different counter connections offer different waveform possibilities. If the reload were $A or $9 then the extra clock-low time increases even further. Other variations involve taking PHI2_S from Q1 of the '163 and choosing a reload value of $C or $D... Or, taking PHI2_S from Q0 and choosing a reload value of $E. But in some ways a shift register is more tractable for concocting different different enable waveform possibilities, and it's a shift register (not a counter) that's at the heart of my circuits downthread.
-- Jeff
footnote: In 2015, forum member cr1901 resolved a distressing ambiguity in the '816 data sheet by testing an actual '816 and reporting, "The 65816 does NOT drive the bank address while RDY is low." In other words, the alternating pattern of Bank-Address, Data that normally occupies the 816's data bus during the Phi2-low, Phi2-high periods of every cycle is temporarily suspended. Because no Bank Address is present, it's necessary to also suspend the enable signal sent to the Bank Address latch; otherwise it would latch bits that are intended as data.
Much later, BDD unearthed an 1994 WDC datasheet that suggests appropriate logic to deal with the matter (first image, below). Compared with a more modern suggestion that doesn't involve RDY (second image), the 1994 circuit drives LE with a NOR rather than an inverter. Also there are two inverters which arguably improve matters by counterbalancing the delay of the NOR.