Clock Stretching and creating wait-states via
the CPU's RDY pin are the two available alternatives when one's goal is to grant extra access time for a slow I/O device or memory. Less commonly, they are used to momentarily freeze program execution and thus synchronize with some external event. In this thread I'd like to talk about the pros and cons that one might consider when choosing between RDY and Clock Stretching. Also I created a few very simple circuits that seem worth sharing. (Notably, my clock-stretching circuits include options that allow VIA timers to continue proper time-keeping.
)
First a note in regard to troubleshooting. Even a simple circuit adds complexity. For the sake of a smoother project launch, novices in particular are well advised to omit the RDY/stretching and simply use a slower CPU clock. Alternatively, it doesn't take much to plan ahead for the troubleshooting scenario. That just means designing the project so it includes some simple means -- jumpers, for example -- by which a slower CPU clock can be substituted and also the RDY/stretching circuit can be disabled.
As for the wiring that's involved, at the heart of our RDY or Clock-Stretch circuit there'll be a counter or pulse generator of some kind — one that implements a delay. Its output will either serve as the CPU clock, or it will act to pull down the RDY input of the CPU while the latter is supplied with a full-time, uninterrupted clock from elsewhere (a crystal oscillator, for example). Either method is sufficient to cause the CPU to momentarily delay its progress... and we don't need a lot of resources to get the job done. Measured in terms of jelly-bean logic, a highly workable circuit can be had for
one IC or less!
- Immediately below is my minimal RDY scheme -- just a single JK flipflop, providing a single wait state. (A two-wait-state variation is discussed here.)
- A little further below you'll find my minimal Clock Stretching circuit -- just a single '163 counter IC.
- Later in this thread I present some Clock Stretching circuits specifically intended for overclockers -- folks who like to push the CPU beyond its rated maximum frequency.
Attachment:
File comment: note the "from /CS" input needs to become valid before the rise of Phi2
simple wait-state generator.gif [ 4 KiB | Viewed 157112 times ]
A moment ago I mentioned "causing the CPU to momentarily delay its progress," and that may be sufficient for a simple case involving an (EP)ROM. But it's important to realize that we may need the circuit to do more than just delay the CPU. We might prefer or actually require that it also generates a signal that can be used to pace the slow-motion data transfer -- the read or write -- to/from the slow device. I'll briefly explain before moving on.
Broadly speaking, even a full speed device needs three things, timing-wise. There needs to be at least a certain minimum period during which the device is fully enabled for the read or write... and by "fully enabled" I mean the condition in which all the necessary chip select(s) are active and also any necessary control inputs such as /WE are simultaneously active. We'll call that the enabled period. And, we need the address presented to the device to be valid and stable for at least a certain minimum period
before the enabled period (this is the address setup time) and
after the enabled period (this is the address hold time). The address hold time is often quite easy to satisfy, but right now it isn't my goal to closely examine details. (For details you can refer to the device data sheet. And you may even wish to examine the specs pertaining to when the
data bus is valid.) Right now I just want to underscore the point that there's more to consider than simply causing the CPU to momentarily delay its progress.
65xx designs usually use the PHI2 to define the "fully enabled time" that paces the actual data transfers. A RAM for example may be supplied with /WE and /OE signals that are qualified by PHI2. And if the design includes peripherals from the 65xx family, PHI2 will be supplied directly to them. But these arrangements aren't well suited for a slow peripheral trying to keep up with a fast CPU whose progress has been momentarily delayed.
In all three cases illustrated below, the CPU cycle has been extended to double its usual duration using my simple JK circuit, and three different glue logic schemes are used to pace the data transfer to/from the slow device. Bearing in mind the three timing requirements mentioned earlier, notice (highlighted in pink) the fully enabled period and, surrounding it (in green) the address setup time and the address hold time.
- is the preferred scheme, especially if the slow device is writeable (eg, not a ROM). Compared with an unstretched cycle, all three of the timing requirements mentioned earlier have been extended. And BTW, in regard to implementation, you may find it advantageous to replace the inverter and the two NANDs with a 74_1G19, or with half of a 74_139.
- is included only for reference. It shows what would happen if PHI2 were used to pace the transfer to/from the slow device. Although perhaps not guaranteed to fail, it's a scheme that raises some perplexing questions.
- may be (marginably) tolerable for a ROM or other read-only device. Clearly, it uses the fewest resources. But it's unacceptable for writable devices because its lack of address setup time is likely to result in "rogue writes" to unintended addresses within the device. And, even with a read-only device, there's some risk that scheme (c) may inject noise into the power supply. This is due to bus contention with other devices if the slow device fails to go tri-state quickly enough after being read.
Attachment:
glue for controlling enable timing.png [ 12.77 KiB | Viewed 151428 times ]
Regarding the choice between RDY and Clock Stretching, here's what I came up with as a list of points to consider:
- Many schemes for Clock Stretching degrade the accuracy of timers in the system, such as the timers in a 6522 VIA. However, my stretcher circuits offer an optional auxiliary output that persists with the usual, non-stretched waveform, allowing this problem to be avoided. (With RDY there's no stretching and thus no timekeeping problem.)
- The RDY pin grants one or more entire CPU cycles, whereas Clock Stretching can allow finer control. Its increments can be equivalent to fractions of a CPU cycle, and sometimes that will allow you to come closer to delivering only as much delay as is required. Clock stretching also sidesteps certain CPU-specific issues, as follows.
- (for NMOS): 6502 and 6510 etc only "listen" to RDY during read cycles. If you wanna coax an NMOS 65xx to grant extra access time during a write, clock stretching is the only option.
- (for '816): Assuming you want more than 64K then you'll need a Bank Address latch, and that latch will require an enable signal. If you're using RDY then the logic to generate that signal becomes slightly more complex, because it's no longer acceptable to simply drive the latch enable with an inverted version of Phi2. I put the details in a footnote, below.
- (WDC '816 and 'C02): The RDY pin on WDC CPUs is bidirectional, and thus there's a risk of overcurrent resulting from contention. This occurs if external logic is trying to drive RDY high while at the same time the CPU is driving RDY low because a WAI instruction has been executed, either deliberately or as the result of a crash. Prospective mitigations are discussed in the text accompanying the circuit linked to above. But contention isn't a threat with clock stretching, because no external gate needs to drive RDY. (The pin will require a pullup resistor, though. Only with non-WDC processors can you safely tie RDY directly to VCC.)
Shown below is a clock-stretch circuit built around the wondrous and ever entertaining 74xx163 4-bit synchronous counter!
This chip even lets us include a
non-stretched output, courteously allowing your VIA timers to remain punctual!
Note: with this circuit, as with the JK circuits, it's imperative that /SELECT becomes valid somewhat before the rise of Phi2.
This might turn out to be the limiting factor for maximum clock speed, given the prop delays from the address decoder that typically drives /SELECT. A faster decoder will help, but a more powerful solution is to choose one of my overclockers' solutions later in the thread. These -- like RDY -- don't require /SELECT to become valid so early; it's sufficient that /SELECT is valid somewhat before the
fall of Phi2.
Attachment:
'163-based cycle stretcher.png [ 11.66 KiB | Viewed 157676 times ]
Some notes re: the '163 circuit:
PHI2_S is the stretchable clock, and it's what generally serves as Phi2 for your system. PHI2_VIA would only be used to drive the Phi2 pin on your 6522 VIA's.
The red asterisks pertain to debugging strategy. To bypass the circuit, sever the connections where indicated; then temporarily connect PHI2_S and VIA2_VIA together and drive them from an oscillator which is slow enough that clock stretching isn't required. If you're doing a PCB you may want to provide jumpers, or exposed traces you can easily cut, as a means to sever the connections. Or, if the '163 is in a socket, you can simply remove it!
2XCLK runs at twice the nominal CPU frequency, and its duty cycle isn't especially critical. Every rising edge of 2XCLK causes the '163 to either count or load, with the loads occurring at the end of every state 0.
If /SELECT is inactive (ie; high), the value loaded will be $F, and if /SELECT is low the value loaded will be determined by how you've connected the counter's P3-P0 inputs:
- Loading $D, $B or $9 will result in extra time equivalent to 1, 2 or 3 normal CPU cycles, respectively. PHI2_VIA will continue, unaffected.
- Loading $E, $C, $A or $8 will result in extra time equivalent to .5, 1.5, 2.5, or 3.5 normal CPU cycles, respectively. These fractional-cycle delays *do* minimally impact VIA timing. In all cases the VIA is slowed by 0.5 cycle.
If you have
multiple slow devices, some slower than others, you might wanna customize each device's delay value. For certain combinations of values this won't even require any additional components. To allow
any combination of values you could hook up a diode matrix or something similar.
Attachment:
'163 cycle stretcher variation.png [ 2.44 KiB | Viewed 157840 times ]
As shown in the '163 circuit above, the PHI2_S signal has extended duration of its high time (thus delaying the CPU). But PHI2_S is not ideal for qualifying /RD or /WR to a slow device. Although the fully-enabled period would be extended, there'd be no extra address setup time or address hold time (just the same amount of each that'd occur with an unstretched cycle).
It's possible to generate an enable signal that's padded exactly as you wish, but to do that that you'll probably need an extra gate or two. Otherwise, you can examine the existing '163 waveforms to see if one of them is close enough.
In the example below, Q3 still cues the counter reload as before, but the stretchable clock PHI2_S is now instead taken from
Q2 of the '163. And, the reload value ($B) is such that stretching causes Q2 to reload with a 0 not a 1, thus granting one extra 2XCLK period of Clock-low time. But there's still no extra address hold time (as compared with an unstretched cycle).
Different counter connections offer different waveform possibilities. If the reload were $A or $9 then the extra clock-low time increases even further. Other variations involve taking PHI2_S from Q1 of the '163 and choosing a reload value of $C or $D... Or, taking PHI2_S from Q0 and choosing a reload value of $E. But in some ways a shift register is more tractable for concocting different different enable waveform possibilities, and it's a shift register (not a counter) that's at the heart of my circuits downthread.
Attachment:
cycle stretcher extra detail.png [ 24.27 KiB | Viewed 157112 times ]
-- Jeff
footnote: In 2015, forum member cr1901 resolved a distressing ambiguity in the '816 data sheet by testing an actual '816 and
reporting, "The 65816 does NOT drive the bank address while RDY is low." In other words, the alternating pattern of
Bank-Address, Data that normally occupies the 816's data bus during the
Phi2-low, Phi2-high periods of every cycle is temporarily suspended. Because no Bank Address is present, it's necessary to also suspend the enable signal sent to the Bank Address latch; otherwise it would latch bits that are intended as data.
Much later, BDD
unearthed an 1994 WDC datasheet that suggests appropriate logic to deal with the matter (first image, below). Compared with a more modern suggestion that
doesn't involve RDY (second image), the 1994 circuit drives LE with a NOR rather than an inverter. Also there are two inverters which arguably improve matters by counterbalancing the delay of the NOR.
Attachment:
File comment: Using RDY.
1994 suggestion for bank-address latch.png [ 47.4 KiB | Viewed 151563 times ]
Attachment:
File comment: Not using RDY. BTW, the odd diamond symbol seemingly just indicates the bus is bidirectional.
Bank Address Latching Circuit.png [ 38.56 KiB | Viewed 151563 times ]