A real timing quagmire. If I were going the microcontroller route it would not be with a 65C256.
Generating Wait-States with Clock Trickery
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Generating Wait-States with Clock Trickery
whartung wrote:
Well, the W65C265S does this, but also, potentially, a bunch of other stuff.
A real timing quagmire. If I were going the microcontroller route it would not be with a 65C256.
Last edited by BigDumbDinosaur on Sun Oct 24, 2021 7:37 pm, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Generating Wait-States with Clock Trickery: From Theory to P
Well, here it is nearly two years since anything was posted to this topic. With COVID-19 keeping most of us at home and out of work (or working less), there are opportunities to monkey with our computing gadgets, So it is with moi.
To date, I hadn't really had an opportunity to test the theory I presented earlier in this topic. However, with all this downtime on my hands, plus the next iteration of the POC series being on the drawing board, I decided it was time to build a gadget to prove or disprove my ideas. Here's the schematic for it:
This is a simple circuit whose sole purpose in life is to demonstrate a practical way to stretch the clock for slow hardware. To assist in connecting it to the logic analyzer, J2 is an 8-pin header for picking off signals that are of interest.
This contraption generates three clock signal outputs, one which is the "global" clock (GCLK) that runs at half the speed of the oscillator. Two out-of-phase clocks, Ø1 and Ø2, are derived from one section of the 74AC74 flip-flop (U3). Ø1 and Ø2 would be needed in a 65C816 system to drive the MPU, as well as gate a bank latch and data bus transceiver. Ø2 is in phase with GCLK and Ø1 is 180° out of phase with Ø2.
That little gadget at the left end of the schematic is a push button. When pressed it will momentarily pull the AC109's K input low, which will trigger the wait-state by pulling the PRE input of U3b low for one entire GCLK cycle. While PRE is low Ø2 will be continuously high and Ø1 will be continuously low, causing the high phase of that clock period to double in length. At the end of the GCLK cycle, PRE will be driven high and U3b will return to normal operation.
Here is an annotated logic analyzer trace of this circuit in action:
Testing was done with a 2 MHz oscillator, which results in the three output clocks running at 1 MHz. The annotations should be self-explanatory.
In a practical application, the K input to U2a (/WSE on the annotated trace) would be connected into the glue logic in such a way that it would be pulled low when a wait-state is needed. Due to the manner in which this circuit operates, K must be pulled low before the rise of Ø2. In simple terms, chip selects of devices that are to be wait-stated must not be qualified by Ø2, else this circuit will not work.
This circuit can be used with the NMOS 6502, as well as the 65C02 and 65C816, although it is targeted to the requirements of an '816 system with extended RAM (Jeff's clock stretcher using a synchronous counter is a simpler choice for a 65(c)02 system or a 65C816 without extended RAM—it's a one-chip solution that I used in POC V1.2). Note that only the high phase of Ø2 is stretched. This should suffice for most applications.
In a system with one or more 65C22s, GCLK should be used to drive the 65C22s' Ø2 inputs. Driving them from Ø2 will result in the timers gradually slowing down with each wait-state. Also, the synchronous shift registers, if used, will "malfunction" if Ø2 is stretched. If you are using WDC's 65C21, 65C22 or 65C51 there is no need to wait-state any of them, as they are rated for 14 MHz operation.
To date, I hadn't really had an opportunity to test the theory I presented earlier in this topic. However, with all this downtime on my hands, plus the next iteration of the POC series being on the drawing board, I decided it was time to build a gadget to prove or disprove my ideas. Here's the schematic for it:
This is a simple circuit whose sole purpose in life is to demonstrate a practical way to stretch the clock for slow hardware. To assist in connecting it to the logic analyzer, J2 is an 8-pin header for picking off signals that are of interest.
This contraption generates three clock signal outputs, one which is the "global" clock (GCLK) that runs at half the speed of the oscillator. Two out-of-phase clocks, Ø1 and Ø2, are derived from one section of the 74AC74 flip-flop (U3). Ø1 and Ø2 would be needed in a 65C816 system to drive the MPU, as well as gate a bank latch and data bus transceiver. Ø2 is in phase with GCLK and Ø1 is 180° out of phase with Ø2.
That little gadget at the left end of the schematic is a push button. When pressed it will momentarily pull the AC109's K input low, which will trigger the wait-state by pulling the PRE input of U3b low for one entire GCLK cycle. While PRE is low Ø2 will be continuously high and Ø1 will be continuously low, causing the high phase of that clock period to double in length. At the end of the GCLK cycle, PRE will be driven high and U3b will return to normal operation.
Here is an annotated logic analyzer trace of this circuit in action:
Testing was done with a 2 MHz oscillator, which results in the three output clocks running at 1 MHz. The annotations should be self-explanatory.
In a practical application, the K input to U2a (/WSE on the annotated trace) would be connected into the glue logic in such a way that it would be pulled low when a wait-state is needed. Due to the manner in which this circuit operates, K must be pulled low before the rise of Ø2. In simple terms, chip selects of devices that are to be wait-stated must not be qualified by Ø2, else this circuit will not work.
This circuit can be used with the NMOS 6502, as well as the 65C02 and 65C816, although it is targeted to the requirements of an '816 system with extended RAM (Jeff's clock stretcher using a synchronous counter is a simpler choice for a 65(c)02 system or a 65C816 without extended RAM—it's a one-chip solution that I used in POC V1.2). Note that only the high phase of Ø2 is stretched. This should suffice for most applications.
In a system with one or more 65C22s, GCLK should be used to drive the 65C22s' Ø2 inputs. Driving them from Ø2 will result in the timers gradually slowing down with each wait-state. Also, the synchronous shift registers, if used, will "malfunction" if Ø2 is stretched. If you are using WDC's 65C21, 65C22 or 65C51 there is no need to wait-state any of them, as they are rated for 14 MHz operation.
Last edited by BigDumbDinosaur on Sun Oct 24, 2021 7:45 pm, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Generating Wait-States with Clock Trickery
This circuit has been put into practical application in POC V1.3.
x86? We ain't got no x86. We don't NEED no stinking x86!
- fredericsegard
- Posts: 47
- Joined: 05 Aug 2020
- Location: Montreal, QC, Canada
- Contact:
Re: Generating Wait-States with Clock Trickery
Hi BDD,
I implemented your circuit to try it out. Except for a minor change, as you can see in the attached diagram. I replaced the 74AC109 with a 74AC112 (because that's what I had on hand), which is functionally the same, except the K input is inverted. There's also a 74HC193 4-bit counter that serves as a clock speed selector, that I have bypassed during debugging (wiring up pin5 of the 20MHz clock directly to pins 3 and 11 of the 74AC74.
The problem I have is this. When I hardwire WSEN low, I get the expected result of 10MHz, with a 50ns LOW and 50ns HIGH period. When I set WSEN high, I get 5MHz, but with a duty cycle of 175ns HIGH, and 25ns LOW. According to your timing diagram, this shouldn't be, it should be 50%/50%. I double-checked my wiring, but all seems according to the schematic.
Any thoughts?
BigDumbDinosaur wrote:
This circuit has been put into practical application in POC V1.3.
The problem I have is this. When I hardwire WSEN low, I get the expected result of 10MHz, with a 50ns LOW and 50ns HIGH period. When I set WSEN high, I get 5MHz, but with a duty cycle of 175ns HIGH, and 25ns LOW. According to your timing diagram, this shouldn't be, it should be 50%/50%. I double-checked my wiring, but all seems according to the schematic.
Any thoughts?
Re: Generating Wait-States with Clock Trickery
fredericsegard wrote:
I replaced the 74AC109 with a 74AC112 (because that's what I had on hand), which is functionally the same, except the K input is inverted.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Generating Wait-States with Clock Trickery
fredericsegard wrote:
I implemented your circuit to try it out. Except for a minor change, as you can see in the attached diagram. I replaced the 74AC109 with a 74AC112 (because that's what I had on hand), which is functionally the same, except the K input is inverted.
Dr Jefyll wrote:
That's not the only difference. The clock on a '109 triggers on the rising edge, whereas that on a '112 is falling-edge triggered.
As Jeff noted, the AC112 is clocked on the falling edge. That characteristic would explain what you are observing.
Both Digi-Key and Mouser stock the 74AC109 in single-piece quantities, in DIP and SOIC.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Generating Wait-States with Clock Trickery
So what i gathered from this thread is:
Pulling RDY Low is functionally identical to holding the CPU's PHI2 pin high, so you can say that it stretches the High Phase of the CPU's internal clock until the next falling edge when RDY is high again.
The problem with that is the fact that it's all inside the CPU so any decoding logic that also uses PHI2 (RD/WR signals for Memory, IO, and the Bank Address Latch) will continue to run like normal since the clock is uneffected by RDY.
I ran into this same issue when doing my wait state generation for my 65C02 SBC since i also wanted to run at 20MHz.
my solution is very similar to your own BDD, but instead of using 2 D-Flip Flops I only use 1 for the wait state generation and then OR the negated RDY output with PHI2 to create the creatively named PHI2_D that automatically gets it's high phase stretched whenever the RDY output is low. The RDY output goes to the CPU's RDY pin and PHI2_D is used exclsuviely for the decoding logic, though as it was mentioned before RDY is just stretching the internal CPU clock so you could throw that output away, pull the RDY pin high with a resistor, and use PHI2_D for both the decoding logic and the CPU.
The timing diagram should look something like this: But I'm using this circuit in a CPLD so I don't know how different the timings would be when you use discrete logic ICs... for example the propergation delay of the OR gate could be an issue at +30MHz
also please excuse my hand drawn diagrams, i don't know what fancy programs you guys have to make those circuits and timing diagrams so i just use GIMP.
Pulling RDY Low is functionally identical to holding the CPU's PHI2 pin high, so you can say that it stretches the High Phase of the CPU's internal clock until the next falling edge when RDY is high again.
The problem with that is the fact that it's all inside the CPU so any decoding logic that also uses PHI2 (RD/WR signals for Memory, IO, and the Bank Address Latch) will continue to run like normal since the clock is uneffected by RDY.
I ran into this same issue when doing my wait state generation for my 65C02 SBC since i also wanted to run at 20MHz.
my solution is very similar to your own BDD, but instead of using 2 D-Flip Flops I only use 1 for the wait state generation and then OR the negated RDY output with PHI2 to create the creatively named PHI2_D that automatically gets it's high phase stretched whenever the RDY output is low. The RDY output goes to the CPU's RDY pin and PHI2_D is used exclsuviely for the decoding logic, though as it was mentioned before RDY is just stretching the internal CPU clock so you could throw that output away, pull the RDY pin high with a resistor, and use PHI2_D for both the decoding logic and the CPU.
The timing diagram should look something like this: But I'm using this circuit in a CPLD so I don't know how different the timings would be when you use discrete logic ICs... for example the propergation delay of the OR gate could be an issue at +30MHz
also please excuse my hand drawn diagrams, i don't know what fancy programs you guys have to make those circuits and timing diagrams so i just use GIMP.
- akohlbecker
- Posts: 282
- Joined: 24 Jul 2021
- Contact:
Re: Generating Wait-States with Clock Trickery
Proxy, I go into the RDY pin in great detail in my video series, starting with Episode 5 and Episode 6 where I talk about what it is in the context of the 65C816 bus demultiplexing, and continuing in Episode 18 and Episode 19 where I use it to generate Read and Write pulses. I specifically pay close attention to the timing delays it generates. You might find this interesting!
I have yet to use it for wait states, but at this point, given that my glue logic has been designed with this pin in mind, it will be pretty easy to do. Mostly connect the slow I/O chip select to RDY_IN with a flip flop or a counter. I just have to make sure that the address decoding + the wait state counting don't delay the signal past the rising edge of the clock.
Here is how I handle the RDY pin, turning it from a bi-directional pin to two unidirectional pins RDY_IN and /WAI. I'm also making the input go through a flip flop, so the logic can stabilize before the CPU reads RDY and stay consistent throughout the cycle
Here is how I generate my clock phases, taking into account the registered RDY input. I've had to generate a few phases to satisfy the hold times on reads and on writes. All phases are OR-ed with RDY similarly to what you're doing.
And finally, here is how those phases are used for read-write pulses
This is in discrete logic and so relatively slow. I could scale it up to probably 6MHz in spec, more with overclocking. At the moment my address decoding is the limiting factor. But if you're on a CPLD I suspect it can scale quite a bit more.
All in all, I think this is a good base to have in the glue logic to be ready (!) for wait states!
I have yet to use it for wait states, but at this point, given that my glue logic has been designed with this pin in mind, it will be pretty easy to do. Mostly connect the slow I/O chip select to RDY_IN with a flip flop or a counter. I just have to make sure that the address decoding + the wait state counting don't delay the signal past the rising edge of the clock.
Here is how I handle the RDY pin, turning it from a bi-directional pin to two unidirectional pins RDY_IN and /WAI. I'm also making the input go through a flip flop, so the logic can stabilize before the CPU reads RDY and stay consistent throughout the cycle
Here is how I generate my clock phases, taking into account the registered RDY input. I've had to generate a few phases to satisfy the hold times on reads and on writes. All phases are OR-ed with RDY similarly to what you're doing.
And finally, here is how those phases are used for read-write pulses
This is in discrete logic and so relatively slow. I could scale it up to probably 6MHz in spec, more with overclocking. At the moment my address decoding is the limiting factor. But if you're on a CPLD I suspect it can scale quite a bit more.
All in all, I think this is a good base to have in the glue logic to be ready (!) for wait states!
Re: Generating Wait-States with Clock Trickery
akohlbecker wrote:
I'm also making the input go through a flip flop, so the logic can stabilize before the CPU reads RDY and stay consistent throughout the cycle
My own approach would be for the decoder output to directly (ie, w/o any flipflop) cause the CPU's RDY pin to be pulled low. And if you pull the CPU's RDY pin low then you're assured that the address won't change when the cycle ends, and hence the decoder's output will remain stable.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
- akohlbecker
- Posts: 282
- Joined: 24 Jul 2021
- Contact:
Re: Generating Wait-States with Clock Trickery
Dr Jefyll wrote:
akohlbecker wrote:
I'm also making the input go through a flip flop, so the logic can stabilize before the CPU reads RDY and stay consistent throughout the cycle
My own approach would be for the decoder output to directly (ie, w/o any flipflop) cause the CPU's RDY pin to be pulled low. And if you pull the CPU's RDY pin low then you're assured that the address won't change when the cycle ends, and hence the decoder's output will remain stable.
-- Jeff
You're right that wait states are synchronized with the clock and probably don't need this safety, provided your signal propagates outside of the CPU window. However, other use cases can trigger RDY at any moment, like DMA. Ben Eater's VGA card, for example, pauses the CPU while the display is active and it is reading from memory, independently from the main CPU clock. Thinking about RDY and BE from the perspective of using this card and making it safe, has been an interesting challenge for me!
Re: Generating Wait-States with Clock Trickery
akohlbecker wrote:
There is a window around the falling edge of the clock where the CPU is reading RDY. It can happen that RDY goes low during this window and whether the CPU actually registers it is then undefined.
In cases where RDY goes high or goes low during the danger zone, one might suppose that the CPU will either immediately recognize the new level or instead simply recognize it on the subsequent cycle... IOW, a tidy result in either case. But there's a third possible outcome. When tPCS or tPCH is violated the result can be messy, resulting in a program crash.
drogon wrote:
I'm wondering if anyone is successfully using Rdy to halt a 65C02?
I tried it some time back on my Ruby boards and had mixed results - in that it worked very well - I was pulling Rdy low, then BE low, then fiddling with the RAM from another processor (an ATmega), then releasing BE then taking Rdy high ... At that point, sometimes the 65C02 would crash - it's as if it was reading a random instruction at that point.
I tried it some time back on my Ruby boards and had mixed results - in that it worked very well - I was pulling Rdy low, then BE low, then fiddling with the RAM from another processor (an ATmega), then releasing BE then taking Rdy high ... At that point, sometimes the 65C02 would crash - it's as if it was reading a random instruction at that point.
Perhaps tPCS/tPCH violations cause metastability issues within the CPU; I don't know. But another (more likely?) explanation is that RDY is a signal that gets internally routed to multiple sections of the CPU. For example, PC needs to be told not to increment, and updates to X, Y and A need to be postponed. It's easy to imagine a jumbled outcome if (for example) PC did get the message in time but some other part of the CPU didn't!
-- Jeff
Last edited by Dr Jefyll on Sat May 14, 2022 6:17 pm, edited 1 time in total.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: Generating Wait-States with Clock Trickery
Dr Jefyll wrote:
Perhaps tPCS/tPCH violations cause metastability issues within the CPU; I don't know. But another (more likely?) explanation is that RDY is a signal that gets internally routed to multiple sections of the CPU. For example, PC needs to be told not to increment, and updates to X, Y and A need to be postponed. It's easy to imagine a jumbled outcome if (for example) PC did get the message in time but some other part of the CPU didn't!
I guess adding yet another latch to help guard against metastability was a straw that would break the camel's back, and it's more cost effective to require external latching for the minority of cases where this matters (given that many 6502 uses never touch RDY).
- akohlbecker
- Posts: 282
- Joined: 24 Jul 2021
- Contact:
Re: Generating Wait-States with Clock Trickery
Good points all around! 
Re: Generating Wait-States with Clock Trickery
gfoot wrote:
it's more cost effective to require external latching for the minority of cases where this matters (given that many 6502 uses never touch RDY).
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
- fredericsegard
- Posts: 47
- Joined: 05 Aug 2020
- Location: Montreal, QC, Canada
- Contact:
Re: Generating Wait-States with Clock Trickery
I admit I don't full understand what has been posted since my last post, but I do get the general gist of it.
Understandably, timing is everything. I think I was looking for a simple solution to a rather complex problem. I definitely will need to do more homework on this.
I want to thank all that chimed in on this thread.
Understandably, timing is everything. I think I was looking for a simple solution to a rather complex problem. I definitely will need to do more homework on this.
I want to thank all that chimed in on this thread.