6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Nov 12, 2024 3:47 am

All times are UTC




Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Feb 18, 2019 5:22 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Clock Stretching and creating wait-states via the CPU's RDY pin are the two available alternatives when one's goal is to grant extra access time for a slow I/O device or memory. Less commonly, they are used to momentarily freeze program execution and thus synchronize with some external event. In this thread I'd like to talk about the pros and cons that one might consider when choosing between RDY and Clock Stretching. Also I created a few very simple circuits that seem worth sharing. (Notably, my clock-stretching circuits include options that allow VIA timers to continue proper time-keeping. :!: )

First a note in regard to troubleshooting. Even a simple circuit adds complexity. For the sake of a smoother project launch, novices in particular are well advised to omit the RDY/stretching and simply use a slower CPU clock. Alternatively, it doesn't take much to plan ahead for the troubleshooting scenario. That just means designing the project so it includes some simple means -- jumpers, for example -- by which a slower CPU clock can be substituted and also the RDY/stretching circuit can be disabled.

As for the wiring that's involved, at the heart of our RDY or Clock-Stretch circuit there'll be a counter or pulse generator of some kind — one that implements a delay. Its output will either serve as the CPU clock, or it will act to pull down the RDY input of the CPU while the latter is supplied with a full-time, uninterrupted clock from elsewhere (a crystal oscillator, for example). Either method is sufficient to cause the CPU to momentarily delay its progress... and we don't need a lot of resources to get the job done. Measured in terms of jelly-bean logic, a highly workable circuit can be had for one IC or less!

  • Immediately below is my minimal RDY scheme -- just a single JK flipflop, providing a single wait state. (A two-wait-state variation is discussed here.)
  • A little further below you'll find my minimal Clock Stretching circuit -- just a single '163 counter IC.
  • Later in this thread I present some Clock Stretching circuits specifically intended for overclockers -- folks who like to push the CPU beyond its rated maximum frequency.

Attachment:
File comment: note the "from /CS" input needs to become valid before the rise of Phi2
simple wait-state generator.gif
simple wait-state generator.gif [ 4 KiB | Viewed 157112 times ]


A moment ago I mentioned "causing the CPU to momentarily delay its progress," and that may be sufficient for a simple case involving an (EP)ROM. But it's important to realize that we may need the circuit to do more than just delay the CPU. We might prefer or actually require that it also generates a signal that can be used to pace the slow-motion data transfer -- the read or write -- to/from the slow device. I'll briefly explain before moving on.

Broadly speaking, even a full speed device needs three things, timing-wise. There needs to be at least a certain minimum period during which the device is fully enabled for the read or write... and by "fully enabled" I mean the condition in which all the necessary chip select(s) are active and also any necessary control inputs such as /WE are simultaneously active. We'll call that the enabled period. And, we need the address presented to the device to be valid and stable for at least a certain minimum period before the enabled period (this is the address setup time) and after the enabled period (this is the address hold time). The address hold time is often quite easy to satisfy, but right now it isn't my goal to closely examine details. (For details you can refer to the device data sheet. And you may even wish to examine the specs pertaining to when the data bus is valid.) Right now I just want to underscore the point that there's more to consider than simply causing the CPU to momentarily delay its progress.

65xx designs usually use the PHI2 to define the "fully enabled time" that paces the actual data transfers. A RAM for example may be supplied with /WE and /OE signals that are qualified by PHI2. And if the design includes peripherals from the 65xx family, PHI2 will be supplied directly to them. But these arrangements aren't well suited for a slow peripheral trying to keep up with a fast CPU whose progress has been momentarily delayed.

In all three cases illustrated below, the CPU cycle has been extended to double its usual duration using my simple JK circuit, and three different glue logic schemes are used to pace the data transfer to/from the slow device. Bearing in mind the three timing requirements mentioned earlier, notice (highlighted in pink) the fully enabled period and, surrounding it (in green) the address setup time and the address hold time.

  1. is the preferred scheme, especially if the slow device is writeable (eg, not a ROM). Compared with an unstretched cycle, all three of the timing requirements mentioned earlier have been extended. And BTW, in regard to implementation, you may find it advantageous to replace the inverter and the two NANDs with a 74_1G19, or with half of a 74_139. :wink:
  2. is included only for reference. It shows what would happen if PHI2 were used to pace the transfer to/from the slow device. Although perhaps not guaranteed to fail, it's a scheme that raises some perplexing questions.
  3. may be (marginably) tolerable for a ROM or other read-only device. Clearly, it uses the fewest resources. But it's unacceptable for writable devices because its lack of address setup time is likely to result in "rogue writes" to unintended addresses within the device. And, even with a read-only device, there's some risk that scheme (c) may inject noise into the power supply. This is due to bus contention with other devices if the slow device fails to go tri-state quickly enough after being read.

Attachment:
glue for controlling enable timing.png
glue for controlling enable timing.png [ 12.77 KiB | Viewed 151428 times ]


Regarding the choice between RDY and Clock Stretching, here's what I came up with as a list of points to consider:

  • Many schemes for Clock Stretching degrade the accuracy of timers in the system, such as the timers in a 6522 VIA. However, my stretcher circuits offer an optional auxiliary output that persists with the usual, non-stretched waveform, allowing this problem to be avoided. (With RDY there's no stretching and thus no timekeeping problem.)

  • The RDY pin grants one or more entire CPU cycles, whereas Clock Stretching can allow finer control. Its increments can be equivalent to fractions of a CPU cycle, and sometimes that will allow you to come closer to delivering only as much delay as is required. Clock stretching also sidesteps certain CPU-specific issues, as follows.

  • (for NMOS): 6502 and 6510 etc only "listen" to RDY during read cycles. If you wanna coax an NMOS 65xx to grant extra access time during a write, clock stretching is the only option.

  • (for '816): Assuming you want more than 64K then you'll need a Bank Address latch, and that latch will require an enable signal. If you're using RDY then the logic to generate that signal becomes slightly more complex, because it's no longer acceptable to simply drive the latch enable with an inverted version of Phi2. I put the details in a footnote, below.

  • (WDC '816 and 'C02): The RDY pin on WDC CPUs is bidirectional, and thus there's a risk of overcurrent resulting from contention. This occurs if external logic is trying to drive RDY high while at the same time the CPU is driving RDY low because a WAI instruction has been executed, either deliberately or as the result of a crash. Prospective mitigations are discussed in the text accompanying the circuit linked to above. But contention isn't a threat with clock stretching, because no external gate needs to drive RDY. (The pin will require a pullup resistor, though. Only with non-WDC processors can you safely tie RDY directly to VCC.)


Shown below is a clock-stretch circuit built around the wondrous and ever entertaining 74xx163 4-bit synchronous counter! :) This chip even lets us include a non-stretched output, courteously allowing your VIA timers to remain punctual!

Note: with this circuit, as with the JK circuits, it's imperative that /SELECT becomes valid somewhat before the rise of Phi2. :!: This might turn out to be the limiting factor for maximum clock speed, given the prop delays from the address decoder that typically drives /SELECT. A faster decoder will help, but a more powerful solution is to choose one of my overclockers' solutions later in the thread. These -- like RDY -- don't require /SELECT to become valid so early; it's sufficient that /SELECT is valid somewhat before the fall of Phi2.
Attachment:
'163-based cycle stretcher.png
'163-based cycle stretcher.png [ 11.66 KiB | Viewed 157676 times ]
Some notes re: the '163 circuit:

PHI2_S is the stretchable clock, and it's what generally serves as Phi2 for your system. PHI2_VIA would only be used to drive the Phi2 pin on your 6522 VIA's.

The red asterisks pertain to debugging strategy. To bypass the circuit, sever the connections where indicated; then temporarily connect PHI2_S and VIA2_VIA together and drive them from an oscillator which is slow enough that clock stretching isn't required. If you're doing a PCB you may want to provide jumpers, or exposed traces you can easily cut, as a means to sever the connections. Or, if the '163 is in a socket, you can simply remove it!

2XCLK runs at twice the nominal CPU frequency, and its duty cycle isn't especially critical. Every rising edge of 2XCLK causes the '163 to either count or load, with the loads occurring at the end of every state 0.

If /SELECT is inactive (ie; high), the value loaded will be $F, and if /SELECT is low the value loaded will be determined by how you've connected the counter's P3-P0 inputs:

  • Loading $D, $B or $9 will result in extra time equivalent to 1, 2 or 3 normal CPU cycles, respectively. PHI2_VIA will continue, unaffected.

  • Loading $E, $C, $A or $8 will result in extra time equivalent to .5, 1.5, 2.5, or 3.5 normal CPU cycles, respectively. :!: These fractional-cycle delays *do* minimally impact VIA timing. In all cases the VIA is slowed by 0.5 cycle.

If you have multiple slow devices, some slower than others, you might wanna customize each device's delay value. For certain combinations of values this won't even require any additional components. To allow any combination of values you could hook up a diode matrix or something similar.
Attachment:
'163 cycle stretcher variation.png
'163 cycle stretcher variation.png [ 2.44 KiB | Viewed 157840 times ]


As shown in the '163 circuit above, the PHI2_S signal has extended duration of its high time (thus delaying the CPU). But PHI2_S is not ideal for qualifying /RD or /WR to a slow device. Although the fully-enabled period would be extended, there'd be no extra address setup time or address hold time (just the same amount of each that'd occur with an unstretched cycle).

It's possible to generate an enable signal that's padded exactly as you wish, but to do that that you'll probably need an extra gate or two. Otherwise, you can examine the existing '163 waveforms to see if one of them is close enough.

In the example below, Q3 still cues the counter reload as before, but the stretchable clock PHI2_S is now instead taken from Q2 of the '163. And, the reload value ($B) is such that stretching causes Q2 to reload with a 0 not a 1, thus granting one extra 2XCLK period of Clock-low time. But there's still no extra address hold time (as compared with an unstretched cycle).

Different counter connections offer different waveform possibilities. If the reload were $A or $9 then the extra clock-low time increases even further. Other variations involve taking PHI2_S from Q1 of the '163 and choosing a reload value of $C or $D... Or, taking PHI2_S from Q0 and choosing a reload value of $E. But in some ways a shift register is more tractable for concocting different different enable waveform possibilities, and it's a shift register (not a counter) that's at the heart of my circuits downthread.

Attachment:
cycle stretcher extra detail.png
cycle stretcher extra detail.png [ 24.27 KiB | Viewed 157112 times ]

-- Jeff

footnote: In 2015, forum member cr1901 resolved a distressing ambiguity in the '816 data sheet by testing an actual '816 and reporting, "The 65816 does NOT drive the bank address while RDY is low." In other words, the alternating pattern of Bank-Address, Data that normally occupies the 816's data bus during the Phi2-low, Phi2-high periods of every cycle is temporarily suspended. Because no Bank Address is present, it's necessary to also suspend the enable signal sent to the Bank Address latch; otherwise it would latch bits that are intended as data.

Much later, BDD unearthed an 1994 WDC datasheet that suggests appropriate logic to deal with the matter (first image, below). Compared with a more modern suggestion that doesn't involve RDY (second image), the 1994 circuit drives LE with a NOR rather than an inverter. Also there are two inverters which arguably improve matters by counterbalancing the delay of the NOR.

Attachment:
File comment: Using RDY.
1994 suggestion for bank-address latch.png
1994 suggestion for bank-address latch.png [ 47.4 KiB | Viewed 151563 times ]
Attachment:
File comment: Not using RDY. BTW, the odd diamond symbol seemingly just indicates the bus is bidirectional.
Bank Address Latching Circuit.png
Bank Address Latching Circuit.png [ 38.56 KiB | Viewed 151563 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Mon Dec 04, 2023 1:35 am, edited 7 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 18, 2019 5:36 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
Very nice!!! :D Bookmarked.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 18, 2019 6:51 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Great!


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 18, 2019 8:14 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Great post Jeff! And a new thread, and a good subject. Just what we need.


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 18, 2019 9:05 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
No booboos to report (yet)? Well, that's something to be thankful for! :P But I'm sure I'll be editing that post again, for clarity if nothing else.

The '163 really is a Swiss Army knife! It occurs to me that the signal marked in red (below) could be renamed /HALT -- because if you pull it low the counter will complete whatever sequence is in progress, then endlessly load 0000 and the CPU will stop indefinitely.

Also, I redrew the timing diagram to show an additional signal (marked in green). That stretched (but narrow duty cycle) waveform on the TC output seems like it would surely be useful for something... but I haven't yet figured out what! :lol:


Attachments:
Notes on '163-based cycle stretcher.png
Notes on '163-based cycle stretcher.png [ 4.05 KiB | Viewed 157805 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 18, 2019 9:14 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
Dr Jefyll wrote:
seems like it would surely be useful for something... but I haven't yet figured out what! :lol:

You will! :lol:

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 08, 2020 2:17 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
I implemented Jeff's circuit with POC V1.2 and have posted results.

————————————————————
Edit: I linked back to the test results. Thanks to Ed for also doing so.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Thu Apr 09, 2020 3:44 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 08, 2020 8:12 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
(BDD's results posted here.)

Jeff, in your scheme you run the CPU at full speed except for some cycles where you slow down to access slow devices. You explain how to take off a constant full speed clock for a timer-counter. That's all good.

But it might be worth also noting the case we see in Acorn's machines: again, the CPU runs at full speed except for some cycles where it slows down to access slow devices... but in this case one of the slow devices is the timer-counter, which needs a constant low speed clock. This is, I think, a more difficult problem: not only must the CPU be slowed, but it might also have to wait for the appropriate phase of the free-running low speed clock.

I mention it as a curiosity, not as a criticism (or a challenge... well maybe it is a challenge.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 12:27 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
I wanted to discuss these ideas more, and especially BigEd's thought about supporting synchronization with a slow clock, as that's pretty important in some of my circuits.

BDD's POC v1.3 and v1.4 used the following circuit which is I think Jeff's second one posted above:
Attachment:
File comment: Clock stretching circuit from BDD's POC v1.4
clockstretch_pocv14.png
clockstretch_pocv14.png [ 22.38 KiB | Viewed 153224 times ]


This works well but requires knowing whether or not the current cycle should be stretched within one tick of PHI2 falling, which puts a constraint on overall clock speed, given glue logic delays as the 65C816 appears to be slow at presenting the bank address which is typically used as part of the decision over whether or not to stretch the cycle.

I thought maybe the following modification might resolve that:
Attachment:
File comment: Possible change to allow later stretching decision
clockstretch_pocv14_gf.png
clockstretch_pocv14_gf.png [ 26.53 KiB | Viewed 153224 times ]


So we always assume the cycle will be stretched, but cut it short if it turns out it's not meant to be stretched. However in order to not stretch the low phase of the clock during fast cycles, we need to constrain the stretching to the high phase, which as noted in this thread may mean slow devices don't have enough address setup time, and cause other complications.

It got me thinking though about whether we can support a consistent slow clock as BigEd said above, to allow the CPU to run faster than some I/O devices that themselves require such a clock. I don't have a neat solution for that yet but have been thinking along the following lines:
Attachment:
File comment: Possible way to provide a slow but synchronous I/O clock
clockstretch_2x163_slowclk.png
clockstretch_2x163_slowclk.png [ 28.74 KiB | Viewed 153224 times ]


The general mechanism is the same as in my previous circuit, but the value loaded into the '163 U5 is such that PHI2 will fall one CLK cycle after SLOWCLK. In addition, if SLOWCLK is high at the time PHI2 goes high, PHI2 waits an additional SLOWCLK cycle, ensuring that there's a full high half-cycle of SLOWCLK. Unfortunately all of this means that SLOWCLK can only be half of the speed of PHI2 anyway, as the counters otherwise don't have enough bits.

Here's an illustration of how PHI2 would respond to clock stretching, showing the four possible values that could get loaded into the counter and how they correspond with the phases of the slow clock and U6 counter:
Attachment:
File comment: Various possible PHI2 responses
clockstretch_2x163_slowclk_waveforms.png
clockstretch_2x163_slowclk_waveforms.png [ 6.03 KiB | Viewed 153224 times ]


There are a lot of missing pieces and rough edges, for example SLOWCLK needs to be delayed by one CLK cycle, to match PHI2 (or PHI2 needs to end a cycle earlier); chip selects and write-enable signals need to be gated to the second high phase of SLOWCLK in cases where SLOWCLK was high initially; and there's no guarantee of a full low half-cycle of SLOWCLK with chip selects and valid address signals active, so some devices might be unhappy. For my own system I also really need this slow clock to be a lot slower - maybe 4x or 8x slower than PHI2 - so would need more counter bits.

So overall I think this slow clock idea shows some promise, and some of these issues can be easily resolved with a few flipflops, but it's not quite there yet.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 4:37 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Yikes! Lots to talk about here. :)

Firstly, one hopefully simple point to dispatch, George. Published timing specs suggest the Bank Address bits become valid at the same time as address lines A15-A0, so can you elaborate please regarding your remark about the 65C816 appearing to be slow at presenting the bank address? Perhaps you simply mean there'll be some non-trivial added delay if a '573 or similar latch is used. (And I'll note in passing that the latch delay can be reduced to the sub-nanosecond level by using the FET Bus Switch approach -- more on the FET switch approach in When is a Latch not a Latch? (Capturing the '816 Bank Addr).

gfoot wrote:
knowing whether or not the current cycle should be stretched within one tick of PHI2 falling [...] puts a constraint on overall clock speed, given glue logic delays
Yes, it would definitely be better if the clock-stretch request were allowed to arrive fairly late! And on my back burner I have a project that's been successful in satisfying this. :!:

Unlike my '163 circuit, which requires the clock-stretch request to become valid shortly before Phi2 rises, the circuit below merely requires the clock-stretch request to become valid shortly before Phi2 falls, making the circuit easier to work with... especially when over-clocking the CPU! :twisted:

At this time I can't present the full, proper writeup I planned. But your post and the mention of clock speed and glue logic delays prompt me to provide a somewhat rushed preview! :oops:

The circuit was tested by using it to generate Phi2 for an '816 executing a hardwired NOP ($EA). In a real system, /STRCH (the stretch-request signal) would probably come from an address decoder, but for my test rig I simply used the 816's VPA output to supply the stretch request. For each NOP executed, there'll be a non-stretched cycle and a stretched cycle.

After consulting my records I have edited this post, and can now supply additional detail. The main circuit operates as intended, supplying CPU_PHI2 for use by the CPU and LAZY_PHI2 for use by the logic that qualifies /RD and /WR to the slow device. At present, the optional J-K flipflop isn't performing to spec but in a subsequent post this problem is corrected. This optional J-K can be included to generate VIA_PHI2, a uniform, full speed, non-stretched Phi2 to protect the time-keeping if there's a full-speed VIA in the system.
Attachment:
74AC323-based clock stretcher waveforms.png
74AC323-based clock stretcher waveforms.png [ 570.64 KiB | Viewed 152763 times ]
Attachment:
'323-based cycle stretcher 06c_.png
'323-based cycle stretcher 06c_.png [ 19.55 KiB | Viewed 152411 times ]
It's a little odd how I've used the shift register. So, referring to the diagram below, allow me to point out that the cycle completes when a "1" arrives at IO0. Before this occurs, the shift reg dictates either a short or a long time delay -- either one or eight of the 2X clock periods -- according to which direction it's shifting... and the direction depends on /STRCH, the circuit's stretch-request input. BTW, note that the shift-reg always gets cleared at the time Phi2 rises; in fact, that's what causes Phi2 to rise.
Attachment:
explanation.png
explanation.png [ 9.98 KiB | Viewed 152763 times ]


Quote:
BigEd's thought about supporting synchronization with a slow clock, as that's pretty important in some of my circuits.
Indeed this is valuable consideration, but it's a different challenge. To be clear, you're talking about momentarily synchronizing the CPU to a slow clock that's also specified as being uniform because it's used for timekeeping by a VIA or UART. In contrast, my goal ATM is to accommodate an LCD or other peripheral that's slow but not used for timekeeping. And I also want to accommodate VIAs used for timekeeping, but they'd be modern, WDC devices, capable of operating at full speed. I tend to think it'd be better to create a separate thread, partly to avoid potential confusion, and also because it may be a different audience who's interested.

Edit: Thanks, George, for creating the thread Simple circuits for clock stretching to match a slow clock.

I'm pleased with the low parts count of the basic clock-stretch circuit above -- just one 20-pin IC and 3 inverters! :mrgreen: And the LAZY_PHI2 it generates grants the peripheral device a great deal more setup, dwell and hold time compared with full speed operation. Comments welcome.

And I have other circuits in the works; for example it'd be good to offer even longer cycle times, and likewise good to offer a granular selection so you can trim the total amount of stretch and minimize the impact on CPU throughput. Stay tuned!

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sun Nov 05, 2023 5:35 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 5:38 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Regarding the 65C816's bank address delay, I think I misunderstood the extent of the problem - I thought BDD had observed 30ns delays from the falling edge of PHI2, but looking at the logic analyser captures he posted, it looks like the bank address is there after about 10ns, which is not so bad - still, given the need to then decode that and decide whether to stretch or not, it is still a... stretch! But perhaps no worse than for the 65C02 with the address bus, it seems they have similar delays from the falling edge of PHI2.

Dr Jefyll wrote:
At this time I can't present the full, proper writeup I planned. But your post and the mention of clock speed and glue logic delays prompt me to provide a somewhat rushed preview! :oops:
Cool, sorry to push it out into the open before it was planned!

Quote:
Quote:
BigEd's thought about supporting synchronization with a slow clock, as that's pretty important in some of my circuits.
Indeed this is valuable consideration, but it's different challenge than the full-speed VIA just mentioned. I tend to think it'd be better to create a separate thread (linked to this one, of course), partly because it's a different audience who'll be interested in synchronization of the CPU with an independent, slow clock. Also, it will make for clearer discussion if we separate that challenge from that of accommodating a full-speed VIA.

Sure, I'm happy for it to move to another thread. I have expanded on it and got something that ought to work, though it is of course no longer as simple as you were aiming for!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 6:06 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
gfoot wrote:
Regarding the 65C816's bank address delay, I think I misunderstood the extent of the problem - I thought BDD had observed 30ns delays from the falling edge of PHI2, but looking at the logic analyser captures he posted, it looks like the bank address is there after about 10ns, which is not so bad - still, given the need to then decode that and decide whether to stretch or not, it is still a... stretch!
Yup, I made it about 11ns from PHI2 falling to the address out (the bank address seemed to take fractionally longer but still under 13ns). However I was using a 200Mhz scope so nanosecond timings need some interpretation. Still as the '816 seems to struggle to reach 40Mhz (everything measured at 5V) that's a pretty good indication it needs slightly longer than 10ns (12.5ns) for half a clock cycle.

Back on the subject of clock stretching and simple minimalist chip count circuits. I managed to design up one that is 15 discrete ICs and is absolutely awesome! In theory.

I mean it doesn't work, probably because that's a stupid number chips for a clock stretching circuit. But is working really a useful benchmark? I think not.

Anyway I'll post it here if I ever get around to cleaning it up.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 7:53 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
I started posting this about six hours ago, only to be called away.  It’s likely stale content by now, but I’ll throw it out there to confuse things.  :D

gfoot wrote:
This works well but requires knowing whether or not the current cycle should be stretched within one tick of PHI2 falling, which puts a constraint on overall clock speed, given glue logic delays as the 65C816 appears to be slow at presenting the bank address which is typically used as part of the decision over whether or not to stretch the cycle.

I thought maybe the following modification might resolve that...

It may be because I'm up earlier than normal (for me) and haven't ingested enough coffee, but I'm not seeing a gain here. Even though the HC00 is being gated by Ø2, the 163's /CLR (labeled in your drawing as /MR) timing is relative to the oscillator's signal, which means the HC00 really needs to settle in no more that TSU nanoseconds before the oscillator goes high. So, it seems to me the same timing limitation exists, just at a differing input.

In any case, the HC00 likely needs to be faster logic to avoid having it introduce too much lag in responding to /WSE, which in my POC V1.3 unit, doesn't resolve until late during Ø2 low.

Incidentally, POC V1.3 doesn't use the 163. It, instead, has a AC74 and AC109 clock generator. V1.4 does have the 163. Since V1.4 doesn't perform as well as V1.3, I may look at bodging the former to try out your idea. The one impediment is V1.4 is entirely SMT (excepting one item), which is not something I can readily work with anymore.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 9:22 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
BigDumbDinosaur wrote:
It may be because I'm up earlier than normal (for me) and haven't ingested enough coffee, but I'm not seeing a gain here. Even though the HC00 is being gated by Ø2, the 163's /CLR (labeled in your drawing as /MR) timing is relative to the oscillator's signal, which means the HC00 really needs to settle in no more that TSU nanoseconds before the oscillator goes high. So, it seems to me the same timing limitation exists, just at a differing input.

The /MR signal is intended to take effect only at the end of the (unstretched) PHI2-high phase - not at the start of it - so the idea is that there should be an extra clock cycle (half a PHI2 cycle) of time for the bank/address decoding and NAND logic to take place.

Quote:
In any case, the HC00 likely needs to be faster logic to avoid having it introduce too much lag in responding to /WSE, which in my POC V1.3 unit, doesn't resolve until late during Ø2 low.
Yes I didn't deliberately pick the HC part, that's just what KiCad gave me - you can use something faster - but equally it's probably not as important given the extra cycle of time that's available.

Quote:
Incidentally, POC V1.3 doesn't use the 163. It, instead, has a AC74 and AC109 clock generator. V1.4 does have the 163. Since V1.4 doesn't perform as well as V1.3, I may look at bodging the former to try out your idea. The one impediment is V1.4 is entirely SMT (excepting one item), which is not something I can readily work with anymore.[/color]

Ah I misunderstood which you'd used this in, sorry. It'd be interesting to hear if it works - but might be too awkward to bodge. Maybe you could feed PHI2 into the GAL and so make /WSE be gated by PHI2 right from there, but the connection at the 163 will still need cut traces and bodge wires.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 26, 2023 9:34 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
gfoot wrote:
Regarding the 65C816’s bank address delay, I think I misunderstood the extent of the problem - I thought BDD had observed 30ns delays from the falling edge of PHI2, but looking at the logic analyser captures he posted, it looks like the bank address is there after about 10ns, which is not so bad - still, given the need to then decode that and decide whether to stretch or not, it is still a... stretch!  But perhaps no worse than for the 65C02 with the address bus, it seems they have similar delays from the falling edge of PHI2.

I’ve tried out several 65C816s in POC V1.3, including one with a Sanyo 0.8µ core, and the elapsed time from the fall of Ø2 to the stabilization of the bank bits seems to be remarkably consistent, right around 12ns, as reported by the logic analyzer (LA).  In a capture I did at 12.5 MHz of a RAM read (which would not be wait-stated), VDA was asserted at 10ns and A16 appeared on D0 at 12ns.  A0-A15 had already appeared at the rise of VDA, although it’s likely there is a small lag there as well.  It would appear that 2ns lag from VDA high to A16 on D0 is also very consistent.

Before continuing, I should note that my logic analyzer has a maximum sample rate of 500 million/second, which equates to an effective resolution of 2ns.  This sample limit occasionally produces some anomalies in the captures, such as a symmetric 20 MHz clock appearing to be asymmetric, since the 25ns half-cycle time has to be resolved to an even number of time units for display purposes.

V1.3’s glue logic can’t make any wait-state decisions until A16 is present, since the decoding rules prevent ROM and I/O hardware, both which are wait-stated, from appearing in the extended banks.  It’s a race between the gates that generate chip selects based on address bus state and the tPLH/tPHL timing of the bank latch.  tPLH/tPHL ranges from 2ns to 11ns, so the likely worst-case will result in /WSE being asserted 21ns after the fall of Ø2.

Since a half-cycle at 20 MHz is 25ns, that only leaves 4ns for whatever is watching /WSE to react.  The 74AC109 I used in V1.3 as the wait-state timer can’t meet that requirement, which is why clock-stretching in that machine goes south somewhere above 16 MHz.  The 163 used in V1.4 to generate the Ø2 clock likewise can’t react that quickly, and has the double whammy of having its input timings being relative to the oscillator, which of course, is running at Ø2 × 2—V1.4 barely manages 15 MHz.  The 109, on the other hand, only has to keep up with Ø2.

The problem, as I understand it, devolves to a matter of how to widen that 4ns window.  So far, it seems intractable with discrete logic.  I may be wrong on this—I’m eagerly awaiting Jeff to jump in with a rebuttal and a circuit example :D, but I have studied this at length and have yet to solve the problem.

A likely path to widening the 4ns window is to eliminate the timing flop as a discrete part and synthesize it in a CPLD that also provides the rest of the decoding glue logic.  It would produce the /WSE signal to control clock stretching.  A 7ns CPLD should be able to increase the timing window to at least 8ns, worst-case, assuming the use of buried logic and no pin nodes (not to be confused with the PINNODE statement in CUPL, which when used with a CPLD, doesn’t involve actual pins).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: