6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 8:37 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Sun Apr 25, 2021 11:29 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Clock stretching is an ongoing topic on the 6502 forum. The state of the art is jfoucher's variant which I believe is an extension of designs primarily devised by BigDumbDinosaur and Dr. Jefyll. I described a very small variant which provides +1, +2 or +4 wait states.

After my wilful ignorance of 74x161 functionality and choosing to implement 4017 style daisy-chains, it occurs to me that correct use of 74x161 inhibit signals may be used to extend clock stretching beyond five cycles. Specifically, two 74AC161 chips may be ganged to provide +1, +2, +4, +8, +16, +32, +64 and +128 wait states. Of these, six inputs retain even phase suitable for use with video circuitry. Of particular interest is +32, +64 and +128 wait states. This would allow processors exceeding 30MHz to retain interoperability with 1MHz chips and cards.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 28, 2021 11:51 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Nice! I so totally want to build a 30MHz 65C02 computer...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 29, 2021 5:30 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
enso wrote:
Nice! I so totally want to build a 30MHz 65C02 computer...

...and I want to go on a date with Jeri Ryan. :D

Dunno about 30 MHz, but 25 MHz might be attainable if your glue logic and RAM are up to it. If you extrapolate the Fmax vs. Vdd curve published in the 65C02 data sheet (page 24), it appears it will intersect 25 MHz at a hair over 5 volts. I've got POC V1.2 (65C816) running at 20 Mhz using 74AC logic (no PLD) and it is rock-solid. So I have no reason to doubt the 65C02 can run faster. You'd have to wait-state the C02 for ROM and I/O accesses, but that's no big deal.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 29, 2021 10:48 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Before encountering jfoucher's variant of the clock stretching circuit, I had considered a 2, 3 and N cycle system for my abandoned four nybble opcode processor architecture. My design only used states five, six and seven of a 4 bit counter. I was suitably impressed with jfoucher exploiting the wrap-around of the counter to load a three bit count. To out-do this, I asked "Is it possible to count more than three bits?" to which the answer is "Yes. Use the schematic in the data-sheet, as highly recommended by Dr Jefyll."

If that isn't clear enough, I enclose a diagram. There are two tricky parts to this process. The first tricky part is that the input clock is twice the speed of the output clock. Therefore, all clock stretching is measured in phases rather than cycles - where two phases equal one cycle. The unwary may be off by a factor of two. jfoucher has minimal regard for the one phase delay because it makes the output of the raw, unstretched, half speed, output signal incompatible with 6522, 6845 and other system choices. Although, I've noted that, in a minority of cases, it is possible to leave a processor out of phase and nudge it back into phase, as required, using a spare inhibit input. The second tricky part is the handling of the two inhibit signals which differs between the first two units of ganged four bit counter chips. This may or may not be related to carry acceleration problems encountered by Charles Babbage. Just follow the data-sheet. It works.

BigDumbDinosaur on Thu 29 Apr 2021 wrote:
enso on Wed 28 Apr 2021 wrote:
Nice! I so totally want to build a 30MHz 65C02 computer...

...and I want to go on a date with Jeri Ryan. :D


One of these is more likely than the other. Admittedly, I'm more of a Terry Ferrell kinda sheep.

More pertinently, my plan for a 30MHz system was to connect every output of the top level address decode to a different section of the clock stretching inputs. This would make clock cycle counting rather elastic but it has the advantage that a spare strobe of address decode may be connected to one of the unused clock stretching inputs to implement a low power pause.


Attachments:
clock-stretch-multi-chip0-0-1.pdf [147.19 KiB]
Downloaded 75 times
clock-stretch-multi-chip0-0-0.odg [9.61 KiB]
Downloaded 77 times

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!
Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 29, 2021 1:28 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1117
Location: Albuquerque NM USA
enso wrote:
Nice! I so totally want to build a 30MHz 65C02 computer...


How about 29.5mhz 65C02? It'll give you nice standard baud of 230400. https://www.retrobrewcomputers.org/doku ... asmo:crc65

The key is make the design simple and pc board small. Reduce component counts and associated load capacitances and propagation delays.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sun May 09, 2021 12:10 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I have a minor extension to my previous circuit. I don't know if it has been done before but I have solved 6502 dual core, arbitrary scale clock stretching using the minimum number of discrete chips. An intermediate version of this circuit used an inverter to ensure that one counter was always inhibited and one counter was always active. I eliminated the inverter by stealing one bit of one counter. I was concerned that formal analysis of states may require consideration of 2^2N (or 2^8N) initial states. Thankfully, it reduces to four major cases.

In normal operation, one counter always has top bit which are 01 or 10 and this counter is cross-wired such that one counter is inhibited. When the circuit is powered up, state 00 or 11 may be encountered. State 00 causes both counters to be inhibited. Thankfully, cross-wiring causes one counter to be re-loaded on the next cycle and this is sufficient to return to a good state. If the initial state is 11, both counters are uninhibited. One or both counters may re-load before the system settles into good states.

Functionality of /LOAD differs wildly across the two counter registers. Where /LOAD is tied to inhibit, the register may repeatedly load junk data before good data is obtained and the counter released. Whereas, for the opposing counter, /LOAD is triggered for one cycle before returning to a hold state.


Attachments:
clock-stretch-multi-chip0-0-6.pdf [263.31 KiB]
Downloaded 80 times
clock-stretch-multi-chip0-0-5.odg [18.38 KiB]
Downloaded 65 times

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!
Top
 Profile  
Reply with quote  
PostPosted: Sun May 16, 2021 3:09 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I was quite pleased with last week's advance on clock stretching. However, while ruminating in the Sheep Pen, I realized that I've not described much more than baud rate generator. I am therefore determined to be more original.

The story so far:-


I have been influenced by recent discussion about software programmable clock speed and yet another multi core topology discussion where each core has a dedicated task in a multi-media system. This led to references to previous discussion but omitted any consideration of how to clock or boot a grid on 6502 processors.

Clock Stretching Octo Core And Beyond

One pair of cores may use a slightly asymmetric counting circuit to obtain square waves of opposite polarity and relatively little signal skew. Two pairs of cores may share the counting circuit via AND gates. Consider the typical case where RAM is faster than ROM. When two cores in the same phase access separate banks of RAM, they may do so at the fastest speed. However, if either core (or both cores) access ROM, everything is paused to allow access at a lower speed. This arrangement allows each pair of processors to share one memory-space while also allowing synchronous access to a (typically) smaller pool of memory shared between corresponding pairs of processors.

It is possible to scale this process to a larger cluster of synchronous cores. However, as the number of cores increase, the likelihood of accessing slow memory also increases. We also have the problem that a deep tree of AND gates (and the physical distance of signalling) is likely to restrict the technique to eight cores or less. Why eight cores? A trivial variation of the circuit allows a latch to set the minimum duration for clock stretching. This may be used to test system stability or operate at low power. When the latch is reset at power-up, outputs AND all clock stretching signals. In this case, all delays are continuously asserted. If the mask is modified, a subset of genuine delay signals may be ignored but a minimum cycle time remains in place. Overall, the maximum number of cores in a cluster is heavily influenced by the fan-in of delay signals which may include a register to define maximum speed. The eight core limit is derived from two tiers of three input AND of which eight inputs are cores and the ninth input is a software defined speed limit.

Other communication techniques may be used between clusters, such as dual port RAM. This is akin to long distance electrical distribution in which synchronous regions of alternating current are joined with DC ties.

Clock Stretching Beyond 30MHz

It is possible to implement two tiers of clock stretching such that a preliminary tier begins clock stretching before address decode occurs in full. Analysis requires categorization of glue logic chips according to number of inputs, complexity and the consequent propagation delay. In particular, address decode of three or more bits falls into a secondary tier. These are the weights I use when designing but they may not have accuracy:-

  • 74HC logic gates with two inputs have 7ns propagation.
  • 74HC139 with three inputs has 12ns propagation.
  • 74HC138 with six inputs has 28ns propagation.
  • 74HC688 has more than 30ns propagation.

Casual use of 74x688 has fallen into disuse due to excessive propagation delay (except in video generation, where they should be retired). Likewise, it should be apparent that any address decode exceeding 30MHz should increasingly look like the Garth Wilson Special where only the top two bits of the address-space are significant. In the preferred embodiment, address line A14 AND A15 is LOW when address is 0-48KB and A14 NAND A15 is LOW when address is 48-64KB. This may be run in parallel with dual core bank switching. The result is broadly compatible with several popular memory maps which use similar arrangements to reduce circuitry and increase speed.

After the first phase of clock stretching has been established, address decode may ripple through two or more tiers of 74x138 and selectively request further delay to the secondary tier of the clock stretching circuit. For example, 48-50KB may be slow user defined I/O and 56-64KB may be slow boot ROM. In general, any irregular map of clock stretching may be defined.

In the minimal case, two tier, dual core clock stretching requires four 74HC161 chips. However, this arrangement may be restricted to two tiers of +2 wait state only - unless video phase and symmetric access to the slowest I/O is sacrificed. If this is acceptable then six inputs are available for clock stretching. It may be preferable to have two fast counters and four slow counters. The fast tier is restricted to +2 operation while the slow tier obtains a 3x multiplier chosen at the end of the first fast cycle. In this arrangement, phase count is always odd. Specifically, the product of phases is (1 + 2^F)(1 + 2^S) where 2^F and 2^S is never one.

Unfortunately, each tier of clock stretching incurs a minimum of two ticks per cycle. Therefore, a two tier clock stretching circuit requires an oscillator which exceeds the minimum cycle time by a factor of four. For example, a 120MHz oscillator requires a 120MHz tier of counters and a 60MHz tier of counters to obtain a minimum cycle time of 30MHz. It is otherwise suitable to approximate the cycle time of each section of a memory map with 33.3ns granularity or better.

Square Wave Video Generation

I wondered if it is possible to make counters with more than two phases. I have also considered audio and visual applications for square waves with arbitrary duty and duration. Indeed, it is possible to generate horizontal sync, left porch, playfield address, right porch and corresponding vertical signals in a manner which may be hard-wired or software configurable, without the use of 74x688 comparators. Indeed, I believe that it is possible to arrange a ring of four registers where one or three self-inhibit. I also believe that 1920*1080*60FPS display requires a 16 bit shift register running at 160MHz and 12 74HC161 chips running at 10MHz or less. I already have a ring of 2*2 counters running at 25MHz.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Sun May 16, 2021 8:13 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
If you are using Xilinx FPGAs, you can use LUTs as up-to-16 or 32-bit shift registers. A slice fits two (four in newer fpgas); if you pick mutually-prime lengths and load each with a single 1 bit, they form a pulse delay multiplier of sorts. A single slice has enough circuitry to detect a coincident pulse. For instance 9- and 11- long Shift Registers will result in a 1 out of 99 pulse, etc.

A mechanical equivalent is two gears with mutually prime number of teeth. A complete cycle is the multiple of tooth counts.

These can be easily combined to, for instance, generate VGA timing with 3 slices.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: