6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 12:20 am

All times are UTC




Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Oct 28, 2023 11:52 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Over in DrJeffyl's thread - RDY vs CLOCK STRETCHING. Includes 2 very simple circuits - I posted a potential circuit for supporting clock stretching to match a slower, derived clock, and he asked that I make a separate thread for it as it's a different goal. So I'll elaborate more on that topic here and share my circuit, and I'm interested if anyone has other such circuits to share or examples of where it is useful.

One of Jeff's circuits used a '163 counter to perform the stretching, and also provided a regular unstretched clock output for driving VIAs. However, in some situations you may want this consistent clock to actually be slower than the normal CPU clock, and then you'd want the stretching behaviour to specifically bring the CPU clock in sync with the slower clock on certain cycles. For example, we may be using parts like 6522 VIAs that require a consistent clock signal like PHI2, but which don't support the speeds that we are running the CPU at. Or perhaps as in Paganini's case recently, we are struggling to get all of the address decoding done far enough before the normal rising edge of PHI2; and especially if some of the devices are 6522 VIAs, options are limited. This approach should support both those cases.

So the goal is to run PHI2 at a high frequency, e.g. 16MHz, and also generate a slower clock from the same source - let's call it SLOWCLK - at for example half of that frequency, 8MHz. When the CPU performs an operation on a slow device, we need to bring the two clocks in sync. In particular we want PHI2's falling edge to coincide with SLOWCLK's, but it's also beneficial if there's a healthy amount of SLOWCLK being low first, so that slow devices have plenty of time to react to being selected and the state of the address bus, and so that fine-grained address decoding has plenty of time to take place.

In normal operation then, PHI2 and SLOWCLK will be ticking along, but from time to time PHI2 will be held high to synchronise them:
Code:
    WSE:0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0
   PHI2: 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
Here WSE is an input signal that we change in between clock ticks - it means wait-state enable (terminology taken from BDD's POC v1.4) and is an output of address decoding. It's generally low, indicating no need to stretch the clock, and PHI2 runs at its usual rate; but if WSE goes high before the usual falling edge of PHI2 then PHI2 is held high and the current cycle is extended to include a full low phase of SLOWCLK followed by a high phase.

Now we can connect SLOWCLK to e.g. the PHI2 input of a VIA. But we also need to make sure we activate the VIA's chip-select inputs at the right times. We can't do that purely based on the addresses coming from the CPU because that could overlap with the wrong SLOWCLK cycle. So we also need to generate a SLOWCS signal to gate these chip-selects with, like so:
Code:
    WSE:0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0
   PHI2: 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0
We can connect SLOWCS to the VIA's active-high select line, or just include it in our regular address decoding process, and the VIA will only be activated during the correct slow clock cycle.

So that's the goal. This would be pretty easy to do in a PLD, and lots of people happily and successfully use PLDs for this sort of thing, but other people have their own different reasons for avoiding PLDs in clock generation, and the aim here is to cater for that, whatever the reason.

So here's the circuit I've designed to do it, extended from DrJeffyl's circuits:
Attachment:
File comment: Simple clock stretching to match a derived, half-rate clock signal
clockstretch_2x163_slowclk2.png
clockstretch_2x163_slowclk2.png [ 33.63 KiB | Viewed 7474 times ]
Note that in this circuit WSE has been inverted - so the signal feeding into the NAND gate is high if the cycle should be a fast cycle, and low if it should be slow.

As with DrJeffyl's circuit, the input clock oscillator is double the frequency that PHI2 generally ends up being (e.g. CLK=32MHz, PHI2=16MHz); but here we also have SLOWCLK which is half of the regular PHI2 frequency (e.g. SLOWCLK=8MHz). I have some other circuits for greater divisors, in case you need the slow clock to be even slower, but I'll post those separately.

The main change in this circuit - from DrJeffyl's 163-based circuit - is that counter U2 that generates PHI2 has its low bits loaded from counter U1, which generates SLOWCLK. These bits can be interpreted as a negative number which says how many clock ticks are needed before the falling edge of SLOWCLK - e.g. if they were both set, that would represent -1 in two's complement, imagining higher bits are all set, meaning that one more tick is needed before the falling edge of SLOWCLK - which is indeed the case because on the next tick U1's low bits will roll over to 00. However, as both U1 and U2 are driven by the same input clock, U2 is going to lag a clock cycle behind U1. To remedy that, the relevant output of U1 is passed through a separate D flip-flop (U4A) to form SLOWCLK before being used in the rest of the circuit, bringing it back into sync.

Whenever PHI2 is low, U2 will be reloaded with its high bit set, its next lower bit clear, and the lowest two bits taken from U1. This means that U2 will tick for 1-4 more cycles until a falling edge of SLOWCLK, then 4 more cycles, all with its top output bit (PHI2) high, and then wrap around to zero causing PHI2 to fall in sync with another SLOWCLK falling edge in at least 4 CLK ticks' time. These 4 guarranteed ticks ensure that there will definitely be a low and then high phase of SLOWCLK in the meantime.

U2's second-highest bit is initialised clear, but will get set after the next falling edge of SLOWCLK, and then stay set until PHI2 falls. This is exactly the signal we need for SLOWCS - it is high during the specific low-then-high phases of SLOWCLK, and low at all other times.

The final extension to DrJeffyl's circuit is the input to U2's /MR pin. If this is high, then U2 will tick through a stretched cycle as described above; but if it is low then U2 will get cleared to zero on the next CLK tick, causing PHI2 to go low. We use this to curtail the stretched cycles if PHI2 is high but the incoming WSE signal does not request stretching. This means that unstretched cycles only have one CLK tick in the high state before going low again.

Low phases of PHI2 are always just a single tick because U2's /PE pin is driven by PHI2, so when PHI2 is low we always load U2 with its top bit set on the next cycle.

I hope this explanation makes sense - there's quite a lot going on in a rather small circuit. I have built this on a breadboard for testing, and wired it up to an Arduino Mega to run a full test suite of all the various phases that PHI2 and SLOWCLK could be in. I'll post the output of that test program below, but here are some waveform diagrams to illustrate:
Attachment:
File comment: Waveforms for critical signals in the circuit, when WSE is asserted near a rising edge of PHI2 with at various points during SLOWCLK
clockstretch_2x163_slowclk_waveforms.png
clockstretch_2x163_slowclk_waveforms.png [ 57.86 KiB | Viewed 7474 times ]
This shows the critical signals in the circuit. It includes four separate PHI2 lines, covering the various phase relationships. In each case assume WSE was raised around the rising edge of PHI2 in one of these lines - it could occur before or after it without affecting the result. In the first PHI2 trace, this occurs at the same time as a falling edge of SLOWCLK, and we load U1 with the value -8, and wait in fact for two full cycles of SLOWCLK before continuing. This ensures we get a full low phase of SLOWCLK with SLOWCS high, even if WSE was late arriving. The next trace down shows the response when PHI2's rising edge was one CLK tick further in, and in this case U2 gets loaded with -7, and so on. The lower traces in this diagram show the states of the outputs of U2 for the PHI2(-8) case in particular.

Some of these phases are actually going to be quite rare - they'd only occur on initial startup, as once PHI2 has been synchronised with SLOWCLK once, it will only change phase by a multiple of two CLK cycles at a time. But it's still important to handle those cases, otherwise it would never get in sync, so my the Arduino has a way to force it to happen, and test the result. the Arduino code also tests WSE rising or falling on both sides of the leading edge of PHI2, to make sure both behave the same way.

I have designed this circuit into a simple 6502-based computer to test it out further, but not built it yet:
Attachment:
File comment: Full 6502-based computer schematic based on this clock stretching technique
clockstretch_2x163_6502basedcomputer.png
clockstretch_2x163_6502basedcomputer.png [ 107.63 KiB | Viewed 7474 times ]
I'm expecting this to be able to run into the 30+MHz range, as the CPU/RAM core is the same as in my fast PDIP design and there aren't too many I/O devices to bog things down. The address decoding is also quite coarse but it should be fine to tighten that up and add more RAM in the upper half of the address space. It will make WSE more complex to determine, but it doesn't really matter much if that is slow, so long as it's ready by the end of the usual PHI2-high phase.

Here's the output of the test program, showing the critical signal states in each test case, which might illustrate how these things work together. Note that in this output WSE is active-high again, not inverted.
Code:
Testing normal fast cycles
    WSE:0 0 0 0 0 0 0 0
   PHI2: 1 0 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 0 0 0 0 0

Testing slow cycle with delay 0
    WSE:1 1 1 1 1 1 1 1 0 0 0 0
   PHI2: 1 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle with delay 1
    WSE:0 1 1 1 1 1 1 1 0 0 0 0
   PHI2: 1 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle with delay 2
    WSE:0 0 1 1 1 1 1 1 0 0 0 0
   PHI2: 1 0 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle with delay 3
    WSE:0 0 0 1 1 1 1 1 0 0 0 0
   PHI2: 1 0 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing two slow cycles in a row
    WSE:1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   PHI2: 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0

Testing delayed fall of SLOW after slow cycle
    WSE:1 1 1 1 1 1 1 1 1 0 0 0
   PHI2: 1 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing putting PHI2 and SLOWCLK out of phase
    WSE:0 0 0 0
   PHI2: 0 1 0 1
SLOWCLK: 0 1 1 0
 SLOWCS: 0 0 0 0

Testing slow cycle from out-of-phase with delay 0
    WSE:1 1 1 1 1 1 1 1 0 0 0 0
   PHI2: 1 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle from out-of-phase with delay 1
    WSE:0 1 1 1 1 1 1 1 0 0 0 0
   PHI2: 0 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle from out-of-phase with delay 2
    WSE:0 0 1 1 1 1 1 1 0 0 0 0
   PHI2: 0 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 0

Testing slow cycle from out-of-phase with delay 3
    WSE:0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0
   PHI2: 0 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0

Testing two slow cycles in a row from out-of-phase
    WSE:0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   PHI2: 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
SLOWCLK: 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
 SLOWCS: 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 28, 2023 11:53 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
It's possible that you may want or need the slow clock to be slower than half the speed of PHI2. For example, in my Fast PDIP computer I typically run the CPU at about 32MHz, and the I/O module at 8MHz. If I were to use the circuit above, the I/O clock would be 16MHz, which is technically too fast for most EEPROMs. So I really want to divide the clock by more than 2 - perhaps 4, or even 8, to get an I/O speed of 8MHz or 4MHz.

As far as I can see, doing so requires more ICs, because U2 in the circuit above has all its bits in use. The extensible way to do this seems to be simply adding another counter IC, like so:
Attachment:
File comment: Clock stretching circuit for SLOWCLK = PHI2/4 or 8 or ...
clockstretch_3x163_slowclk.png
clockstretch_3x163_slowclk.png [ 24.96 KiB | Viewed 7465 times ]
This leads to SLOWCLK=PHI2/4, as desired. All I did here was add another counter IC, chained to the existing one (U17, which was called U2 in the earlier circuit), moved PHI2 to the high bit of the new counter, and moved SLOWCLK and SLOWCS down one bit. You could move them down another bit to get SLOWCLK=PHI2/8, and I think it would still work.

Here are the waveforms you get from that, for all eight interesting phase possibilities - again note that half of these are rarely going to occur in practice, but still need to be supported for initialisation:
Attachment:
clockstretch_3x163_slowclk_waveforms.png
clockstretch_3x163_slowclk_waveforms.png [ 12.37 KiB | Viewed 7465 times ]
Now it feels like the new counter IC is woefully underutilised. If divide-by-4 is all that's required, I think we can actually replace that counter IC with the second half of the D flip-flop we're already using in the circuit, along with a few NAND gates - so depending on the rest of the circuit, this may be a more compact arrangement:
Attachment:
clockstretch_2x163_2x74_slowclk.png
clockstretch_2x163_2x74_slowclk.png [ 27.81 KiB | Viewed 7465 times ]
Neither of these two are tested though, so take them with a pinch of salt! I wanted to share them anyway as possible extensions to the first circuit above, and if I make a synchronous version of my fast PDIP computer then it'll probably use one of these, I think, unless somebody comes up with something better!


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 29, 2023 5:46 pm 
Offline

Joined: Fri Mar 18, 2022 6:33 pm
Posts: 491
This is great stuff George! I wonder if a generalized solution is possible. I mean, a kind of "synchronization" circuit, which takes an arbitrary number of fixed-frequency input clocks and generates a variable Ø2 output that syncs with one of the input clocks based on an address decoding scheme.

E.g., in Blue August I have some 15ns SRAM, 70ns flash ROM, VIAS, and a 4MHz ACIA. The RAM can keep up with a 30MHz CPU clock (at least, I think so. I haven't tried.) The ROM can go about 10MHz or so (actually faster for my 70ns Atmel EEPROM), the VIAs are only limited by the speed of address decoding (since it has to be done by the end of Ø1), and the ACIA can probably go a little faster than 4MHz.

My initial system concept seems very conservative by these numbers: 16MHz RAM speed, 8MHz ROM / VIA speed, 4MHz ACIA speed - but I couldn't figure out a way to get a 3-speed CPU clock out of the clock stretcher.

_________________
"The key is not to let the hardware sense any fear." - Radical Brad


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 29, 2023 7:46 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
This is really awesome, I've been thinking of ways to interface SID (which requires a stable 1 MHz clock signal) with a faster (8 MHz) CPU clock. Simple clock stretching that I implemented with GAL to address a slower LCD (viewtopic.php?f=4&t=7593) was not an option for SID since the latter wants the clock to remain stable without any stretching. Your approach is really cool, I'm going to try it myself!

EDIT: I've implemented your circuit in Digital to do some tests, and looks like it's working as expected!


Attachments:
gfoot_slow_clock.zip [1.36 KiB]
Downloaded 62 times

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD
Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 31, 2023 12:43 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Paganini wrote:
This is great stuff George! I wonder if a generalized solution is possible. I mean, a kind of "synchronization" circuit, which takes an arbitrary number of fixed-frequency input clocks and generates a variable Ø2 output that syncs with one of the input clocks based on an address decoding scheme.

E.g., in Blue August I have some 15ns SRAM, 70ns flash ROM, VIAS, and a 4MHz ACIA. The RAM can keep up with a 30MHz CPU clock (at least, I think so. I haven't tried.) The ROM can go about 10MHz or so (actually faster for my 70ns Atmel EEPROM), the VIAs are only limited by the speed of address decoding (since it has to be done by the end of Ø1), and the ACIA can probably go a little faster than 4MHz.

My initial system concept seems very conservative by these numbers: 16MHz RAM speed, 8MHz ROM / VIA speed, 4MHz ACIA speed - but I couldn't figure out a way to get a 3-speed CPU clock out of the clock stretcher.

In general I think I'd just run all the peripherals at the speed of the slowest one, unless it is *really* slow, in which case you should probably put it on the far side of a VIA. One consideration with slowing the CPU down to match very slow peripherals is that interrupts that occur in the meantime will be delayed until the end of the cycle, so it will increase the latency of interrupt responses.

That said, it would be fairly easy to extend the circuit to produce additional derived clocks and be able to sync to them as well - if you compare the first circuit with the ones in my second post, which used a quarter-speed slow clock, the differences are fairly minor and I think you could easily make it produce both slow clocks, along with separate SLOWCS signals corresponding to each, and then it's just some minor changes to the way the second counter is loaded, to make it depend upon how much stretching is needed:
Attachment:
clockstretch_2x163_175_slowclk2_slowclk4.png
clockstretch_2x163_175_slowclk2_slowclk4.png [ 33.09 KiB | Viewed 7337 times ]

Here SLOWCLK2 is half the frequency of PHI2 and SLOWCLK4 is a quarter of the frequency of PHI2. WSE2 is active-high enabling stretching to match SLOWCLK2; WSE4 is active-high enabling stretching to match SLOWCLK4; and you shouldn't set both high together. In this circuit, however, these signals need to be valid significantly before the rising edge of PHI2, as they are used to load the counter. So the decoding again needs to be very fast.

Another approach is to do more like what I did in my Fast PDIP system - set a D flipflop if the cycle needs to be a slow cycle of any kind, then hold PHI2 high, and then wait for another signal to say it's OK to proceed. That return signal could come from various different other circuits with different delays, depending upon what type of I/O operation it was. I still think this is of limited benefit given the complexity, but it can be done.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 31, 2023 12:45 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
and3rson wrote:
This is really awesome, I've been thinking of ways to interface SID (which requires a stable 1 MHz clock signal) with a faster (8 MHz) CPU clock. Simple clock stretching that I implemented with GAL to address a slower LCD (viewtopic.php?f=4&t=7593) was not an option for SID since the latter wants the clock to remain stable without any stretching. Your approach is really cool, I'm going to try it myself!

EDIT: I've implemented your circuit in Digital to do some tests, and looks like it's working as expected!

SID is an interesting example, I hope it works well. I have tested the full-computer circuit I posted above (with a minor error fixed), with a serial circuit also attached, and it seems to mostly work though needs some debugging.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 31, 2023 8:57 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
Paganini wrote:
I wonder if a generalized solution is possible. I mean, a kind of "synchronization" circuit, which takes an arbitrary number of fixed-frequency input clocks and generates a variable Ø2 output that syncs with one of the input clocks based on an address decoding scheme.

Actually, this really makes me wanna try to implement gfoot's solution using GAL which would be a life-saver for my builds. I suck at understanding 74xx timers, but I still feel like '22V10 should be able to handle this schematic, despite only having so much registers. I'll have to do some studying on '163 and will post here if I have any luck with GAL.

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Last edited by and3rson on Wed Nov 01, 2023 12:25 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 12:23 am 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
Okay, I think I've got it, and I'm honestly proud of myself to finally be able to do something useful with my GAL besides address decoding & clock dividers. :D
Of course, all credit still goes to gfoot! I merely converted his design into GAL equations.

Some notes:
- I've added /RES input to ensure all registers are in consistent state. It can be driven directly by your computer's reset circuit.
- First '163 Q-outputs are A2,A1,A0 (A3 is not used)
- Second '163 Q-outputs are SLOWCS,A2,A1,A0 (since A3 is directly tied to SLOWCS)
- Second '163 equations are duplicated so that the counter can be cleared with NAND(PHI2, /WSE) - which is represented as /PHI2 + WSE as per DeMorgan's theorem
- SLOWCLK division can be configured by changing the equation to use A1 instead (unfortunately, there's no way to use A3 since we're out of GAL outputs and thus we're not counting A3)
- SLOWCS can easily be made active-low by inverting its pin definition (line 5)

Code:
GAL22V10
WSESync

CLK  /RES  /WSE   NC    NC       NC    NC    NC    NC    NC      NC    GND
/OE   A0    A1    A2    SLOWCLK /BSR   B0    B1    B2    SLOWCS  PHI2  VCC

; A2..A0 - 4-bit counter (1st '163, bit 3 is not used), 0 on reset

A0.R  = /A0 * /RES

A1.R  =  A0 * /A1 * /RES
      + /A0 *  A1 * /RES

A2.R  = /A2 *  A1 *  A0 * /RES
      +  A2 * /A1       * /RES
      +  A2 * /A0       * /RES

; SLOWCS,B2,B1,B0 - 4-bit counter (2nd '163), 0 on reset, SLOWCS is bit 3
; Set = /PHI2
; Reset = PHI2 & /WSE

BSR  =  PHI2 * /WSE

B0.R  =  A0 * /PHI2 * /RES
      + /B0 * /BSR  * /RES

B1.R  =  A1         * /PHI2 * /RES
      +  B0 * /B1   * /BSR  * /RES
      + /B0 *  B1   * /BSR  * /RES

B2.R  =  A2             * /PHI2 * /RES
      + /B2 *  B1 *  B0 * /BSR  * /RES
      +  B2 * /B1       * /BSR  * /RES
      +  B2 * /B0       * /BSR  * /RES

; Bit 3 of 2nd '163
SLOWCS.R = /SLOWCS *  B2 *  B1 *  B0 *  /BSR * /RES
         +  SLOWCS * /B2             *  /BSR * /RES
         +  SLOWCS * /B1             *  /BSR * /RES
         +  SLOWCS * /B0             *  /BSR * /RES

; SLOWCLK output, 0 on reset
SLOWCLK.R = A2 * /RES

; PHI2 output, 1 on reset
/PHI2.R =  B0 *  B1 *  B2 *  SLOWCS * /RES + PHI2 * /WSE * /RES

DESCRIPTION
Implementation of gfoot's wait-state clock synchronization circuit, with credits to BDD ("wait-state-enable") & Dr Jefyll (original ideas)
riginal thread: http://forum.6502.org/viewtopic.php?f=4&t=7798


'22V10 was barely enough to keep all the stuff, but my hunch turned out to be correct - 10 registers is all we need for this task.
I've slightly simplified the schematic to be more GAL-friendly:
Attachment:
gfoot_schematic_simplified.png
gfoot_schematic_simplified.png [ 238.06 KiB | Viewed 7249 times ]


I haven't tested all the edge cases, so here's the basic test I did (my recently developed GAL solver turned out to be surprisingly handy here :lol:)
Attachment:
gfoot_schematic_test.png
gfoot_schematic_test.png [ 139.9 KiB | Viewed 7249 times ]

Code:
RES     __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

WSE     ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\____________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

A0      __/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__

A1      _____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾

A2      ___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾

BSR     ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾

B0      _____/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾

B1      ___________/‾‾\________/‾‾\________/‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\_____/‾‾‾‾‾\________/‾‾\________/‾‾\________/‾‾\________/‾‾\________/‾‾\_____

B2      _________________/‾‾\__/‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\______________/‾‾\__/‾‾\______________/‾‾\__/‾‾\______________/‾‾

SLOWCLK ______________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾

SLOWCS  __________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_________________________________________________________________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾


EDIT: FYI - the above equations have not been tested with a real GAL, and I have no idea whether they are truly doing what they should.
EDIT 2: Fixed some copy-paste typos in GAL assembly.
EDIT 3: I totally forgot to actually load B2..B0 with A2..A0... So the above code is totally wrong. I'll post updated version in the morning. There's at least one register that may be saved.
EDIT 4: Updated the post and the schematic (added few signal delays), I think it's now behaving exactly as it should.

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 11:50 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Nice! What's the ".R" syntax, I haven't seen that used for these devices in the past - are you using a different compiler to WinCUPL? Normally I'd use ".D" here as we want the outputs to be configured as D flipflops.

I think if you've directly translated the circuit, it's probably using a few more macrocells than it needs to. Things like delaying SLOWCLK are probably unnecessary in a PLD because we can instead arrange for the other signals to happen one cycle earlier.

To elaborate a bit more - the reason I delay SLOWCLK through an extra flipflop is that B is loaded synchronously from A, while A is reloaded with A+1. This means that generally B lags A by one tick. But in the PLD we can just as easily load B with A+1 in the first place, so then A and B will be exactly equal and there'll be no need to delay SLOWCLK.

I'd also note that once that's done, A0=B0 at (almost) all times (unless I'm missing something) so there's another possible saving there.

Oh and if you'd like, I can post the Arduino program I used for testing - you could use that to test the device when you've burned it. Though my test program is for the divide-by-two case and I think you implemented the divide-by-four one, so you'd need to make some updates.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 12:49 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
gfoot wrote:
Nice! What's the ".R" syntax, I haven't seen that used for these devices in the past - are you using a different compiler to WinCUPL? Normally I'd use ".D" here as we want the outputs to be configured as D flipflops.

Thanks! I'm using galasm - it's CLI-friendly and also the first GAL assembler I found while researching, and I also like the operator choice (+ and *) so I've been using it ever since. R stands for "register" and it basically defines an equation that is evaluated & latched with CLK. Basically, a D flip-flop in galasm looks as simple as "Q.R = D". Luckily all your components use same clock, so I didn't need something like GAL20RA10 (which supports individual clocks for registers).

gfoot wrote:
I think if you've directly translated the circuit, it's probably using a few more macrocells than it needs to. Things like delaying SLOWCLK are probably unnecessary in a PLD because we can instead arrange for the other signals to happen one cycle earlier.

To elaborate a bit more - the reason I delay SLOWCLK through an extra flipflop is that B is loaded synchronously from A, while A is reloaded with A+1. This means that generally B lags A by one tick. But in the PLD we can just as easily load B with A+1 in the first place, so then A and B will be exactly equal and there'll be no need to delay SLOWCLK.

I'd also note that once that's done, A0=B0 at (almost) all times (unless I'm missing something) so there's another possible saving there.


Good point! I was actually wondering why did you use a flip-flop there since I blindly carbon-copied that part into my equations in a cargo-cult fashion. If we can save some registers, that means we could have more bits for clock division, which is great!

and3rson wrote:
Oh and if you'd like, I can post the Arduino program I used for testing - you could use that to test the device when you've burned it. Though my test program is for the divide-by-two case and I think you implemented the divide-by-four one, so you'd need to make some updates.


This would be very helpful - I'd love to see if my implementation behaves the same way as you intended. :)

Also, one more comment/question: I had to add several delays in Digital (e. g. 1st '163 Q2 -> delay -> top flip-flop), they are presented as rectangles with a wide "H"-like shape. I did this to ensure that the flip-flop doesn't latch D in the same clock cycle as 1st '163, since (if I understand correctly) it should take an extra cycle for D2 to propagate into flip-flop. Is my assumption correct? Is there any chance for a race condition to happen here (e. g. if top flip-flop somehow latches D with a delay due to clock line capacitance and thus records D2 one cycle earlier)?

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 2:07 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
Update: turns out you were right, with PLD it's easy to fully get rid of 2nd '163 whatsoever. Here's the new code:

Code:
GAL22V10
WSESync

CLK  /RES  /WSE   NC    NC       NC      NC      NC     NC    NC    NC    GND
/OE   A0    A1    A2    A3       SLOWCLK SLOWCS  SR     PHI2  NC    NC    VCC

; A3..A0 - 4-bit counter (1st '163, bit 3 is not used), 0 on reset
; A3 is cleared if WSE is asserted

A0.R  = /A0 * /RES

A1.R  =  A0 * /A1 * /RES
      + /A0 *  A1 * /RES

A2.R  = /A2 *  A1 *  A0 * /RES
      +  A2 * /A1       * /RES
      +  A2 * /A0       * /RES

A3.R  = /A3 *  A2 *  A1 *  A0 * /RES * WSE
      +  A3 * /A2             * /RES * WSE
      +  A3 * /A1             * /RES * WSE
      +  A3 * /A0             * /RES * WSE

; Used to hold PHI2 high if it's high & /WSE is asserted
SR   =  PHI2 * /WSE

SLOWCLK = A2
SLOWCS  = A3

; PHI2 output, 1 on reset
/PHI2.R = /A0   * /A1  * /A2 * /A3 * /SR * /RES
        +  PHI2 * /WSE * /RES

DESCRIPTION
Implementation of gfoot's wait-state clock synchronization circuit, with credits to BDD ("wait-state-enable") & Dr Jefyll (original ideas)
riginal thread: http://forum.6502.org/viewtopic.php?f=4&t=7798


Notes:
- A3..A0 is a 4-bit counter, but the main difference from '163 is that A3 is only counted when /WSE is asserted. Thus A3 & A2 together provide SLOWCS & SLOWCLK.
- SLOWCLK & SLOWCS are aliases for A2 & A3. I've added them only for clarity. By using A2 & A3 directly, extra 2 pins are saved, thus leaving 4 free outputs in total!

Timing diagrams for 8 possible cases (/WSE asserted during different SLOWCLK phase times), generated by my beloved "ginger":
Code:
Case 1

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ______________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 2

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_________________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 3

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 4

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_____________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 5

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 6

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_________________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 7


/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


Case 8

/RES    __/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_____________________________/‾‾‾‾‾‾‾‾

SLOWCLK ‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________/‾‾‾‾‾‾‾‾‾‾‾\___________

SLOWCS  __________________________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___________

PHI2    ‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾\__/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\__/‾‾\__/‾‾


SInce there's more free registers now, one could extend the counter to 5 bits (by adding A4 and changing SLOWCS & SLOWCLK from A3 & A2 to A4 & A3) to have an even slower SLOWCLK.

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 2:39 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Oh that's even better than I had in mind. The trick with A3 is neat. I wonder whether this can also lead to a simpler non-PLD circuit, not requiring two (or more) counters.

I don't think SR needs to be output either, it's only used internally and not registered, so it doesn't need a macrocell.

Probably the most useful thing to do with the extra pins is use them for address decoding outputs, possibly qualified /RD and /WR signals, etc. You could probably fit most of your glue logic in there; or take advantage of the unused pins by moving to a smaller device like 16V8 which might use less power.

One consideration that's come up a number of times here is that the 65C02 is specified to require almost rail-to-rail swing on its clock inputs (and some others) and some PLDs don't guarantee high enough highs, in particular. So that's one reason why some people prefer not to use them for clock generation, though in practic


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 3:02 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
gfoot wrote:
Oh that's even better than I had in mind. The trick with A3 is neat. I wonder whether this can also lead to a simpler non-PLD circuit, not requiring two (or more) counters.

I tried to do this with '163 in Digital, but I didn't find a way to "hijack" its Q3 output so that it's only counted when /WSE is asserted. With PLD it was easy, since the counter is simply 4 registers, so I had full control over when I reset A3. Would be cool if there was a way to do this - possibly with some /PE & D3 trickery that would keep Q0-Q2 unmodified but overwrite Q3 with zero when /WSE is low. Unfortunately I suck at 74xx counters!

gfoot wrote:
I don't think SR needs to be output either, it's only used internally and not registered, so it doesn't need a macrocell.

Main reason why I defined it as separate signal is that I needed to NAND(PHI2, /WSE) to use in PHI2.R equation, but there's no way to use /(PHI2 * /WSE) as an inline term, and converting it DeMorgan-style transforms AND into OR with two terms, which is hard to use in another equation. E.g. "A * B * (C + D)" needs to be specified as "A * B * C + A * B * D", which quickly becomes unreadable and sometimes requires too many terms. But I think it might be possible to do and save one extra cell.

and3rson wrote:
Probably the most useful thing to do with the extra pins is use them for address decoding outputs, possibly qualified /RD and /WR signals, etc. You could probably fit most of your glue logic in there; or take advantage of the unused pins by moving to a smaller device like 16V8 which might use less power.

Definitely! I've been using my GALs for /RD & /WR qualification a lot since GALs in my designs almost always had some spare cells. In fact, in my Deck65, a single '16V8 handles clock stretching, R/W qualification & address decoding for entire system. It's amazing what you can achieve with those small boys. And best of all - they still come in DIPs. I wish ATF1504 was available in a DIP package...

gfoot wrote:
One consideration that's come up a number of times here is that the 65C02 is specified to require almost rail-to-rail swing on its clock inputs (and some others) and some PLDs don't guarantee high enough highs, in particular. So that's one reason why some people prefer not to use them for clock generation

Thanks for pointing - it has bit me in my back before, since my 65C02 went berserk even when I touched its PHI2 with tweezers or with a probe.
For that purpose, I always pass clock outputs from GAL/ATF PLDs through a pair of Schmitt triggers (e. g. 74HC14). Alternatively, clock outputs can be inverted in GAL pin definition section to save one inverter gate.

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 3:28 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
and3rson wrote:
I tried to do this with '163 in Digital, but I didn't find a way to "hijack" its Q3 output so that it's only counted when /WSE is asserted. With PLD it was easy, since the counter is simply 4 registers, so I had full control over when I reset A3. Would be cool if there was a way to do this - possibly with some /PE & D3 trickery that would keep Q0-Q2 unmodified but overwrite Q3 with zero when /WSE is low. Unfortunately I suck at 74xx counters!
Most likely you'd need A3 to actually be in an external D flipflop rather than in the counter. You could try to do the trickery you mentioned, but you also want A0-A2 to count up at the same time, which is obviously very complicated and the whole point of using the counter is not to have to do that!

Quote:
Main reason why I defined it as separate signal is that I needed to NAND(PHI2, /WSE) to use in PHI2.R equation, but there's no way to use /(PHI2 * /WSE) as an inline term, and converting it DeMorgan-style transforms AND into OR with two terms, which is hard to use in another equation. E.g. "A * B * (C + D)" needs to be specified as "A * B * C + A * B * D", which quickly becomes unreadable and sometimes requires too many terms. But I think it might be possible to do and save one extra cell.
In Cupl you can write the term just as you have done there, to define an intermediate value, but not bind it to a pin - then you can use it in expressions later, and it will get expanded out like a macro. Perhaps there's a syntax for that in your assembler too. Cupl does support inverting complex expressions, it just expands it into sums of products internally, so although you need to be aware of the cost, you don't actually need to write it out longhand.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 01, 2023 10:26 pm 
Offline
User avatar

Joined: Fri Feb 17, 2023 11:59 pm
Posts: 163
Location: Lviv, Ukraine
and3rson wrote:
In Cupl you can write the term just as you have done there, to define an intermediate value, but not bind it to a pin - then you can use it in expressions later, and it will get expanded out like a macro. Perhaps there's a syntax for that in your assembler too. Cupl does support inverting complex expressions, it just expands it into sums of products internally, so although you need to be aware of the cost, you don't actually need to write it out longhand.

Just tried WinCUPL today - unfortunately, I find it really hard to use (forcing me to use its editor), and in addition to that it doesn't look well under wine (I'm on GNU/Linux). It's a shame since according to the docs WinCUPL is a really powerful tool. Maybe I'll try getting back to it again some time later.

Meanwhile, I've cleaned up the equations and finally understood how this entire thing is supposed to work (I noticed that trying to understanding what I'm doing greatly helps me to achieve a positive result, haha!)

Code:
GAL22V10
WSESync

CLK  /WSE   NC    NC      NC      NC    NC    NC    NC    NC    NC    GND
/OE   A0    A1    SLOWCLK SLOWCS /PHI2  NC    NC    NC    NC    NC    VCC

; SLOWCS(A3),SLOWCLK(A2),A1,A0 - 4-bit counter (1st '163)
; A3 is cleared if WSE is asserted

A0.R  = /A0

A1.R  =  A0 * /A1
      + /A0 *  A1

SLOWCLK.R  = /SLOWCLK *  A1 *  A0
           +  SLOWCLK * /A1
           +  SLOWCLK * /A0

SLOWCS.R   = /SLOWCS *  SLOWCLK *  A1 *  A0  * WSE
           +  SLOWCS * /SLOWCLK              * WSE
           +  SLOWCS * /A1                   * WSE
           +  SLOWCS * /A0                   * WSE

; PHI2 output, in sync with A0 while /WSE is deasserted
/PHI2.R    = A0 * /WSE
           + A0 *  A1 *  SLOWCLK *  SLOWCS *  WSE

DESCRIPTION
Implementation of gfoot's wait-state clock synchronization circuit, with credits to BDD ("wait-state-enable") & Dr Jefyll (original ideas)
Original thread: http://forum.6502.org/viewtopic.php?f=4&t=7798


We're down to 5 registers total - 4 bits for counter and 1 bit for PHI2. This leaves us with 5 spare pins on '20V8 (or 3 on '16V8), which is still enough to do some additional R/W qualification as well as some address decoding!
I've tested my stuff using the following vector file with ginger: https://github.com/and3rson/ginger/blob ... e/slow.vec

Some notes:
- I've removed the reset signal, since we need to keep SLOWCLK & PHI2 rolling during reset in order to properly initialize CPU, VIA, etc.
- I've seen somewhere on our forums that GAL registers are initially set to 1, and that's what ginger also assumes. Thus I think that reset is not necessary in our case.
- PHI2 is now always in sync with A0 while /WSE is deasserted. This also made the equations simpler. Also, all internal registers (4 counter bits & PHI2) do not lag behind each other, so no need for additional internal flip-flops.

The diagrams look nice, so now off to a real hardware test. :D
Code:
Case 1

/RES    ___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

SLOWCLK --_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

SLOWCS  --_______________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________________

PHI2    --___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾


Case 2

/RES    ___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

SLOWCLK --_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

SLOWCS  --_______________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________________

PHI2    --___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾


Case 3

/RES    ___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

SLOWCLK --_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

SLOWCS  --_______________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________________

PHI2    --___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾


Case 4

/RES    ___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

/WSE    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾--

SLOWCLK --_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

SLOWCS  --_______________________________________________________________/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\_______________________________________________________________

PHI2    --___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾\___/‾‾‾

_________________
/Andrew

deck65 - 6502 slab with screen and keyboard | ПК-88 - SBC based on KM1810VM88 (Ukrainian i8088 clone) | leo80 - simple Z80 SBC
nice65 - 6502 assembly linter | My parts, footprints & 3D models for KiCad/FreeCAD


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: GARTHWILSON and 52 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: