I thought I'd post an update. The breadboard prototype is stable at 25.175MHz again, with 64K RAM and the 16V8 PLD performing the decoding to decide whether to stretch the clock or not. This was rather hairy as I discussed a bit before. One big constraint has been the rise time for the high address lines of the CPU - on the breadboard I've seen times as bad as 20ns, though I think all the chips concerned are TTL-compatible so don't need it to rise all the way. In any case this restricts the maximum clock speed - the addresses start to transition shortly after the CPU clock falls, and I need to know whether to stretch the clock before it falls again, with a margin to account for the response time of the OR gate that's feeding the clock signal to the CPU. Adding the numbers up, this didn't sound too bad, but it wasn't working because my clock stretch circuit wasn't very well designed - it was originally meant to be a RDY control circuit until I repurposed it, and I think it suffers a bit from that.
There are really two things the circuit needs to know - it needs to know when an I/O operation is starting (so it can set a flipflop to stretch the clock) and it needs to know when the operation is complete (so it can reset the flipflop). The first one needs to get done somewhat before the clock would normally fall; the second one is less sensitive, but needs to happen at some point while the unstretched clock (CLK) would be high, so that when it resets the flipflop the CPU clock doesn't immediately respond with a short low pulse. In theory this could just be triggered by CLK being high, though in practice it seems to need a slight delay.
What I've ended up with is this:
Attachment:
schematic-64kpld-2023-09-10.png [ 36.54 KiB | Viewed 18209 times ]
No, I'm not very happy with it! It does work though, passing my RAM soak test and the Dormann tests consistently. I also tried applying heat to see if that destabilised things, using a hairdryer focused on the CPU, RAM, and PLD - it seemed fine though, at maybe 40C-45C according to my thermometer. The ICs were too hot for me to touch.
The changes here are that the OR gates are now HCT rather than AHCT, and there's an AHCT inverter that's taken over the A15 inversion (as it's a bit faster than the PLD was for that) and added some extra delays to DCLK2 and DPHI2, which push them towards the falling edge of PHI2. In theory I don't think DCLK2 needs all these delays, it could be driven straight from DCLK, but this is the configuration I stress-tested.
So I am happy that the 64K version is working, but I'm not sure if I want to commit this particular nonsense to a PCB - at the very least I'll need to make these delays configurable with solder jumpers or something like that. It may be better to just take another pass at this bit of the circuit from scratch as I think there's a simpler way - perhaps starting the timing from DPHI2 which needs to rise a little before the falling edge of PHI2, so it may make sense for that delay to come from a gate - rather than chaining gates together to try to delay for nearly the duration of PHI2 being high.
A related thing I've been working at is making it easier to connect the logic analyser to this circuit. Because of the clock stretching, and the high frequency, it is tricky getting it connected in a way that means it samples the bus at the right time and has its setup/hold times accounted for. To make this all much easier I built a circuit to go between the two, that latches the data bus and holds its value for the logic analyser to sample later on. This means the logic analyser needs to be triggered somewhat later than usual, and in turn this means we also need to latch the clock-stretching signal (roughly equivalent to RDY) that the LA needs in order to know whether or not to ignore the current clock cycle. I probably ought to latch RESET as well, but that's only being used to prevent data capture during long resets, so not very important.
This is the circuit I've ended up with - again involving another long delay line of inverters:
Attachment:
schematic-logicanalyserhat-partial.png [ 21.76 KiB | Viewed 18209 times ]
CLK is my main circuit's CLK signal, which preceeds PHI2 by at least an OR gate delay. So here, five inverters after the fall of PHI2, the internal clock line rises and clocks the register and D flipflop. Then one more inverter later the clock signal to the logic analyser falls, after which it samples the signals.
This particular configuration is what it took to work with my PCB prototype, at about 34MHz. It is not connected to the CPU's main data bus - it's connected through a further bus transceiver (74AHCT245) and I think that's why some of the extra gate delay was required here.
Again it feels like this delay needs to be adjusted for different circumstances and clock speeds. I am thinking of making this into a HAT (hardware-attached-on-top) for the logic analyser, to make it convenient and easy to connect, and maybe using jumpers to control the delays. Perhaps adding the possibility of even longer delays as well.
I also designed and prototyped an SD card interface module. I've usually done this through a 6522 in the past, but it can operate more quickly and with simpler code if the hardware has more direct support for it.
Attachment:
20230910_144710.jpg [ 4.57 MiB | Viewed 18208 times ]
This uses another ATF16V8 PLD (probably the B variant, 15ns) along with an input shift register (74HC595) and an output shift register (74HCT166). There's also a 74HCT139 decoder as I ran out of macrocells in the PLD and didn't want to swap it for a more power-hungry one. Like my TTL serial module, this provides a fairly standard CPU interface - one memory-mapped I/O port for reading or writing data, and a control port where writes enable/disable the SD card's chip select, and reads read a flag saying whether or not the module is ready to read/write another byte. It can also generate interrupts when it needs attention, but I've mostly used it with polling so far:
Code:
sd_waitbyte:
; wait for busy flag low
bit SD_STAT
bmi sd_waitbyte
rts
sd_readbyte:
jsr sd_waitbyte
lda SD_DATA
rts
sd_writebyte:
jsr sd_waitbyte
sta SD_DATA
rts
For a PCB implementation of that I should probably put the card socket and level-shifters etc on my own board, rather than using an off-the-shelf 5V adaptor module.
I'm happy to find that I can bolt on things like this fairly ad-hoc without worrying about wiring quality and signal integrity, and without causing any stability issues for the rest of the system - it is very convenient and makes prototyping new things very simple. I will probably do my floppy disc interface next.
Finally, I've been considering how I want to add video output support - in particular, ways of doing it without it being constrained to the slow I/O bus. At the moment I'm hoping to make it work through a narrow command-driven interface using just a few write-only locations that are not in the I/O address space - maybe a in zero page, for a little extra speed, or maybe just somewhere else in the RAM address space. Possibly an entire page, so that the address bus can do some work too, but I'm keen to avoid wiring a lot of signals through. Because these ports are write-only, the CPU doesn't need to wait and synchronise on a response - it does need to wait long enough before sending the next byte, but can use that time to get the next byte ready.
I want to make this work asynchronously this time, so that the CPU and I/O clock speeds are not tied in any way to the video pixel clock. The cost of synchronizing access is kind of high though - with two D flipflops between the clock domains I think it will cost at least 2 cycles of the video system before it actually reads the data, by my current calculations, it will be equivalent to about 2MHz communication rate with the video board, which is lower than I wanted. However through the use of FIFOs (74HC40105) I believe I can improve this significantly, and then the CPU can queue up 16 bytes of data at a time before having to wait.
Edit - one more thing regarding heat stress testing - my PCB prototype runs at 34MHz using the DS1086Z programmable oscillator, but fails my memory test after a minute or so of hairdryer heat. At 32.768MHz using a standard oscillator it seems completely stable under hairdryer heat, for several minutes with a thermometer saying the air temperature was 50C. The heat was focused on the CPU module, blowing hot air across both sides of the board (the RAM and some glue logic are on the back).