You have a
clock domain crossing problem due to the inclusion of slow ROM, slow I/O and fast video. A partial solution to this problem is to shadow ROM. Meanwhile, people who have implemented systems without video advocate clock stretching. You are rightfully wary of this technique. The trivial case is incompatible with processor/video running on opposite clock phases. The extended case requires additional hardware which may not be suitable in your design. It might be helpful to enumerate a subset of possible options:-
- Slow ROM: Anyone who is familiar with 1980s computer hardware will associate the magic figure of DRAM with 150ns RAS with an operating frequency of 4MHz. Using contemporary glue logic, 150ns ROM might obtain 5MHz. Budget 30ns for glue logic and CMOS loads, an extra 30ns if using 74x688 for address decode and 5ns per buffered parallel bus card. (Unbuffered cards count as multiple loads.) For your case, 200ns is optimistic and this may fail when all parallel card slots are occupied. Understandably, unshadowed slow ROM is not your preferred option. However, parallel boot ROM speed plus signal load limits the maximum operating frequency at boot.
- Shadow ROM: In this case, the initial task of the processor is to copy firmware to RAM. This requires either booting the processor in a slow mode or clock stretching. Either technique may or may not be compatible with the video system. Thankfully, this is not a problem because the video system may be initialized after firmware is shadowed.
- Boot with slow clock: Software defined clock speed. System boots at 4MHz or slower. Firmware is copied to RAM. Processor/video frequency is switched to faster setting before video is initialized. I am unnerved by this option because clock switching may create signal glitches which may crash a system and these are Heisenbugs which are difficult to replicate. It is also possible to switch to an absent clock and hang the system. Regardless, an idle system may be dropped to the minimum frequency to reduce energy consumption.
- Clock stretching: This covers a subset of clock domain crossing. I don't find clock domain crossing intuitive. I suspect you don't find it intuitive either. Thankfully, the most important aspect is to recognize the problem rather than solve it. In particular, it is not often acknowledged that there are multiple tiers of solution. It is relatively easy to solve this for a single core system with no video. It is slightly more difficult to solve this for a dual core system with no video. Solving this problem with video divides into a number of cases which include a split bus (possibly with multiple banks of RAM), disallowing access to slow peripherals during video display, "snow" on screen and video memory which is not in the processor's memory map.
- Split bus: Other systems may have an arrangement where video hardware has priority access to video RAM and the processor may concurrently access its own bank of RAM. The interleaved arrangement of 6502/6845 (and similar) makes this redundant. However, problems arise when 6502 is required to depart from this interleaving. This can be solved with tri-state buffers between the processor and the video RAM and, unfortunately, clock stretching. This allows the video system to make multiple accesses to video RAM while the processor makes one access to slow ROM or slow I/O. Specifically, without a Chip Enable signal, video RAM ignores addresses directed to slow ROM or slow I/O. Meanwhile, the video system never displays transitory junk obtained from unwanted sources.
- Secondary bus: This covers a different subset of clock domain crossing. In this case, fast ROM and fast RAM are memory mapped but slow I/O is accessed via latches and buffers. Although timing glitches remain possible - especially with asynchronous clocks - there is no restriction on the frequency ratio between clock domains.
To handle the cases of boot ROM, slow I/O and consistent timer state, my personal preference would be fast ROM and a secondary bus. This requires minimal hardware and provides considerable flexibility. (I'm not adverse to memory mapping a serial ROM or using a toy processor to copy serial boot ROM into fast RAM. It is merely a matter of cost and reliability.) You may be inclined to shadow slow ROM, allow a processor to set its own speed and provide an interrupt handler to access slow I/O between video display cycles. This requires the least hardware and any deficiencies can be fixed in software.
BigaEd on Tue 29 Dec 2020 wrote:
if the clock isn't regular, is there any provision for fixed-frequency timer/counters?
The video hardware is in the fastest clock domain. It is typical for video hardware to generate vertical blank interrupts or horizontal scan line interrupts. Problem partially solved.
jfoucher on Tue 5 Jan 2021 wrote:
the only parallel ROM I could find that was not outrageously expensive was 150ns, so why couldn't it be at the other end of the bus ?
My suggestion for slow ROM directly adjacent to the processor covers multiple scenarios. In the first set of scenarios, a surface mount slow ROM may be shadowed (or clock stretched). Due to similarities of pin-out, a slow boot ROM may be at the bottom of a DIP fast ROM stack which possibly consists of one fast ROM in a ZIF socket. A slow boot ROM may also be at the bottom of a fast DIP RAM stack. Although the timing of the shadowed boot ROM is not critical, this arrangement minimizes the horizontal and vertical distance of the furthest RAM while also minimizing board area. In the final scenario, slow ROM and slow I/O may be on one side of a split bus while fast RAM is more closely associated with the video hardware.
Only one ROM must be executable at boot and that is the base firmware. For an automatically configuring parallel card bus system, all device driver firmware may be supplied on serial ROM. Whether or not the base firmware is shadowed in RAM, the device drivers can be transferred over a clocked serial connection and executed from RAM. This may require an additional fraction of a second to initialize but it significantly eases bus timings while reducing cost. To further reduce cost, all shadowed firmware may be de-compressed using one common algorithm.
Sheep64 on Mon 4 Jan 2021 wrote:
without space for video DRAM
There's a
guy who makes a really compact 4MB Static RAM module. It requires less space than DRAM and it doesn't require refresh cycles. Indeed, by using such a module, a separate bank of video RAM may only require about one square inch.
Sheep64 on Mon 4 Jan 2021 wrote:
FPGA for video is a hedged bet
I briefly investigated Lattice Semiconductor's iCE40 FPGAs and now I understand why people are so enthusiastic about them. They are cheaper than 6502, faster than the official speed of 6502, work with fully open source tools which can be run from command line or Makefile. The largest variant currently has 7680 four input LUTs. Arlet's popular soft core requires about 4000 six input LUTs and this may or may not fit. This is probably not sufficient for a VIPER and VIC-II but it is definitely suitable for something like a
Diablo engine and Atari ANTIC. Indeed, given the bus timings, there is an inherent compromise between processing and display. Therefore, creativity may be encouraged by deliberately avoiding a large, expensive FPGA.
cjs on Mon 4 Jan 2021 wrote:
I'm not sure that you're clear on the reasoning behind the 100×100 mm limit.
I briefly investigated board manufacture. Excluding introductory offers, excluding heavy metals which cannot be sold in Europe and excluding suppliers in countries where the natives murder each other in significant quantities or are otherwise likely to have disrupted supplies, the best I could find was USD115 for three credit card size boards. That might have been four layers and including air mail delivery. It was about half price with lead and cheaper with introductory offers. I was surprised to see a Japanese company listed but I assumed that it was a scam, outsourced or an introductory price. Actually, they all looked like scammers, spammy companies which spend heavily on advertising and intermediaries inflating the costs of lesser known parties. It makes me much more inclined to use a local company which I met at a trade exhibition. Even if it costs more than USD115, this is vastly preferable to having my bank account emptied and not receiving circuit boards.