With the rash of "how do I add video output to my computer" threads recently, I've moved up one of my background ideas to be figured out sooner. The basic problem boils down to how to interface a cheap, ubiquitous RAM chip to both the CPU and the video output, such that a sufficiently high dot-clock and colour space is available, and preferably without impeding CPU access to the framebuffer. Stealing cycles from the CPU to service the video output is also out of the question - it was a bad idea in 1980, and an even worse one today.
First some ground truths. A PAL-format 80-column terminal (640-pixel lines), refreshing up to 256 lines at 50Hz, requires a 16MHz dot clock. A cheap 55ns SRAM chip can in fact keep up with that, producing 8-bit pixels, but monitors which will accept such a signal are no longer ubiquitous, and it would be impossible for the CPU to access the framebuffer during the active part of each scanline. Instead, I'm going to design for a 48MHz dot clock, allowing a thoroughly SVGA-style 800x480 at up to 75Hz refresh, which is relatively easy to find a display for. Indeed one of my monitors is natively 1680x1050, so it should be able to pixel-double an 840x525 VGA signal and make it look pretty nice. There are also Display Parallel Interface (DPI) modules out there, such as Pimoroni's HyperPixel, which obviate the need to implement a DAC (though I think that's not very difficult anyway).
To make a 48MHz dot clock would require a 20ns RAM. Such things are available, but from a hobbyist perspective it seems wise to limit the number of signals running at such a high rate to the bare minimum. So instead I'm going to use a cheap 55ns RAM by way of a parallel-to-serial shift register, which reduces the data rate required to 6MB/s per bitplane. An advantage of this approach is that the RAM chip & shift register can simply be replicated to add more bitplanes and thus more colours, while a single instance is already enough to implement a plain text console. A second advantage is that the RAM is actually fast enough to service CPU accesses interleaved with display accesses.
The chief downside is that the only 74-series parallel-to-serial shift registers fast enough for the job are apparently only available in SOIC, not PDIP. But you might be able to get away with a 74HC version if you only want a 16MHz dot clock, and those *are* available in through-hole format. To reach 48MHz, though, I'm picking the Toshiba 74VHC165.
How much RAM do we need? Well, 800 pixels is 100 bytes per bitplane-row, and if we allow for 512 rows (to simplify addressing) that means 50KB per bitplane. That'll easily fit into a 128Kx8b SRAM chip, which is available in through-hole format for a very reasonable price at 55ns. The excess space allows for quite a lot of flexibility with addressing, and we can use that to implement hardware scrolling (both horizontal and vertical) very easily. That would be the responsibility of the scan timing generator, which we only need one of to control all the bitplanes at once.
If we have only a single bitplane, we could get away with fixing the CPU clock to 6MHz (ie. synchronous with the display RAM accesses), and accessing the RAM directly from the CPU. If it's a 6502, we'd need to include a banking mechanism to map 128KB of framebuffer onto a relatively small segment of CPU address space; with an '816 we can just reserve a couple of bank IDs for it. We'll also need to assign a few I/O registers to configure the scan timings and maybe some other things.
But if we want to run the CPU faster, or use more bitplanes (with 6 planes, we can have 64 colours without needing a CLUT), we need to pay more attention to the CPU interface and decouple it somehow. We might often want to paint many pixels in the same colour, but not necessarily in neat blocks of eight; this *can* be done via the CPU, but it's slow and cumbersome since each bitplane has to be read, modified through a masking operation, then written back. But adding some relatively simple hardware can speed this up considerably. Add a colour register, map all the bitplanes to the same CPU address space, and what the CPU then writes is a pixel mask to which the colour should be applied; the hardware takes care of the read-modify-write operation and synchronising it with the display access. Since all the bitplanes are mapped on top of each other, reads have to be disambiguated by a second selector register but are otherwise conventional.
To that end, we observe that the SRAM can be read in 3 pixel clocks at 48MHz (we may assume each cycle is 20ns), leaving 5 clocks for the CPU access. Three of those are occupied by reading the prior value of the pixels, leaving two to complete a write to the same location. Since the address remains the same, we only need the read-to-write turnaround and the write data setup time, which for the Alliance parts totals 45ns. Which, yes, is pushing it slightly, but there is always the option of reducing the refresh rate a little to bring the dot clock down and give a bit more timing margin. I think it's reasonable to assume, though, that the RAM will exceed its specified performance when run at more than its specified minimum voltage, which is 2.7V - even a 3.3V system can be expected to offer a noticeable margin here.
This tight memory access cycle needs sequencing logic, to include a wait-state output to the CPU since the latter may be running at a completely different clock speed. An obvious way to do this is with a fast GAL chip, but I'll also look into sensible ways to do this with only discrete logic. To keep the timing sufficiently tight for 48MHz, I think some 74AHC parts will be involved, but these could be substituted with 74HC parts if run at 16MHz dot-clock. The sequencing logic can be implemented once across all bitplanes, with only the masking logic needing to be repeated.
The horizontal and vertical scan also needs sequencing logic, for which Ben Eater's examples may be instructive. Only one set is needed for any number of bitplanes. I think 74HC parts will work for this, as they only need to count groups of 8 pixels at worst; this also means the horizontal scan only needs 8-bit counters and comparators, though the vertical scan will need 10 bits. I'm thinking of using twin 2-4 decoders to select the scan phases, as these can conveniently drive control signals and present different registers' contents to the comparators.
Of course, now I need to convert the above into a workable schematic…
|