Another thread prompted me to consider the practicalities of using currently-available DRAM with current WDC CPUs, preferably at full speed.
For starters, if you're using a 6502, just use SRAM. At the memory sizes you typically install in such machines, it's cheap enough to not care about. So I'm going to assume the use of a 65816, which may also help avoid some of the performance tradeoff.
By Mouser prices, there is a potentially significant cost saving here if you intend to install a lot of RAM, to the tune of nearly €3.50 per 2MB, minus the extra cost of
babysitting the DRAM versus the very simple method of driving
SRAM. For a full 16MB array, that adds up to nearly €28, and you can get a fair amount of 74-series logic for that. The question is, how much 74-series logic do we need to do the necessary babysitting, and how fast can we go with it?
I'm basing my initial analysis on the ISSI 50ns 1Mx16 EDO chips currently available from Mouser. These are 3.3V parts, so I'll be working to WDC's 8MHz 3.3V timings.
The basic 65xx bus contract is that addresses are valid at the end of Phi1 (when Phi2 goes high) and that data is valid in either direction at the end of Phi2 (when Phi2 falls). At 8MHz, the full cycle takes 125ns, so I'll round down and take each full phase as 60ns. But DRAM needs more than two distinct phases to operate, so we could also consider quarter-phases of 30ns each; we'll need a double-speed clock to regulate those.
At 3.3V, however, the address and data setup times are 40ns. That means we can't rely on addresses being valid and stable at the midpoint of Phi1, only at the end,
nor can we rely on write-data being valid and stable at the mid-point of Phi2. We need addresses and data valid before we can even start feeding them into the DRAM, because its control signals are edge-triggered.
So we must assert /RAS with the upper half of the address at the end of Phi1, /CAS (and /OE for read cycles) with the lower half of the address at the midpoint of Phi2, and /WE (for write cycles) at the end of Phi2 (a "late write" cycle in DRAM terms). We have to
deassert all control signals at the end of Phi2 on a read cycle to avoid conflicting with the bank address multiplexing, but at the midpoint of Phi1 on a write cycle.
As far as I can tell, the timings all theoretically make sense with that scheme, for both reads and writes. The timing relationships need careful management in some places, particularly the switch on the DRAM address bus between /RAS and /CAS. We don't need to worry about delays while switching DRAM pages, because we can issue a full RAS-CAS sequence for every CPU cycle. But this still leaves open the question of how to handle refresh.
Side note: this is a 16-bit DRAM, but it has separate /CAS inputs for the high and low bytes, so you can treat it like a pair of 1Mx8 DRAMs stuck together, with the DQ lines commoned and the appropriate /CAS line selected by one of the address bits, perhaps A0. The DQ lines will remain high-impedance and impervious to writes if the corresponding /CAS line stays high.
Refresh is greatly simplified by the fact that these DRAM chips include an internal refresh row counter, which means we don't need to intervene on the address bus with our own row counter, only instruct the chip when to perform a refresh cycle (by falling /RAS while /CAS is already low). Classic micros, whose DRAMs lacked this feature, often co-opted the CRTC as a refresh counter, and designed the video memory access sequence to meet refresh requirements; we don't need to bother with that.
These DRAMs also support "hidden refresh", but I don't think we can use that if we want high performance. The cycle time of /RAS is limited by the sum of tRAS and tRP, which is 80ns for this device, but we need to perform two /RAS cycles during one CPU cycle if we want hidden refresh, and 160ns is much more than the 125ns we can even theoretically tolerate. So we need to find or make opportunities to perform a CBR (/CAS before /RAS) refresh in which no data transfer occurs.
The 65816 obliges by identifying "internal operation" cycles by keeping both VDA and VPA low; these are cycles where the CPU isn't doing any data transfer. These signals are valid at the end of Phi1, so we can hold /OE and /WE high for that cycle, assert /CAS at the end of Phi1, and assert /RAS at the midpoint of Phi2. We can also do the same in cycles where the CPU is accessing something other than DRAM - or, at least in theory, when it accesses a
different DRAM (though this will increase the complexity of the glue logic). The DRAM chip will ignore address lines and keep data lines high-impedance during a refresh cycle.
However, we can't guarantee that the 65816 will use enough "internal operation" cycles or I/O device accesses to satisfy tREF for the DRAM; we need to cycle through all 1024 rows within 16ms, or 128,000 Phi2 cycles at 8MHz, or 125 cycles per row on average. To work around this, we can set up a 0-63 (refresh every 64 cycles) or 0-99 (refresh every 100 cycles) counter, incremented by Phi2, and reset when we can do a refresh with no performance penalty. If the counter overflows, it will assert its carry signal to trigger a forced refresh cycle. This is the same as an "internal operation" refresh cycle, except that RDY is also negated to keep the CPU out of the way. RDY is valid at the end of Phi2.
So that, I think, establishes that we *can* use modern DRAM as a cost-saving measure, with a feasible amount of extra glue logic and very little performance penalty. I think I'll leave checking the 5V 14MHz options and figuring out exactly what that glue logic looks like for another day.