Earlier in this thread...
GARTHWILSON wrote:
full address decoding [...] would reduce the maximum clock speed, unless I resorted to programmable logic.
One way to preserve maximum clock speed, even without programmable logic, is to wait-state the I/O. The resultant performance trade-off can be minimized, as explained later.
In the case of a 'C02 system that's lacking programmable logic, clocked at maximum speed, and using the wait-state solution, I/O in zero-page is still a clear win. An '816 system subject to all the same restrictions will have no benefit
performance-wise; however, it and the 'C02 both save program memory by using zero-page/direct-page addressing mode.
Further improvement occurs if any of the restrictions are relaxed (slower clock, and/or programmable logic is available), because wait states become unnecessary. A 'C02 system will get 100% of the benefit of I/O in zero-page. Unfortunately the '816 doesn't have SMB RMB BBS and BBR instructions, but (like the 'C02) it will save one cycle and one byte for each occurrence of LDA STA BIT TSB TRB etc with zero-page/direct-page addressing.
Attachment:
zero-page decoder and wait-state generator.gif [ 18.04 KiB | Viewed 1479 times ]
The circuit above includes wait state logic and full decoding for a 6522 VIA mapped into zero page.
Read-Modify-Write instructions incur only one wait state, not three. This preserves the substantial performance advantage offered by SMB and RMB -- instructions custom tailored for fast I/O! The more general R-M-W instructions such as INC DEC TSB and TRB also incur only one wait state, as do simple reads and writes such as BIT LDA STA AND ORA etc.
If I were building this I'd use
74LVC1G332's for the three 3-input OR gates at the top left. These guys come in a painfully tiny 6-pin DIP but boast
3-ns maximum propagation delay. The 74_138 decoder is available in a long list of high-speed families, including 74AC and 74BCT.
In the timing diagrams, cycle one is the wait state, when the zero-page address first appears. Because no data is transfered in cycle 1, we're freed from the usual stipulation that the VIA CS must be asserted before Phase 2 begins. Instead it's sufficient merely to bring the CPU RDY input low tPCS before Phase 2
ends.
Memory is not yet inhibited, so an unused memory access occurs in cycle one. Since RDY gets pulled low in cycle 1, cycle 2 is the same address all over again -- but this time accessing the VIA instead of memory. In cycle 2 there's no difficulty getting the VIA chip-select asserted before Phase 2 begins, because tADS (the CPU address delay) and the decoder propagation delay have already elapsed -- those signals are stable.
Next we have either the beginning of the next instruction (marked by SYNC), or cycles 3 & 4 and
then the beginning of the next instruction (marked by SYNC).
It's actually the SYNC signal that ends the VIA chip-select -- and here again we avoid the decoder propagation delay. The SYNC signal is passed to the VIA with just a single gate-delay (the 2-input OR). It's always this same path that ends the VIA chip select, regardless of whether the access is read-modify-write or just a simple read or write. That's why the two timing diagrams end with the identical sequence.
-- Jeff