The technology of the time did not allow the generation of fancy clock multipliers. I would venture to say that an examination of the Visual6502 project will show that there are no such clock generation circuits employed in the NMOS device. Furthermore, from the descriptions and circuit diagrams provided by those evaluating the actual chip implementation, I gather that latches and transfer gates are the predominant bi-stable elements and bus switching circuitry employed in the device itself.
If I remember correctly, the most staggering statistic I've gleaned from the Visual6502 project is that only about 3600 transistors were used in the implementation of the original microprocessor. This is a remarkably small number of transistors for such an elegant processor. At least two transistors are required to build a latch, and an edge-triggered FF requires at least a master and a slave latch plus an additional transitor or two for clock pulse steering and/or inverting to control the two latches.
Assuming that 6 transistors are used, then 3600 transistors can be used to construct 600 FFs. The various registers, A, X, Y, S, PCH. PCL, and P, require at least 6*(8*6+6) transistors, or about 324 transitors. Additional FFs such as Memory Address registers, and Data Input and Output registers, and other temporary working registers likely brings up the number of FFs to 6*8*6, or an additional 288 transistors, for a total number of 612 transistors used for D-style FFs. Various gates have to be constructed to pass the various registers around on one or two internal busses. Assuming 2 eight bit busses, and a total number of registers around 112, then an additional 224 transistors are used as bus drivers. At this point, 836 of 3600 transistors have been used for just registers and bus drivers. The remaining 2764 transistors would be required to implement the ALU, the sequencer, and any other random logic that may be required. The control ROM probably requires some for buffering, but none for the implementation.
The point is that a two phase clock, where one phase enables some latches, and the other phase enables other latches, will result in a more efficient implementation from the perspective of transistors. Timing and layout of the circuits are certainly more critical. Transfering data from one latch to another with latches is more difficult. A non-overlapping two phase clock, like that of the 6502, is essential to type of operation.
Many of the fastest computers of that time (CDC 6600 and CDC 7600) were built with latches using pulsed enables. One very interesting control approach used by DEC in the PDP 6 was a delay line where the taps essentially determined the control transfer points in a sequential circuit; a form of microprogramming (according to DEC). A pulse was entered into a delay line and at predetermined delays which matched the performance of the preceding logic circuits, a control pulse would be extracted and used to initiate the next operation of the sequential logic circuit. It's like using a series of '123 one-shots to march a computational function along to completion with the pulse from the last one-shot latching the result into the final register/latch.
As our ability to integrate transistors has improved, we've adopted more stable and easier to control design methodologies. In the process, we've lost a significant amount of technology. Latches are deprecated as design elements, even for those problems where they are clearly superior. For example, an 8031 microcomputer with its multiplexed address and data bus would require extremely fast ROM/RAM if instead of a '373 octal latch, the least significant byte of the address was demultiplexed using a '374 octal register. The additional address setup provided by the latch allows the use of substantially lower speed memory devices.
I tend to think of multi-phase systems like the 6502 in terms of latches and transfer gates. When this approach is used, the first half of the cycle is the setup phase, i.e. loading of the master latch in a FF, and the second half of the phase is storage phase, i.e. the transfer of the master latch to the slave latch in a FF. These two operations can not be counted as separate; they are both required and as such form a single cycle. As separate, independent operations, they are incomplete.
Using registers (FFs) is fine, and all of our FPGA/CPLD implementations have to use registers since the delays between latches are harder to control in an FPGA/CPLD. (It is virtually impossible to provide non-overlapping latch gating signals in a PLD for a large number of latches.) Invariably, our PLD-based 6502s use many more resources than Peddle and company used when they implemented the NMOS 6502. I attribute this primarily to fact that we tend to use registers and multiplexers rather than latches and pass transistors.
I am always amazed at the elegance of the implementation of the early microprocessors. I find it very instructive, from a hardware design perspective, to study these old architectures. Their implementations are certainly nothing that a recent graduate will ever have been taught, and that is definitely a problem.
_________________ Michael A.
|