What a terrific thread this is turning out to be Squonk! It's wonderful to see this survey take shape, and to follow the journey as it evolves. I'll continue to read with interest.
Regarding Elmore delays, they do seem to be a useful model here. It's worth noting that the actual delays I observed in my experiments were much longer than the estimates I derived from Elmore delay calculations (
viewtopic.php?p=69813#p69633). Nevertheless I suspect that the central insight (namely that RC delays along the FET-Switch carry chain multiply rather than add together) very much applies in this case.
Quote:
The asymmetry you mention, less resistance to GND than VDD. ... Either might propagate faster from a pre-charged high state.
I like the creative thinking here! The C74-6502 "pre-charged" certain values to accelerate the ALU's Decimal Adjust logic. I don't yet see how this might be used in the FET carry chain, but it sure would be exciting to find out.
While I'm here, I did want to highlight one ALU design consideration that may not be immediately apparent. And that is the important issue of when control signals are required to be available. This is a critical issue for pipelined ALU designs (like the one used in the
C74-100, for example), but may also impinge on non-pipelined CPUs with critical timing requirements.
In particular, some ALU designs require that certain control signals be available to the ALU at the same time as (or prior to) the input data. (This is often the case for ALUs that treat control signals as input values to MUX "lookup tables", for example). The consequence is that whatever logic is required to resolve these control signals is pushed upstream in the datapath, either earlier in the cycle (or perhaps to a prior cycle in a pipelined design).
Of course decoding and marshalling of control signals takes time, and unfortunately this processing often ends up on the critical path. In those situations, whatever speed gains the ALU itself delivers should be considered in light of any pre-processing that is required.
By contrast, other ALU designs merely require control signals to select one of several values computed by the ALU. This allows control signals to be resolved
even as the ALU itself is working. This has the significant advantage of allowing the ALU to begin work immediately, as soon as data is available, rather than being held up by burdensome control signals.
In the case of the pipelined ALU in the C74-100, this proved a decisive consideration. In that design, data is available to the ALU after just a single Clk-to-Data tpd. Control signals are resolved in parallel with the ALU, and are made available just in time to select an appropriate result to be latched at the end of the cycle.
Alright, I hope the preceding is helpful.
Congrats once again on a great thread.
Best,
Drass