BigEd wrote:
Looks like the NMOS 6502 puts the low byte of the operand through the ALU: it comes out in the right cycle to be used as the low address byte and to be loaded into the PCL.
Arlet wrote:
My core also holds the PCL byte in the ALU for a one cycle delay. The nice thing is that all the muxes are already there for other purposes, so it only needed proper control signals.
Thanks for the comments gents.
An ADDER output hold register is interesting to me and I can see how it facilitates this operation. That said, I'm convinced I'm still failing to appreciate fully the implications of this approach. I certainly see how having latches in the data path can be beneficial (given that the ALU is likely the critical path in the processor). For example, I latch memory reads into the input of the ALU so that a memory read can be performed in one cycle and the ALU operation in the next (during an ADC abs, for example). In all other respects, however, the ALU is a pass-through, from source register through ALU and back to target register in one cycle. The same is true for the INC16 circuit. Somehow, this approach seems more intuitive to me and the delta in propagation delay of going to the target register directly vs. an ALU output latch seems small. I was reassured to note that Dieter's X02 also seems to dispense with an ALU output hold register (and employs uni-directional buses as well by the way).
Lot's of different approaches here so more study ahead for sure ...
On that note, I was excited to discover some inefficiencies in my BCD logic when looking at Arlet's core. The key detail was that my Half Carry was fed directly from the low-adder to the high-adder, rather than being OR'ed with the BCD carry (i.e. a BCD half-carry needs to be generated when the low-nibble has been BCD adjusted). I suspect my confusion came as a result of having split the low and high nibble BCD adjustment over two cycles (because it just took too long otherwise). In any event, my logic was more complex and slower.
The improved circuit still requires an extra-cycle to complete though ... uhmm, hold on ... could this be the pay-off for an ALU output hold register?! i.e., enabling better pipelining to spread ALU operations over more than one cycle but still keep to the NMOS 6502 cycle count - it's good motivation to continue to explore.
By the way, I've also been spending some time looking at the state machine in Arlet's core - very nicely arranged, and it clearly illustrates common steps in various instructions types. If I ever get the opportunity to revisit my original gate-level instruction decoder and sequencer, this will prove an invaluable guide. I need to read up on Verilog some more as well - such a powerful tool!