Arlet wrote:
For BCD operations, the idea is to store the output of the binary addition in A, and at the same time, store the BCD adjustment in M. An extra cycle follows where A+M is calculated and stored in A again.
I was thinking that the BCD adjustment term was either 0, +6 or -6, which is equal to +10. That's how I've done it before in the FPGA/CPLD verilog code.
Thinking about how to optimize the logic for 3 different bit patterns (0000, 0110 and 1010), I realized I could also use the ALU as a subtractor, by inverting B input and setting carry input flag. That way, I only need 0 or +6 as adjustment digits.
Another optimization is that I no longer split my 8 bit adder in two 4 bit pieces. For instance, suppose you add 16 and 16. If you just add them in binary, you get 2C, and then if you add 6 just to lower nibble you get 22, but the answer is 32. In order to fix that, I had split the adder in two 4 bit pieces, and then generate a carry from lower digit into upper digit whenever the result was bigger than 9. So 16 + 16 would first result in 3C, and then the lower nibble would be adjusted C+6=2, resulting in correct answer of 32.
Producting the decimal carry in the middle of the adder adds some gate delay in the ripple chain, plus some extra chaos in the middle of a regular structure.
More or less by accident I tried it without the decimal carry detection. You get 16+16=2C as temporary result. But then when you add the +6 correction to the lower nibble, and just let the carry propagate normally, you end up with 32. And it also works for subtraction, thanks to the +9+Carry trick above. With the old +10 adjustment, it would not work.
Edit: there is a caveat. The carry out is slightly more complicated, because you can get a carry during first addition or during second. This is easily fixed by just OR-ing the second carry into the carry flag. And you can use the binary carry out for this, which should save a little bit of time.
Modified BCD code passes Klaus Dormann's test suite. The
Github code has been updated.