A few months ago I presented a 16 x 16 Signed Multiplication routine for my M65C02A core and Bitwise provided one for th 65816. Initial testing provided good results, but Bitwise indicated that he wasn't getting the expected result for a product of -32768 x -32768.
Bitwise wrote:
I had a go for the 65C816. This works for normal values but not -32768 - Not sure where the extra bit P can be held.
Some time ago I had decided to implement an arithmetic right shift instruction for my M65C02A core. The intention was to use it improve the time required for multiplications using the Booth algorithm. A previous attempt at a
8 x 8 signed multiplication routine for the 65C02 using a Booth algorithm required keeping a "guard" bit for the sign bit.
In my M65C02A core, I also implemented an instruction to reverse the bits of a register which allows the C flag / bit to function as the Booth "recoding" memory bit and the N flag to sense the state of the next bit in the multiplier. The multiplier is shifted left after reversing it which allows the C flag / bit to hold the previously shifted value and the N flag to test the state of the next multiplier bit. A simple SEC instruction at the beginning of the routine correctly initializes the Booth "recoding" memory bit.
Regardless, a guard bit is required for the sign bit of the product register. In an hardware implementation, an additional bit duplicating the sign bit is very easily included. Because of the number of additional instructions and memory cycles / locations required to implement a guard bit, I decided to use the overflow flag V to indicate those cases where the addition / subtraction of the multiplicand to the double length product register overflowed the representation. I restore the sign of the product during the arithmetic right shift as described below. In short, I think that the sign is determined from the N and V flags rather than the C and V flags:
MichaelM wrote:
BitWise:
Something you said regarding the -32768 * -32768 case made me go back and check my implementation. I found that my result was 0xC000_0000 instead of the correct 0x4000_0000.
In getting the bugs out of the M65C02A Python model and the _imul() source, I had removed a feature of the M65C02A-specific asr.w a instruction that enables the restoration of the correct sign bit when the preceding add/sub operation of the Booth multiplication operation generates an arithmetic overflow. When -32768 is added to -32768 in the upper product register, the result is 0x8000_0000. The V flag correctly indicates an overflow.
I had modified the asr.w a to restore the sign by using the V flag and the N flag. If V is set, then the correct sign bit is the complement of the N flag. (If the V flag is not set, then the sign is the same as the N flag.) With this feature restored to the asr.w a instruction, the _imul() does correctly compute the product of -32768 * -32768.