BillG wrote:
Chromatix wrote:
Using a numerical trick:
Code:
LDA S1
EOR #$7F ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA W0
LDA #0
ROL A ; carry flag to high byte
STA W0+1
I make that 19 cycles, 14 bytes if arguments are in ZP.
Wow!
Arthur C. Clarke said, "Any sufficiently advanced technology is indistinguishable from magic." That certainly applies here.
I will have to study it in much more depth. It works with the few values I threw at it.
Thank you.
The comments do hint at how it works, but I could give a better explanation for those less used to juggling bits.
A signed byte in two's complement form, as I assume S1 is, has a total range of -128 to 127 inclusive. In the range -128 ($80) to -1 ($FF), the most-significant bit is set, while in the range 0 to 127 ($7F) it is cleared. Hence the most-significant bit functions as a sign bit. Within both ranges, the lower 7 bits increase monotonically as the value represented becomes less negative and more positive.
Two's complement is far from the only way to represent a signed byte. In the early days of computing there was one's complement (which had two representations of zero). But if you simply invert the sign bit of two's complement, you get to a form known as "excess 128". Here, if you treat the byte as unsigned, it has a continuous range of 0 to 255, where the value that used to be -128 is now 0, and that which used to be 0 is now 128, and so on. To get to the true value you would subtract 128. IEEE-754 single precision uses a closely related format called "excess 127" for the exponent field, in which the value 1.0 has a zero exponent which is stored as $7F. This is useful because even on the 6502, unsigned integers are easier to deal with than signed ones.
Putting that aside for a moment, the method to negate a two's complement value is to complement
all the bits, then add 1. You can easily see that doing this to zero yields zero again, via $FF (-1). It is the complementing of the low bits that reverses their value sequence.
To perform W0 = 258 - S1, we effectively need to
add the constant 258 to a negated S1. If we combine the operations for negation and conversion to excess-128, we end up with complementing all
except the sign bit, then adding 1. If we then added 258, we would need to immediately subtract 128 to obtain the true value. Combining the three latter additive operations gives us W0 = (S1 ^ $7F) + 131, which is just two ALU operations, with the intermediate value effectively being (-S1) in excess-127 format. Since the final result is potentially 9 bits wide, we then only need to extract the carry bit into the high byte.
The 6502 performs all of the required operations quite efficiently, such that the only thing I could really ask for is a dedicated instruction for clearing the accumulator, which would in practice be no faster and only one byte smaller. The only wrinkle is the idiomatic need to clear the carry before adding (2 cycles) and that ROL A also has a dead cycle. Aside from that, all 6502 cycles are either fetching program bytes or accessing operand memory. It's hard to see how another 8-bit CPU could improve significantly on that.
I see from BillG's examples that the 6800 benefits mostly from not having to clear the carry bit, and this makes the code slightly more compact. However, the time saved is then absorbed by slower write operations, with one dead cycle each. I can see that being a significant handicap in practical code. The 6809 has the same handicap, but gets to incur it only once with a 16-bit write operation involving both accumulators, so is slightly faster overall. The 8080 looks much slower until you remember that the Intel clock ticks four times per memory access, so normalised for memory accesses it is again comparable to the others. I have to say that the 8080 assembler is spectacularly opaque.