Comparisons and contrasts

BillG · Post by **BillG** » Mon Sep 21, 2020 8:15 am

Here are a couple of prototypes of the same for the 8080. Can this be improved?

 0100 3A 0015        [13] 00043         lda     S0              ; Get variable value
                          00044
 0103 67              [5] 00045         mov     H,A             ; Stash a copy
                          00046
 0104 2F              [4] 00047         cma                     ; Get one's complement
 0105 6F              [5] 00048         mov     L,A
                          00049
 0106 7C              [5] 00050         mov     A,H             ; Recover the variable value
                          00051
 0107 17              [4] 00052         ral                     ; Sign extend it
 0108 9F              [4] 00053         sbb     A
 0109 2F              [4] 00054         cma                     ; Get one's complement
 010A 67              [5] 00055         mov     H,A
                          00056
 010B 11 0103        [10] 00057         lxi     D,258+1         ; Constant plus one for 1's compl to 2's
 010E 19             [10] 00058         dad     D
                          00059 ;  0 := v W0 -> 1
 010F 22 000D        [16] 00060         shld    W0

85 cycles

Code: Select all

 0112 3A 0015        [13] 00063         lda     S0              ; Get variable value
                          00064
 0115 6F              [5] 00065         mov     L,A             ; Stash a copy
                          00066
 0116 17              [4] 00067         ral                     ; Sign extend it
 0117 9F              [4] 00068         sbb     A
 0118 67              [5] 00069         mov     H,A
                          00070
 0119 3E 02           [7] 00071         mvi     A,2             ; Subtract low bytes
 011B 95              [4] 00072         sub     L
 011C 6F              [5] 00073         mov     L,A
                          00074
 011D 3E 01           [7] 00075         mvi     A,1             ; Subtract high bytes
 011F 9C              [4] 00076         sbb     H
 0120 67              [5] 00077         mov     H,A
                          00078 ;  0 := v W0 -> 1
 0121 22 000D        [16] 00079         shld    W0

79 cycles

BillG · Post by **BillG** » Mon Sep 21, 2020 10:36 am

And for the 6809:

Code: Select all

                                  00031 * W0 := 258 - S1;
                                  00032
                                  00033 *   *  0 := v W0 -> 1
                                  00034 *   *  1 L r 2
                                  00035
                                  00036 *      *  2 L c 258 -> 3
                                  00037 *      *  3 - v S1
                                  00038
                                  00039
                                  00040 *  1 L r 2
                                  00041 *  2 L c 258 -> 3
                                  00042 *  3 - v S1
 0029 D6 16                   [4] 00043          ldb    S1        ; Get variable value
 002B 1D                      [2] 00044          sex
 002C 83 0102                 [4] 00045          subd   #258      ; Subtract the constant
 002F 43                      [2] 00046          coma             ; Negate the number
 0030 50                      [2] 00047          negb
 0031 82 FF                   [2] 00048          sbca   #$FF
                                  00049 *  0 := v W0 -> 1
 0033 DD 0D                   [5] 00050          std    W0

21 cycles

dmsc · Post by **dmsc** » Mon Sep 21, 2020 1:55 pm

Hi!

BillG wrote:

Today's challenge is to subtract a signed byte from a constant yielding a two-byte number.

W0 := 258 - S1;

This (untested) code is smaller and faster in the 6502:

Code: Select all

  lda #$02  ; 2
  sec       ; 2
  sbc S1    ; 3
  ldx #1    ; 2
  bcs ok    ; 2    / 3
  bit S1    ; 3
  bmi ok    ; 2 / 3
  dex       ; 2
ok:
  sta W0    ; 3
  stx W0+1  ; 3

Total of 18, 23 or 24 cycles, depending on S1, mean of 23.43 cycles.

As always, hand-written code in the 6502 can be a lot smaller.

BigEd · Post by **BigEd** » Mon Sep 21, 2020 2:14 pm

In this previous post about z80 vs 6502 there's a link to Z80, the 8 bit Number Cruncher which is quite interesting.

Chromatix · Post by **Chromatix** » Mon Sep 21, 2020 2:53 pm

Using a numerical trick:

Code: Select all

LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1

I make that 19 cycles, 14 bytes if arguments are in ZP.

dmsc · Post by **dmsc** » Mon Sep 21, 2020 3:39 pm

Hi!

Chromatix wrote:

Using a numerical trick:

Code: Select all

LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1

I make that 19 cycles, 14 bytes if arguments are in ZP.

Much better, and this is also applicable to the other 8 bit micros.

Have Fun!

BillG · Post by **BillG** » Tue Sep 22, 2020 3:49 am

dmsc wrote:

BillG wrote:

Today's challenge is to subtract a signed byte from a constant yielding a two-byte number.

W0 := 258 - S1;

This (untested) code is smaller and faster in the 6502:

Code: Select all

  lda #$02  ; 2
  sec       ; 2
  sbc S1    ; 3
  ldx #1    ; 2
  bcs ok    ; 2    / 3
  bit S1    ; 3
  bmi ok    ; 2 / 3
  dex       ; 2
ok:
  sta W0    ; 3
  stx W0+1  ; 3

Total of 18, 23 or 24 cycles, depending on S1, mean of 23.43 cycles.

As always, hand-written code in the 6502 can be a lot smaller.

It works with the few values I threw at it.

My compiler does not have a peephole optimizer yet; as such, the result must be in A:X.

Even with some T?? instructions, your code is no worse than mine and is much better in many cases.

Thank you.

BillG · Post by **BillG** » Tue Sep 22, 2020 3:53 am

Chromatix wrote:

Using a numerical trick:

Code: Select all

LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1

I make that 19 cycles, 14 bytes if arguments are in ZP.

Wow!

Arthur C. Clarke said, "Any sufficiently advanced technology is indistinguishable from magic." That certainly applies here.

I will have to study it in much more depth. It works with the few values I threw at it.

Thank you.

BillG · Post by **BillG** » Tue Sep 22, 2020 3:54 am

dmsc wrote:

Much better, and this is also applicable to the other 8 bit micros.

Indeed...

For the 6502:

Code: Select all

 0029 A5 16           [3] 00045          lda    S1
 002B 49 7F           [2] 00046          eor    #$7F
 002D 18              [2] 00047          clc
 002E 69 83           [2] 00048          adc    #258-127
 0030 AA              [2] 00049          tax
 0031 A9 00           [2] 00050          lda    #0
 0033 2A              [2] 00051          rol    A
 0034 85 0E           [3] 00052          sta    W0+1
 0036 86 0D           [3] 00053          stx    W0

21 cycles

For the 6800:

Code: Select all

 0029 4F              [2] 00043          clra
 002A D6 16           [3] 00044          ldab   S1
 002C C8 7F           [2] 00045          eorb   #$7F
 002E C9 83           [2] 00046          adcb   #258-127
 0030 49              [2] 00047          rola
 0031 97 0D           [4] 00048          staa   W0
 0033 D7 0E           [4] 00049          stab   W0+1

20 cycles

For the 6809:

Code: Select all

 0029 4F                      [2] 00043          clra
 002A D6 16                   [4] 00044          ldab   S1
 002C C8 7F                   [2] 00045          eorb   #$7F
 002E C9 83                   [2] 00046          adcb   #258-127
 0030 49                      [2] 00047          rola
 0031 DD 0D                   [5] 00048          std    W0

17 cycles

For the 8080:

Code: Select all

 0100 3A 0016        [13] 00043         lda     S1
 0103 EE 7F           [7] 00044         xri     7Fh
 0105 CE 83           [7] 00045         aci     258-127
 0107 6F              [5] 00046         mov     L,A
                          00047
 0108 3E 00           [7] 00048         mvi     A,0
 010A 17              [4] 00049         ral
 010B 67              [5] 00050         mov     H,A
                          00051
 010C 22 000D        [16] 00052         shld    W0

64 cycles

Chromatix · Post by **Chromatix** » Tue Sep 22, 2020 8:29 am

BillG wrote:

Chromatix wrote:

Using a numerical trick:

Code: Select all

LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1

I make that 19 cycles, 14 bytes if arguments are in ZP.

Wow!

Arthur C. Clarke said, "Any sufficiently advanced technology is indistinguishable from magic." That certainly applies here.

I will have to study it in much more depth. It works with the few values I threw at it.

Thank you.

The comments do hint at how it works, but I could give a better explanation for those less used to juggling bits.

A signed byte in two's complement form, as I assume S1 is, has a total range of -128 to 127 inclusive. In the range -128 ($80) to -1 ($FF), the most-significant bit is set, while in the range 0 to 127 ($7F) it is cleared. Hence the most-significant bit functions as a sign bit. Within both ranges, the lower 7 bits increase monotonically as the value represented becomes less negative and more positive.

Two's complement is far from the only way to represent a signed byte. In the early days of computing there was one's complement (which had two representations of zero). But if you simply invert the sign bit of two's complement, you get to a form known as "excess 128". Here, if you treat the byte as unsigned, it has a continuous range of 0 to 255, where the value that used to be -128 is now 0, and that which used to be 0 is now 128, and so on. To get to the true value you would subtract 128. IEEE-754 single precision uses a closely related format called "excess 127" for the exponent field, in which the value 1.0 has a zero exponent which is stored as $7F. This is useful because even on the 6502, unsigned integers are easier to deal with than signed ones.

Putting that aside for a moment, the method to negate a two's complement value is to complement all the bits, then add 1. You can easily see that doing this to zero yields zero again, via $FF (-1). It is the complementing of the low bits that reverses their value sequence.

To perform W0 = 258 - S1, we effectively need to add the constant 258 to a negated S1. If we combine the operations for negation and conversion to excess-128, we end up with complementing all except the sign bit, then adding 1. If we then added 258, we would need to immediately subtract 128 to obtain the true value. Combining the three latter additive operations gives us W0 = (S1 ^ $7F) + 131, which is just two ALU operations, with the intermediate value effectively being (-S1) in excess-127 format. Since the final result is potentially 9 bits wide, we then only need to extract the carry bit into the high byte.

The 6502 performs all of the required operations quite efficiently, such that the only thing I could really ask for is a dedicated instruction for clearing the accumulator, which would in practice be no faster and only one byte smaller. The only wrinkle is the idiomatic need to clear the carry before adding (2 cycles) and that ROL A also has a dead cycle. Aside from that, all 6502 cycles are either fetching program bytes or accessing operand memory. It's hard to see how another 8-bit CPU could improve significantly on that.

I see from BillG's examples that the 6800 benefits mostly from not having to clear the carry bit, and this makes the code slightly more compact. However, the time saved is then absorbed by slower write operations, with one dead cycle each. I can see that being a significant handicap in practical code. The 6809 has the same handicap, but gets to incur it only once with a 16-bit write operation involving both accumulators, so is slightly faster overall. The 8080 looks much slower until you remember that the Intel clock ticks four times per memory access, so normalised for memory accesses it is again comparable to the others. I have to say that the 8080 assembler is spectacularly opaque.

BillG · Post by **BillG** » Tue Sep 22, 2020 10:48 am

Chromatix wrote:

The 6502 performs all of the required operations quite efficiently, such that the only thing I could really ask for is a dedicated instruction for clearing the accumulator, which would in practice be no faster and only one byte smaller.

The bug in the ointment is that the 680x clr* instructions clear the carry flag. Notice that I put clra before the subtraction for the lower byte...

Chromatix wrote:

I see from BillG's examples that the 6800 benefits mostly from not having to clear the carry bit, and this makes the code slightly more compact. However, the time saved is then absorbed by slower write operations, with one dead cycle each. I can see that being a significant handicap in practical code.

I do not like that 6800 instructions contain dead cycles (the 6809 is worse), but they are somewhat offset by the two almost equal accumulators and the ability to efficiently offset the index register.

Chromatix wrote:

I have to say that the 8080 assembler is spectacularly opaque.

I think the 8080 mnemonics and operand format were designed to make the assembler simpler.

BillG · Post by **BillG** » Tue Sep 22, 2020 5:30 pm

Chromatix wrote:

To perform W0 = 258 - S1, we effectively need to add the constant 258 to a negated S1. If we combine the operations for negation and conversion to excess-128, we end up with complementing all except the sign bit, then adding 1. If we then added 258, we would need to immediately subtract 128 to obtain the true value. Combining the three latter additive operations gives us W0 = (S1 ^ $7F) + 131, which is just two ALU operations, with the intermediate value effectively being (-S1) in excess-127 format. Since the final result is potentially 9 bits wide, we then only need to extract the carry bit into the high byte.

I think I am getting it now. To subtract S1 from an arbitrary constant, I am doing this...

Code: Select all

LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #<Constant-127
TAX
LDA #>Constant-127
ADC #0      ; Add carry flag to high byte
STX  W0
STA  W0+1

Thanks again...

Chromatix · Post by **Chromatix** » Wed Sep 23, 2020 1:23 am

Yes, I think that should work. However, take note of when the arbitrary constant is itself less than 127, as the immediate operands and the final result could then be negative. Although then it would be only a one-byte value, so might not hit this code path?

BillG · Post by **BillG** » Wed Sep 23, 2020 2:25 am

In my limited testing, I tried constant = 0 and S1 = -1; that worked:

Code: Select all

                          00031 ; W0 := 0 - S1;
                          00032
                          00033 ;   ;  0 := v W0 -> 1
                          00034 ;   ;  1 L r 2
                          00035
                          00036 ;      ;  2 L c 0 -> 3
                          00037 ;      ;  3 - v S1
                          00038
                          00039
                          00040 ;  1 L r 2
                          00041 ;  2 L c 0 -> 3
                          00042 ;  3 - v S1
 0029 A5 16           [3] 00043          lda    S1
 002B 49 7F           [2] 00044          eor    #$7F
 002D 18              [2] 00045          clc
 002E 69 81           [2] 00046          adc    #<65409
 0030 AA              [2] 00047          tax
 0031 A9 FF           [2] 00048          lda    #>65409
 0033 69 00           [2] 00049          adc    #0
                          00050 ;  0 := v W0 -> 1
 0035 86 0D           [3] 00051          stx    W0
 0037 85 0E           [3] 00052          sta    W0+1

It appears to fail for constant < 0. More study is needed. Fortunately, that is something the compiler can easily detect and manage.

Thanks again.

BillG · Post by **BillG** » Wed Sep 23, 2020 5:39 am

I take part of that back. It does not fail until the constant is < -1 with S1 = -1. The first ADC instruction does not set the carry flag - that is the key.

Code: Select all

                          00031 ; W0 := -2 - S1;
                          00032
                          00033 ;   ;  0 := v W0 -> 1
                          00034 ;   ;  1 L r 2
                          00035
                          00036 ;      ;  2 L c -2 -> 3
                          00037 ;      ;  3 - v S1
                          00038
                          00039
                          00040 ;  1 L r 2
                          00041 ;  2 L c -2 -> 3
                          00042 ;  3 - v S1
 0029 A5 16           [3] 00043          lda    S1
 002B 49 7F           [2] 00044          eor    #$7F
 002D 18              [2] 00045          clc
 002E 69 7F           [2] 00046          adc    #<65407
 0030 AA              [2] 00047          tax
 0031 A9 FF           [2] 00048          lda    #>65407
 0033 69 00           [2] 00049          adc    #0
                          00050 ;  0 := v W0 -> 1
 0035 86 0D           [3] 00051          stx    W0
 0037 85 0E           [3] 00052          sta    W0+1

Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts

Re: Comparisons and contrasts