6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Nov 12, 2024 12:10 am

All times are UTC




Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 11  Next
Author Message
PostPosted: Mon Sep 21, 2020 8:15 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Here are a couple of prototypes of the same for the 8080. Can this be improved?

Code:
 0100 3A 0015        [13] 00043         lda     S0              ; Get variable value
                          00044
 0103 67              [5] 00045         mov     H,A             ; Stash a copy
                          00046
 0104 2F              [4] 00047         cma                     ; Get one's complement
 0105 6F              [5] 00048         mov     L,A
                          00049
 0106 7C              [5] 00050         mov     A,H             ; Recover the variable value
                          00051
 0107 17              [4] 00052         ral                     ; Sign extend it
 0108 9F              [4] 00053         sbb     A
 0109 2F              [4] 00054         cma                     ; Get one's complement
 010A 67              [5] 00055         mov     H,A
                          00056
 010B 11 0103        [10] 00057         lxi     D,258+1         ; Constant plus one for 1's compl to 2's
 010E 19             [10] 00058         dad     D
                          00059 ;  0 := v W0 -> 1
 010F 22 000D        [16] 00060         shld    W0

85 cycles


Code:
 0112 3A 0015        [13] 00063         lda     S0              ; Get variable value
                          00064
 0115 6F              [5] 00065         mov     L,A             ; Stash a copy
                          00066
 0116 17              [4] 00067         ral                     ; Sign extend it
 0117 9F              [4] 00068         sbb     A
 0118 67              [5] 00069         mov     H,A
                          00070
 0119 3E 02           [7] 00071         mvi     A,2             ; Subtract low bytes
 011B 95              [4] 00072         sub     L
 011C 6F              [5] 00073         mov     L,A
                          00074
 011D 3E 01           [7] 00075         mvi     A,1             ; Subtract high bytes
 011F 9C              [4] 00076         sbb     H
 0120 67              [5] 00077         mov     H,A
                          00078 ;  0 := v W0 -> 1
 0121 22 000D        [16] 00079         shld    W0

79 cycles


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 21, 2020 10:36 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
And for the 6809:

Code:
                                  00031 * W0 := 258 - S1;
                                  00032
                                  00033 *   *  0 := v W0 -> 1
                                  00034 *   *  1 L r 2
                                  00035
                                  00036 *      *  2 L c 258 -> 3
                                  00037 *      *  3 - v S1
                                  00038
                                  00039
                                  00040 *  1 L r 2
                                  00041 *  2 L c 258 -> 3
                                  00042 *  3 - v S1
 0029 D6 16                   [4] 00043          ldb    S1        ; Get variable value
 002B 1D                      [2] 00044          sex
 002C 83 0102                 [4] 00045          subd   #258      ; Subtract the constant
 002F 43                      [2] 00046          coma             ; Negate the number
 0030 50                      [2] 00047          negb
 0031 82 FF                   [2] 00048          sbca   #$FF
                                  00049 *  0 := v W0 -> 1
 0033 DD 0D                   [5] 00050          std    W0

21 cycles


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 21, 2020 1:55 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

BillG wrote:
Today's challenge is to subtract a signed byte from a constant yielding a two-byte number.

W0 := 258 - S1;


This (untested) code is smaller and faster in the 6502:
Code:
  lda #$02  ; 2
  sec       ; 2
  sbc S1    ; 3
  ldx #1    ; 2
  bcs ok    ; 2    / 3
  bit S1    ; 3
  bmi ok    ; 2 / 3
  dex       ; 2
ok:
  sta W0    ; 3
  stx W0+1  ; 3


Total of 18, 23 or 24 cycles, depending on S1, mean of 23.43 cycles.

As always, hand-written code in the 6502 can be a lot smaller.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 21, 2020 2:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
In this previous post about z80 vs 6502 there's a link to Z80, the 8 bit Number Cruncher which is quite interesting.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 21, 2020 2:53 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Using a numerical trick:
Code:
LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1
I make that 19 cycles, 14 bytes if arguments are in ZP.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 21, 2020 3:39 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

Chromatix wrote:
Using a numerical trick:
Code:
LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1
I make that 19 cycles, 14 bytes if arguments are in ZP.


Much better, and this is also applicable to the other 8 bit micros.

Have Fun!


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 3:49 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
dmsc wrote:
BillG wrote:
Today's challenge is to subtract a signed byte from a constant yielding a two-byte number.

W0 := 258 - S1;


This (untested) code is smaller and faster in the 6502:
Code:
  lda #$02  ; 2
  sec       ; 2
  sbc S1    ; 3
  ldx #1    ; 2
  bcs ok    ; 2    / 3
  bit S1    ; 3
  bmi ok    ; 2 / 3
  dex       ; 2
ok:
  sta W0    ; 3
  stx W0+1  ; 3


Total of 18, 23 or 24 cycles, depending on S1, mean of 23.43 cycles.

As always, hand-written code in the 6502 can be a lot smaller.


It works with the few values I threw at it.

My compiler does not have a peephole optimizer yet; as such, the result must be in A:X.

Even with some T?? instructions, your code is no worse than mine and is much better in many cases.

Thank you.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 3:53 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Chromatix wrote:
Using a numerical trick:
Code:
LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1
I make that 19 cycles, 14 bytes if arguments are in ZP.


Wow!

Arthur C. Clarke said, "Any sufficiently advanced technology is indistinguishable from magic." That certainly applies here.

I will have to study it in much more depth. It works with the few values I threw at it.

Thank you.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 3:54 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
dmsc wrote:
Much better, and this is also applicable to the other 8 bit micros.


Indeed...

For the 6502:

Code:
 0029 A5 16           [3] 00045          lda    S1
 002B 49 7F           [2] 00046          eor    #$7F
 002D 18              [2] 00047          clc
 002E 69 83           [2] 00048          adc    #258-127
 0030 AA              [2] 00049          tax
 0031 A9 00           [2] 00050          lda    #0
 0033 2A              [2] 00051          rol    A
 0034 85 0E           [3] 00052          sta    W0+1
 0036 86 0D           [3] 00053          stx    W0

21 cycles


For the 6800:

Code:
 0029 4F              [2] 00043          clra
 002A D6 16           [3] 00044          ldab   S1
 002C C8 7F           [2] 00045          eorb   #$7F
 002E C9 83           [2] 00046          adcb   #258-127
 0030 49              [2] 00047          rola
 0031 97 0D           [4] 00048          staa   W0
 0033 D7 0E           [4] 00049          stab   W0+1

20 cycles


For the 6809:

Code:
 0029 4F                      [2] 00043          clra
 002A D6 16                   [4] 00044          ldab   S1
 002C C8 7F                   [2] 00045          eorb   #$7F
 002E C9 83                   [2] 00046          adcb   #258-127
 0030 49                      [2] 00047          rola
 0031 DD 0D                   [5] 00048          std    W0

17 cycles


For the 8080:

Code:
 0100 3A 0016        [13] 00043         lda     S1
 0103 EE 7F           [7] 00044         xri     7Fh
 0105 CE 83           [7] 00045         aci     258-127
 0107 6F              [5] 00046         mov     L,A
                          00047
 0108 3E 00           [7] 00048         mvi     A,0
 010A 17              [4] 00049         ral
 010B 67              [5] 00050         mov     H,A
                          00051
 010C 22 000D        [16] 00052         shld    W0

64 cycles


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 8:29 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
BillG wrote:
Chromatix wrote:
Using a numerical trick:
Code:
LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #(258 - 127)
STA  W0
LDA #0
ROL A       ; carry flag to high byte
STA  W0+1
I make that 19 cycles, 14 bytes if arguments are in ZP.


Wow!

Arthur C. Clarke said, "Any sufficiently advanced technology is indistinguishable from magic." That certainly applies here.

I will have to study it in much more depth. It works with the few values I threw at it.

Thank you.

The comments do hint at how it works, but I could give a better explanation for those less used to juggling bits.

A signed byte in two's complement form, as I assume S1 is, has a total range of -128 to 127 inclusive. In the range -128 ($80) to -1 ($FF), the most-significant bit is set, while in the range 0 to 127 ($7F) it is cleared. Hence the most-significant bit functions as a sign bit. Within both ranges, the lower 7 bits increase monotonically as the value represented becomes less negative and more positive.

Two's complement is far from the only way to represent a signed byte. In the early days of computing there was one's complement (which had two representations of zero). But if you simply invert the sign bit of two's complement, you get to a form known as "excess 128". Here, if you treat the byte as unsigned, it has a continuous range of 0 to 255, where the value that used to be -128 is now 0, and that which used to be 0 is now 128, and so on. To get to the true value you would subtract 128. IEEE-754 single precision uses a closely related format called "excess 127" for the exponent field, in which the value 1.0 has a zero exponent which is stored as $7F. This is useful because even on the 6502, unsigned integers are easier to deal with than signed ones.

Putting that aside for a moment, the method to negate a two's complement value is to complement all the bits, then add 1. You can easily see that doing this to zero yields zero again, via $FF (-1). It is the complementing of the low bits that reverses their value sequence.

To perform W0 = 258 - S1, we effectively need to add the constant 258 to a negated S1. If we combine the operations for negation and conversion to excess-128, we end up with complementing all except the sign bit, then adding 1. If we then added 258, we would need to immediately subtract 128 to obtain the true value. Combining the three latter additive operations gives us W0 = (S1 ^ $7F) + 131, which is just two ALU operations, with the intermediate value effectively being (-S1) in excess-127 format. Since the final result is potentially 9 bits wide, we then only need to extract the carry bit into the high byte.

The 6502 performs all of the required operations quite efficiently, such that the only thing I could really ask for is a dedicated instruction for clearing the accumulator, which would in practice be no faster and only one byte smaller. The only wrinkle is the idiomatic need to clear the carry before adding (2 cycles) and that ROL A also has a dead cycle. Aside from that, all 6502 cycles are either fetching program bytes or accessing operand memory. It's hard to see how another 8-bit CPU could improve significantly on that.

I see from BillG's examples that the 6800 benefits mostly from not having to clear the carry bit, and this makes the code slightly more compact. However, the time saved is then absorbed by slower write operations, with one dead cycle each. I can see that being a significant handicap in practical code. The 6809 has the same handicap, but gets to incur it only once with a 16-bit write operation involving both accumulators, so is slightly faster overall. The 8080 looks much slower until you remember that the Intel clock ticks four times per memory access, so normalised for memory accesses it is again comparable to the others. I have to say that the 8080 assembler is spectacularly opaque.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 10:48 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Chromatix wrote:
The 6502 performs all of the required operations quite efficiently, such that the only thing I could really ask for is a dedicated instruction for clearing the accumulator, which would in practice be no faster and only one byte smaller.


The bug in the ointment is that the 680x clr* instructions clear the carry flag. Notice that I put clra before the subtraction for the lower byte...

Chromatix wrote:
I see from BillG's examples that the 6800 benefits mostly from not having to clear the carry bit, and this makes the code slightly more compact. However, the time saved is then absorbed by slower write operations, with one dead cycle each. I can see that being a significant handicap in practical code.


I do not like that 6800 instructions contain dead cycles (the 6809 is worse), but they are somewhat offset by the two almost equal accumulators and the ability to efficiently offset the index register.

Chromatix wrote:
I have to say that the 8080 assembler is spectacularly opaque.


I think the 8080 mnemonics and operand format were designed to make the assembler simpler.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 22, 2020 5:30 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Chromatix wrote:
To perform W0 = 258 - S1, we effectively need to add the constant 258 to a negated S1. If we combine the operations for negation and conversion to excess-128, we end up with complementing all except the sign bit, then adding 1. If we then added 258, we would need to immediately subtract 128 to obtain the true value. Combining the three latter additive operations gives us W0 = (S1 ^ $7F) + 131, which is just two ALU operations, with the intermediate value effectively being (-S1) in excess-127 format. Since the final result is potentially 9 bits wide, we then only need to extract the carry bit into the high byte.


I think I am getting it now. To subtract S1 from an arbitrary constant, I am doing this...

Code:
LDA  S1
EOR #$7F    ; convert signed byte to excess-127 and negate
CLC
ADC #<Constant-127
TAX
LDA #>Constant-127
ADC #0      ; Add carry flag to high byte
STX  W0
STA  W0+1


Thanks again...


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 23, 2020 1:23 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Yes, I think that should work. However, take note of when the arbitrary constant is itself less than 127, as the immediate operands and the final result could then be negative. Although then it would be only a one-byte value, so might not hit this code path?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 23, 2020 2:25 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
In my limited testing, I tried constant = 0 and S1 = -1; that worked:

Code:
                          00031 ; W0 := 0 - S1;
                          00032
                          00033 ;   ;  0 := v W0 -> 1
                          00034 ;   ;  1 L r 2
                          00035
                          00036 ;      ;  2 L c 0 -> 3
                          00037 ;      ;  3 - v S1
                          00038
                          00039
                          00040 ;  1 L r 2
                          00041 ;  2 L c 0 -> 3
                          00042 ;  3 - v S1
 0029 A5 16           [3] 00043          lda    S1
 002B 49 7F           [2] 00044          eor    #$7F
 002D 18              [2] 00045          clc
 002E 69 81           [2] 00046          adc    #<65409
 0030 AA              [2] 00047          tax
 0031 A9 FF           [2] 00048          lda    #>65409
 0033 69 00           [2] 00049          adc    #0
                          00050 ;  0 := v W0 -> 1
 0035 86 0D           [3] 00051          stx    W0
 0037 85 0E           [3] 00052          sta    W0+1


It appears to fail for constant < 0. More study is needed. Fortunately, that is something the compiler can easily detect and manage.

Thanks again.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 23, 2020 5:39 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
I take part of that back. It does not fail until the constant is < -1 with S1 = -1. The first ADC instruction does not set the carry flag - that is the key.

Code:
                          00031 ; W0 := -2 - S1;
                          00032
                          00033 ;   ;  0 := v W0 -> 1
                          00034 ;   ;  1 L r 2
                          00035
                          00036 ;      ;  2 L c -2 -> 3
                          00037 ;      ;  3 - v S1
                          00038
                          00039
                          00040 ;  1 L r 2
                          00041 ;  2 L c -2 -> 3
                          00042 ;  3 - v S1
 0029 A5 16           [3] 00043          lda    S1
 002B 49 7F           [2] 00044          eor    #$7F
 002D 18              [2] 00045          clc
 002E 69 7F           [2] 00046          adc    #<65407
 0030 AA              [2] 00047          tax
 0031 A9 FF           [2] 00048          lda    #>65407
 0033 69 00           [2] 00049          adc    #0
                          00050 ;  0 := v W0 -> 1
 0035 86 0D           [3] 00051          stx    W0
 0037 85 0E           [3] 00052          sta    W0+1


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 11  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: