Page 1 of 4
6502 vs 6800
Posted: Mon Sep 21, 2020 7:07 pm
by litwr
Let's say we have a table of COUNT records which have the next structure
int8_t a
uint16_t b
uint16_b c
We need to get uint32_t sum which can be get by the next formula
sum += 0, if a = 0 (A)
sum += 3b, if a > 0 (B)
sum += c, if a < 0 (C)
The code for the 6502 can be such
Code: Select all
;sum - 4 zp bytes
;tab_addr - 2 zp bytes
;cnt - 2 zp bytes
;t - 2 zp bytes
lda #0
sta sum
sta sum+1
sta sum+2
sta sum+3
lda #lo(tab)
sta tab_addr
lda #hi(tab)
sta tab_addr+1
lda #lo(65536-COUNT)
sta cnt
lda #hi(65536-COUNT)
sta cnt+1
loop
ldy #0
lda (tab_addr),y
beq next
bmi alt
ldy #2 ;offset of hi(b)
lda (tab_addr),y
sta t1
dey
lda (tab_addr),y
asl
tax
rol t1
lda #0
rol
sta t1+1
txa
adc (tab_addr),y
tax
iny
lda (tab_addr),y
adc t1
sta t1
bcc l1
inc t1+1
l1 txa
clc
adc sum
sta sum
lda t1
adc sum+1
sta sum+1
lda t1+1
adc sum+2
sta sum+2
bcc next
bcs l4
alt ldy #3 ;offset of lo(c)
lda (tab_addr),y
clc
adc sum
sta sum
iny
lda (tab_addr),y
adc sum+1
sta sum+1
bcc next
inc sum+2
bne next
l4
inc sum+3
next
lda tab_addr
clc
adc #5 ;size of the record
sta tab_addr
bcc l3
inc tab_addr
l3 inc cnt
bne loop
inc cnt+1
bne loop
This is not self-modifying code. Timings for the 6502 can be calculated with formula COUNT*(28 + 3A + 103B + 39C)
What can the 6800 show?
Re: 6502 vs 6800
Posted: Tue Sep 22, 2020 10:52 am
by BillG
Code: Select all
00001 *
00002 * Let's say we have a table of COUNT records which have the next structure
00003 * int8_t a
00004 * uint16_t b
00005 * uint16_b c
00006 * We need to get uint32_t sum which can be get by the next formula
00007 * sum += 0, if a = 0 (A)
00008 * sum += 3b, if a > 0 (B)
00009 * sum += c, if a < 0 (C)
00010 *
00011 * Scoreboard: (assuming Count < 256, final sum < 64K)
00012 * 28 cycles for setup
00013 * 20*(Count-1) cycles for incrementing X
00014 *
00015 * 19 cycles for each a = 0
00016 * 69 cycles for each a > 0
00017 * 51 cycles for each a < 0
00018 *
00019 * 15 cycles to exit
00020 *
00021
00022 * Next 10
00023 * AddToSum + Next 42 = 18 + 10
00024
0000 00025 org 0
00026
0000 00027 Sum rmb 4
0004 00028 Cnt rmb 2
00029
0003 00030 Count equ 3
00031
0000 00032 A_ equ 0
0001 00033 B_ equ 1
0003 00034 C_ equ 3
00035
0100 00036 org $100
00037
0100 00038 DoIt
0100 CE 0000 [3] 00039 ldx #0 ; Clear the sum
0103 DF 00 [5] 00040 stx Sum
0105 DF 02 [5] 00041 stx Sum+2
00042
0107 CE FFFD [3] 00043 ldx #-Count ; Set to count up to 0
010A DF 04 [5] 00044 stx Cnt
00045
010C CE 0200 [3] 00046 ldx #Table ; Point to table
00047
010F 20 05 (0116) [4] 00048 bra First ; Skip incrementing of X
00049
0111 00050 Loop
0111 08 [4] 00051 inx ; Point to the next entry
0112 08 [4] 00052 inx
0113 08 [4] 00053 inx
0114 08 [4] 00054 inx
0115 08 [4] 00055 inx
00056
0116 00057 First
0116 A6 00 [5] 00058 ldaa A_,X ; What kind of entry?
0118 27 1C (0136) [4] 00059 beq Next ; a = 0
011A 2A 25 (0141) [4] 00060 bpl DoB ; a > 0
00061
011C 00062 DoC
011C A6 03 [5] 00063 ldaa C_,X ; Load c into A:B
011E E6 04 [5] 00064 ldab C_+1,X
00065
0120 00066 AddToSum
0120 DB 03 [3] 00067 addb Sum+3 ; Add A:B to Sum
0122 D7 03 [4] 00068 stab Sum+3
0124 99 02 [3] 00069 adca Sum+2
0126 97 02 [4] 00070 staa Sum+2
0128 24 0C (0136) [4] 00071 bcc Next ; Less than 64K
012A 96 01 [3] 00072 ldaa Sum+1
012C 89 00 [2] 00073 adca #0
012E 97 01 [4] 00074 staa Sum+1
0130 96 00 [3] 00075 ldaa Sum
0132 89 00 [2] 00076 adca #0
0134 97 00 [4] 00077 staa Sum
00078
0136 00079 Next
0136 7C 0005 [6] 00080 inc Cnt+1 ; More to do?
0139 26 D6 (0111) [4] 00081 bne Loop
013B 7C 0004 [6] 00082 inc Cnt
013E 26 D1 (0111) [4] 00083 bne Loop
00084
0140 39 [5] 00085 rts
00086
0141 00087 DoB
0141 E6 02 [5] 00088 ldab B_+1,X ; Load b into A:B
0143 A6 01 [5] 00089 ldaa B_,X
00090
0145 58 [2] 00091 lslb ; Multiply by 2
0146 49 [2] 00092 rola
00093
0147 EB 02 [5] 00094 addb B_+1,X ; Multiply by 3
0149 A9 01 [5] 00095 adca B_,X
00096
014B 20 D3 (0120) [4] 00097 bra AddToSum ; Go add and repeat
00098
0200 00099 org $200
00100
0200 00101 Table
0200 00 00102 fcb 0
0201 0000 00103 fdb 0
0203 0000 00104 fdb 0
00105
0205 01 00106 fcb 1
0206 0102 00107 fdb 258
0208 0000 00108 fdb 0
00109
020A FF 00110 fcb -1
020B 0000 00111 fdb 0
020D 01FF 00112 fdb 511
Re: 6502 vs 6800
Posted: Tue Sep 22, 2020 5:56 pm
by litwr
Thank you. Your code shows COUNT*(35+4A+50B+36C) timings - it is much better than the 6502 code for the +3b case, the second accumulator makes this difference. It confirms that for two-byte arithmetic the 6800 has an advantage over the 6502.
Let's try strings. We have a null-terminated string and we need to uppercase it.
Code: Select all
;pstr - 2 bytes on zp
lda #hi(string)
sta pstr+1
lda #lo(string)
sta pstr
ldy #0
loop
lda (pstr),y
beq exit
cmp #'z'+1
bcs next
cmp #'a'
bcc next
eor #$20
sta (pstr),y
next
iny
bne loop
inc pstr+1
bne loop
The 6502 has LENGTH*(14.5 + 3Z + 7A + 14C) ticks timing where Z is a number of cases when a char is strictly above 'z', A - when a char is equal or below 'a', C - when a char is a lowercase letter. IMHO the 6800 has about LENGTH*(19+4Z+10A+17C) ticks and this means it is about 30% slower. Are you agree or can you show better the 6800 timings?
Let's also realize a C-function strcmp().
Code: Select all
;pstr1 and pstr2 are 2 byte on zp
lda #hi(string1)
sta pstr1+1
lda #lo(string1)
sta pstr1
lda #hi(string2)
sta pstr2+1
lda #lo(string2)
sta pstr2
ldy #0
sec
loop
lda (pstr1),y
beq l1
sbc (pstr2),y
bne l2
iny
bne loop
inc pstr1+1
inc pstr2+1
bne loop
l1 sbc (pstr2),y
l2
It takes 20 cycles for a char while chars are equal. If strings differ in the first chars it takes 16 cycles. IMHO the 6800 must be much worse for this typical code.
Re: 6502 vs 6800
Posted: Wed Sep 23, 2020 12:35 am
by BillG
I do not have time to play right now.
Can we just agree that the 6800 is not 2 to 4 times slower than the 6502?
Re: 6502 vs 6800
Posted: Wed Sep 23, 2020 4:49 pm
by leepivonka
Here are some 6800 code samples for comparison.
Uppercase a string.
Keeping the single pointer in the X register works well.
Code: Select all
ldx #string ; 3 bytes, 3 cycles
bra endchk ; 2 bytes, 4 cycles
do
cmpa #'z'+1 ; 2 bytes, 2 cycles
bcc next ; 2 bytes, 4 cycles
cmpa #'a' ; 2 bytes, 2 cycles
bcs next ; 2 bytes, 4 cycles
eora #$20 ; 2 bytes, 2 cycles
staa 0,x ; 2 bytes, 6 cycles
next
inx ; 1 byte, 4 cycles
endchk
ldaa 0,x ; 2 bytes, 5 cycles
bne do ; 2 bytes, 4 cycles
strcmp
Shuttling the 2 pointers in & out of the X register slows things down.
Code: Select all
;pstr1 and pstr2 are 2 byte on zp
ldx #string1 ; 3 bytes, 3 cycles
stx pstr1 ; 2 bytes, 5 cycles
ldx #string2 ; 3 bytes, 3 cycles
stx pstr2 ; 2 bytes, 5 cycles
loop
ldx pstr1 ; 2 bytes, 4 cycles
ldaa 0,x ; 2 bytes, 5 cycles
beq l1 ; 2 bytes, 4 cycles
inx ; 1 byte, 4 cycles
stx pstr1 ; 2 bytes, 5 cycles
ldx pstr2 ; 2 bytes, 4 cycles
suba 0,x ; 2 bytes, 5 cycles
bne l2 ; 2 bytes, 4 cycles
inx ; 1 byte, 4 cycles
stx pstr2 ; 2 bytes, 5 cycles
bra loop ; 2 bytes, 4 cycles
l1 ldx pstr2 ; 2 bytes, 5 cycles
suba 0,x ; 2 bytes, 5 cycles
l2
Re: 6502 vs 6800
Posted: Thu Sep 24, 2020 7:18 pm
by litwr
Can we just agree that the 6800 is not 2 to 4 times slower than the 6502?
It seems that the 6800 is ready to give in.

It is a claim from MOS technology that the 6501/6502 is 2-4 times faster than the 6800. So we need to have thorough testing to get a base for our own conclusions. You know that I estimates the 6502/6800 general performance ratio close to 2:1. Of course I know that for two byte math and especially signed the 6800 can be even faster than the 6502 but such math is rather rare for the 8-bit programming. The main problem of the 6800 is the address register starving because the 6502 can use 128 zp pseudo-registers, this can gives the 6502 2-3 times advantage for not very simple calculations. Other the 6502 advantages: faster jumps, fast RMW on zp, faster counters (because of two index registers) - I am sure it is possible to add several more items to this list... Typical string operations shows more than 2 times faster code for the 6502. The similar picture should be for one byte arithmetic. So it is quite plausible for me that the 6502 is generally about 2 times faster than the 6800.
It is interesting for me to compare performances of the 6800 and 6502 on fast sorting algorithms. For the 6502 I can point on a Shell sort implementation -
https://codebase64.org/doku.php?id=base ... t_elements and a quick sort implementation -
https://codebase64.org/doku.php?id=base ... t_elements
We need to prepare several identical filling patterns (random, ordered, reversed, zeros, etc.). IMHO the 6502 should be at least 2 times faster with this quick sort than the 6800.
Here are some 6800 code samples for comparison.
Thank you very much. We have got LENGTH*(15+4Z+10A+18C) for the first case and that means that the 6800 is about 35% slower. For the second case we have got 48 the 6800's cycles where the 6502 uses only 20. So the 6502 is 2.4 faster for this case. We can estimate similar numbers for other typical operations for strings: strcpy, memcpy, strstr, ...
Re: 6502 vs 6800
Posted: Thu Sep 24, 2020 7:47 pm
by rwiker
Warning: I haven't done any 6800 programming since early 1986, and that was not by any means extensive.
If self-modifying code is acceptable, would something like the following work?
Code: Select all
ldx #string1 ; 3 bytes, 3 cycles
stx m1+1 ; 2 bytes, 5 cycles
ldx #string2 ; 3 bytes, 3 cycles
stx m2+1 ; 2 bytes, 5 cycles
stx m3+1
ldx #0 ; 3 bytes, 3 cycles
loop
m1
ldaa 0,x ; 2 bytes, 5 cycles
beq l1 ; 2 bytes, 4 cycles
m2
suba 0,x ; 2 bytes, 5 cycles
bne l2 ; 2 bytes, 4 cycles
inx ; 1 byte, 4 cycles
bra loop ; 2 bytes, 4 cycles
l1
m3
suba 0,x ; 2 bytes, 5 cycles
l2
Re: 6502 vs 6800
Posted: Thu Sep 24, 2020 10:00 pm
by barrym95838
...would something like the following work?
No. Those STX instructions are going to overwrite your BNE and BEQ opcodes. You're trying to stuff 16-bits of address into an 8-bit operand.
On the 6800 the only "quick" way to deal with two randomly located strings is to disable IRQs, bring in the S register and hope that you don't get burned by an NMI. If the starting addresses of the strings are <256 from each other, then you can use a variation of your idea.
Re: 6502 vs 6800
Posted: Thu Sep 24, 2020 10:25 pm
by BillG
On the 6800 the only "quick" way to deal with two randomly located strings is to disable IRQs, bring in the S register and hope that you don't get burned by an NMI.
That was my thought exactly.
Code: Select all
00001 *
00002 * Convert zero terminated string to upper case.
00003 *
00004 * Scoreboard:
00005 *
00006 * 10 cycles to setup
00007 *
00008 * 16 cycles for each character < 'a'
00009 * 22 cycles for each character > 'z'
00010 * 36 cycles for each lower case character
00011 *
00012 * 21 cycles to exit
00013 *
00014
0000 00015 StkSav rmb 2
00016
0100 00017 org $100
00018
0100 9F 00 [5] 00019 sts StkSav ; Save stack pointer
00020
0102 0F [2] 00021 sei ; Disable interrupts
00022
0103 8E 01FF [3] 00023 lds #String-1 ; Point stack pointer to string
00024
0106 00025 Loop
0106 32 [4] 00026 pula ; Get a character
00027
0107 4D [2] 00028 tsta ; End of string?
0108 27 0E (0118) [4] 00029 beq Exit ; Yes
00030
010A 81 61 [2] 00031 cmpa #'a' ; Below lower case 'a'?
010C 25 F8 (0106) [4] 00032 blo Loop ; Yes, skip
00033
010E 81 7A [2] 00034 cmpa #'z' ; Above lower case 'z'?
0110 22 F4 (0106) [4] 00035 bhi Loop ; Yes, skip
00036
0112 88 20 [2] 00037 eora #$20 ; Fold to upper case
0114 36 [4] 00038 psha ; Replace in string
0115 31 [4] 00039 ins ; Go to next character
0116 20 EE (0106) [4] 00040 bra Loop ; Repeat
00041
0118 00042 Exit
0118 9E 00 [4] 00043 lds StkSav ; Recover stack pointer
00044
011A 0E [2] 00045 cli ; Enable interrupts
00046
011B 39 [5] 00047 rts
Code: Select all
00001 *
00002 * Compare two strings
00003 *
00004 * Scoreboard:
00005 *
00006 * 13 cycles to setup
00007 *
00008 * 23 cycles * lessor string length not counting terminators
00009 *
00010 * 10 cycles if second string is not shorter
00011 * 23 cycles if second string is shorter
00012 *
00013 * 16 cycles to exit
00014 *
00015
0000 00016 StkSav rmb 2
00017
0100 00018 org $100
00019
0100 9F 00 [5] 00020 sts StkSav ; Save stack pointer
00021
0102 0F [2] 00022 sei ; Disable interrupts
00023
0103 8E 01FF [3] 00024 lds #String1-1 ; Point stack pointer to string
00025
0106 CE 02FF [3] 00026 ldx #String2-1 ; Point X to other string
00027
0109 00028 Loop
0109 32 [4] 00029 pula ; Get a character
00030
010A 4D [2] 00031 tsta ; End of string?
010B 27 05 (0112) [4] 00032 beq Exit ; Yes
00033
010D 08 [4] 00034 inx
010E A1 00 [5] 00035 cmpa ,X ; Compare characters
0110 27 F7 (0109) [4] 00036 beq Loop
00037
0112 00038 Exit
0112 9E 00 [4] 00039 lds StkSav ; Recover stack pointer
00040
0114 0E [2] 00041 cli ; Enable interrupts
00042
0115 A1 00 [5] 00043 cmpa ,X ; Set flags for result
00044
0117 39 [5] 00045 rts
Re: 6502 vs 6800
Posted: Fri Sep 25, 2020 6:18 am
by rwiker
...would something like the following work?
No. Those STX instructions are going to overwrite your BNE and BEQ opcodes. You're trying to stuff 16-bits of address into an 8-bit operand.
On the 6800 the only "quick" way to deal with two randomly located strings is to disable IRQs, bring in the S register and hope that you don't get burned by an NMI. If the starting addresses of the strings are <256 from each other, then you can use a variation of your idea.
Hah. I foolishly assumed that the indexed addressing mode would use a 16-bit base address and the index register. If I had paid closer attention to the cycle counts in the comments, I would have avoided that trap.
Re: 6502 vs 6800
Posted: Fri Sep 25, 2020 5:11 pm
by litwr
That was my thought exactly.
Thank you. But disabling interrupts for long operations is no good and even without interrupts the 6800 is still slower than the 6502. I know a similar trick for the Z80 to use PUSH to fill memory but for this task the Z80 is faster than the 6502. Such tricks are very good for processors like 6809 or 68k where the user stack and interrupt stack are separated but for the 6800 or Z80 it is not normal coding, it is extremes.
Such tricks also can't help with a quick sort.
Let me propose one more typical algorithm, a string translation. We have a string, a translation table and we must get a result string.
Code: Select all
lda #lo(string1)
sta pstr1
lda #hi(string1)
sta pstr1+1
lda #lo(string2)
sta pstr2
lda #hi(string2)
sta pstr2+1
lda #lo(trantab)
sta m+1
lda #hi(trantab)
sta m+2
ldy #0
loop lda (pstr1),y
beq exit
tax
m lda $1000,x
sta (pstr2),y
iny
bne loop
inc pstr1+1
inc pstr2+1
bne loop
exit sta (pstr2),y
It is 24 ticks for every char in a translated string. IMHO the 6800 requires maybe even 80 ticks for this task. Not self-modifying code for the 6502 takes 33 clock cycles.
Re: 6502 vs 6800
Posted: Sat Sep 26, 2020 12:53 am
by barrym95838
It is 24 ticks for every char in a translated string. IMHO the 6800 requires maybe even 80 ticks for this task. Not self-modifying code for the 6502 takes 33 clock cycles.
Ooh, may I try?
Code: Select all
ldx #string1 [3]
stx pstr1 [5]
ldx #string2 [3]
stx pstr2 [5]
loop:
ldx pstr1 [4]
ldab ,x [5]
beq exit [3/4]
inx [4]
stx pstr1 [5]
stab m+1 [4]
ldx #trantab [3]
m:
ldab ,x [5]
ldx pstr2 [4]
stab ,x [6]
inx [4]
stx pstr2 [5]
bra loop [4]
exit:
ldx pstr2 [4]
stab ,x [6]
I don't know for sure if I made any rookie coding mistakes or cycle counting mistakes, but it looks like the 6800 is 56 ticks per char.
[I must admit that the 6800 source looks tidier than the 6502's, even though its performance isn't exactly brilliant.]
[Edit: How about a little 6809 action?
Code: Select all
ldu #string1
ldy #string2
loop:
ldab ,u+
beq exit
ldx #trantab
abx
ldab ,x
stab ,y+
bra loop
exit:
stab ,y
That's even tidier looking, but I don't know if it's valid or correct or how many ticks per char.]
Re: 6502 vs 6800
Posted: Sat Sep 26, 2020 1:18 am
by BillG
But disabling interrupts for long operations is no good and even without interrupts the 6800 is still slower than the 6502.
A SWTPC 6800 computer running FLEX only uses interrupts for background print spooling and that is only if a special timer option is installed. The worst case is the printer pausing while interrupts are disabled.
Many 8-bit computers do not make much use of interrupts. The Apple 2 Woz Disk 2 controller uses software timing to make its magic work. Most systems with a Western Digital 17xx disk controller use a tight loop in the driver software for data transfers; most disable interrupts while doing this.
Until mice became common, interrupts were mostly used for three things: keyboard input, receiving data from a serial port and video synchronization. Only the latter is time critical and programs doing that tend to finish processing and wait in a spin loop for the event.
Let me propose one more typical algorithm, a string translation. We have a string, a translation table and we must get a result string.
Now that is contrived, to use your word. We all agree that the 6800 is hampered by having only one index register, yet you insist on continuing to propose examples to prove that one point.
I have put forth examples of things a computer may be expected to do and you denigrate those, saying that is not used in 8-bit programming or only a compiler does that. Do you use an editor? 16-bit arithmetic is needed when dealing with the addresses of data in memory. Do you use an assembler? Plenty of 16-bit math in there.
Re: 6502 vs 6800
Posted: Sat Sep 26, 2020 5:27 am
by BillG
[Edit: How about a little 6809 action?]
Code: Select all
0100 CE 0300 [3] 00007 ldu #String1
0103 108E 0400 [4] 00008 ldy #String2
0107 8E 0500 [3] 00009 ldx #Trantab
00010
010A 00011 2:
010A E6 C0 [6] 00012 ldab ,U+
010C 27 06 (0114) [3] 00013 beq 3f
010E E6 85 [5] 00014 ldab B,X
0110 E7 A0 [6] 00015 stab ,Y+
0112 20 F6 (010A) [3] 00016 bra 2b
00017
0114 00018 3:
0114 E7 A4 [4] 00019 stab ,Y
Because a zero translates to a zero, it gets even better...
Code: Select all
0200 CE 0300 [3] 00023 ldu #String1
0203 108E 0400 [4] 00024 ldy #String2
0207 8E 0500 [3] 00025 ldx #Trantab
00026
020A 00027 2:
020A E6 C0 [6] 00028 ldab ,U+
020C E6 85 [5] 00029 ldab B,X
020E E7 A0 [6] 00030 stab ,Y+
0210 26 F8 (020A) [3] 00031 bne 2b
Edit: Post in haste, correct in leisure...
Re: 6502 vs 6800
Posted: Sat Sep 26, 2020 6:42 am
by BigEd
Can we just agree that the 6800 is not 2 to 4 times slower than the 6502?
It seems that the 6800 is ready to give in. ;)
It is a claim from MOS technology that the 6501/6502 is 2-4 times faster than the 6800. So we need to have thorough testing to get a base for our own conclusions.
Two important points here: first, that's a historical claim from a supplier, which is clearly a sales position and may bear no relation to reality. Second, performance is not a scalar quantity: every subroutine, every program, has its own measure. There is no base that could be broad enough, if anyone is seeking a definitive scalar measure.
What is interesting, of course, is seeing the same algorithm implemented for different machines. And we see readily that the application of ingenuity can make a big difference: a second attempt can be better than a first attempt. Which means there's a lot of uncertainty, because not all implementations will get the same inspired reworking.
So, let's not proceed as if we're seeking a definitive resolution: let's proceed as if we are exploring a territory.