BillG wrote:
On further examination, the 6800 version can be made substantially faster than the 6502 version by completely unrolling the loop but the code is almost twice as big. (The XSUB8 subroutine is also used elsewhere, so it cannot be eliminated in the process.) A small speed gain by a similar transformation of the 6502 version does not justify the massive increase in code size required.
I have to partially retract that statement. Somehow, I managed to forget that the 6800 only has one index register; the X register must to be ping ponged between I and J. Still, completely unrolling that loop makes the 6800 code about as fast as the 6502 version, but with much bigger code size. The result would look something like this:
Code:
0100 DE 00 [4] 00006 ldx I
00007 *
0102 E6 00 [5] 00008 ldab 0,X
0104 DE 02 [4] 00009 ldx J
0106 A6 00 [5] 00010 ldaa 0,X
0108 E7 00 [6] 00011 stab 0,X
010A DE 00 [4] 00012 ldx I
010C A7 00 [6] 00013 staa 0,X
00014 *
010E E6 01 [5] 00015 ldab 1,X
0110 DE 02 [4] 00016 ldx J
0112 A6 01 [5] 00017 ldaa 1,X
0114 E7 01 [6] 00018 stab 1,X
0116 DE 00 [4] 00019 ldx I
0118 A7 01 [6] 00020 staa 1,X
etc.
There are two other service subroutines for the symbol table sorter.
CMPLBL compares two symbols, pointed to by the variables I and J. Note that the 6800 version uses a couple of temporary variables instead of modifying I and J, then fixing them after the loop is finished; this same approach would have made the exchange subroutine better. Completely unrolling the loop is again the fastest implementation.
The original 6800 code:
Code:
.154E DE 20 [4] 02922 CMPLBL LDX I
.1550 DF 76 [5] 02923 STX XTEMP GET I INTO XTEMP
.1552 DE 22 [4] 02924 LDX J
.1554 DF 80 [5] 02925 STX XTEMP5 GET J INTO XTEMP5
.1556 C6 06 [2] 02926 LDAB #$06 SET COMPARE COUNT
.1558 DE 76 [4] 02927 CMPLB1 LDX XTEMP
.155A A6 00 [5] 02928 LDAA 0,X GET CHAR FROM I
.155C 08 [4] 02929 INX
.155D DF 76 [5] 02930 STX XTEMP
.155F DE 80 [4] 02931 LDX XTEMP5
.1561 A1 00 [5] 02932 CMPA 0,X COMPARE WITH J CHAR
.1563 26 06 (156B) [4] 02933 BNE CMPLB2 EXIT IF NOT EQUAL
.1565 08 [4] 02934 INX
.1566 DF 80 [5] 02935 STX XTEMP5
.1568 5A [2] 02936 DECB ELSE DECREMENT COUNT
.1569 26 ED (1558) [4] 02937 BNE CMPLB1 LOOP UNTIL DONE
.156B 39 [5] 02938 CMPLB2 RTS
The 6502 code:
Code:
.1981 04172 CMPLBL
. 04173 ; LDX I
. 04174 ; STX XTEMP ; GET I INTO XTEMP
. 04175 ; LDX J
. 04176 ; STX XTEMP5 ; GET J INTO XTEMP5
.1981 A0 00 [2] 04177 ldy #0 ; SET index
.1983 04178 CMPLB1 ;LDX XTEMP
.1983 B1 22 [5/6] 04179 lda (I),Y ; GET CHAR FROM I
. 04180 ; INX
. 04181 ; STX XTEMP
. 04182 ; LDX XTEMP5
.1985 D1 24 [5/6] 04183 cmp (J),Y ; COMPARE WITH J CHAR
.1987 D0 05 (198E) [2/3] 04184 bne CMPLB2 ; EXIT IF NOT EQUAL
. 04185 ; INX
. 04186 ; STX XTEMP5
. 04187
.1989 C8 [2] 04188 iny ; ELSE inCREMENT index
.198A C0 06 [2] 04189 cpy #6
.198C D0 F5 (1983) [2/3] 04190 bne CMPLB1 ; LOOP UNTIL DONE
. 04191
.198E 60 [6] 04192 CMPLB2 rts
PUSH pushes an address onto a software stack. This code is an even test which illustrates shortcomings in both processors. The 6800 does not provide easy access to the index register; this is corrected in the 6809. The 6502 does not provide an easy way to perform 16-bit arithmetic.
The original 6800 code:
Code:
.14B5 DF 80 [5] 02823 PUSH STX XTEMP5 PUT THE VALUE IN
.14B7 DE 28 [4] 02824 LDX SRSP THE X REGISTER ONTO THE
.14B9 96 80 [3] 02825 LDAA XTEMP5 SORT REQUEST STACK AND
.14BB A7 00 [6] 02826 STAA 0,X UPDATE THE SORT REQUEST
.14BD 08 [4] 02827 INX STACK POINTER
.14BE 96 81 [3] 02828 LDAA XTEMP5+1
.14C0 A7 00 [6] 02829 STAA 0,X
.14C2 08 [4] 02830 INX
.14C3 DF 28 [5] 02831 STX SRSP
.14C5 39 [5] 02832 RTS
The 6502 code:
Code:
.18AB 03977 PUSH
. 03978 ; STX XTEMP5 ; PUT THE VALUE IN
. 03979 ; LDX SRSP ; THE X REGISTER ONTO THE
. 03980 ; LDAA XTEMP5 ; SORT REQUEST STACK AND
. 03981
. 03982 ; Address in A:X
.18AB A0 01 [2] 03983 ldy #1
.18AD 91 2A [6] 03984 sta (SRSP),Y ; UPDATE THE SORT REQUEST
.18AF 88 [2] 03985 dey ; STACK POINTER
.18B0 8A [2] 03986 txa
.18B1 91 2A [6] 03987 sta (SRSP),Y
. 03988
.18B3 18 [2] 03989 clc ; Update "stack pointer"
.18B4 A5 2A [3] 03990 lda SRSP
.18B6 69 02 [2] 03991 adc #2
.18B8 85 2A [3] 03992 sta SRSP
.18BA A5 2B [3] 03993 lda SRSP+1
.18BC 69 00 [2] 03994 adc #0
.18BE 85 2B [3] 03995 sta SRSP+1
. 03996
.18C0 60 [6] 03997 rts
At this time, the symbol table sorter has been completely coded. Wish me luck as I begin to test it...