Comparisons and contrasts
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
You heard incorrectly. That's an illegal, disgusting and sadistic use of force. I said that they were inbred, not me.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
Thinking about the relative strengths of the 6502 always returns focus to the zero page.
Within the zero page, the 6502 indexing modes do behave like the 6800 index with displacement mode, with dramatic results. Taking liberties that the list need not be preserved after traversal, it is handled in parts which do fit in the zero page.
What is good for the goose is good for the gander and the gosling. The door is now open for the others to exploit their zero pages in the same manner.
Unfortunately, the 6800 cannot load the index register from a single byte. Its minor gain does not save it from last place.
Even though the 8080 architecture does not directly advantage the first 256 bytes of its memory space, single-byte addresses do present a huge opportunity in light of the clumsy addressing capabilities. Some tricky coding keeps the 8080 solidly in second place.
Within the zero page, the 6502 indexing modes do behave like the 6800 index with displacement mode, with dramatic results. Taking liberties that the list need not be preserved after traversal, it is handled in parts which do fit in the zero page.
Code: Select all
00004 ; 328611 ($503A3) cycles
00005 ; 256134 ($3E886) cycles after inlining subroutines
00006
0000 08 00007 Free fcb Heap
0001 4B 00008 J fcb 75 ; Inner loop runs 75 times
0002 0000 0000 00009 Total fdb 0,0
0006 0BB7 00010 I fdb 2999
00011
0008 00012 Heap:
00013
0200 00014 org $200
00015
00016 ;
00017 ; Allocate a node
00018 ;
0200 00019 Alloc:
0200 A5 00 [3] 00020 lda Free ; Allocate a node
0202 AA [2] 00021 tax ; Return it in X
00022
0203 18 [2] 00023 clc ; Point to free memory
0204 69 03 [2] 00024 adc #3
0206 85 00 [3] 00025 sta Free
00026
0208 60 [6] 00027 rts
00028
00029 ;
00030 ; Prepend a node to the list
00031 ;
0209 00032 Prepend:
0209 20 0200 [6] 00033 jsr Alloc ; Allocate a new node
00034
020C 94 00 [4] 00035 sty 0,X ; Point next of new node to Root
00036
020E 8A [2] 00037 txa ; This is now the first node
020F A8 [2] 00038 tay
00039
0210 A5 06 [3] 00040 lda I ; Store value in node
0212 95 01 [4] 00041 sta 1,X
0214 A5 07 [3] 00042 lda I+1
0216 95 02 [4] 00043 sta 2,X
00044
0218 60 [6] 00045 rts
00046
00047 ;
00048 ; Traverse the list and sum the values
00049 ;
0219 00050 NoCarry:
0219 B9 0000 [4/5] 00051 lda 0,Y ; Point to the next node
00052
021C F0 1A (0238) [2/3] 00053 beq Summed ; Until end of list
00054
021E A8 [2] 00055 tay
00056
021F 00057 Sum:
021F 18 [2] 00058 clc ; Add value of node to sum
0220 A5 02 [3] 00059 lda Total
0222 79 0001 [4/5] 00060 adc 1,Y
0225 85 02 [3] 00061 sta Total
0227 A5 03 [3] 00062 lda Total+1
0229 79 0002 [4/5] 00063 adc 2,Y
022C 85 03 [3] 00064 sta Total+1
022E 90 E9 (0219) [2/3] 00065 bcc NoCarry
00066
0230 E6 04 [5] 00067 inc Total+2
00068
0232 D0 E5 (0219) [2/3] 00069 bne NoCarry
00070
0234 E6 05 [5] 00071 inc Total+3
00072
0236 D0 E1 (0219) [2/3] 00073 bne NoCarry ; Always branches
00074
0238 00075 Summed:
0238 60 [6] 00076 rts
00077
00078 ;
00079 ;
00080 ;
0239 00081 RepeatOuterLoop:
0239 20 021F [6] 00082 jsr Sum ; Traverse the list and add them up
00083
023C A0 00 [2] 00084 ldy #0 ; Clear list
00085
023E A9 08 [2] 00086 lda #Heap ; Reset the heap
0240 85 00 [3] 00087 sta Free
00088
0242 A9 4B [2] 00089 lda #75 ; Run inner loop 75 times
0244 85 01 [3] 00090 sta J
00091
0246 D0 06 (024E) [2/3] 00092 bne InnerLoop ; Always branches
00093
0248 00094 NoBorrow:
0248 C6 06 [5] 00095 dec I
00096
024A C6 01 [5] 00097 dec J
024C F0 EB (0239) [2/3] 00098 beq RepeatOuterLoop
00099
024E 00100 Main:
024E 00101 InnerLoop:
024E 20 0209 [6] 00102 jsr Prepend
00103
0251 A5 06 [3] 00104 lda I ; Low byte of count zero?
0253 D0 F3 (0248) [2/3] 00105 bne NoBorrow ; No, no borrow
00106
0255 C6 07 [5] 00107 dec I+1 ; Borrow from upper byte
0257 10 EF (0248) [2/3] 00108 bpl NoBorrow ; Done if it goes negative
00109
0259 20 021F [6] 00110 jsr Sum ; Traverse the list and add them up
00111
025C 00 [7] 00112 brk
00113
024E 00114 end Main
Unfortunately, the 6800 cannot load the index register from a single byte. Its minor gain does not save it from last place.
Code: Select all
00004 * 366717 ($5987D) cycles
00005 * 288067 ($46543) cycles after inlining subroutines
00006
0000 000D 00007 Free fdb Heap
0002 0000 00008 Root fdb 0
0004 0000 00009 New fdb 0
0006 0000 0000 00010 Total fdb 0,0
000A 3C 00011 J fcb 60 ; Inner loop runs 60 times
000B 0BB7 00012 I fdb 2999
00013
000D 00014 Heap rmb 4*60
00015
00016 *
00017 * Allocate a node
00018 *
00FD 00019 Alloc
00FD D6 01 [3] 00020 ldab Free+1 ; Allocate a node
00FF D7 05 [4] 00021 stab New+1 ; Return it in New
00022
0101 CB 04 [2] 00023 addb #4 ; Point to free memory
0103 D7 01 [4] 00024 stab Free+1
00025
0105 39 [5] 00026 rts
00027
00028 *
00029 * Prepend a node to the list
00030 *
0106 00031 Prepend
0106 8D F5 (00FD) [8] 00032 bsr Alloc ; Allocate a new node
00033
0108 DE 04 [4] 00034 ldx New ; Point to the new node
00035
010A D6 03 [3] 00036 ldab Root+1 ; Point next of new node to Root
010C E7 01 [6] 00037 stab 1,X
010E 6F 00 [7] 00038 clr ,X
00039
0110 A7 03 [6] 00040 staa 3,X ; Store value in node
0112 D6 0B [3] 00041 ldab I
0114 E7 02 [6] 00042 stab 2,X
00043
0116 DF 02 [5] 00044 stx Root ; This is now the first node
00045
0118 39 [5] 00046 rts
00047
00048 *
00049 * Traverse the list and sum the values
00050 *
0119 00051 Sum
0119 DE 02 [4] 00052 ldx Root
011B 96 08 [3] 00053 ldaa Total+2
011D D6 09 [3] 00054 ldab Total+3
00055
011F 00056 SumLoop
011F EB 03 [5] 00057 addb 3,X ; Add value of node to sum
0121 A9 02 [5] 00058 adca 2,X
0123 24 08 (012D) [4] 00059 bcc NoCarry
00060
0125 7C 0007 [6] 00061 inc Total+1
00062
0128 26 03 (012D) [4] 00063 bne NoCarry
00064
012A 7C 0006 [6] 00065 inc Total
00066
012D 00067 NoCarry
012D EE 00 [6] 00068 ldx ,X ; Point to the next node
012F 26 EE (011F) [4] 00069 bne SumLoop ; Until end of list
00070
0131 97 08 [4] 00071 staa Total+2
0133 D7 09 [4] 00072 stab Total+3
00073
0135 39 [5] 00074 rts
00075
00076 *
00077 *
00078 *
0136 00079 RepeatOuterLoop
0136 97 0C [4] 00080 staa I+1 ; Save low byte of count
00081
0138 8D DF (0119) [8] 00082 bsr Sum ; Traverse the list and add them up
00083
013A DF 02 [5] 00084 stx Root ; Clear list
00085
013C 86 0D [2] 00086 ldaa #Heap ; Reset the heap
013E 97 01 [4] 00087 staa Free+1
00088
0140 86 3C [2] 00089 ldaa #60 ; Run inner loop 60 times
0142 97 0A [4] 00090 staa J
00091
0144 00092 Main
0144 00093 InnerLoop
0144 96 0C [3] 00094 ldaa I+1 ; Get low byte of node value
00095
0146 20 06 (014E) [4] 00096 bra IntoLoop
00097
0148 00098 NoBorrow
0148 4A [2] 00099 deca
00100
0149 7A 000A [6] 00101 dec J
014C 27 E8 (0136) [4] 00102 beq RepeatOuterLoop
00103
014E 00104 IntoLoop
014E 8D B6 (0106) [8] 00105 bsr Prepend
00106
0150 4D [2] 00107 tsta ; Low byte of count zero?
0151 26 F5 (0148) [4] 00108 bne NoBorrow ; No, no borrow
00109
0153 7A 000B [6] 00110 dec I ; Borrow from upper byte
0156 2A F0 (0148) [4] 00111 bpl NoBorrow ; Done if it goes negative
00112
0158 8D BF (0119) [8] 00113 bsr Sum ; Traverse the list and add them up
00114
015A 3F [12] 00115 swi
00116
0144 00117 end Main
Code: Select all
00004 ; 757274 (0B8E1Ah) cycles
00005 ; 594194 (091112h) cycles after inlining subroutines
00006
0000 0000 0000 00007 Total dw 0,0
0004 0BB7 00008 I dw 2999
00009
0006 00010 Heap ds 3*75
00011
00012 ;
00013 ; Allocate a node
00014 ;
00E7 00015 Alloc:
00E7 6B [5] 00016 mov L,E ; Allocate a node to return in HL
00017
00E8 1C [5] 00018 inr E ; Point to free memory
00E9 1C [5] 00019 inr E
00EA 1C [5] 00020 inr E
00021
00EB C9 [10] 00022 ret
00023
00024 ;
00025 ; Prepend a node to the list
00026 ;
00EC 00027 Prepend:
00EC CD 00E7 [17] 00028 call Alloc ; Allocate a new node
00029
00EF 72 [7] 00030 mov M,D ; Point next of new node to Root
00031
00F0 55 [5] 00032 mov D,L ; This is the new first node
00033
00F1 23 [5] 00034 inx H ; Store value in node
00F2 71 [7] 00035 mov M,C
00F3 23 [5] 00036 inx H
00F4 3A 0005 [13] 00037 lda I+1
00F7 77 [7] 00038 mov M,A
00039
00F8 C9 [10] 00040 ret
00041
00042 ;
00043 ; Traverse the list and sum the values
00044 ;
00F9 00045 Sum:
00F9 6A [5] 00046 mov L,D ; Point HL to first node
00FA EB [4] 00047 xchg
00FB 2A 0000 [16] 00048 lhld Total ; Load cumulative total
00FE EB [4] 00049 xchg
00050
00FF 00051 SumLoop:
00FF 23 [5] 00052 inx H ; Add value of node to sum
0100 4E [7] 00053 mov C,M
0101 23 [5] 00054 inx H
0102 46 [7] 00055 mov B,M
0103 EB [4] 00056 xchg
0104 09 [10] 00057 dad B
0105 EB [4] 00058 xchg
0106 D2 010E [10] 00059 jnc NoCarry
00060
0109 7D [5] 00061 mov A,L ; Carry to high high word
010A 2E 02 [7] 00062 mvi L,Total+2
010C 34 [10] 00063 inr M
010D 6F [5] 00064 mov L,A
00065
010E 00066 NoCarry:
010E 2B [5] 00067 dcx H ; Point to the next node
010F 2B [5] 00068 dcx H
0110 6E [7] 00069 mov L,M
00070
0111 7D [5] 00071 mov A,L
0112 B5 [4] 00072 ora L ; Until end of list
0113 C2 00FF [10] 00073 jnz SumLoop
00074
0116 EB [4] 00075 xchg
0117 22 0000 [16] 00076 shld Total
011A EB [4] 00077 xchg ; Leave 0 in HL
00078
011B C9 [10] 00079 ret
00080
00081 ;
00082 ;
00083 ;
011C 00084 Main:
011C 26 00 [7] 00085 mvi H,0
00086
011E 3A 0004 [13] 00087 lda I
0121 4F [5] 00088 mov C,A
00089
0122 C3 012A [10] 00090 jmp IntoOuterLoop
00091
0125 00092 RepeatOuterLoop:
0125 C5 [11] 00093 push B
00094
0126 CD 00F9 [17] 00095 call Sum ; Traverse the list and add them up
00096
0129 C1 [10] 00097 pop B
00098
012A 00099 IntoOuterLoop:
012A 06 4B [7] 00100 mvi B,75
00101
012C 11 0006 [10] 00102 lxi D,Heap ; Reset the heap and clear list
00103
012F C3 0137 [10] 00104 jmp IntoLoop
00105
0132 00106 NoBorrow:
0132 0D [5] 00107 dcr C
00108
0133 05 [5] 00109 dcr B
0134 CA 0125 [10] 00110 jz RepeatOuterLoop
00111
0137 00112 IntoLoop:
0137 CD 00EC [17] 00113 call Prepend
00114
013A 79 [5] 00115 mov A,C ; Low byte of count zero?
013B B1 [4] 00116 ora C
013C C2 0132 [10] 00117 jnz NoBorrow ; No, no borrow
00118
013F 2E 05 [7] 00119 mvi L,I+1 ; Borrow from upper byte
0141 35 [10] 00120 dcr M
0142 F2 0132 [10] 00121 jp NoBorrow ; Done if it goes negative
00122
0145 CD 00F9 [17] 00123 call Sum ; Traverse the list and add them up
00124
0148 76 [7] 00125 hlt
00126
011C 00127 end Main
Re: Comparisons and contrasts
A change in register usage narrows the gap but does not alter the standings.
Code: Select all
00007 * 360323 ($57F83) cycles
00008 * 281673 ($44C49) cycles after inlining subroutines
00009
0000 000D 00010 Free fdb Heap
0002 0000 00011 Root fdb 0
0004 0000 00012 New fdb 0
0006 0000 0000 00013 Total fdb 0,0
000A 3C 00014 J fcb 60 ; Inner loop runs 60 times
000B 0BB7 00015 I fdb 2999
00016
000D 00017 Heap rmb 4*60
00018
00019 *
00020 * Allocate a node
00021 *
00FD 00022 Alloc
00FD 97 05 [4] 00023 staa New+1 ; Return it in New
00024
00FF 8B 04 [2] 00025 adda #4 ; Point to free memory
00026
0101 39 [5] 00027 rts
00028
00029 *
00030 * Prepend a node to the list
00031 *
0102 00032 Prepend
0102 8D F9 (00FD) [8] 00033 bsr Alloc ; Allocate a new node
00034
0104 DE 04 [4] 00035 ldx New ; Point to the new node
00036
0106 E7 03 [6] 00037 stab 3,X ; Store value in node
0108 D6 0B [3] 00038 ldab I
010A E7 02 [6] 00039 stab 2,X
00040
010C D6 03 [3] 00041 ldab Root+1 ; Point next of new node to Root
010E E7 01 [6] 00042 stab 1,X
0110 6F 00 [7] 00043 clr ,X
00044
0112 DF 02 [5] 00045 stx Root ; This is now the first node
00046
0114 39 [5] 00047 rts
00048
00049 *
00050 * Traverse the list and sum the values
00051 *
0115 00052 Sum
0115 DE 02 [4] 00053 ldx Root
0117 96 08 [3] 00054 ldaa Total+2
0119 D6 09 [3] 00055 ldab Total+3
00056
011B 00057 SumLoop
011B EB 03 [5] 00058 addb 3,X ; Add value of node to sum
011D A9 02 [5] 00059 adca 2,X
011F 24 08 (0129) [4] 00060 bcc NoCarry
00061
0121 7C 0007 [6] 00062 inc Total+1
00063
0124 26 03 (0129) [4] 00064 bne NoCarry
00065
0126 7C 0006 [6] 00066 inc Total
00067
0129 00068 NoCarry
0129 EE 00 [6] 00069 ldx ,X ; Point to the next node
012B 26 EE (011B) [4] 00070 bne SumLoop ; Until end of list
00071
012D 97 08 [4] 00072 staa Total+2
012F D7 09 [4] 00073 stab Total+3
00074
0131 39 [5] 00075 rts
00076
00077 *
00078 *
00079 *
0132 00080 RepeatOuterLoop
0132 8D E1 (0115) [8] 00081 bsr Sum ; Traverse the list and add them up
00082
0134 DF 02 [5] 00083 stx Root ; Clear list
00084
0136 86 3C [2] 00085 ldaa #60 ; Run inner loop 60 times
0138 97 0A [4] 00086 staa J
00087
013A 00088 Main
013A 86 0D [2] 00089 ldaa #Heap ; Reset the heap
00090
013C D6 0C [3] 00091 ldab I+1 ; Load low byte of count
00092
013E 20 08 (0148) [4] 00093 bra IntoLoop
00094
0140 00095 NoBorrow
0140 5A [2] 00096 decb
0141 D7 0C [4] 00097 stab I+1
00098
0143 7A 000A [6] 00099 dec J
0146 27 EA (0132) [4] 00100 beq RepeatOuterLoop
00101
0148 00102 IntoLoop
0148 8D B8 (0102) [8] 00103 bsr Prepend
00104
014A D6 0C [3] 00105 ldab I+1 ; Low byte of count zero?
014C 26 F2 (0140) [4] 00106 bne NoBorrow ; No, no borrow
00107
014E 7A 000B [6] 00108 dec I ; Borrow from upper byte
0151 2A ED (0140) [4] 00109 bpl NoBorrow ; Done if it goes negative
00110
0153 8D C0 (0115) [8] 00111 bsr Sum ; Traverse the list and add them up
00112
0155 3F [12] 00113 swi
00114
013A 00115 end Main
Re: Comparisons and contrasts
In another thread, barrym95838 wrote:
To which I wrote:
To which BigEd wrote:
To which I wrote:
To which Dr Jefyll wrote:
Maybe Motorola moved on, but there's definitely some further-evolved 6809 DNA out there.
Back in the day, Hitachi out-6809ed Motorola by an embarrassing margin with their backward compatible 6309. "To the 6809 specifications, it adds higher clock rates, enhanced features, new instructions, and additional registers. Most of the new instructions were added to support the additional registers, as well as up to 32-bit math, hardware division, bit manipulations, and block transfers."
Also noteworthy are the MC9S12 and HCS12 series. These guys have "only" three stackpointer/index regs (U got turfed) but otherwise seem very 6809-like. Unfortunately this resemblance includes the 6809's very high prevalence of dead cycles, but on the plus side the MC9S12 and HCS12 are in current production by NXP and presumably clock a great deal faster.
-- Jeff
The Hitachi 6309 is to the 6809 as the 65816 is to the 6502. The addition of more instructions, registers and addressing modes along with a "native" mode. Even though it put its resources into the 68000 and not in further evolving the 6809, Motorola applied legal pressure to keep Hitachi from documenting the "improvements" they made. It took the hacker community to do that.
The evolution of the Motorola 8-bit line is more like
6800 --> 6801 --> HC11 --> HC12 --> HC16
|
+--> 6809
Unfortunately, when they went to the 6809, Motorola dropped many of the efficient inherent instructions in favor of generalized versions. Among those lost were TAB, TBA, ABA, SBA, CBA, INX, DEX, PSHA, PSHB, PULA, PULB. The generalized instructions were bigger and slower. Some dropped instructions were not replaced; the recommended workarounds were seriously inefficient in some cases.
One of the 6809 designers later wrote a master's thesis looking back on the design decisions:
https://www.cs.utexas.edu/ftp/techreports/tr81-206a.pdf
Interesting reading. He said that adding the direct page register was a mistake. I disagree; though I have not used that capability, it did not apparently cost much to implement.
To add to what I previously said:
I would also add lead to load into and cead to compare with the combined accumulator D. It would have been really nice to be able to do things like:
barrym95838 wrote:
The 6809 designers had P.I. code directly in mind when they designed it
BillG wrote:
I have written one 6809 program using PIC and it was no walk in the park, piece of cake or bed of roses.
That program is the resident part of a system extension to add command recall, aka history, to FLEX. The 6800 and 6502 versions use a relocation bitmap similar to how MOVCPM worked on an 8080 - assemble the code at two addresses a multiple of 256 bytes apart; the bytes which differed needed to be adjusted to relocate the code. But the code can only be placed at addresses differing from the original location by entire pages.
I thought that PIC would allow me to put the code anywhere and not waste up to a page of memory. It did, but one programming idiom was not easy to implement.
In absolute 6800 code, I can write:
to compare register X with the address of the end of a data structure.
In PIC 6809 code I cannot write:
I can write:
to load the address into a register, but I cannot easily compare it with another register. I had to push it onto the stack and compare it there:
which is substantially larger and slower.
If only Motorola had given me four "compare effective address" instructions: ceax, ceay, ceau and ceas...
That program is the resident part of a system extension to add command recall, aka history, to FLEX. The 6800 and 6502 versions use a relocation bitmap similar to how MOVCPM worked on an 8080 - assemble the code at two addresses a multiple of 256 bytes apart; the bytes which differed needed to be adjusted to relocate the code. But the code can only be placed at addresses differing from the original location by entire pages.
I thought that PIC would allow me to put the code anywhere and not waste up to a page of memory. It did, but one programming idiom was not easy to implement.
In absolute 6800 code, I can write:
Code: Select all
cpx #TheEnd
In PIC 6809 code I cannot write:
Code: Select all
cmpx #TheEnd,PCR
Code: Select all
leay TheEnd,PCR
Code: Select all
leay TheEnd,PCR
pshs Y
cmpx ,S++
If only Motorola had given me four "compare effective address" instructions: ceax, ceay, ceau and ceas...
BigEd wrote:
That doesn't seem so bad - you found an idiom which works and you could wrap it in a macro.
BillG wrote:
The initial intended target of the 6809 version of the program is a Peripheral Technologies PT69-5 single board computer.
It is based on a 6809 processor running at 2 MHz. Unlike the 6502, evolution of the 6809 stopped when Motorola moved on to bigger things, the 680x0 family. FLEX on a 6809 officially supports up to 56 KBytes of RAM. I specifically chose to target the PT69-5 because it has an additional 4 KBytes, much of which is otherwise unused; my code and the history buffer resides there so that the amount of memory available for program use remained the same.
Instead of that push and compare sequence in my last post, I ended up doing the load effective address and storing the result in a variable during program initialization then comparing pointers against that later - much smaller and faster. The point is that position independent code on the 6809 does not come free though it is easier than on most other architectures.
It is based on a 6809 processor running at 2 MHz. Unlike the 6502, evolution of the 6809 stopped when Motorola moved on to bigger things, the 680x0 family. FLEX on a 6809 officially supports up to 56 KBytes of RAM. I specifically chose to target the PT69-5 because it has an additional 4 KBytes, much of which is otherwise unused; my code and the history buffer resides there so that the amount of memory available for program use remained the same.
Instead of that push and compare sequence in my last post, I ended up doing the load effective address and storing the result in a variable during program initialization then comparing pointers against that later - much smaller and faster. The point is that position independent code on the 6809 does not come free though it is easier than on most other architectures.
Dr Jefyll wrote:
BillG wrote:
Unlike the 6502, evolution of the 6809 stopped when Motorola moved on to bigger things
Also noteworthy are the MC9S12 and HCS12 series. These guys have "only" three stackpointer/index regs (U got turfed) but otherwise seem very 6809-like. Unfortunately this resemblance includes the 6809's very high prevalence of dead cycles, but on the plus side the MC9S12 and HCS12 are in current production by NXP and presumably clock a great deal faster.
-- Jeff
The evolution of the Motorola 8-bit line is more like
6800 --> 6801 --> HC11 --> HC12 --> HC16
|
+--> 6809
Unfortunately, when they went to the 6809, Motorola dropped many of the efficient inherent instructions in favor of generalized versions. Among those lost were TAB, TBA, ABA, SBA, CBA, INX, DEX, PSHA, PSHB, PULA, PULB. The generalized instructions were bigger and slower. Some dropped instructions were not replaced; the recommended workarounds were seriously inefficient in some cases.
One of the 6809 designers later wrote a master's thesis looking back on the design decisions:
https://www.cs.utexas.edu/ftp/techreports/tr81-206a.pdf
Interesting reading. He said that adding the direct page register was a mistake. I disagree; though I have not used that capability, it did not apparently cost much to implement.
To add to what I previously said:
BillG wrote:
If only Motorola had given me four "compare effective address" instructions: ceax, ceay, ceau and ceas...
Code: Select all
cead 7,X
ceax D,Y
Re: Comparisons and contrasts
Quote:
One of the 6809 designers later wrote a master's thesis looking back on the design decisions
https://dl.acm.org/doi/pdf/10.1145/1500676.1500739
although it's not the same document. (But it does have a text layer so it's easier to copy and paste!)
Quote:
we wanted to prove that it was possible to produce an inexpensive microprocessor that was also easy to program. We felt that too many of the existing microprocessors were needlessly difficult to program. We suspected that the reason was not that it was impossible to make a microcomputer that was easy to program, but, rather, that the architects of the early microprocessors were generally more hardware oriented than software oriented.
Quote:
We wanted the architecture to efficiently support modern block-structured high-level languages. Features such as stack addressing were included for this purpose. We also wanted to better support assembly language with the ability to write recursive, reentrant programs and position-independent programs.
https://www.cs.utexas.edu/ftp/techreports/tr81-206a.pdf
https://www.cs.utexas.edu/ftp/techreports/tr81-206b.pdf
https://www.cs.utexas.edu/ftp/techreports/tr81-206c.pdf
Re: Comparisons and contrasts
Thanks. I did not realize that.
I have more reading to do...
I have more reading to do...
Re: Comparisons and contrasts
I quickly skimmed the last two documents. A more detailed read will have to wait for the weekend.
My initial impressions about his two sets of papers:
* His samples were too small. Granted the studies were done before Al Gore invented the Internet and the whole open source phenomenon arose. The choice of the 6839 floating point library ROM is going to be biased toward position independent code and away from extended and direct addressing. The monitor, as system software, will avoid the use of the direct page so that application programs can have it.
* He said that lbsr, leax, pshs were the top three most frequently appearing opcodes.
680x programmers are used to coding BSR to call a subroutine over JSR because it is both smaller and faster. On the 6800, if the assembler said that a branch was out of range, change it JMP. On the 6809, it is much easier to add 'L' to make it LBSR even though it is a cycle slower than JMP. Some programmers may have gotten into the habit of always coding "LBSR" instead of "BSR" or "JMP.". I do not believe as he does that a majority of programmers strove for position independent code.
Likewise, it is much easier to change an out of range short conditional branch to a long one than inserting and branching around an absolute jump as was the case on the 6800. Long conditional branches are good and not because they are position independent.
By getting rid of INX and DEX, what did he think was going to happen? He did not filter out "LEAX 1,X" and "LEAX -1,X" for special consideration.
PSHA and PULA were very commonly used on the 6800 with PSHB and PULB slightly less so. Those were moved from the 6800 "inherent" bucket to the 6809 pshs/puls one. With more than one index register, there is nowhere near the need for pushing and pulling X as was the case on the 6800.
* He said that the indexed addressing mode was more heavily used on the 6809 than the 6800.
Again, by getting rid of INX and DEX, usage was moved from the 6800 "inherent" bucket to the 6809 "indexed" one.
* He said that indirect addressing modes were seldom used.
His survey was done before the use of high-level programming languages, especially C, significantly displaced assembly language on 8-bit microprocessors. Things like "LDA [6,S]" are likely common in code generated by compilers.
Something like "LDA [6,S+]" is not very useful. It would be different if it was the address at "6,S" instead of the S register that was being incremented and that is not supported.
* The answer for ABA, SBA and CBA not being orthogonal is to add AAB, SAB and CAB. Those are very useful instructions to have. The 6809 has around thirty unused single-byte opcodes; I would have also kept the single-byte forms of INX, DEX, PSHA, PULA, PSHB, PULB as the 6801, HC11, etc. did.
My initial impressions about his two sets of papers:
* His samples were too small. Granted the studies were done before Al Gore invented the Internet and the whole open source phenomenon arose. The choice of the 6839 floating point library ROM is going to be biased toward position independent code and away from extended and direct addressing. The monitor, as system software, will avoid the use of the direct page so that application programs can have it.
* He said that lbsr, leax, pshs were the top three most frequently appearing opcodes.
680x programmers are used to coding BSR to call a subroutine over JSR because it is both smaller and faster. On the 6800, if the assembler said that a branch was out of range, change it JMP. On the 6809, it is much easier to add 'L' to make it LBSR even though it is a cycle slower than JMP. Some programmers may have gotten into the habit of always coding "LBSR" instead of "BSR" or "JMP.". I do not believe as he does that a majority of programmers strove for position independent code.
Likewise, it is much easier to change an out of range short conditional branch to a long one than inserting and branching around an absolute jump as was the case on the 6800. Long conditional branches are good and not because they are position independent.
By getting rid of INX and DEX, what did he think was going to happen? He did not filter out "LEAX 1,X" and "LEAX -1,X" for special consideration.
PSHA and PULA were very commonly used on the 6800 with PSHB and PULB slightly less so. Those were moved from the 6800 "inherent" bucket to the 6809 pshs/puls one. With more than one index register, there is nowhere near the need for pushing and pulling X as was the case on the 6800.
* He said that the indexed addressing mode was more heavily used on the 6809 than the 6800.
Again, by getting rid of INX and DEX, usage was moved from the 6800 "inherent" bucket to the 6809 "indexed" one.
* He said that indirect addressing modes were seldom used.
His survey was done before the use of high-level programming languages, especially C, significantly displaced assembly language on 8-bit microprocessors. Things like "LDA [6,S]" are likely common in code generated by compilers.
Something like "LDA [6,S+]" is not very useful. It would be different if it was the address at "6,S" instead of the S register that was being incremented and that is not supported.
* The answer for ABA, SBA and CBA not being orthogonal is to add AAB, SAB and CAB. Those are very useful instructions to have. The 6809 has around thirty unused single-byte opcodes; I would have also kept the single-byte forms of INX, DEX, PSHA, PULA, PSHB, PULB as the 6801, HC11, etc. did.
Re: Comparisons and contrasts
As I read about the limited amount of software available to him for analysis, I keep getting a feeling of "lost opportunity."
Southwest Technical Products (SWTPC) was less than a hundred miles down the road in San Antonio. The SWTPC 6800 computer was the most common computer built around the 6800 processor; it was mentioned quite often in magazines such as Popular Electronics.
SWTPC offered a good selection of software at very inexpensive prices including BASIC interpreters as well as a native text editor and assembler.
When the time came to do his "post mortem" analysis, SWTPC was selling a 6809 computer running FLEX and an even larger selection of software.
At that time, Motorola was making huge bucks selling processors to be used in vehicle engine controllers so they may have intentionally chosen to ignore "toy" computers.
Southwest Technical Products (SWTPC) was less than a hundred miles down the road in San Antonio. The SWTPC 6800 computer was the most common computer built around the 6800 processor; it was mentioned quite often in magazines such as Popular Electronics.
SWTPC offered a good selection of software at very inexpensive prices including BASIC interpreters as well as a native text editor and assembler.
When the time came to do his "post mortem" analysis, SWTPC was selling a 6809 computer running FLEX and an even larger selection of software.
At that time, Motorola was making huge bucks selling processors to be used in vehicle engine controllers so they may have intentionally chosen to ignore "toy" computers.
Re: Comparisons and contrasts
BillG wrote:
In another thread, barrym95838 wrote:
barrym95838 wrote:
The 6809 designers had P.I. code directly in mind when they designed it
https://archive.org/details/byte-magazi ... ew=theater
https://archive.org/details/byte-magazi ... ew=theater
One of the things which bug me about programming the 6809 is that the test memory instruction TST takes one more cycle than the instruction to load the byte into a register. The processor treats it like a read-modify-write instruction without the write.
Re: Comparisons and contrasts
BillG wrote:
the test memory instruction TST takes one more cycle than the instruction to load the byte into a register.
- LDA direct-page (and similar examples) 4 cycles instead of 3
LDA absolute (and similar examples) 5 cycles instead of 4
INC direct-page (and similar examples) 6 cycles instead of 5
INC absolute (and similar examples) 7 cycles instead of 6
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: Comparisons and contrasts
In section 6.1 of https://www.cs.utexas.edu/ftp/techreports/tr81-206b.pdf, he mentions that the 6801 was internally faster than the 6809 and the instructions which could have been made a cycle faster:
* load effective address (lea*)
* subroutine calls (jsr, bsr, lbsr)
* pushes and pulls
* direct and extended addressing modes
* possibly indexed mode
* an additional cycle off of 5-bit offset indexing (these two would have really helped the new forms of INX and DEX)
I suspect that they faced enormous pressure to ship the 6809 before the upcoming 68000 made it less relevant. The 68000 did not arrive in quantity until late 1980, so I wish they had taken the time to do that optimization.
I am still chapped that they did not base TST on the LDA or LDB logic instead of the INC or DEC.
* load effective address (lea*)
* subroutine calls (jsr, bsr, lbsr)
* pushes and pulls
* direct and extended addressing modes
* possibly indexed mode
* an additional cycle off of 5-bit offset indexing (these two would have really helped the new forms of INX and DEX)
I suspect that they faced enormous pressure to ship the 6809 before the upcoming 68000 made it less relevant. The 68000 did not arrive in quantity until late 1980, so I wish they had taken the time to do that optimization.
I am still chapped that they did not base TST on the LDA or LDB logic instead of the INC or DEC.
Re: Comparisons and contrasts
BillG wrote:
I suspect that they faced enormous pressure to ship the 6809 before the upcoming 68000 made it less relevant.
Hitachi, however, was highly successful in eliminating dead cycles (on the 6309, I mean).
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: Comparisons and contrasts
There is an ambiguity in the 6809 I finally got around to testing on real hardware.
If I do this:
X obviously contains $FFFF.
But if I do this:
does X contain 0 or 1?
It would depend upon whether the load is done first or the increment.
I will reveal the answer later while you guess and discuss...
If I do this:
Code: Select all
ldx #0
leax ,-X
But if I do this:
Code: Select all
ldx #0
leax ,X+
It would depend upon whether the load is done first or the increment.
I will reveal the answer later while you guess and discuss...
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
I ran into that conundrum while designing my own 65m32a architecture (a decades-long work in progress), and decided that the increment should be after the potential read of the effective address but before the load (or store), resulting in X=1. I still don't know if that decision is going to bite me in the future.
[Edit: actually I don't believe I thought this particular case through, because it (LDX #,X+) is not allowed in 65m32a assembly. STX ,X+ is allowed, and would result in a one being stored in location zero.]
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
[Edit: actually I don't believe I thought this particular case through, because it (LDX #,X+) is not allowed in 65m32a assembly. STX ,X+ is allowed, and would result in a one being stored in location zero.]
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):
Code: Select all
org $0100
ldx #5
txs
jsr $1234
brkGot a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
barrym95838 wrote:
I ran into that conundrum while designing my own 65m32a architecture (a decades-long work in progress), and decided that the increment should be after the potential read of the effective address but before the load (or store), resulting in X=1. I still don't know if that decision is going to bite me in the future.
[Edit: actually I don't believe I thought this particular case through, because it (LDX #,X+) is not allowed in 65m32a assembly. STX ,X+ is allowed, and would result in a one being stored in location zero.]
[Edit: actually I don't believe I thought this particular case through, because it (LDX #,X+) is not allowed in 65m32a assembly. STX ,X+ is allowed, and would result in a one being stored in location zero.]
barrym95838 wrote:
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
Code: Select all
org $0100
ldx #5
txs
jsr $1234
brkMy simulator, which I in no way claim is authoritative, jumps to $1234 with the instruction changed to jsr $105.
As for
BillG wrote:
Code: Select all
ldx #0
leax ,X+
I had no idea going in which it was going to be. My guess at this point is that the indexing logic (which is agnostic as to the type of instruction being executed) loads the effective address into an internal register for use by the ALU, incrementing X at the end of the process. That internal register is copied to X in the final "write back" stage of the instruction.
It is interesting that the 68000 does not allow
Code: Select all
lea (A1)+,A1