Comparisons and contrasts
Re: Comparisons and contrasts
Mike, thanks for a very interesting thread.
While you specifically requested for 1970's 8-bitters, I want a quick mention of the (12-bit) PDP-8. As a child of the micro age, I have had no direct experience with these machines. Recently I went down a rabbit hole that introduced me to the PDP-8 architecture, and I was struck by the similarity to the 6502 (the other way around really).
The heavy reliance on the zero page makes it clear that the 6502 is a descendant of the PDP-8 architecture. Similarly, the concept of the 'current page' wound its way into the more modern 65C816.
To be even more accurate, the use of the zero page is found earlier in the CDC 160 designed my Mr. Cray himself around 1960.
While you specifically requested for 1970's 8-bitters, I want a quick mention of the (12-bit) PDP-8. As a child of the micro age, I have had no direct experience with these machines. Recently I went down a rabbit hole that introduced me to the PDP-8 architecture, and I was struck by the similarity to the 6502 (the other way around really).
The heavy reliance on the zero page makes it clear that the 6502 is a descendant of the PDP-8 architecture. Similarly, the concept of the 'current page' wound its way into the more modern 65C816.
To be even more accurate, the use of the zero page is found earlier in the CDC 160 designed my Mr. Cray himself around 1960.
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
It's okay (by me) to drift a bit. My 10th grade trigonometry teacher had a pdp-8/a, and I played around with DEC BASIC, but never found an assembler for it at the time. 29 years later, I found a *mulator and learned enough to compose a small program. My spare time seems to be getting more and more precious, so I had to kinda leave it at that ...
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
barrym95838 wrote:
It's okay (by me) to drift a bit. My 10th grade trigonometry teacher had a pdp-8/a, and I played around with DEC BASIC, but never found an assembler for it at the time. 29 years later, I found a *mulator and learned enough to compose a small program. My spare time seems to be getting more and more precious, so I had to kinda leave it at that ...
https://talk.dallasmakerspace.org/t/pro ... /18852/229
Someone had already submitted it to that site.
Re: Comparisons and contrasts
Hmmm... Let's see what you really think about topic drift: https://github.com/rwiker/50-shades-of-99-bottles
Re: Comparisons and contrasts
I have been working on a project using the CP/M version of Turbo Pascal, so I threw the fib and tarai benchmarks at it.
Good, fast, cheap. The saying is that you cannot have all three.
Turbo Pascal was not expensive and it compiled quickly, but the generated code is not so good. It takes about four times longer to run these programs as SDCC samples.
Code: Select all
function fib(n : integer)
: integer;
begin
if n < 2
then
fib := n
else
fib := fib(n - 1) + fib(n - 2)
end; { fib }
begin
writeln(fib(23))
end. { Test }
Code: Select all
2100 01074 L_2100:
2100 01 000C [10] 01075 ld BC,000Ch ; Push stack frame
2103 21 ECAE [10] 01076 ld HL,0ECAEh
2106 CD 0509 [17] 01077 call L_0509
01078
2109 FDE1 [14] 01079 pop IY ; Get return address
01080
210B E1 [10] 01081 pop HL ; Get n from stack
210C 22 ECB6 [16] 01082 ld (0ECB6h),HL ; Store in local variable
01083
210F FDE5 [15] 01084 push IY ; Put return address back
01085
2111 2A ECB6 [16] 01086 ld HL,(0ECB6h) ; if n < 2
2114 E5 [11] 01087 push HL
2115 21 0002 [10] 01088 ld HL,0002
2118 D1 [10] 01089 pop DE
2119 CD 06E1 [17] 01090 call L_06E1
211C CB45 [8] 01091 bit 0,L
211E CA 212A [10] 01092 jp Z,L_212A
01093
2121 2A ECB6 [16] 01094 ld HL,(0ECB6h) ; Execute fib := n
2124 22 ECB8 [16] 01095 ld (0ECB8h),HL
01096
2127 C3 2150 [10] 01097 jp L_2150
01098
212A 01099 L_212A:
212A 2A ECB6 [16] 01100 ld HL,(0ECB6h) ; Evaluate fib(n - 1)
212D E5 [11] 01101 push HL
212E 21 0001 [10] 01102 ld HL,0001
2131 D1 [10] 01103 pop DE
2132 EB [4] 01104 ex DE,HL
2133 B7 [4] 01105 or A
2134 ED52 [15] 01106 sbc HL,DE
2136 E5 [11] 01107 push HL
2137 CD 2100 [17] 01108 call L_2100
213A E5 [11] 01109 push HL
01110
213B 2A ECB6 [16] 01111 ld HL,(0ECB6h) ; Evaluate fib(n - 2)
213E E5 [11] 01112 push HL
213F 21 0002 [10] 01113 ld HL,0002
2142 D1 [10] 01114 pop DE
2143 EB [4] 01115 ex DE,HL
2144 B7 [4] 01116 or A
2145 ED52 [15] 01117 sbc HL,DE
2147 E5 [11] 01118 push HL
2148 CD 2100 [17] 01119 call L_2100
01120
214B D1 [10] 01121 pop DE ; Fib := fib(n - 1) and fib(n - 2)
214C 19 [11] 01122 add HL,DE
214D 22 ECB8 [16] 01123 ld (0ECB8h),HL
01124
2150 01125 L_2150:
2150 2A ECB8 [16] 01126 ld HL,(0ECB8h) ; Prepare to return fib
2153 D9 [4] 01127 exx ; Stash return value
01128
2154 01 000C [10] 01129 ld BC,000Ch ; Pop stack frame and return
2157 11 ECAE [10] 01130 ld DE,0ECAEh
215A C3 0523 [10] 01131 jp L_0523
Code: Select all
01136 ;
01137 ; From here
01138 ;
2160 21 0017 [10] 01139 ld HL,23 ; Evaluate fib(23)
2163 E5 [11] 01140 push HL
2164 CD 2100 [17] 01141 call L_2100
01142 ;
01143 ; to here takes 106,552,447 cycles
01144 ;
Code: Select all
function tarai(x : integer;
y : integer;
z : integer)
: integer;
begin
if x <= y
then
tarai := y
else
tarai := tarai(tarai(x - 1, y, z),
tarai(y - 1, z, x),
tarai(z - 1, x, y))
end; { tarai }
begin
writeln(tarai(10, 1, 0))
end. { Test }
Code: Select all
2100 01074 L_2100:
2100 01 0010 [10] 01075 ld BC,0010h ; Push stack frame
2103 21 ECAA [10] 01076 ld HL,0ECAAh
2106 CD 0509 [17] 01077 call L_0509
01078
2109 FDE1 [14] 01079 pop IY ; Get return address
01080
210B E1 [10] 01081 pop HL ; Get z from stack
210C 22 ECB2 [16] 01082 ld (0ECB2h),HL ; Store in local variable
01083
210F E1 [10] 01084 pop HL ; Get y from stack
2110 22 ECB4 [16] 01085 ld (0ECB4h),HL ; Store in local variable
01086
2113 E1 [10] 01087 pop HL ; Get x from stack
2114 22 ECB6 [16] 01088 ld (0ECB6h),HL ; Store in local variable
01089
2117 FDE5 [15] 01090 push IY ; Put return address back
01091
2119 2A ECB6 [16] 01092 ld HL,(0ECB6h) ; if x <= y
211C E5 [11] 01093 push HL
211D 2A ECB4 [16] 01094 ld HL,(0ECB4h)
2120 D1 [10] 01095 pop DE
2121 CD 06B9 [17] 01096 call L_06B9
2124 CB45 [8] 01097 bit 0,L
2126 CA 2132 [10] 01098 jp Z,L_2132
01099
2129 2A ECB4 [16] 01100 ld HL,(0ECB4h) ; tarai := y
212C 22 ECB8 [16] 01101 ld (0ECB8h),HL
01102
212F C3 2183 [10] 01103 jp L_2183
01104
2132 01105 L_2132:
2132 2A ECB6 [16] 01106 ld HL,(0ECB6h) ; Evaluate tarai(x - 1, y, z)
2135 E5 [11] 01107 push HL
2136 21 0001 [10] 01108 ld HL,0001
2139 D1 [10] 01109 pop DE
213A EB [4] 01110 ex DE,HL
213B B7 [4] 01111 or A
213C ED52 [15] 01112 sbc HL,DE
213E E5 [11] 01113 push HL
213F 2A ECB4 [16] 01114 ld HL,(0ECB4h)
2142 E5 [11] 01115 push HL
2143 2A ECB2 [16] 01116 ld HL,(0ECB2h)
2146 E5 [11] 01117 push HL
2147 CD 2100 [17] 01118 call L_2100
01119
214A E5 [11] 01120 push HL ; Save first value
01121
214B 2A ECB4 [16] 01122 ld HL,(0ECB4h) ; Evaluate tarai(y - 1, z, x)
214E E5 [11] 01123 push HL
214F 21 0001 [10] 01124 ld HL,0001
2152 D1 [10] 01125 pop DE
2153 EB [4] 01126 ex DE,HL
2154 B7 [4] 01127 or A
2155 ED52 [15] 01128 sbc HL,DE
2157 E5 [11] 01129 push HL
2158 2A ECB2 [16] 01130 ld HL,(0ECB2h)
215B E5 [11] 01131 push HL
215C 2A ECB6 [16] 01132 ld HL,(0ECB6h)
215F E5 [11] 01133 push HL
2160 CD 2100 [17] 01134 call L_2100
01135
2163 E5 [11] 01136 push HL ; Save second value
01137
2164 2A ECB2 [16] 01138 ld HL,(0ECB2h) ; Evaluate tarai(z - 1, x, y)
2167 E5 [11] 01139 push HL
2168 21 0001 [10] 01140 ld HL,0001
216B D1 [10] 01141 pop DE
216C EB [4] 01142 ex DE,HL
216D B7 [4] 01143 or A
216E ED52 [15] 01144 sbc HL,DE
2170 E5 [11] 01145 push HL
2171 2A ECB6 [16] 01146 ld HL,(0ECB6h)
2174 E5 [11] 01147 push HL
2175 2A ECB4 [16] 01148 ld HL,(0ECB4h)
2178 E5 [11] 01149 push HL
2179 CD 2100 [17] 01150 call L_2100
01151
217C E5 [11] 01152 push HL ; Evaluate tarai(1st, 2nd, 3rd)
217D CD 2100 [17] 01153 call L_2100
2180 22 ECB8 [16] 01154 ld (0ECB8h),HL
01155
2183 01156 L_2183:
2183 2A ECB8 [16] 01157 ld HL,(0ECB8h) ; Prepare to return fib
2186 D9 [4] 01158 exx ; Stash return value
01159
2187 01 0010 [10] 01160 ld BC,0010h ; Pop stack frame and return
218A 11 ECAA [10] 01161 ld DE,0ECAAh
218D C3 0523 [10] 01162 jp L_0523
Code: Select all
01167 ;
01168 ; From here
01169 ;
2193 21 000A [10] 01170 ld HL,000Ah ; Evaluate tarai(10, 1, 0)
2196 E5 [11] 01171 push HL
2197 21 0001 [10] 01172 ld HL,0001
219A E5 [11] 01173 push HL
219B 21 0000 [10] 01174 ld HL,0000
219E E5 [11] 01175 push HL
219F CD 2100 [17] 01176 call L_2100
01177 ;
01178 ; to here takes 1,117,869,614 cycles
01179 ;
Turbo Pascal was not expensive and it compiled quickly, but the generated code is not so good. It takes about four times longer to run these programs as SDCC samples.
Re: Comparisons and contrasts
Can 8080 code be smaller and faster than Z80 code?
Much has been said about how the new instructions added to the Z80 are often too slow to be useful, that these are more commonly used to create compact code rather than faster execution.
I have a subroutine for which a version written for the 8080 is both smaller and quicker. It is used by CP/M Turbo Pascal programs to release allocated memory back to the heap. The 8080 version is not only faster; it is substantially faster.
I did not originally set out to write a faster implementation but had to reach repeatedly into my bag of tricks to avoid running out of registers while trying to do a literal translation. The result ended up surprisingly more efficient.
The original Z80 code:
As rewritten for the 8080:
Much has been said about how the new instructions added to the Z80 are often too slow to be useful, that these are more commonly used to create compact code rather than faster execution.
I have a subroutine for which a version written for the 8080 is both smaller and quicker. It is used by CP/M Turbo Pascal programs to release allocated memory back to the heap. The 8080 version is not only faster; it is substantially faster.
I did not originally set out to write a faster implementation but had to reach repeatedly into my bag of tricks to avoid running out of registers while trying to do a literal translation. The result ended up surprisingly more efficient.
The original Z80 code:
Code: Select all
07241 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
07242 ;
07243 ; Dispose
07244 ;
07245 ; Input:
07246 ; HL = size of block to free
07247 ; Top of stack = the address of variable containing block address
07248 ;
1D7B 07249 L_1D7B:
1D7B EB [4] 07250 ex DE,HL ; Move size to DE
07251
1D7C E1 [10] 07252 pop HL ; Get return address
1D7D E3 [19] 07253 ex (SP),HL ; Swap for address of variable
07254
1D7E 7E [7] 07255 ld A,(HL) ; Get address of the block to free
1D7F 23 [5] 07256 inc HL
1D80 66 [7] 07257 ld H,(HL)
1D81 6F [4] 07258 ld L,A
07259
1D82 13 [5] 07260 inc DE ; Round size up to next multiple of 4
1D83 13 [5] 07261 inc DE
1D84 13 [5] 07262 inc DE
1D85 7B [4] 07263 ld A,E
1D86 E6FC [7] 07264 and 0FCh
1D88 5F [4] 07265 ld E,A
07266
1D89 EB [4] 07267 ex DE,HL ; Save "quantized" size
1D8A 22 00F0 [16] 07268 ld (L_00F0),HL ; DE -> block to free
07269
1D8D 2A 00DE [16] 07270 ld HL,(L_00DE) ; Point to the first free block
1D90 E5 [11] 07271 push HL
1D91 DDE1 [14] 07272 pop IX
07273
1D93 B7 [4] 07274 or A ; Is block to free lower?
1D94 ED52 [15] 07275 sbc HL,DE
1D96 3052 (1DEA) [7/12] 07276 jr NC,L_1DEA ; Yes
07277
1D98 07278 L_1D98:
1D98 DD6E00 [19] 07279 ld L,(IX+0) ; Get next address
1D9B DD6601 [19] 07280 ld H,(IX+1)
1D9E E5 [11] 07281 push HL
07282
1D9F B7 [4] 07283 or A ; Is this block still lower?
1DA0 ED52 [15] 07284 sbc HL,DE
1DA2 3004 (1DA8) [7/12] 07285 jr NC,L_1DA8 ; No
07286
1DA4 DDE1 [14] 07287 pop IX ; Point to next free block
07288
1DA6 18F0 (1D98) [12] 07289 jr L_1D98 ; Keep looking
07290
1DA8 07291 L_1DA8:
1DA8 E1 [10] 07292 pop HL ; Get block just above block to free
07293
1DA9 D5 [11] 07294 push DE ; Point to block to be freed
1DAA FDE1 [14] 07295 pop IY
07296
1DAC ED4B 00F0 [20] 07297 ld BC,(L_00F0) ; Retrieve block size
07298
1DB0 FD7102 [19] 07299 ld (IY+2),C ; Store size
1DB3 FD7003 [19] 07300 ld (IY+3),B
1DB6 FD7500 [19] 07301 ld (IY+0),L ; Store next
1DB9 FD7401 [19] 07302 ld (IY+1),H
07303
1DBC DD7300 [19] 07304 ld (IX+0),E ; Point here from previous block
1DBF DD7201 [19] 07305 ld (IX+1),D
07306
1DC2 DDE5 [15] 07307 push IX ; Point to previous block
1DC4 E1 [10] 07308 pop HL
1DC5 DD4E02 [19] 07309 ld C,(IX+2) ; Get its size
1DC8 DD4603 [19] 07310 ld B,(IX+3)
1DCB CD 1E05 [17] 07311 call L_1E05 ; Try to coalesce
1DCE 2809 (1DD9) [7/12] 07312 jr Z,L_1DD9 ; Combined?
07313
1DD0 DD5E00 [19] 07314 ld E,(IX+0) ; Point to next block
1DD3 DD5601 [19] 07315 ld D,(IX+1)
1DD6 D5 [11] 07316 push DE
1DD7 DDE1 [14] 07317 pop IX
07318
1DD9 07319 L_1DD9:
1DD9 DDE5 [15] 07320 push IX ; Set up to try combining with next block
1DDB E1 [10] 07321 pop HL
1DDC DD4E02 [19] 07322 ld C,(IX+2) ; Load size
1DDF DD4603 [19] 07323 ld B,(IX+3)
1DE2 DD5E00 [19] 07324 ld E,(IX+0) ; Load next
1DE5 DD5601 [19] 07325 ld D,(IX+1)
1DE8 181B (1E05) [12] 07326 jr L_1E05
07327
1DEA 07328 L_1DEA:
1DEA 2A 00DE [16] 07329 ld HL,(L_00DE) ; Freed block is now first free block
1DED ED53 00DE [20] 07330 ld (L_00DE),DE
1DF1 D5 [11] 07331 push DE
1DF2 DDE1 [14] 07332 pop IX
1DF4 DD7500 [19] 07333 ld (IX+0),L ; Store next
1DF7 DD7401 [19] 07334 ld (IX+1),H
1DFA ED4B 00F0 [20] 07335 ld BC,(L_00F0) ; Store size
1DFE DD7102 [19] 07336 ld (IX+2),C
1E01 DD7003 [19] 07337 ld (IX+3),B
1E04 EB [4] 07338 ex DE,HL
07339
1E05 07340 L_1E05:
1E05 09 [11] 07341 add HL,BC ; Are the blocks adjacent?
1E06 B7 [4] 07342 or A
1E07 ED52 [15] 07343 sbc HL,DE
1E09 C0 [5/11] 07344 ret NZ ; No
07345
1E0A D5 [11] 07346 push DE
1E0B FDE1 [14] 07347 pop IY
07348
1E0D 2A 00C4 [16] 07349 ld HL,(L_00C4) ; Is upper block at the top of the heap?
1E10 B7 [4] 07350 or A
1E11 ED52 [15] 07351 sbc HL,DE
1E13 281B (1E30) [7/12] 07352 jr Z,L_1E30 ; Yes
07353
1E15 FD7E00 [19] 07354 ld A,(IY+0) ; Transfer next to lower block
1E18 DD7700 [19] 07355 ld (IX+0),A
1E1B FD7E01 [19] 07356 ld A,(IY+1)
1E1E DD7701 [19] 07357 ld (IX+1),A
1E21 FD6E02 [19] 07358 ld L,(IY+2) ; Add the sizes
1E24 FD6603 [19] 07359 ld H,(IY+3)
1E27 09 [11] 07360 add HL,BC
1E28 DD7502 [19] 07361 ld (IX+2),L
1E2B DD7403 [19] 07362 ld (IX+3),H
1E2E AF [4] 07363 xor A ; Clear Z flag
07364
1E2F C9 [10] 07365 ret
07366
1E30 07367 L_1E30:
1E30 DDE5 [15] 07368 push IX ; Set new top of heap
1E32 E1 [10] 07369 pop HL
1E33 22 00C4 [16] 07370 ld (L_00C4),HL
07371
1E36 0604 [7] 07372 ld B,4 ; Zero the header
07373
1E38 07374 L_1E38:
1E38 3600 [10] 07375 ld (HL),0
1E3A 23 [5] 07376 inc HL
1E3B 10FB (1E38) [8/13] 07377 djnz L_1E38
07378
1E3D C9 [10] 07379 ret
As rewritten for the 8080:
Code: Select all
05810 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
05811 ;
05812 ; Dispose
05813 ;
05814 ; Input:
05815 ; HL = size of block to free
05816 ; Top of stack = the address of variable containing block address
05817 ;
14E9 05818 Rq1D7B:
14E9 EB [4] 05819 xchg ; Move size to DE
05820
14EA E1 [10] 05821 pop H ; Get return address
14EB E3 [18] 05822 xthl ; Swap for address of variable
05823
14EC 7E [7] 05824 mov A,M ; Get address of the block to free
14ED 23 [5] 05825 inx H
14EE 66 [7] 05826 mov H,M
14EF 6F [5] 05827 mov L,A
05828
14F0 13 [5] 05829 inx D ; Round size up to next multiple of 4
14F1 13 [5] 05830 inx D
14F2 13 [5] 05831 inx D
14F3 7B [5] 05832 mov A,E
14F4 E6 FC [7] 05833 ani 0FCh
14F6 5F [5] 05834 mov E,A
05835
14F7 EB [4] 05836 xchg ; Save "quantized" size
14F8 22 00F0 [16] 05837 shld Vq00F0 ; DE -> block to free
05838
14FB 2A 00DE [16] 05839 lhld Vq00DE ; Point to the first free block
05840
14FE 7D [5] 05841 mov A,L ; Is block to free lower?
14FF 93 [4] 05842 sub E
1500 7C [5] 05843 mov A,H
1501 9A [4] 05844 sbb D
1502 DA 151E [10] 05845 jc Lq1D98 ; No
05846
1505 2A 00F0 [16] 05847 lhld Vq00F0 ; Load size into BC
1508 44 [5] 05848 mov B,H
1509 4D [5] 05849 mov C,L
05850
150A 2A 00DE [16] 05851 lhld Vq00DE ; Freed block is now first free block
150D EB [4] 05852 xchg
150E 22 00DE [16] 05853 shld Vq00DE
05854
1511 73 [7] 05855 mov M,E ; Store next
1512 23 [5] 05856 inx H
1513 72 [7] 05857 mov M,D
05858
1514 23 [5] 05859 inx H ; Store size
1515 71 [7] 05860 mov M,C
1516 23 [5] 05861 inx H
1517 70 [7] 05862 mov M,B
1518 2B [5] 05863 dcx H
1519 2B [5] 05864 dcx H
151A 2B [5] 05865 dcx H
05866
151B C3 1554 [10] 05867 jmp Lq1E05 ; Try to coalesce with next block
05868
151E 05869 Lq1D98:
151E 44 [5] 05870 mov B,H ; Remember current block address
151F 4D [5] 05871 mov C,L
05872
1520 7E [7] 05873 mov A,M ; Get next address
1521 23 [5] 05874 inx H
1522 66 [7] 05875 mov H,M
1523 6F [5] 05876 mov L,A
05877
1524 93 [4] 05878 sub E ; Is this block still lower?
1525 7C [5] 05879 mov A,H
1526 9A [4] 05880 sbb D
1527 DA 151E [10] 05881 jc Lq1D98 ; Yes
05882
152A E5 [11] 05883 push H ; Save address of block above block to free
05884
152B 60 [5] 05885 mov H,B ; Point to block just below block to free
152C 69 [5] 05886 mov L,C
05887
152D 73 [7] 05888 mov M,E ; Point here from previous block
152E 23 [5] 05889 inx H
152F 72 [7] 05890 mov M,D
05891
1530 E1 [10] 05892 pop H ; Refresh address of block above block to free
05893
1531 EB [4] 05894 xchg ; Point to block to free
05895
1532 73 [7] 05896 mov M,E ; Store next
1533 23 [5] 05897 inx H
1534 72 [7] 05898 mov M,D
05899
1535 EB [4] 05900 xchg ; Retrieve block size
1536 2A 00F0 [16] 05901 lhld Vq00F0
1539 EB [4] 05902 xchg
05903
153A 23 [5] 05904 inx H ; Store size
153B 73 [7] 05905 mov M,E
153C 23 [5] 05906 inx H
153D 72 [7] 05907 mov M,D
05908
153E 60 [5] 05909 mov H,B ; Point to previous block
153F 69 [5] 05910 mov L,C
05911
1540 CD 154A [17] 05912 call Lq1DD9 ; Try to coalesce with block being freed
1543 CA 154A [10] 05913 jz Lq1DD9 ; Combined?
05914
1546 7E [7] 05915 mov A,M ; Point to next block
1547 23 [5] 05916 inx H
1548 66 [7] 05917 mov H,M
1549 6F [5] 05918 mov L,A
05919
154A 05920 Lq1DD9:
154A 23 [5] 05921 inx H ; Set up to try combining with the next block
154B 23 [5] 05922 inx H
154C 23 [5] 05923 inx H
154D 46 [7] 05924 mov B,M ; Load size
154E 2B [5] 05925 dcx H
154F 4E [7] 05926 mov C,M
1550 2B [5] 05927 dcx H ; Load next
1551 56 [7] 05928 mov D,M
1552 2B [5] 05929 dcx H
1553 5E [7] 05930 mov E,M
05931
1554 05932 Lq1E05:
1554 7D [5] 05933 mov A,L ; Are the blocks adjacent?
1555 81 [4] 05934 add C
1556 BB [4] 05935 cmp E
1557 C0 [5/11] 05936 rnz ; No - low bytes do not match
1558 7D [5] 05937 mov A,L
1559 81 [4] 05938 add C
155A 7C [5] 05939 mov A,H
155B 88 [4] 05940 adc B
155C BA [4] 05941 cmp D
155D C0 [5/11] 05942 rnz ; No - high bytes do not match
05943
155E E5 [11] 05944 push H ; Stash address of lower block
05945
155F 2A 00C4 [16] 05946 lhld Vq00C4 ; Is upper block at the top of the heap?
1562 7D [5] 05947 mov A,L
1563 BB [4] 05948 cmp E
1564 C2 156C [10] 05949 jnz Lq1E15 ; No
1567 7C [5] 05950 mov A,H
1568 BA [4] 05951 cmp D
1569 CA 1582 [10] 05952 jz Lq1E30 ; Yes
05953
156C 05954 Lq1E15:
156C E1 [10] 05955 pop H ; Recover address of lower block
05956
156D 1A [7] 05957 ldax D ; Transfer next to lower block
156E 77 [7] 05958 mov M,A
156F 23 [5] 05959 inx H
1570 13 [5] 05960 inx D
1571 1A [7] 05961 ldax D
1572 77 [7] 05962 mov M,A
1573 23 [5] 05963 inx H
1574 13 [5] 05964 inx D
05965
1575 1A [7] 05966 ldax D ; Add the sizes
1576 81 [4] 05967 add C
1577 77 [7] 05968 mov M,A
1578 13 [5] 05969 inx D
1579 23 [5] 05970 inx H
157A 1A [7] 05971 ldax D
157B 88 [4] 05972 adc B
157C 77 [7] 05973 mov M,A
05974
157D 2B [5] 05975 dcx H ; Point back to the base
157E 2B [5] 05976 dcx H
157F 2B [5] 05977 dcx H
05978
1580 AF [4] 05979 xra A ; Clear Z flag
05980
1581 C9 [10] 05981 ret
05982
1582 05983 Lq1E30:
1582 E1 [10] 05984 pop H ; Recover address of lower block
05985
1583 22 00C4 [16] 05986 shld Vq00C4 ; Set new top of heap
05987
1586 06 04 [7] 05988 mvi B,4 ; Zero the header
05989
1588 05990 Lq1E38:
1588 36 00 [10] 05991 mvi M,0
158A 23 [5] 05992 inx H
158B 05 [5] 05993 dcr B
158C C2 1588 [10] 05994 jnz Lq1E38
05995
158F C9 [10] 05996 ret
Re: Comparisons and contrasts
That's interesting... having done those optimisations in 8080 land, would any of the tactics apply in Z80 land?
Can you put your finger on what it is in the Z80 which lets it down in this case relative to the 8080?
Can you put your finger on what it is in the Z80 which lets it down in this case relative to the 8080?
-
kernelthread
- Posts: 166
- Joined: 23 Jun 2021
Re: Comparisons and contrasts
The Z80 is object code compatible with the 8080, so the smallest (in code size) Z80 implementation of any function can't be larger than the smallest implementation for the 8080.
I don't know how the cycle times compare. From BillG's long post above it appears that some instructions may be faster on the 8080 than the same instruction on the Z80.
I don't know how the cycle times compare. From BillG's long post above it appears that some instructions may be faster on the 8080 than the same instruction on the Z80.
Re: Comparisons and contrasts
BigEd wrote:
That's interesting... having done those optimisations in 8080 land, would any of the tactics apply in Z80 land?
Can you put your finger on what it is in the Z80 which lets it down in this case relative to the 8080?
Can you put your finger on what it is in the Z80 which lets it down in this case relative to the 8080?
The author of the original Borland code seemed to be overly fixated on using the Z80 instructions.
The prime example is using the register pair SBC instruction to perform comparisons. Lacking those, I used the atomic 8080 byte operations and they turned out to be faster.
The second is using the index registers. The Z80 version spends an inordinate number of cycles pushing and popping values between register pairs. Because this subroutine uses only low values for index offsets, the 8080 can get by with incrementing and decrementing the base address in a register. Couple that with the fact that the indexed addressing operation itself (fetching an additional byte for the offset, then adding it to the base) is also inefficient.
I did not show an assembly of the 8080 code with a Z80 assembler. A single byte register-to-register move on the Z80 is only four cycles as opposed to five on the 8080, so the difference running the 8080 code on a Z80 is even more striking.
Last edited by BillG on Thu Aug 05, 2021 1:31 pm, edited 1 time in total.
Re: Comparisons and contrasts
kernelthread wrote:
The Z80 is object code compatible with the 8080, so the smallest (in code size) Z80 implementation of any function can't be larger than the smallest implementation for the 8080.
kernelthread wrote:
I don't know how the cycle times compare. From BillG's long post above it appears that some instructions may be faster on the 8080 than the same instruction on the Z80.
The few exceptions include:
Single byte register-to-register move - five cycles on the 8080; four on the Z80
Register pair add - ten cycles on the 8080, eleven on the Z80
Re: Comparisons and contrasts
BillG wrote:
Single byte register-to-register move - five cycles on the 8080; four on the Z80
Register pair add - ten cycles on the 8080, eleven on the Z80
Register pair add - ten cycles on the 8080, eleven on the Z80
The Z80 is faster with
Code: Select all
LD r,r +1
DEC r +1
INC r +1
JP (HL) +2Code: Select all
ADD HL,r16 +1
INC r16 +1
DEC r16 +1
IN a +1
OUT a +1
LD SP,HL +1
EX (SP),HL +1Maybe you don't know but there is Turbo Pascal 4 compatible compiler for the 8080 - https://www.vcfed.org/forum/forum/genre ... s-compiler - I am sure it contains quite good coding.
Re: Comparisons and contrasts
litwr wrote:
The Z80 is faster with
The 8080 is faster with
Code: Select all
LD r,r +1
DEC r +1
INC r +1
JP (HL) +2Code: Select all
ADD HL,r16 +1
INC r16 +1
DEC r16 +1
IN a +1
OUT a +1
LD SP,HL +1
EX (SP),HL +1litwr wrote:
The most important 8080 advantage is the faster 16-bit ADD instruction. However if you try to make the best code, the Z80 is much faster than the 8080 and the Z80 is even slightly faster than the 8085.
litwr wrote:
Maybe you don't know but there is Turbo Pascal 4 compatible compiler for the 8080 - https://www.vcfed.org/forum/forum/genre ... s-compiler - I am sure it contains quite good coding.
https://www.vcfed.org/forum/forum/genre ... post934040
Re: Comparisons and contrasts
BillG wrote:
The Z80 is "much faster" only for some things. Pascal's dispose is not one of them.
Code: Select all
OR A
SBC HL,DL (19 ticks, 3 bytes)
Code: Select all
LD A,L
SUB E
LD A,H
SBC A,D (16 ticks, 4 bytes - 18 ticks on the 8080)
BillG wrote:
I read that thread but chose not to look further because it did not work with CP/M. Then I saw this and had to laugh...
https://www.vcfed.org/forum/forum/genre ... post934040
https://www.vcfed.org/forum/forum/genre ... post934040
Re: Comparisons and contrasts
I think it is worth to add some clarifications. I wrote that the Z80 may be much faster than the 8080 - I only meant a value about up to 30% larger. I agree that most of the Z80 instructions are useless to significantly boost the performance. However some instructions are good for slight speed increasing: SBC HL, ADC HL, additional shifts and rotates, additional loads/stores for 16-bit registers, ... The Z80 instructions also allow us to write more compact code. They also give us some features which imitating are very costly on the 8080.
BTW I published benchmark results for optimized sorting algos for the 6502, Z80, and 6809. So when the 6809 or Z80 have enough registers for an algorithm implementation they are good. The 6809 also demonstrates excellent code density.
BTW I published benchmark results for optimized sorting algos for the 6502, Z80, and 6809. So when the 6809 or Z80 have enough registers for an algorithm implementation they are good. The 6809 also demonstrates excellent code density.
Re: Comparisons and contrasts
This is the subroutine to convert a subscript to an offset into an array in LUCIDATA Pascal. It was disassembled from the 6800 run-time system code.
It is called from the handler of the instruction to convert subscript(s) into an offset into an array. L_073C is called for the leftmost subscript; L_074A is called for subsequent subscripts. On entry, register X points to the subscript.
P-code instructions are four bytes in length. The first byte identifies the instruction.
The 6800 instruction dispatcher copies the second, third and fourth bytes of the instruction to memory locations $DA, $DB and $DC respectively before passing control to the instruction handler. The 6502 dispatcher names these locations InstrMode, InstrParm1 and InstrParm2 respectively.
For the instruction to calculate the offset into an array from subscript(s), the second byte is the number of dimensions, the third byte is the size of an array element and the fourth byte identifies the first entry into the subscript range table for this array.
The word at $F2 points to the first entry of the subscript range table.
A subscript range entry for an array [3..7] would look like this:
0 ; High byte of lowest allowed subscript
3 ; Low byte of lowest allowed subscript
0 ; High byte of number of subscripts in the allowed range
5 ; Low byte of number of subscripts in the allowed range
For the 6502 implementation, all multi-byte entities except for those used indirect indexed addressing are stored in big-endian format.
The subscript(s) have been pushed onto the stack in left to right order; that is, the rightmost subscript is at the top of the stack.
The stack pointer points to the last byte of the top item on the stack.
The 6800 implementation uses the two bytes following the storage for the stack pointer pseudo-register (Reg_SP) for scratch space; the 6502 implementation will do the same. In this case, it points to the byte before one of the subscripts pushed onto the stack instead of passing the address in registers.
Any obvious ways to improve it? This code must remain NMOS compatible.
Code: Select all
01318 *
01319 ; Convert subscript to offset and check that it is within range
01320 *
073C 01321 L_073C
073C 96 DC [3] 01322 ldaa $DC ; Point to first subscript range table entry
073E 48 [2] 01323 asla
073F 48 [2] 01324 asla
0740 D6 F2 [3] 01325 ldab $F2
0742 9B F3 [3] 01326 adda $F3
0744 C9 00 [2] 01327 adcb #$00
0746 D7 C6 [4] 01328 stab $C6
0748 97 C7 [4] 01329 staa $C7
01330
074A 01331 L_074A
074A DF C4 [5] 01332 stx $C4 ; Point to subscript
074C A6 01 [5] 01333 ldaa 1,X
074E E6 02 [5] 01334 ldab 2,X
0750 DE C6 [4] 01335 ldx $C6
0752 E0 01 [5] 01336 subb 1,X ; Subtract beginning of range
0754 A2 00 [5] 01337 sbca ,X
0756 2D 0B (0763) [4] 01338 blt L_0763 ; Below lower bound
01339
0758 36 [4] 01340 psha
0759 37 [4] 01341 pshb
075A E0 03 [5] 01342 subb 3,X ; Does it exceed the upper bound?
075C A2 02 [5] 01343 sbca 2,X
075E 2E 03 (0763) [4] 01344 bgt L_0763 ; Yes !!! BUG!!! this should be bge instead of bgt
01345
0760 33 [4] 01346 pulb
0761 32 [4] 01347 pula
01348
0762 39 [5] 01349 rts
01350
0763 01351 L_0763
0763 CE 0769 [3] 01352 ldx #L_0769
0766 7E 0230 [3] 01353 jmp L_0230
01354
0769 2041525241592042 01355 L_0769 fcc ' ARRAY BOUNDS OR SUB-RANGE ERROR ***'
P-code instructions are four bytes in length. The first byte identifies the instruction.
The 6800 instruction dispatcher copies the second, third and fourth bytes of the instruction to memory locations $DA, $DB and $DC respectively before passing control to the instruction handler. The 6502 dispatcher names these locations InstrMode, InstrParm1 and InstrParm2 respectively.
For the instruction to calculate the offset into an array from subscript(s), the second byte is the number of dimensions, the third byte is the size of an array element and the fourth byte identifies the first entry into the subscript range table for this array.
The word at $F2 points to the first entry of the subscript range table.
A subscript range entry for an array [3..7] would look like this:
0 ; High byte of lowest allowed subscript
3 ; Low byte of lowest allowed subscript
0 ; High byte of number of subscripts in the allowed range
5 ; Low byte of number of subscripts in the allowed range
For the 6502 implementation, all multi-byte entities except for those used indirect indexed addressing are stored in big-endian format.
The subscript(s) have been pushed onto the stack in left to right order; that is, the rightmost subscript is at the top of the stack.
The stack pointer points to the last byte of the top item on the stack.
The 6800 implementation uses the two bytes following the storage for the stack pointer pseudo-register (Reg_SP) for scratch space; the 6502 implementation will do the same. In this case, it points to the byte before one of the subscripts pushed onto the stack instead of passing the address in registers.
Code: Select all
04212 *
04213 ; Convert subscript to offset and check that it is within range
04214 *
369E 04215 L_073C
369E A5 1A [3] 04216 lda InstrParm2 ; Point to first subscript range table entry
36A0 0A [2] 04217 asl A
36A1 0A [2] 04218 asl A
36A2 18 [2] 04219 clc
36A3 65 3E [3] 04220 adc L_00F2
36A5 85 14 [3] 04221 sta Tmp
36A7 A5 3F [3] 04222 lda L_00F2+1
36A9 69 00 [2] 04223 adc #0
36AB 85 15 [3] 04224 sta Tmp+1
04225
36AD 04226 L_074A
36AD A0 02 [2] 04227 ldy #2 ; Get subscript
36AF B1 08 [5/6] 04228 lda (Reg_SP+2),Y
36B1 38 [2] 04229 sec
36B2 88 [2] 04230 dey ; Y is 1
36B3 F1 14 [5/6] 04231 sbc (Tmp),Y ; Subtract beginning of range
36B5 AA [2] 04232 tax
36B6 B1 08 [5/6] 04233 lda (Reg_SP+2),Y
36B8 88 [2] 04234 dey ; Y is 0
36B9 F1 14 [5/6] 04235 sbc (Tmp),Y
04236 ; blt L_0763 ; Below lower bound
04237 ; branches if N, V are different
36BB 70 04 (36C1) [2/3] 04238 bvs L_074AVSet
36BD 30 1A (36D9) [2/3] 04239 bmi L_0763
36BF 10 02 (36C3) [2/3] 04240 bpl L_0758
04241
36C1 04242 L_074AVSet
36C1 10 16 (36D9) [2/3] 04243 bpl L_0763
04244
36C3 04245 L_0758
36C3 48 [3] 04246 pha ; Push high byte
36C4 48 [3] 04247 pha ; Save high byte again
04248
36C5 38 [2] 04249 sec ; Does it exceed the upper bound?
36C6 A0 03 [2] 04250 ldy #3
36C8 8A [2] 04251 txa
36C9 F1 14 [5/6] 04252 sbc (Tmp),Y
36CB 88 [2] 04253 dey ; Y is 2
36CC 68 [4] 04254 pla
36CD F1 14 [5/6] 04255 sbc (Tmp),Y
04256 ; bge L_0763 ; Yes
04257 ; branches if N, V both set or N, V both clear
36CF 50 04 (36D5) [2/3] 04258 bvc L_074AVClear
36D1 30 06 (36D9) [2/3] 04259 bmi L_0763
36D3 10 02 (36D7) [2/3] 04260 bpl L_0760
04261
36D5 04262 L_074AVClear
36D5 10 02 (36D9) [2/3] 04263 bpl L_0763
04264
36D7 04265 L_0760
36D7 68 [4] 04266 pla ; Recover high byte
04267
36D8 60 [6] 04268 rts
04269
36D9 04270 L_0763
36D9 A2 E0 [2] 04271 ldx #<L_0769
36DB A0 36 [2] 04272 ldy #>L_0769
36DD 4C 3068 [3] 04273 jmp Error
04274
36E0 2041525241592042 04275 L_0769 fcc ' ARRAY BOUNDS OR SUB-RANGE ERROR ***'