Comparisons and contrasts
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
BigEd wrote:
Wow, 30% smaller code is very impressive!
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
This article confirms my results which show that the 6803 is faster than the 6809.
So the cheapest Tandy computer is faster than more expensive one!
This sounds almost unrealistic because the 6809 has a very good code density. Indeed it may be true but it is very difficult to prove.
Quote:
Our benchmark tests prove that using an 8-bit Motorola 6803 microprocessor, the MC-10 performs arithmetic computations in Basic 10% faster than the Color Computer without sacrificing accuracy.
BigEd wrote:
Wow, 30% smaller code is very impressive!
Re: Comparisons and contrasts
barrym95838 wrote:
BigEd wrote:
Wow, 30% smaller code is very impressive!
While the S and Y registers are penalized with respect to loading/storing/comparing on the 6809, there is no penalty using them for addressing.
Other than implementing FORTH, there is little need for a second stack, so I do not miss that benefit of the U register. What I do miss is having a third general purpose 16-bit register.
Where the 6801/6803/68HC11/68HC12 gains over the 6809 are:
* Single byte forms of INX/DEX/INY/DEY
* Single byte forms of push and pull of A/B/P
* One or two-byte forms of TAB/TBA/ABA/SBA/CBA instead of sequences involving the stack
As for the 30% number, from the 68HC12 manual https://www.nxp.com/docs/en/data-sheet/HCS12COREUG.pdf
Quote:
The relative size for code for 68HC11 vs. code for HCS12 has also been tested by rewriting several small programs from scratch. In these cases, the HCS12 code is typically about 30% smaller. These savings are mostly due to improved indexed addressing. A C program compiled for the HCS12 is about 30% smaller than the same program compiled for the 68HC11. The saving are largely due to improved indexing.
Re: Comparisons and contrasts
This post gave me an idea out of left field: viewtopic.php?p=90284#p90284
By coding a conditional branch to *-1, something useful can be done if the $FF opcode cooperates.
Unfortunately, only the Z80 qualifies.
by coding
a reset 38h is done if the carry flag is set.
This is a savings of one byte per use and three cycles when the branch is not taken compared to
Relative conditional branches are available for C, NC, Z, NZ.
By coding a conditional branch to *-1, something useful can be done if the $FF opcode cooperates.
Unfortunately, only the Z80 qualifies.
by coding
Code: Select all
jr C,$-1
This is a savings of one byte per use and three cycles when the branch is not taken compared to
Code: Select all
jp C,38h
Re: Comparisons and contrasts
I have added Mandelbrot results for the 8080 and 68008. It seems the 68008 is very slow. IMHO Clive Sinclair should have chosen the 8088 for his QL and later replaced it with the NEC V20. It is sad that the 68516 was not available that time. The results also show that the proper Z80 code is much faster than the 8080 code.
I have checked this emulator more carefully and found out that it emulates the prefetch queue. So it is really a great emulator. Though, IMHO, it is anyway slightly faster (about 5%) for the PC 5150 than real hardware.
a reset 38h is done if the carry flag is set.
Such tricks are also possible for the 6502 branches it gives us the NMOS 6502 undocumented INCSBC abs,X instruction. 
Some processors (the PDP-11, ARM) allow us to use PC as an ordinary register. It is quite a challenge to write something like CMPB PC,R0 and make program slightly faster and smaller.
BigEd wrote:
Sarah Walker's PCem is cycle accurate, apparently.
https://retrocomputing.stackexchange.co ... 6-emulator
https://retrocomputing.stackexchange.co ... 6-emulator
BillG wrote:
Code: Select all
jr C,$-1
Some processors (the PDP-11, ARM) allow us to use PC as an ordinary register. It is quite a challenge to write something like CMPB PC,R0 and make program slightly faster and smaller.
Re: Comparisons and contrasts
litwr wrote:
I have added Mandelbrot results for the 8080 and 68008. It seems the 68008 is very slow. IMHO Clive Sinclair should have chosen the 8088 for his QL and later replaced it with the NEC V20. It is sad that the 68516 was not available that time. The results also show that the proper Z80 code is much faster than the 8080 code.
litwr wrote:
BigEd wrote:
Sarah Walker's PCem is cycle accurate, apparently.
https://retrocomputing.stackexchange.co ... 6-emulator
https://retrocomputing.stackexchange.co ... 6-emulator
Does it offer adjustments such as adding a wait state for video memory access?
litwr wrote:
BillG wrote:
Code: Select all
jr C,$-1
jr C,$-1 is significant because it makes error checking cheaper. Structured Exception Handling (SEH) had promised to solve that problem, but that has not proved out in practice.
Re: Comparisons and contrasts
BillG wrote:
A 5% error would make it a good, not great emulator.
Does it offer adjustments such as adding a wait state for video memory access?
Does it offer adjustments such as adding a wait state for video memory access?
Re: Comparisons and contrasts
Calculating dest = constant - signed byte? Z80:
So, z80 is quite straight forward, 16b. 6809.
14 bytes, again very straight-forward.
I'm far more experienced on the Z80 than the 6502 and I have enough familiarity with the 6809 to happily code on it. The big difference between how I think when I'm coding the Z80 vs 6502 vs 6809 is as follows.
The Z80 is best thought of as a hybrid 8-bit / 16-bit CPU. So, you have quite a lot of tools (instructions) available to do things conveniently and you flip between using the 8-bit tools and the 16-bit tools. The important thing though is to arrange the tools on your workbench (registers) so they don't get in each others' way. And this is done by maintaining a priority of preferences for the 8-bit vs 16-bit tools. For the 8-bit world you need 'a' the most; then all the other regs are a cache of your most useful temps, with 'b' as the lowest priority since that's used for 'djnz' loops.
For the 16-bit world you need 'hl' the most, then 'de' (because you can swap hl and de easily); then bc, which you also need for 16-bit loops.
Because the 16-bit world sits side-by-side with the 8-bit world, you tend to want to use regs in the opposite priority order to the 16-bit regs (so they don't clash). Then, in other priority order: use the stack as a simple LIFO (which will be easier if you factor your code), use alternate regs if you can decompose the problem into 2 to 4 independent sub-problems; finally, use IX and IY if you have to deal with some infrequent, but randomly accessed data structures (since they're really inefficient).
The 6502 is kinda like the other way around. The tools (instruction set) are less convenient, but you have efficient access to a small worktop (A, X, Y) and good, comprehensive access to a larger bench top (zero-page). Also, the 6502 is more like high-altitude programming vs Z80 jungle type programming: clear and austere vs messy, but plentiful. On the 6502 you have a much bigger set of temps; much easier access to random data via indexed and indirect modes; more challenges when it comes to 16-bit operations and again, I would prioritise X and Y for loop access. This means that problems get restructured accordingly.
I find the best way to think about the 6809 is as a nearly 16-bit, high-level language oriented CPU and you're the compiler. So, you have a straight-forward stack model and the stack is your friend. In addition, you have 3 index registers for some combination of data structures, pointers and loop counters. So, it's easy to code on a 6809, but the code is only going to be moderately optimised. Unlike the 6502 and Z80 where effort scales quite well with optimisation, a 6809 requires disproportionate effort for moderate optimisation, so it's far less likely to be worth it.
Anyway - that's just my personal take
Code: Select all
ld hl,(S1) ;don't care about h, contains garbage.
ld a,l
rla
sbc a,a ;=> a-a-0 or a-a-1 ie. 0 or 0xff.
ld h,a ;hl now sign extended.
ld de,258
or a ;clears carry.
sbc hl,de ;hl=258-S1
ld (dest),hl ;saved
Code: Select all
ldd #258
pshs d
ldb s1
sex
sub d,S++
std dest
I'm far more experienced on the Z80 than the 6502 and I have enough familiarity with the 6809 to happily code on it. The big difference between how I think when I'm coding the Z80 vs 6502 vs 6809 is as follows.
The Z80 is best thought of as a hybrid 8-bit / 16-bit CPU. So, you have quite a lot of tools (instructions) available to do things conveniently and you flip between using the 8-bit tools and the 16-bit tools. The important thing though is to arrange the tools on your workbench (registers) so they don't get in each others' way. And this is done by maintaining a priority of preferences for the 8-bit vs 16-bit tools. For the 8-bit world you need 'a' the most; then all the other regs are a cache of your most useful temps, with 'b' as the lowest priority since that's used for 'djnz' loops.
For the 16-bit world you need 'hl' the most, then 'de' (because you can swap hl and de easily); then bc, which you also need for 16-bit loops.
Because the 16-bit world sits side-by-side with the 8-bit world, you tend to want to use regs in the opposite priority order to the 16-bit regs (so they don't clash). Then, in other priority order: use the stack as a simple LIFO (which will be easier if you factor your code), use alternate regs if you can decompose the problem into 2 to 4 independent sub-problems; finally, use IX and IY if you have to deal with some infrequent, but randomly accessed data structures (since they're really inefficient).
The 6502 is kinda like the other way around. The tools (instruction set) are less convenient, but you have efficient access to a small worktop (A, X, Y) and good, comprehensive access to a larger bench top (zero-page). Also, the 6502 is more like high-altitude programming vs Z80 jungle type programming: clear and austere vs messy, but plentiful. On the 6502 you have a much bigger set of temps; much easier access to random data via indexed and indirect modes; more challenges when it comes to 16-bit operations and again, I would prioritise X and Y for loop access. This means that problems get restructured accordingly.
I find the best way to think about the 6809 is as a nearly 16-bit, high-level language oriented CPU and you're the compiler. So, you have a straight-forward stack model and the stack is your friend. In addition, you have 3 index registers for some combination of data structures, pointers and loop counters. So, it's easy to code on a 6809, but the code is only going to be moderately optimised. Unlike the 6502 and Z80 where effort scales quite well with optimisation, a 6809 requires disproportionate effort for moderate optimisation, so it's far less likely to be worth it.
Anyway - that's just my personal take
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
Thank you for sharing, Snial. I have sincere respect for those who can write good code for the (z)8xxx (the HL style) and 6xxx (the ,X ,Y style) by shifting a gear in their brain. My brain doesn't have that gear, and (z)8xxx code inevitably winds up giving me a headache if I study it too long. I would try to re-code your example for the 65xx (definitely using the 6809 code as the source), but I have to get to work. Maybe tonight ...
[Edit: ... or during a break in the action at work.
6502:19 bytes.
If you don't need to save the effective address for later, you can let the 6502 calculate it on-the-fly:8 bytes.
(I think you're calculating S1-const instead of const-S1, but I've been wrong before ...)
]
[Edit: ... or during a break in the action at work.
6502:
Code: Select all
; dest=const-S1
lda #<const
sec
sbc S1
sta dest
lda #>const
sbc #0
sta dest+1
bit S1
bpl label
inc dest+1 ; adjust for S1<0
label:
...
If you don't need to save the effective address for later, you can let the 6502 calculate it on-the-fly:
Code: Select all
lda S1 ; signed 8-bit displacement
eor #$7f
tay
lda const-$7f,y ; access target (const-S1)
(I think you're calculating S1-const instead of const-S1, but I've been wrong before ...)
]
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
Some time ago, I posted implementations of binary to decimal conversion using the Double Dabble algorithm for several microprocessors:
https://talk.dallasmakerspace.org/t/pro ... /18852/323
https://en.wikipedia.org/wiki/Double_dabble
Rudla posted code to convert a binary number to packed BCD:
viewtopic.php?f=2&t=6579
I recognized that his code bore a definite resemblance to Double Dabble, but it was clearly not the same. Studying his implementation allowed me to step back and see the forest from the trees of the details of the Double Dabble algorithm. Improvements from a straight interpretation of Double Dabble include:
* all of that testing of and adding to individual BCD digits is merely a way to perform a "decimal adjustment" when the hardware does not directly support it. Hardware supporting decimal adjustment can combine the shifting and fiddling by adding to itself (doubling) with the decimal arithmetic mode built into the 6502 or a DAA decimal adjust instruction following the addition on other processors.
* shifting only one byte of the number being converted at a time instead of shifting the entire number. The savings become much more significant for larger numbers.
* growing the result as digits are added instead of needlessly shifting an ocean of zeroes in a large pre-cleared buffer.
I improveed on Rudla's implementation by starting with no bytes of result instead of one byte containing zero, saving up to seven shifts of the byte. I also did a minor tweak improving the innermost loop.
Using what I learned, I rewrote my conversion code from scratch, making it faster and simpler in the process.
My design is based on maximum reuse of memory and is intended to produce a printable character string instead of just a packed BCD number. Because the maximum number to be converted is eight bytes, the buffer is placed in the zero-page for maximum efficiency. The number to be converted is copied to the low end of the output buffer with the least-significant byte at the lowest address. The converted BCD digits grow downward from the end of the buffer and do not overlap the number until after its bytes are already consumed by the shifting. Because the largest number to be converted is 64 bits, it was reasonable to place the buffer in the zero page.
While I will not claim that these are the fastest routines to convert 32 and 64 bit integers to ASCII decimal strings, I am confident they are among the best.
Someone posted code to do decimal conversion to this forum including the time to convert $FFFFFFFF, but I cannot find it now to compare.
The 6502 version:
The scorecard:
The addressing modes and the three byte-sized registers worked well for this implementation.
The 6800 version:
The scorecard:
The 6800 did not fare so well with only one index register and relatively slow memory access instructions.
The added capabilities of the 6809 do not appear to be much better.
The 8080 version:
The scorecard:
Two sets of new numbers are reported. One for the initial version and another for after some intense code golf managed to eliminate variables in memory.
I actually wrote the 8080 version first, then moved on to the 6800 and then the 6502. Only then did I invest the effort to optimize this one.
https://talk.dallasmakerspace.org/t/pro ... /18852/323
https://en.wikipedia.org/wiki/Double_dabble
Rudla posted code to convert a binary number to packed BCD:
viewtopic.php?f=2&t=6579
I recognized that his code bore a definite resemblance to Double Dabble, but it was clearly not the same. Studying his implementation allowed me to step back and see the forest from the trees of the details of the Double Dabble algorithm. Improvements from a straight interpretation of Double Dabble include:
* all of that testing of and adding to individual BCD digits is merely a way to perform a "decimal adjustment" when the hardware does not directly support it. Hardware supporting decimal adjustment can combine the shifting and fiddling by adding to itself (doubling) with the decimal arithmetic mode built into the 6502 or a DAA decimal adjust instruction following the addition on other processors.
* shifting only one byte of the number being converted at a time instead of shifting the entire number. The savings become much more significant for larger numbers.
* growing the result as digits are added instead of needlessly shifting an ocean of zeroes in a large pre-cleared buffer.
I improveed on Rudla's implementation by starting with no bytes of result instead of one byte containing zero, saving up to seven shifts of the byte. I also did a minor tweak improving the innermost loop.
Using what I learned, I rewrote my conversion code from scratch, making it faster and simpler in the process.
My design is based on maximum reuse of memory and is intended to produce a printable character string instead of just a packed BCD number. Because the maximum number to be converted is eight bytes, the buffer is placed in the zero-page for maximum efficiency. The number to be converted is copied to the low end of the output buffer with the least-significant byte at the lowest address. The converted BCD digits grow downward from the end of the buffer and do not overlap the number until after its bytes are already consumed by the shifting. Because the largest number to be converted is 64 bits, it was reasonable to place the buffer in the zero page.
While I will not claim that these are the fastest routines to convert 32 and 64 bit integers to ASCII decimal strings, I am confident they are among the best.
Someone posted code to do decimal conversion to this forum including the time to convert $FFFFFFFF, but I cannot find it now to compare.
The 6502 version:
Code: Select all
. 00191 ; Input:
. 00192 ; The absolute value of the number to convert is stored
. 00193 ; starting at OutBuf+2 in little-endian form
. 00194 ; A = number of bytes in the number
. 00195 ; Dabble is the label of last byte of the buffer
. 00196 ; OutBuf+0 = 0 if number is negative, other if negative
. 00197 ;
. 00198 ; Output:
. 00199 ; A:X contains the address of the resultant string
. 00200 ;
. 00201 ; Uses:
. 00202 ; Byt0 = number of bytes to convert
. 00203 ; Byt1 = number of packed BCD bytes in result
. 00204 ; Byt2 = bits of number being shifted
Code: Select all
.206E 00225 Format:
.206E 85 12 [3] 00226 sta Byt0 ; Remember number of bytes to convert
.2070 AA [2] 00227 tax ; Use it as index into the number
. 00228
.2071 A9 00 [2] 00229 lda #0 ; Start with no bytes of BCD digits
.2073 85 13 [3] 00230 sta Byt1
. 00231
.2075 F8 [2] 00232 sed ; Switch to decimal mode
. 00233
.2076 00234 SkipLeadingZero:
.2076 B5 19 [4] 00235 lda OutBuf+1,X ; Get next highest byte of the number
.2078 D0 07 (2081) [2/3] 00236 bne NotLeadingZero
. 00237
.207A CA [2] 00238 dex ; Skip a leading zero
. 00239
.207B D0 F9 (2076) [2/3] 00240 bne SkipLeadingZero ; Check the next one if more
. 00241
.207D F0 2F (20AE) [2/3] 00242 beq BCD2ASCII ; All bytes are zero
. 00243
.207F 00244 ByteLoop:
.207F B5 19 [4] 00245 lda OutBuf+1,X ; Get next highest byte of the number
. 00246
.2081 00247 NotLeadingZero:
.2081 85 14 [3] 00248 sta Byt2 ; Set it for shifting
. 00249
.2083 86 12 [3] 00250 stx Byt0 ; Remember the address of the last one shifted
. 00251
.2085 38 [2] 00252 sec ; Shift in extra bit for eight trips through loop
. 00253
.2086 B0 12 (209A) [2/3] 00254 bcs EnterBitLoop ; Always branches
. 00255
.2088 00256 BitLoop:
.2088 A2 2D [2] 00257 ldx #Dabble ; Point to the BCD digits
. 00258
.208A A4 13 [3] 00259 ldy Byt1 ; Load count of number of digit bytes
.208C F0 0A (2098) [2/3] 00260 beq SkipDigitLoop ; Skip loop if none
. 00261
.208E 00262 DigitLoop:
.208E B5 00 [4] 00263 lda 0,X ; Double pair of BCD digits with carry
.2090 75 00 [4] 00264 adc 0,X
.2092 95 00 [4] 00265 sta 0,X
. 00266
.2094 CA [2] 00267 dex ; Point to next pair of BCD digits
. 00268
.2095 00269 IntoDigitLoop:
.2095 88 [2] 00270 dey ; More BCD digits?
.2096 D0 F6 (208E) [2/3] 00271 bne DigitLoop ; Yes
. 00272
.2098 00273 SkipDigitLoop:
.2098 B0 0B (20A5) [2/3] 00274 bcs Extend ; Carry out of accumulated digits?
. 00275
.209A 00276 EnterBitLoop:
.209A 26 14 [5] 00277 rol Byt2 ; Transfer upper bit to carry flag
.209C D0 EA (2088) [2/3] 00278 bne BitLoop ; Repeat if more
. 00279
.209E A6 12 [3] 00280 ldx Byt0 ; Point back to the number to convert
. 00281
.20A0 CA [2] 00282 dex ; Any more bytes of the number to convert?
.20A1 D0 DC (207F) [2/3] 00283 bne ByteLoop ; Repeat if more
. 00284
.20A3 F0 09 (20AE) [2/3] 00285 beq BCD2ASCII ; Always branches to unpacking stage
. 00286
.20A5 00287 Extend:
.20A5 E6 13 [5] 00288 inc Byt1 ; Add a new byte to the result
.20A7 A9 01 [2] 00289 lda #1
.20A9 95 00 [4] 00290 sta 0,X
. 00291
.20AB 18 [2] 00292 clc
.20AC 90 EC (209A) [2/3] 00293 bcc EnterBitLoop ; Always branches
. 00294
.20AE 00295 BCD2ASCII:
.20AE D8 [2] 00296 cld ; Return to binary mode
. 00297
.20AF A5 13 [3] 00298 lda Byt1 ; Any converted digits?
.20B1 F0 42 (20F5) [2/3] 00299 beq NoDigits ; No
. 00300
.20B3 38 [2] 00301 sec
.20B4 A9 2E [2] 00302 lda #Dabble+1 ; Point to most significant BCD pair
.20B6 E5 13 [3] 00303 sbc Byt1
.20B8 A8 [2] 00304 tay
. 00305
.20B9 A2 1A [2] 00306 ldx #<OutBuf+2 ; Point to result buffer
. 00307
.20BB B9 0000 [4/5] 00308 lda 0,Y ; Get most significant BCD pair
. 00309
.20BE 29 F0 [2] 00310 and #$F0 ; Is the upper one zero?
.20C0 F0 0E (20D0) [2/3] 00311 beq UpperZero
.20C2 D0 03 (20C7) [2/3] 00312 bne ConvertBoth
. 00313
.20C4 00314 ASCIILoop:
.20C4 B9 0000 [4/5] 00315 lda 0,Y ; Get next most significant BCD pair
. 00316
.20C7 00317 ConvertBoth:
.20C7 4A [2] 00318 lsr A ; Convert upper one to ASCII
.20C8 4A [2] 00319 lsr A
.20C9 4A [2] 00320 lsr A
.20CA 4A [2] 00321 lsr A
.20CB 09 30 [2] 00322 ora #'0'
.20CD 95 00 [4] 00323 sta 0,X
.20CF E8 [2] 00324 inx
. 00325
.20D0 00326 UpperZero:
.20D0 B9 0000 [4/5] 00327 lda 0,Y
.20D3 C8 [2] 00328 iny
.20D4 29 0F [2] 00329 and #$F ; And then the lower one
.20D6 09 30 [2] 00330 ora #'0'
.20D8 95 00 [4] 00331 sta 0,X
.20DA E8 [2] 00332 inx
. 00333
.20DB C6 13 [5] 00334 dec Byt1 ; More digits?
.20DD D0 E5 (20C4) [2/3] 00335 bne ASCIILoop ; Yes
. 00336
.20DF 8A [2] 00337 txa ; Store string length
.20E0 38 [2] 00338 sec
.20E1 E9 1A [2] 00339 sbc #OutBuf+2
. 00340
.20E3 A6 18 [3] 00341 ldx OutBuf ; Is the number negative?
.20E5 F0 14 (20FB) [2/3] 00342 beq NumberNotNegative
. 00343
.20E7 18 [2] 00344 clc ; Increment and store length
.20E8 69 01 [2] 00345 adc #1
.20EA 85 18 [3] 00346 sta OutBuf
. 00347
.20EC A9 2D [2] 00348 lda #'-' ; Prepend minus sign
.20EE 85 19 [3] 00349 sta OutBuf+1
. 00350
.20F0 A9 00 [2] 00351 lda #>OutBuf
.20F2 A2 18 [2] 00352 ldx #<OutBuf
. 00353
.20F4 60 [6] 00354 rts
. 00355
.20F5 00356 NoDigits:
.20F5 A9 30 [2] 00357 lda #'0' ; Emit single '0'
.20F7 85 1A [3] 00358 sta OutBuf+2
. 00359
.20F9 A9 01 [2] 00360 lda #1 ; String length one
. 00361
.20FB 00362 NumberNotNegative:
.20FB 85 19 [3] 00363 sta OutBuf+1
. 00364
.20FD A9 00 [2] 00365 lda #>OutBuf+1
.20FF A2 19 [2] 00366 ldx #<OutBuf+1
. 00367
.2101 60 [6] 00368 rts
Code: Select all
00001 ; Converting $FFFFFFFF
00002 ; old $2BC9 11209 cycles for call FormatLongword
00003 ; new $0A72 2674 cycles
00004 ;
00005 ; Converting $00000001
00006 ; old $2A96 10902 cycles for call FormatLongword
00007 ; new $015B 347 cycles
00008 ;
00009 ; Converting $00000000
00010 ; old $2A98 10904 cycles for call FormatLongword
00011 ; new $0075 117 cycles
The 6800 version:
Code: Select all
. 00486 * Input:
. 00487 * The absolute value of the number to convert is stored
. 00488 * starting at OutBuf+2 in little-endian form
. 00489 * A = number of bytes in the number
. 00490 * Dabble is the label of last byte of the buffer
. 00491 * OutBuf+0 = 0 if number is negative, other if negative
. 00492 *
. 00493 * Output:
. 00494 * X contains the address of the resultant string
. 00495 *
. 00496 * Uses:
. 00497 * Byt0 = number of bytes to convert
. 00498 * Byt1 = number of packed BCD bytes in result
. 00499 * Byt2 = bits of number being shifted
. 00500 * Ptr0
. 00501 * Ptr1
Code: Select all
.01F5 00522 Format
.01F5 97 10 [4] 00523 staa Byt0 ; Remember number of bytes to convert
. 00524
.01F7 8B 17 [2] 00525 adda #<OutBuf+1 ; Point to most significant byte of number
.01F9 97 0B [4] 00526 staa Ptr0+1
.01FB 86 00 [2] 00527 ldaa #>OutBuf+1
.01FD 89 00 [2] 00528 adca #0
.01FF 97 0A [4] 00529 staa Ptr0
.0201 DE 0A [4] 00530 ldx Ptr0
. 00531
.0203 7F 0011 [6] 00532 clr Byt1 ; Start with no bytes of BCD digits
. 00533
.0206 00534 SkipLeadingZero
.0206 E6 00 [5] 00535 ldab ,X ; Get next highest byte of the number
.0208 26 0D (0217) [4] 00536 bne NotLeadingZero
. 00537
.020A 09 [4] 00538 dex ; Skip a leading zero
. 00539
.020B 7A 0010 [6] 00540 dec Byt0
.020E 26 F6 (0206) [4] 00541 bne SkipLeadingZero ; Check the next one if more
. 00542
.0210 20 32 (0244) [4] 00543 bra BCD2ASCII ; All bytes are zero
. 00544
.0212 00545 ByteLoop
.0212 DE 0A [4] 00546 ldx Ptr0 ; Point back to the number to convert
.0214 09 [4] 00547 dex ; Point to next highest byte
. 00548
.0215 E6 00 [5] 00549 ldab ,X ; Get next highest byte of the number
. 00550
.0217 00551 NotLeadingZero
.0217 D7 12 [4] 00552 stab Byt2 ; Set it for shifting
. 00553
.0219 DF 0A [5] 00554 stx Ptr0 ; Remember the address of the last one shifted
. 00555
.021B 0D [2] 00556 sec ; Shift in extra bit for eight trips through loop
. 00557
.021C 20 1C (023A) [4] 00558 bra EnterBitLoop
. 00559
.021E 00560 BitLoop
.021E CE 002B [3] 00561 ldx #Dabble ; Point to the BCD digits
. 00562
.0221 D6 11 [3] 00563 ldab Byt1 ; Load count of number of digit bytes
.0223 27 0B (0230) [4] 00564 beq SkipDigitLoop ; Skip loop if none
. 00565
.0225 00566 DigitLoop
.0225 A6 00 [5] 00567 ldaa ,X ; Double pair of BCD digits with carry
.0227 A9 00 [5] 00568 adca ,X
.0229 19 [2] 00569 daa
.022A A7 00 [6] 00570 staa ,X
. 00571
.022C 09 [4] 00572 dex ; Point to next pair of BCD digits
. 00573
.022D 00574 IntoDigitLoop
.022D 5A [2] 00575 decb ; More BCD digits?
.022E 26 F5 (0225) [4] 00576 bne DigitLoop ; Yes
. 00577
.0230 00578 SkipDigitLoop
.0230 24 08 (023A) [4] 00579 bcc EnterBitLoop ; Carry out of accumulated digits?
. 00580
.0232 7C 0011 [6] 00581 inc Byt1 ; Add a new byte to the result
.0235 86 01 [2] 00582 ldaa #1
.0237 A7 00 [6] 00583 staa ,X
. 00584
.0239 0C [2] 00585 clc
. 00586
.023A 00587 EnterBitLoop
.023A 79 0012 [6] 00588 rol Byt2 ; Transfer upper bit to carry flag
.023D 26 DF (021E) [4] 00589 bne BitLoop ; Repeat if more
. 00590
.023F 7A 0010 [6] 00591 dec Byt0 ; Any more bytes of the number to convert?
.0242 26 CE (0212) [4] 00592 bne ByteLoop ; Repeat if more
. 00593
.0244 00594 BCD2ASCII
.0244 96 11 [3] 00595 ldaa Byt1 ; Any converted digits?
.0246 27 57 (029F) [4] 00596 beq NoDigits ; No
. 00597
.0248 86 2C [2] 00598 ldaa #<Dabble+1 ; Point to most significant BCD pair
.024A 90 11 [3] 00599 suba Byt1
.024C 97 0B [4] 00600 staa Ptr0+1
.024E 86 00 [2] 00601 ldaa #>Dabble+1
.0250 82 00 [2] 00602 sbca #0
.0252 97 0A [4] 00603 staa Ptr0
. 00604
.0254 CE 0018 [3] 00605 ldx #OutBuf+2 ; Point to result buffer
.0257 DF 0C [5] 00606 stx Ptr1
. 00607
.0259 DE 0A [4] 00608 ldx Ptr0 ; Get most significant BCD pair
.025B A6 00 [5] 00609 ldaa ,X
.025D 08 [4] 00610 inx
.025E DF 0A [5] 00611 stx Ptr0
.0260 DE 0C [4] 00612 ldx Ptr1
. 00613
.0262 16 [2] 00614 tab ; Clone it
. 00615
.0263 85 F0 [2] 00616 bita #$F0 ; Is the upper one zero?
.0265 27 17 (027E) [4] 00617 beq ConvertLowerOnly
.0267 20 0C (0275) [4] 00618 bra ConvertBoth
. 00619
.0269 00620 ASCIILoop
.0269 DF 0C [5] 00621 stx Ptr1
.026B DE 0A [4] 00622 ldx Ptr0 ; Get next most significant BCD pair
.026D A6 00 [5] 00623 ldaa ,X
.026F 08 [4] 00624 inx
.0270 DF 0A [5] 00625 stx Ptr0
.0272 DE 0C [4] 00626 ldx Ptr1
. 00627
.0274 16 [2] 00628 tab ; Clone it
. 00629
.0275 00630 ConvertBoth
.0275 44 [2] 00631 lsra ; Convert upper one to ASCII
.0276 44 [2] 00632 lsra
.0277 44 [2] 00633 lsra
.0278 44 [2] 00634 lsra
.0279 8A 30 [2] 00635 oraa #'0'
.027B A7 00 [6] 00636 staa ,X
.027D 08 [4] 00637 inx
. 00638
.027E 00639 ConvertLowerOnly
.027E C4 0F [2] 00640 andb #$F ; And then the lower one
.0280 CA 30 [2] 00641 orab #'0'
.0282 E7 00 [6] 00642 stab ,X
.0284 08 [4] 00643 inx
. 00644
.0285 7A 0011 [6] 00645 dec Byt1 ; More BCD pairs?
.0288 26 DF (0269) [4] 00646 bne ASCIILoop
. 00647
.028A DF 0C [5] 00648 stx Ptr1 ; Determine string length
.028C 96 0D [3] 00649 ldaa Ptr1+1
.028E 80 18 [2] 00650 suba #<OutBuf+2
. 00651
.0290 D6 16 [3] 00652 ldab OutBuf ; Is the number negative?
.0292 27 11 (02A5) [4] 00653 beq NumberNotNegative
. 00654
.0294 4C [2] 00655 inca ; Increment and store length byte
.0295 97 16 [4] 00656 staa OutBuf
. 00657
.0297 C6 2D [2] 00658 ldab #'-' ; Prepend minus sign
.0299 D7 17 [4] 00659 stab OutBuf+1
. 00660
.029B CE 0016 [3] 00661 ldx #OutBuf ; Point to the result
. 00662
.029E 39 [5] 00663 rts
. 00664
.029F 00665 NoDigits
.029F 86 30 [2] 00666 ldaa #'0' ; Emit single '0'
.02A1 97 18 [4] 00667 staa OutBuf+2
. 00668
.02A3 86 01 [2] 00669 ldaa #1 ; String length one
. 00670
.02A5 00671 NumberNotNegative
.02A5 97 17 [4] 00672 staa OutBuf+1
. 00673
.02A7 CE 0017 [3] 00674 ldx #OutBuf+1
. 00675
.02AA 39 [5] 00676 rts
Code: Select all
00001 * Converting $FFFFFFFF
00002 * old $38F0 14576 cycles for call FormatLongword
00003 * new $0FED 4077 cycles
00004 *
00005 * Converting $00000001
00006 * old $37B3 14259 cycles for call FormatLongword
00007 * new $01F2 498 cycles
00008 *
00009 * Converting $00000000
00010 * old $37B0 14256 cycles for call FormatLongword
00011 * new $00B9 185 cycles
The added capabilities of the 6809 do not appear to be much better.
The 8080 version:
Code: Select all
. 00127 ; Input:
. 00128 ; The absolute value of the number to convert is stored
. 00129 ; starting at OutBuf+2 in little-endian form
. 00130 ; A = number of bytes in the number
. 00131 ; B = 0 if number is positive, other if negative
. 00132 ; Dabble is the label of last byte of the buffer
. 00133 ;
. 00134 ; Output:
. 00135 ; HL contains the address of the resultant string
Code: Select all
.0173 00156 Format:
.0173 4F [5] 00157 mov C,A ; Remember number of bytes to convert
.0174 5F [5] 00158 mov E,A
. 00159
.0175 78 [5] 00160 mov A,B ; Remember negativity
.0176 32 0333 [13] 00161 sta OutBuf
. 00162
.0179 AF [4] 00163 xra A ; Start with no bytes of BCD digits
.017A 47 [5] 00164 mov B,A
. 00165
.017B 21 0334 [10] 00166 lxi H,OutBuf+1 ; Point DE to most significant byte of number
.017E 57 [5] 00167 mov D,A
.017F 19 [10] 00168 dad D
.0180 EB [4] 00169 xchg
. 00170
.0181 00171 SkipLeadingZero:
.0181 1A [7] 00172 ldax D ; Get next highest byte of the number
.0182 B7 [4] 00173 ora A
.0183 C2 018F [10] 00174 jnz NotLeadingZero
. 00175
.0186 1B [5] 00176 dcx D ; Skip a leading zero
. 00177
.0187 0D [5] 00178 dcr C
.0188 C2 0181 [10] 00179 jnz SkipLeadingZero ; Check the next one if more
. 00180
.018B C3 01BC [10] 00181 jmp BIN2ASCII ; All bytes are zero
. 00182
.018E 00183 ByteLoop:
.018E 1A [7] 00184 ldax D ; Get next highest byte of the number
. 00185
.018F 00186 NotLeadingZero:
.018F 1B [5] 00187 dcx D ; Point to next lower byte
. 00188
.0190 D5 [11] 00189 push D ; Stash pointer to next byte
. 00190
.0191 5F [5] 00191 mov E,A ; Set it for shifting
. 00192
.0192 37 [4] 00193 stc ; Shift in extra bit for eight times thru loop
. 00194
.0193 C3 01AF [10] 00195 jmp EnterBitLoop
. 00196
.0196 00197 BitLoop:
.0196 21 0348 [10] 00198 lxi H,Dabble ; Point to the BCD digits
. 00199
.0199 04 [5] 00200 inr B ; Check count of number of digit bytes
.019A 05 [5] 00201 dcr B ; Test for zero but preserve carry flag
.019B CA 01A8 [10] 00202 jz SkipDigitLoop ; Skip loop if none
. 00203
.019E 50 [5] 00204 mov D,B
. 00205
.019F 00206 DigitLoop:
.019F 7E [7] 00207 mov A,M ; Double pair of BCD digits with carry
.01A0 8E [7] 00208 adc M
.01A1 27 [4] 00209 daa
.01A2 77 [7] 00210 mov M,A
. 00211
.01A3 2B [5] 00212 dcx H ; Point to next pair of BCD digits
. 00213
.01A4 00214 IntoDigitLoop:
.01A4 15 [5] 00215 dcr D ; More BCD digits?
.01A5 C2 019F [10] 00216 jnz DigitLoop ; Yes
. 00217
.01A8 00218 SkipDigitLoop:
.01A8 D2 01AF [10] 00219 jnc EnterBitLoop ; Carry out of accumulated digits?
. 00220
.01AB 04 [5] 00221 inr B ; Add new byte to the result
.01AC 36 01 [10] 00222 mvi M,1
. 00223
.01AE B7 [4] 00224 ora A ; Clear carry flag
. 00225
.01AF 00226 EnterBitLoop:
.01AF 7B [5] 00227 mov A,E
.01B0 17 [4] 00228 ral ; Transfer upper bit to carry flag
.01B1 5F [5] 00229 mov E,A
. 00230
.01B2 1C [5] 00231 inr E
.01B3 1D [5] 00232 dcr E
.01B4 C2 0196 [10] 00233 jnz BitLoop ; Repeat if more
. 00234
.01B7 D1 [10] 00235 pop D ; Recover pointer to next byte
. 00236
.01B8 0D [5] 00237 dcr C ; Any more bytes of the number to convert?
.01B9 C2 018E [10] 00238 jnz ByteLoop ; Repeat if more
. 00239
.01BC 00240 BIN2ASCII:
.01BC 21 0335 [10] 00241 lxi H,OutBuf+2 ; Point to result buffer
. 00242
.01BF 78 [5] 00243 mov A,B ; Any converted digits?
.01C0 B7 [4] 00244 ora A
.01C1 CA 01F4 [10] 00245 jz NoDigits ; No
. 00246
.01C4 11 0349 [10] 00247 lxi D,Dabble+1 ; Point to most significant BCD pair
.01C7 7B [5] 00248 mov A,E
.01C8 90 [4] 00249 sub B
.01C9 5F [5] 00250 mov E,A
.01CA 7A [5] 00251 mov A,D
.01CB DE 00 [7] 00252 sbi 0
.01CD 57 [5] 00253 mov D,A
. 00254
.01CE 1A [7] 00255 ldax D ; Get most significant BCD pair
. 00256
.01CF E6 F0 [7] 00257 ani 0F0h ; Is the upper one zero?
.01D1 CA 01DF [10] 00258 jz ConvertLowerOnly
. 00259
.01D4 00260 ASCIILoop:
.01D4 1A [7] 00261 ldax D ; Get next most significant BCD pair
.01D5 0F [4] 00262 rrc ; Convert upper one to ASCII
.01D6 0F [4] 00263 rrc
.01D7 0F [4] 00264 rrc
.01D8 0F [4] 00265 rrc
.01D9 E6 0F [7] 00266 ani 0Fh
.01DB F6 30 [7] 00267 ori '0'
.01DD 77 [7] 00268 mov M,A
.01DE 23 [5] 00269 inx H
. 00270
.01DF 00271 ConvertLowerOnly:
.01DF 1A [7] 00272 ldax D ; And then the lower one
.01E0 13 [5] 00273 inx D
.01E1 E6 0F [7] 00274 ani 0Fh
.01E3 F6 30 [7] 00275 ori '0'
.01E5 77 [7] 00276 mov M,A
.01E6 23 [5] 00277 inx H
. 00278
.01E7 05 [5] 00279 dcr B
.01E8 C2 01D4 [10] 00280 jnz ASCIILoop
. 00281
.01EB 7D [5] 00282 mov A,L ; Store string length
.01EC DE 35 [7] 00283 sbi OutBuf+2 and 0FFh
.01EE 32 0334 [13] 00284 sta OutBuf+1
. 00285
.01F1 C3 01F9 [10] 00286 jmp FinishFormat
. 00287
.01F4 00288 NoDigits:
.01F4 36 30 [10] 00289 mvi M,'0' ; Emit single '0'
. 00290
.01F6 2B [5] 00291 dcx H ; String length one
.01F7 36 01 [10] 00292 mvi M,1
. 00293
.01F9 00294 FinishFormat:
.01F9 21 0334 [10] 00295 lxi H,OutBuf+1 ; Point to the result
. 00296
.01FC 3A 0333 [13] 00297 lda OutBuf ; Was number negative?
.01FF B7 [4] 00298 ora A
.0200 CA 0209 [10] 00299 jz NumberNotNegative ; No
. 00300
.0203 7E [7] 00301 mov A,M ; Get and increment length byte
.0204 3C [5] 00302 inr A
. 00303
.0205 36 2D [10] 00304 mvi M,'-' ; Prepend minus sign
. 00305
.0207 2B [5] 00306 dcx H ; Point to the new result
. 00307
.0208 77 [7] 00308 mov M,A ; Store new length
. 00309
.0209 00310 NumberNotNegative:
.0209 C9 [10] 00311 ret
Code: Select all
00001 ; Converting $FFFFFFFF
00002 ; old $7F53 32595 cycles for call FormatLongword
00003 ; new $2359 9049 cycles
00004 ; new $1F18 7960 cycles
00005 ;
00006 ; Converting $00000001
00007 ; old $7CEE 31982 cycles for call FormatLongword
00008 ; new $055A 1370 cycles
00009 ; new $04C5 1221 cycles
00010 ;
00011 ; Converting $00000000
00012 ; old $7D05 32005 cycles for call FormatLongword
00013 ; new $01E0 480 cycles
00014 ; new $01A2 418 cycles
I actually wrote the 8080 version first, then moved on to the 6800 and then the 6502. Only then did I invest the effort to optimize this one.
Re: Comparisons and contrasts
I see you continuing a discussion started near the bottom of the first page of this thread.
The use of sbc HL,DE provides only a slight speed advantage over a bunch of atomic 8080 instructions doing the same thing but it does save on code size.
Wait a minute. That subtracts 258 from s1...
I think you meant
The use of the stack on the 6809 is not very fast. A temporary variable is often faster.
This brings up one of my favorite programming jokes: The Motorola 6809, where SEX is sometimes followed by STD.
That is the hidden gem of the 8080 and Z80 world. I wish they also provided HL <-> BC
Snial wrote:
Calculating dest = constant - signed byte? Z80:
So, z80 is quite straight forward, 16b.
Code: Select all
ld hl,(S1) ;don't care about h, contains garbage.
ld a,l
rla
sbc a,a ;=> a-a-0 or a-a-1 ie. 0 or 0xff.
ld h,a ;hl now sign extended.
ld de,258
or a ;clears carry.
sbc hl,de ;hl=258-S1
ld (dest),hl ;saved
Snial wrote:
6809.
14 bytes, again very straight-forward.
Code: Select all
ldd #258
pshs d
ldb s1
sex
sub d,S++
std dest
I think you meant
Code: Select all
ldb s1
sex
pshs d
ldd #258
sub d,S++
std dest
This brings up one of my favorite programming jokes: The Motorola 6809, where SEX is sometimes followed by STD.
Snial wrote:
(because you can swap hl and de easily)
Re: Comparisons and contrasts
> Two sets of new numbers are reported. One for the initial version and another for after some intense code golf managed to eliminate variables in memory.
Great result and thanks for the detailed explanation too.
Great result and thanks for the detailed explanation too.
Re: Comparisons and contrasts
I came across some benchmarks to compare Millfork with other compilers:
https://github.com/KarolS/millfork-benc ... aster/6502
The linked list one caught my attention. It creates a list of 3000 nodes then traverses it to add up the values.
My implementions are somewhat like the one provided in C except that I do not use his timing mechanism and the list creation loop counts down instead of up. My intent is not to compare with the compilers but between several different processors.
https://github.com/KarolS/millfork-benc ... nkedlist.c
I am slowly relenting to the prospect that a loop ending with an unconditional branch back to the top often performs better by transforming it to enter in the middle. The result looks unstructured, but it is not very different.
The first one is for the 6800. I have found that it is easiest to adopt 6800 code to other processors instead of the other way around.
The dual accumulators and 16-bit index register with displacement is especially well suited for this challenge.
Though inlining the subroutines provided a substantial improvement, I am showing the code before that final transformation for clarity.
The 6502 had trouble keeping up with the others. Am I missing something?
The 8080 was difficult but benefited from abundant registers. Remember that a memory cycle of the 8080 is three machine cycles; it is one on the others, so 8080 machines usually ran at a two to three times faster clock rate.
The 6809 ran away with this challenge. The additional registers and flexible addressing modes gave it a major advantage.
https://github.com/KarolS/millfork-benc ... aster/6502
The linked list one caught my attention. It creates a list of 3000 nodes then traverses it to add up the values.
My implementions are somewhat like the one provided in C except that I do not use his timing mechanism and the list creation loop counts down instead of up. My intent is not to compare with the compilers but between several different processors.
https://github.com/KarolS/millfork-benc ... nkedlist.c
I am slowly relenting to the prospect that a loop ending with an unconditional branch back to the top often performs better by transforming it to enter in the middle. The result looks unstructured, but it is not very different.
The first one is for the 6800. I have found that it is easiest to adopt 6800 code to other processors instead of the other way around.
The dual accumulators and 16-bit index register with displacement is especially well suited for this challenge.
Though inlining the subroutines provided a substantial improvement, I am showing the code before that final transformation for clarity.
Code: Select all
00001 * 378834 ($5C7D2) cycles
00002 * 300821 ($49715) cycles after inlining subroutines
00003
0000 005F 00004 Free fdb Heap
0002 0000 00005 Root fdb 0
0004 0000 00006 New fdb 0
0006 0000 0000 00007 Total fdb 0,0
000A 0BB7 00008 I fdb 2999
00009
00010 *
00011 * Allocate a node
00012 *
000C 00013 Alloc
000C D6 01 [3] 00014 ldab Free+1 ; Allocate a node
000E D7 05 [4] 00015 stab New+1 ; Return it in New
00016
0010 CB 04 [2] 00017 addb #4 ; Point to free memory
0012 D7 01 [4] 00018 stab Free+1
0014 D6 00 [3] 00019 ldab Free
0016 D7 04 [4] 00020 stab New
0018 C9 00 [2] 00021 adcb #0
001A D7 00 [4] 00022 stab Free
00023
001C 39 [5] 00024 rts
00025
00026 *
00027 * Prepend a node to the list
00028 *
001D 00029 Prepend
001D 8D ED (000C) [8] 00030 bsr Alloc ; Allocate a new node
00031
001F DE 04 [4] 00032 ldx New ; Point to the new node
00033
0021 D6 02 [3] 00034 ldab Root ; Point next of new node to Root
0023 E7 00 [6] 00035 stab ,X
0025 D6 03 [3] 00036 ldab Root+1
0027 E7 01 [6] 00037 stab 1,X
00038
0029 A7 03 [6] 00039 staa 3,X ; Store value in node
002B D6 0A [3] 00040 ldab I
002D E7 02 [6] 00041 stab 2,X
00042
002F DF 02 [5] 00043 stx Root ; This is now the first node
00044
0031 39 [5] 00045 rts
00046
00047 *
00048 * Traverse the list and sum the values
00049 *
0032 00050 Sum
0032 DE 02 [4] 00051 ldx Root
0034 4F [2] 00052 clra
0035 5F [2] 00053 clrb
00054
0036 00055 SumLoop
0036 EB 03 [5] 00056 addb 3,X ; Add value of node to sum
0038 A9 02 [5] 00057 adca 2,X
003A 24 08 (0044) [4] 00058 bcc NoCarry
00059
003C 7C 0007 [6] 00060 inc Total+1
00061
003F 26 03 (0044) [4] 00062 bne NoCarry
00063
0041 7C 0006 [6] 00064 inc Total
00065
0044 00066 NoCarry
0044 EE 00 [6] 00067 ldx ,X ; Point to the next node
0046 26 EE (0036) [4] 00068 bne SumLoop ; Until end of list
00069
0048 97 08 [4] 00070 staa Total+2
004A D7 09 [4] 00071 stab Total+3
00072
004C 39 [5] 00073 rts
00074
00075 *
00076 *
00077 *
004D 00078 Main
004D 96 0B [3] 00079 ldaa I+1 ; Get low byte of node value
00080
004F 20 01 (0052) [4] 00081 bra IntoLoop
00082
0051 00083 Loop
0051 4A [2] 00084 deca
00085
0052 00086 IntoLoop
0052 8D C9 (001D) [8] 00087 bsr Prepend
00088
0054 4D [2] 00089 tsta ; Low byte of count zero?
0055 26 FA (0051) [4] 00090 bne Loop ; No, no borrow
00091
0057 7A 000A [6] 00092 dec I ; Borrow from upper byte
005A 2A F5 (0051) [4] 00093 bpl Loop ; Done if it goes negative
00094
005C 8D D4 (0032) [8] 00095 bsr Sum ; Traverse the list and add them up
00096
005E 3F [12] 00097 swi
00098
005F 00099 Heap
00100
004D 00101 end Main
The 6502 had trouble keeping up with the others. Am I missing something?
Code: Select all
00001 ; 519567 ($7ED8F) cycles
00002 ; 447558 ($6D446) after inlining subroutines
00003
0000 026F 00004 Free fdb Heap
0002 0000 00005 Root fdb 0
0004 0000 00006 New fdb 0
0006 0000 0000 00007 Total fdb 0,0
000A 0BB7 00008 I fdb 2999
00009
0200 00010 org $200
00011
00012 ;
00013 ; Allocate a node
00014 ;
0200 00015 Alloc:
0200 A5 00 [3] 00016 lda Free ; Allocate a node
0202 85 04 [3] 00017 sta New ; Return it in New
00018
0204 18 [2] 00019 clc ; Point to free memory
0205 69 04 [2] 00020 adc #4
0207 85 00 [3] 00021 sta Free
0209 A5 01 [3] 00022 lda Free+1
020B 85 05 [3] 00023 sta New+1
020D 69 00 [2] 00024 adc #0
020F 85 01 [3] 00025 sta Free+1
00026
0211 60 [6] 00027 rts
00028
00029 ;
00030 ; Prepend a node to the list
00031 ;
0212 00032 Prepend:
0212 20 0200 [6] 00033 jsr Alloc ; Allocate a new node
00034
0215 A0 00 [2] 00035 ldy #0 ; Point to the new node
00036
0217 A5 02 [3] 00037 lda Root ; Point next of new node to Root
0219 91 04 [6] 00038 sta (New),Y
021B C8 [2] 00039 iny
021C A5 03 [3] 00040 lda Root+1
021E 91 04 [6] 00041 sta (New),Y
0220 C8 [2] 00042 iny
00043
0221 A5 0A [3] 00044 lda I ; Store value in node
0223 91 04 [6] 00045 sta (New),Y
0225 C8 [2] 00046 iny
0226 A5 0B [3] 00047 lda I+1
0228 91 04 [6] 00048 sta (New),Y
00049
022A A5 04 [3] 00050 lda New ; This is now the first node
022C 85 02 [3] 00051 sta Root
022E A5 05 [3] 00052 lda New+1
0230 85 03 [3] 00053 sta Root+1
00054
0232 60 [6] 00055 rts
00056
00057 ;
00058 ; Traverse the list and sum the values
00059 ;
0233 00060 NoCarry:
0233 A0 00 [2] 00061 ldy #0 ; Point to the next node
0235 B1 02 [5/6] 00062 lda (Root),Y
0237 AA [2] 00063 tax
0238 C8 [2] 00064 iny
0239 B1 02 [5/6] 00065 lda (Root),Y
023B 85 03 [3] 00066 sta Root+1
023D 86 02 [3] 00067 stx Root
00068
023F 05 02 [3] 00069 ora Root
0241 F0 1A (025D) [2/3] 00070 beq Summed ; Until end of list
00071
0243 00072 Sum:
0243 18 [2] 00073 clc ; Add value of node to sum
0244 A0 02 [2] 00074 ldy #2
0246 A5 06 [3] 00075 lda Total
0248 71 02 [5/6] 00076 adc (Root),Y
024A 85 06 [3] 00077 sta Total
024C C8 [2] 00078 iny
024D A5 07 [3] 00079 lda Total+1
024F 71 02 [5/6] 00080 adc (Root),Y
0251 85 07 [3] 00081 sta Total+1
0253 90 DE (0233) [2/3] 00082 bcc NoCarry
00083
0255 E6 08 [5] 00084 inc Total+2
00085
0257 D0 DA (0233) [2/3] 00086 bne NoCarry
00087
0259 E6 09 [5] 00088 inc Total+3
00089
025B D0 D6 (0233) [2/3] 00090 bne NoCarry ; Always branches
00091
025D 00092 Summed:
025D 60 [6] 00093 rts
00094
00095 ;
00096 ;
00097 ;
025E 00098 NoBorrow:
025E C6 0A [5] 00099 dec I
00100
0260 00101 Main:
0260 20 0212 [6] 00102 jsr Prepend
00103
0263 A5 0A [3] 00104 lda I ; Low byte of count zero?
0265 D0 F7 (025E) [2/3] 00105 bne NoBorrow ; No, no borrow
00106
0267 C6 0B [5] 00107 dec I+1 ; Borrow from upper byte
0269 10 F3 (025E) [2/3] 00108 bpl NoBorrow ; Done if it goes negative
00109
026B 20 0243 [6] 00110 jsr Sum ; Traverse the list and add them up
00111
026E 00 [7] 00112 brk
00113
026F 00114 Heap:
00115
0260 00116 end Main
The 8080 was difficult but benefited from abundant registers. Remember that a memory cycle of the 8080 is three machine cycles; it is one on the others, so 8080 machines usually ran at a two to three times faster clock rate.
Code: Select all
00001 ; 991228 (0F1FFCh) cycles
00002 ; 829201 (0CA711h) cycles after inlining subroutines
00003
0100 00004 org 100h
00005
00006 ;
00007 ; Allocate a node
00008 ;
0100 00009 Alloc:
0100 2A 0163 [16] 00010 lhld Free ; Allocate a node
00011
0103 11 0004 [10] 00012 lxi D,4 ; Point to free memory
0106 EB [4] 00013 xchg ; Keep New in DE
0107 19 [10] 00014 dad D
0108 22 0163 [16] 00015 shld Free
00016
010B EB [4] 00017 xchg ; Return New in HL
00018
010C C9 [10] 00019 ret
00020
00021 ;
00022 ; Prepend a node to the list
00023 ;
010D 00024 Prepend:
010D CD 0100 [17] 00025 call Alloc ; Allocate a new node
00026
0110 3A 0165 [13] 00027 lda Root ; Point next of new node to Root
0113 77 [7] 00028 mov M,A
0114 3A 0166 [13] 00029 lda Root+1
0117 22 0165 [16] 00030 shld Root ; This is the new first node
011A 23 [5] 00031 inx H
011B 77 [7] 00032 mov M,A ; Finish pointer to previous Root
011C 23 [5] 00033 inx H
00034
011D 71 [7] 00035 mov M,C ; Store value in node
011E 23 [5] 00036 inx H
011F 70 [7] 00037 mov M,B
00038
0120 C9 [10] 00039 ret
00040
00041 ;
00042 ; Traverse the list and sum the values
00043 ;
0121 00044 Sum:
0121 2A 0165 [16] 00045 lhld Root
0124 11 0000 [10] 00046 lxi D,0
00047
0127 00048 SumLoop:
0127 23 [5] 00049 inx H ; Add value of node to sum
0128 23 [5] 00050 inx H
0129 4E [7] 00051 mov C,M
012A 23 [5] 00052 inx H
012B 46 [7] 00053 mov B,M
012C EB [4] 00054 xchg
012D 09 [10] 00055 dad B
012E EB [4] 00056 xchg
012F D2 013B [10] 00057 jnc NoCarry
00058
0132 E5 [11] 00059 push H ; Carry to high word
0133 2A 016B [16] 00060 lhld Total+2
0136 23 [5] 00061 inx H
0137 22 016B [16] 00062 shld Total+2
013A E1 [10] 00063 pop H
00064
013B 00065 NoCarry:
013B 2B [5] 00066 dcx H ; Point to the next node
013C 2B [5] 00067 dcx H
013D 7E [7] 00068 mov A,M
013E 2B [5] 00069 dcx H
013F 6E [7] 00070 mov L,M
0140 67 [5] 00071 mov H,A
00072
0141 B5 [4] 00073 ora L ; Until end of list
0142 C2 0127 [10] 00074 jnz SumLoop
00075
0145 EB [4] 00076 xchg
0146 22 0169 [16] 00077 shld Total
00078
0149 C9 [10] 00079 ret
00080
00081 ;
00082 ;
00083 ;
014A 00084 Main:
014A 2A 016D [16] 00085 lhld I ; Get count
014D 4D [5] 00086 mov C,L
014E 44 [5] 00087 mov B,H
00088
014F C3 0153 [10] 00089 jmp IntoLoop
00090
0152 00091 NoBorrow:
0152 0D [5] 00092 dcr C
00093
0153 00094 IntoLoop:
0153 CD 010D [17] 00095 call Prepend
00096
0156 0D [5] 00097 dcr C ; Low byte of count zero?
0157 0C [5] 00098 inr C
0158 C2 0152 [10] 00099 jnz NoBorrow ; No, no borrow
00100
015B 05 [5] 00101 dcr B ; Borrow from upper byte
015C F2 0152 [10] 00102 jp NoBorrow ; Done if it goes negative
00103
015F CD 0121 [17] 00104 call Sum ; Traverse the list and add them up
00105
0162 76 [7] 00106 hlt
00107
0163 016F 00108 Free dw Heap
0165 0000 00109 Root dw 0
0167 0000 00110 New dw 0
0169 0000 0000 00111 Total dw 0,0
016D 0BB7 00112 I dw 2999
00113
016F 00114 Heap
00115
014A 00116 end Main
The 6809 ran away with this challenge. The additional registers and flexible addressing modes gave it a major advantage.
Code: Select all
00001 * 363754 ($58CEA) cycles original 6800 source code
00002 * 243378 ($3B6B2) cycles
00003 * 171350 ($29D56) cycles after inlining subroutines
00004
0000 0044 00005 Free fdb Heap
0002 0000 00006 Root fdb 0
0004 0000 00007 New fdb 0
0006 0000 0000 00008 Total fdb 0,0
000A 0BB7 00009 I fdb 2999
00010
00011 *
00012 * Allocate a node
00013 *
000C 00014 Alloc
000C 30 C4 [4] 00015 leax ,U ; Allocate a node
00016
000E 33 44 [5] 00017 leau 4,U ; Point to free memory
00018
0010 39 [5] 00019 rts
00020
00021 *
00022 * Prepend a node to the list
00023 *
0011 00024 Prepend
0011 8D F9 (000C) [7] 00025 bsr Alloc ; Allocate a new node
00026
0013 DC 02 [5] 00027 ldd Root ; Point next of new node to Root
0015 ED 84 [5] 00028 std ,X
00029
0017 10AF 02 [7] 00030 sty 2,X ; Store value in node
00031
001A 9F 02 [5] 00032 stx Root ; This is now the first node
00033
001C 39 [5] 00034 rts
00035
00036 *
00037 * Traverse the list and sum the values
00038 *
001D 00039 Sum
001D 9E 02 [5] 00040 ldx Root
001F CC 0000 [3] 00041 ldd #0 ; Clear total
0022 1F 03 [6] 00042 tfr D,U
00043
0024 00044 SumLoop
0024 E3 02 [7] 00045 addd 2,X ; Add value of node to sum
0026 24 02 (002A) [3] 00046 bcc NoCarry
00047
0028 33 41 [5] 00048 leau 1,U
00049
002A 00050 NoCarry
002A AE 84 [5] 00051 ldx ,X ; Point to the next node
002C 26 F6 (0024) [3] 00052 bne SumLoop ; Until end of list
00053
002E DD 08 [5] 00054 std Total+2
0030 DF 06 [5] 00055 stu Total
00056
0032 39 [5] 00057 rts
00058
00059 *
00060 *
00061 *
0033 00062 Main
0033 109E 0A [6] 00063 ldy I ; Get node value
0036 CE 0044 [3] 00064 ldu #Heap ; Point to free memory
00065
0039 00066 Loop
0039 8D D6 (0011) [7] 00067 bsr Prepend
00068
003B 31 3F [5] 00069 leay -1,Y
00070
003D 26 FA (0039) [3] 00071 bne Loop
00072
003F 8D D0 (0011) [7] 00073 bsr Prepend ; And once for 0 just to make it fair
00074
0041 8D DA (001D) [7] 00075 bsr Sum ; Traverse the list and add them up
00076
0043 3F [19] 00077 swi
00078
0044 00079 Heap
00080
0033 00081 end Main
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
BillG wrote:
I am slowly relenting to the prospect that a loop ending with an unconditional branch back to the top often performs better by transforming it to enter in the middle. The result looks unstructured, but it is not very different.
Quote:
The 6502 had trouble keeping up with the others. Am I missing something?
Quote:
The 6809 ran away with this challenge. The additional registers and flexible addressing modes gave it a major advantage.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Comparisons and contrasts
barrym95838 wrote:
I don't have time to look right now, because I'm trying to catch some feral cats and get them "fixed" and released before they make more inbred feral kittens.
I hear a shotgun works well in such situations.
x86? We ain't got no x86. We don't NEED no stinking x86!