Comparisons and contrasts
Re: Comparisons and contrasts
On further reflection, I am thinking that the upper bound check can be done with an unsigned test.
Concur?
Concur?
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
I might be completely wrong, but doesn't PASCAL allow signed indices?
If you're just comparing a 16-bit value to another, you can almost always replace SEC SBC with CMP for the low byte, signed or unsigned.
If you're just comparing a 16-bit value to another, you can almost always replace SEC SBC with CMP for the low byte, signed or unsigned.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
barrym95838 wrote:
I might be completely wrong, but doesn't PASCAL allow signed indices?
Consider this:
Code: Select all
Offset := Subscript - LowerBound
if Offset < 0 { This must be a signed comparison, that is, -32768 - LowerBound must be reported as an error }
then
report OutOfBounds error
{ Offset is now an unsigned quantity }
barrym95838 wrote:
If you're just comparing a 16-bit value to another, you can almost always replace SEC SBC with CMP for the low byte, signed or unsigned.
Code: Select all
subtract NumberInRange from Offset
if the difference < 0
then
report OutOfBounds error
Code: Select all
compare upper byte of Offset with upper byte of NumberInRange
if <
then
proceed
else if >
then
report error
else
compare lower byte of Offset with lower byte of NumberInRange
if <
then
proceed
else
report error
For the upper bound, I'll have to analyze the two methods.
Edit: Oh, I see what you are saying. Since the difference is not kept, do a compare of the low bytes instead of a subtraction...
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
BillG wrote:
Oh, I see what you are saying. Since the difference is not kept, do a compare of the low bytes instead of a subtraction...
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
I have done some tests for the 6502, Z80, K1801VM1, 8088, and 68000 which have to do the same Mandelbrot calculations. Results are here. It seems that for intensive 16-bit calculations the 6502 shows rather mediocre results. However, getting good optimized code for the Z80 is a very long and expensive process.
Re: Comparisons and contrasts
The linked page says:
I am not familiar with pce-ibmpc. Did you measure the time using a stopwatch or does the emulator report the number of CPU cycles used?
The fastest code in an emulator as measured with a stopwatch is not likely to be the fastest on actual hardware. It is not unlike trying to optimize code for a BASIC compiler as opposed to for one of the many interpreters.
Code optimization on x86 processors is notoriously difficult. If you have ever read The Zen of Assembly Language by Michael Abrash, you will know what I am talking about.
For instance, your code does not make use of the STOSW instruction which is especially advantageous on the 8088/8086 processor but not so much on modern members of the family.
Quote:
Emulators were used to get these results: BK2010 v0.5 for the BK0010, GID v3.10 for the BK0011M, plus4emu v1.2.10 for the Commodore+4, ep128emu v2.0.11 for the Amstrad CPC, mess 0.229 for the BBC Master, FS-UAE 3.0.5 for the Amiga 500, pce-ibmpc version 20140222-4b05f0c for the IBM PC 5160 EGA. The emulators are quite accurate for timings, the only exception is the emulator for the IBM PC which appears to be about 25% faster than real hardware. So all the speed results for the IBM PC may be degraded by this 25% – the degraded ER result is shown in parantheses.
The fastest code in an emulator as measured with a stopwatch is not likely to be the fastest on actual hardware. It is not unlike trying to optimize code for a BASIC compiler as opposed to for one of the many interpreters.
Code optimization on x86 processors is notoriously difficult. If you have ever read The Zen of Assembly Language by Michael Abrash, you will know what I am talking about.
For instance, your code does not make use of the STOSW instruction which is especially advantageous on the 8088/8086 processor but not so much on modern members of the family.
Re: Comparisons and contrasts
For me, it's a major difficulty with any kind of comparison using hand-coded assembly - you need to be able to marshal the same level of expertise with each micro. Not insurmountable, but otherwise you may leave unrealised improvements for some which distort the rankings.
Re: Comparisons and contrasts
BillG wrote:
The linked page says:
I am not familiar with pce-ibmpc. Did you measure the time using a stopwatch or does the emulator report the number of CPU cycles used?
The fastest code in an emulator as measured with a stopwatch is not likely to be the fastest on actual hardware. It is not unlike trying to optimize code for a BASIC compiler as opposed to for one of the many interpreters.
Code optimization on x86 processors is notoriously difficult. If you have ever read The Zen of Assembly Language by Michael Abrash, you will know what I am talking about.
For instance, your code does not make use of the STOSW instruction which is especially advantageous on the 8088/8086 processor but not so much on modern members of the family.
I am not familiar with pce-ibmpc. Did you measure the time using a stopwatch or does the emulator report the number of CPU cycles used?
The fastest code in an emulator as measured with a stopwatch is not likely to be the fastest on actual hardware. It is not unlike trying to optimize code for a BASIC compiler as opposed to for one of the many interpreters.
Code optimization on x86 processors is notoriously difficult. If you have ever read The Zen of Assembly Language by Michael Abrash, you will know what I am talking about.
For instance, your code does not make use of the STOSW instruction which is especially advantageous on the 8088/8086 processor but not so much on modern members of the family.
Code: Select all
1$: mov sqr(r1), r3 ; r3 = y^2
add r0, r1 ; r1 = x+y
mov sqr(r0), r0 ; r0 = x^2
add r3, r0 ; r0 = x^2+y^2
cmp r0, r6 ; if r0 >= 4.0 then
bge 2$ ; overflow
mov sqr(r1), r1 ; r1 = (x+y)^2
sub r0, r1 ; r1 = (x+y)^2-x^2-y^2 = 2*x*y
add r5, r1 ; r1 = 2*x*y+b, updated y
sub r3, r0 ; r0 = x^2
sub r3, r0 ; r0 = x^2-y^2
add r4, r0 ; r0 = x^2-y^2+a, updated x
sob r2, 1$ ; to next iterationCode: Select all
while (x*x + y*y ≤ 2*2 AND iteration < max_iteration) do
xtemp := x*x - y*y + x0
y := 2*x*y + y0
x := xtemp
iteration := iteration + 1All programs just print timings, it is written on the linked page with the results. Those emulators are capable to run tricky code for games and demos. So they are quite accurate. The only exception is the IBM PC emulator because timings were almost never a compatibility issue for this computer. There were too many variety of it that used different processors (the 8088, 8086, V20, V30, 80286, ...) at different clocks. So emulators for the IBM PC are usually faster than the original machines. IMHO the emulators just don't simulate the instruction queue delays. BTW if anybody knows the best IBM PC emulator please inform me about it.
It is possible just get machine cycles quantity for some processors (the Z80 and 6502) but requires different means to use and this IMHO can't change the results. And this approach misses the fine resulting pictures.
BigEd wrote:
For me, it's a major difficulty with any kind of comparison using hand-coded assembly - you need to be able to marshal the same level of expertise with each micro. Not insurmountable, but otherwise you may leave unrealised improvements for some which distort the rankings.
It is interesting that for per MHz efficiency the 6502 beats the 8088 even on 16-bit calculations.
Last edited by litwr on Thu Dec 23, 2021 7:28 pm, edited 2 times in total.
Re: Comparisons and contrasts
Sarah Walker's PCem is cycle accurate, apparently.
https://retrocomputing.stackexchange.co ... 6-emulator
https://retrocomputing.stackexchange.co ... 6-emulator
Re: Comparisons and contrasts
BigEd wrote:
Sarah Walker's PCem is cycle accurate, apparently.
https://retrocomputing.stackexchange.co ... 6-emulator
https://retrocomputing.stackexchange.co ... 6-emulator
Re: Comparisons and contrasts
I have ported the pi-spigot algorithm to the 6803. This CPU was used very rarely, I know only the Tandy TRS-80 MC-10 and its French clones. The results show that this processor is slightly faster than the 6809! It seems Motorola followed the DEC way to make processor instructions more complex and slow. The 6803 has faster instructions than the 6809 but the 6809 has more registers, instructions and addressing modes. However some instructions like LSRD or ASLD only exist for the 6803. If the 6803 also had ROLD, it would greatly speed up the division procedure for this processor.
It would also be interesting to guess how the 6502 might have evolved if MOS Technology had been regularly upgrading it. It is possible that they could have chosen the same path that Motorola did for the 6803. That would have meant using a 16-bit accumulator.
It would also be interesting to guess how the 6502 might have evolved if MOS Technology had been regularly upgrading it. It is possible that they could have chosen the same path that Motorola did for the 6803. That would have meant using a 16-bit accumulator.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
I'm not surprising anyone here, but the ability to run legacy binaries was the constraint chosen for the '802 and '816, and they're full of 16-bit stuff behind those annoying mode bits. x86 followed a similar upgrade path, but threw much, much more money for R&D into the mix.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
barrym95838 wrote:
I'm not surprising anyone here, but the ability to run legacy binaries was the constraint chosen for the '802 and '816, and they're full of 16-bit stuff behind those annoying mode bits. x86 followed a similar upgrade path, but threw much, much more money for R&D into the mix.
1) 6800/6802/6808, 6801/6803 (backward compatible with the 6800);
2) 6800, 68HC08;
3) 6800, 68HC11;
4) 6804, 6805;
5) 6809, 6309 (backward compatible with the CMOS 6809).
This is not a complete list. I am not able to finish it, too many controllers are there. The 6502 family is a set of more compatible processors. But it also contains varieties:
1) NMOS 6502 (undocumented instructions);
2) CMOS 6502 (almost 100% compatible with the NMOS 6502 without undocumented instructions);
3) 6509 (almost 100% compatible with the NMOS 6502);
4) 65CE02 (backward compatible with the CMOS 6502);
5) HuC6280 (backward compatible with the CMOS 6502?);
6) DTV (backward compatible with the CMOS 6502);
7) WDC65C02 (backward compatible with the CMOS 6502);
8 ) 65816 (backward compatible with the CMOS 6502).
It is interesting that almost universal assembler VASM has supports for all (?) 6502 8-bit varieties but still misses the 65816.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Comparisons and contrasts
litwr wrote:
I have found out that the 6800 family consists of families of binary incompatible processors. I could count up to 5 families:
1) 6800/6802/6808, 6801/6803 (backward compatible with the 6800);
2) 6800, 68HC08;
3) 6800, 68HC11;
4) 6804, 6805;
5) 6809, 6309 (backward compatible with the CMOS 6809).
This is not a complete list. I am not able to finish it, too many controllers are there.
1) 6800/6802/6808, 6801/6803 (backward compatible with the 6800);
2) 6800, 68HC08;
3) 6800, 68HC11;
4) 6804, 6805;
5) 6809, 6309 (backward compatible with the CMOS 6809).
This is not a complete list. I am not able to finish it, too many controllers are there.
Quote:
The 68HC12 adds to and replaces a small number of 68HC11 instructions with new forms that are closer to the 6809 processor. More significantly it changes the instruction encodings to be far more dense and adds many 6809 like indexing features, some with even more flexibility. The net result is that code sizes are typically 30% smaller.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Comparisons and contrasts
Wow, 30% smaller code is very impressive!