Page 11 of 11
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 11:12 am
by BillG
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):
Code: Select all
org $0100
ldx #5
txs
jsr $1234
brk
Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
Interesting. Someone with single stepping capability will have to try it on real hardware.
My simulator, which I in no way claim is authoritative, jumps to $1234 with the instruction changed to jsr $105.
If actual hardware works as my simulator does, that may actually be useful.
If the code is written:
Code: Select all
org $0100
ldx #5
txs
jsr $1234
brk
jsr somewhere else
control is passed to $1234 the first time through. Subsequent times, the code at $105 looks like
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 11:56 am
by Dr Jefyll
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):
Code: Select all
org $0100
ldx #5
txs
jsr $1234
brk
Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
Good one, Mike! And $0134 is correct.
That's because the cycle which fetches ADL of the JSR destination occurs before the return address gets pushed to stack, and the cycle which fetches
ADH of the JSR destination occurs
after.
-- Jeff
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 12:13 pm
by John West
Simulators, unless you've been very careful with what operations happen on which cycles, are more likely to be misleading in this case.
On the NMOS 6502 and 65C02, JSR works like this:
- Fetch opcode
- Fetch target address low byte
- (internal operation)
- Write PC high byte
- Write PC low byte
- Fetch target address high byte
The low byte of the target address will be fetched correctly, but by the time it fetches the high byte it has been overwritten by the high byte of PC. The target address will be (if I haven't messed up) $0134.
Instructions that push to the stack (like PHA and JSR) write the value first, then decrement S. Instructions that pull (like PLA and RTS) increment S first, then read the value.
The 65816 works differently, if its datasheet is correct. It fetches the full target address before writing the old PC, so it should go to $1234.
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 12:26 pm
by BigEd
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 1:11 pm
by Dr Jefyll
The 65816 works differently, if its datasheet is correct. It fetches the full target address before writing the old PC, so it should go to $1234.
Whoa, well spotted, John! And the datasheet seems adamant on this point, as there's a specific note included: "different order from N6502"
Also I notice the Absolute Indexed Indirect mode uses a different order once again, with the stack push cycles occurring consecutively and very early indeed.
-- Jeff
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 1:57 pm
by BillG
Are there any other race condition traps in the 6502 instruction set?
It may be interesting to determine how the various processors deal with the jsr/call instruction being overwritten by the return address.
For starters, the x86 has a prefetch queue and will be immune. No need to test it.
The 9900 does not have a hardware stack; it uses a "branch and link" method for invoking a subroutine. It has a number of other land mines because its registers, aka the workspace, may be positioned anywhere in the address space by loading the Workspace Pointer.
The AVR is Harvard architecture and also immune - separate data and program memory.
I cannot single step the 6809 but will try to come up with test code to catch the possibilities.
I seem to recall the documentation for the 6800 and the 8080 going into detail with cycle by cycle descriptions of instructions. I'll go read the fine manuals...
Re: Comparisons and contrasts
Posted: Fri Sep 23, 2022 9:03 pm
by BillG
For the 6809, I wrote:
Then I sprinkled landing pads at the likely places to display a message: $1234, $0107, $0134 and $1207.
It jsred to $1234.
The 8080 Family User's Manual says that the target address is fetched before the return address is pushed.
Re: Comparisons and contrasts
Posted: Sun Sep 25, 2022 1:13 pm
by BillG
Simulators, unless you've been very careful with what operations happen on which cycles, are more likely to be misleading in this case.
On the NMOS 6502 and 65C02, JSR works like this:
- Fetch opcode
- Fetch target address low byte
- (internal operation)
- Write PC high byte
- Write PC low byte
- Fetch target address high byte
The low byte of the target address will be fetched correctly, but by the time it fetches the high byte it has been overwritten by the high byte of PC. The target address will be (if I haven't messed up) $0134.
I just modified mine to properly handle this.
Which is the chicken and which is the egg?
The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
Re: Comparisons and contrasts
Posted: Sun Sep 25, 2022 3:49 pm
by barrym95838
The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
I'm not really a hardware guy, but my thinking is that it was done this way to keep the internal state machine as small as reasonably possible. If
jmp abs takes only three cycles, then
jsr abs should theoretically need only five, and
rts only three. Theoretically ...
Re: Comparisons and contrasts
Posted: Wed Oct 05, 2022 8:18 pm
by BigEd
The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
I believe it's to avoid the need for a temporary register. At the time of pushing PC, the value of PC is the one we observe. In order to push the actual return address, JSR would need to be able to read both opcode bytes before doing the stack pushes - and there's nowhere on-chip for those bytes to be stored. As it is, the stack register gets briefly reused to store the first operand, which is possible because the stack value needs to take a round trip through the ALU to be decremented.
You can see this visual6502:
http://visual6502.org/JSSim/expert.html ... 5&steps=20
Re: Comparisons and contrasts
Posted: Wed Jan 04, 2023 6:29 pm
by litwr
I have gotten several more results for
my Mandelbrot project: the 65816, TMS9995, R800, etc were added. The 65816 shows approximately 2 times better performance than the 6502 at the same frequency. My results also shows that the 68020 was only ≈30% slower than the ARM on 16-bit calculations.
This benchmark generates
pictures and someone can say that they show us Xmas firs in horizontal and other positions.

Re: Comparisons and contrasts
Posted: Thu Jan 05, 2023 6:13 am
by BillG
Mike Douglas just posted the following to the FLEX Users Group mailing list:
I have created two versions of the "Byte Sieve" benchmark to run on the SWTPC 6800 and one to run on an 8080 machine (Altair 8800). This benchmark was first published in the Sep 1981 issue of BYTE magazine. The test was designed to allow comparison between different languages/compilers on a given machine, and to compare between machines given the same language/compiler.
For the first 6800 version, I took the C language version of the benchmark and pretended I was a mediocre compiler and generated a 6800 assembly source file. For the second 6800 version, I optimized the code as a good assembly language programmer would do. Here are the results on a SWTPC 6800 computer (1 MHz):
"C" version: 28.25 seconds
Optimized assembly version: 11.85 seconds
As expected, a well written assembly language version is substantially faster than a non-optimized compiled program.
Through a wide variety of tests in the past, I have found the 8080 at 2 MHz and the 6800 at 1 MHz to be closely on par with each other. However, for this Sieve benchmark, which uses a lot of array/indexed operations, the extra 16 bit registers of the 8080 clearly provide an advantage:
Optimized assembly version: 8.18 seconds
You can find the 6800 versions of the Sieve benchmark here:
https://deramp.com/downloads/swtpc/soft ... Benchmark/
The 8080 version is here:
https://deramp.com/downloads/altair/sof ... ellaneous/
Mike D
Re: Comparisons and contrasts
Posted: Fri Jan 20, 2023 1:27 am
by BillG
Previously, I wrote:
680x programmers are used to coding BSR to call a subroutine over JSR because it is both smaller and faster. On the 6800, if the assembler said that a branch was out of range, change it JMP. On the 6809, it is much easier to add 'L' to make it LBSR even though it is a cycle slower than JMP. Some programmers may have gotten into the habit of always coding "LBSR" instead of "BSR" or "JMP.". I do not believe as he does that a majority of programmers strove for position independent code.
Likewise, it is much easier to change an out of range short conditional branch to a long one than inserting and branching around an absolute jump as was the case on the 6800. Long conditional branches are good and not because they are position independent.
Motorola claimed the 6809 was "source code" compatible with the 6800. That is not entirely true.
There are some obscure differences in the way a few instructions affected the condition codes. Motorola documented those.
What I did not go into with the above quote and Motorola never mentioned in their documentation is that some 6800 instructions grew in size when assembled for the 6809. This may cause marginal relative branch instructions to go "out of range." Maybe the Motorola assembler automatically replaced the branch with a long branch or a jump and conditional branches with the long form or a jump and the opposite branch. Or maybe not. The FLEX 6809 assembler does not and generates an out of range error; the source code must be manually adjusted.