Comparisons and contrasts

Let's talk about anything related to the 6502 microprocessor.
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

BillG wrote:
barrym95838 wrote:
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):

Code: Select all

    org $0100
    ldx #5
    txs
    jsr $1234
    brk
Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
Interesting. Someone with single stepping capability will have to try it on real hardware.

My simulator, which I in no way claim is authoritative, jumps to $1234 with the instruction changed to jsr $105.
If actual hardware works as my simulator does, that may actually be useful.

If the code is written:

Code: Select all

    org $0100
    ldx #5
    txs
    jsr $1234
    brk
    jsr somewhere else    
control is passed to $1234 the first time through. Subsequent times, the code at $105 looks like

Code: Select all

    ora (0,X)
    jsr somewhere else
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: Comparisons and contrasts

Post by Dr Jefyll »

barrym95838 wrote:
For the 65xx, here's a loosely related puzzle that almost certainly has a stable solution (disregarding an untimely interrupt):

Code: Select all

    org $0100
    ldx #5
    txs
    jsr $1234
    brk
Where did we jsr? Was it $1234 or somewhere else? Maybe $0134?
Good one, Mike! And $0134 is correct. :shock:

That's because the cycle which fetches ADL of the JSR destination occurs before the return address gets pushed to stack, and the cycle which fetches ADH of the JSR destination occurs after.

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
John West
Posts: 383
Joined: 03 Sep 2002

Re: Comparisons and contrasts

Post by John West »

Simulators, unless you've been very careful with what operations happen on which cycles, are more likely to be misleading in this case.

On the NMOS 6502 and 65C02, JSR works like this:
  1. Fetch opcode
  2. Fetch target address low byte
  3. (internal operation)
  4. Write PC high byte
  5. Write PC low byte
  6. Fetch target address high byte
The low byte of the target address will be fetched correctly, but by the time it fetches the high byte it has been overwritten by the high byte of PC. The target address will be (if I haven't messed up) $0134.

Instructions that push to the stack (like PHA and JSR) write the value first, then decrement S. Instructions that pull (like PLA and RTS) increment S first, then read the value.

The 65816 works differently, if its datasheet is correct. It fetches the full target address before writing the old PC, so it should go to $1234.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Comparisons and contrasts

Post by BigEd »

Nice puzzle. Here's visual6502 confirming:
http://visual6502.org/JSSim/expert.html ... 0&d=4c0001
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: Comparisons and contrasts

Post by Dr Jefyll »

John West wrote:
The 65816 works differently, if its datasheet is correct. It fetches the full target address before writing the old PC, so it should go to $1234.
Whoa, well spotted, John! And the datasheet seems adamant on this point, as there's a specific note included: "different order from N6502"

Also I notice the Absolute Indexed Indirect mode uses a different order once again, with the stack push cycles occurring consecutively and very early indeed.

-- Jeff
Attachments
65816 JSR cycles.png
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

Are there any other race condition traps in the 6502 instruction set?

It may be interesting to determine how the various processors deal with the jsr/call instruction being overwritten by the return address.

For starters, the x86 has a prefetch queue and will be immune. No need to test it.

The 9900 does not have a hardware stack; it uses a "branch and link" method for invoking a subroutine. It has a number of other land mines because its registers, aka the workspace, may be positioned anywhere in the address space by loading the Workspace Pointer.

The AVR is Harvard architecture and also immune - separate data and program memory.

I cannot single step the 6809 but will try to come up with test code to catch the possibilities.

I seem to recall the documentation for the 6800 and the 8080 going into detail with cycle by cycle descriptions of instructions. I'll go read the fine manuals...
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

For the 6809, I wrote:

Code: Select all

    org     $100

    lds     #$107
    jsr     $1234
Then I sprinkled landing pads at the likely places to display a message: $1234, $0107, $0134 and $1207.

It jsred to $1234.

The 8080 Family User's Manual says that the target address is fetched before the return address is pushed.
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

John West wrote:
Simulators, unless you've been very careful with what operations happen on which cycles, are more likely to be misleading in this case.

On the NMOS 6502 and 65C02, JSR works like this:
  1. Fetch opcode
  2. Fetch target address low byte
  3. (internal operation)
  4. Write PC high byte
  5. Write PC low byte
  6. Fetch target address high byte
The low byte of the target address will be fetched correctly, but by the time it fetches the high byte it has been overwritten by the high byte of PC. The target address will be (if I haven't messed up) $0134.
I just modified mine to properly handle this.

Which is the chicken and which is the egg?

The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Comparisons and contrasts

Post by barrym95838 »

BillG wrote:
The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
I'm not really a hardware guy, but my thinking is that it was done this way to keep the internal state machine as small as reasonably possible. If jmp abs takes only three cycles, then jsr abs should theoretically need only five, and rts only three. Theoretically ...
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Comparisons and contrasts

Post by BigEd »

BillG wrote:
The 6502 pushes the return address - 1 with rts correcting it later. Does jsr work this way to save cycles or transistors or both?
I believe it's to avoid the need for a temporary register. At the time of pushing PC, the value of PC is the one we observe. In order to push the actual return address, JSR would need to be able to read both opcode bytes before doing the stack pushes - and there's nowhere on-chip for those bytes to be stored. As it is, the stack register gets briefly reused to store the first operand, which is possible because the stack value needs to take a round trip through the ALU to be decremented.

You can see this visual6502:
http://visual6502.org/JSSim/expert.html ... 5&steps=20
litwr
Posts: 188
Joined: 09 Jul 2016

Re: Comparisons and contrasts

Post by litwr »

I have gotten several more results for my Mandelbrot project: the 65816, TMS9995, R800, etc were added. The 65816 shows approximately 2 times better performance than the 6502 at the same frequency. My results also shows that the 68020 was only ≈30% slower than the ARM on 16-bit calculations.
This benchmark generates pictures and someone can say that they show us Xmas firs in horizontal and other positions. :D
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

Mike Douglas just posted the following to the FLEX Users Group mailing list:

I have created two versions of the "Byte Sieve" benchmark to run on the SWTPC 6800 and one to run on an 8080 machine (Altair 8800). This benchmark was first published in the Sep 1981 issue of BYTE magazine. The test was designed to allow comparison between different languages/compilers on a given machine, and to compare between machines given the same language/compiler.

For the first 6800 version, I took the C language version of the benchmark and pretended I was a mediocre compiler and generated a 6800 assembly source file. For the second 6800 version, I optimized the code as a good assembly language programmer would do. Here are the results on a SWTPC 6800 computer (1 MHz):

"C" version: 28.25 seconds
Optimized assembly version: 11.85 seconds

As expected, a well written assembly language version is substantially faster than a non-optimized compiled program.

Through a wide variety of tests in the past, I have found the 8080 at 2 MHz and the 6800 at 1 MHz to be closely on par with each other. However, for this Sieve benchmark, which uses a lot of array/indexed operations, the extra 16 bit registers of the 8080 clearly provide an advantage:

Optimized assembly version: 8.18 seconds

You can find the 6800 versions of the Sieve benchmark here:
https://deramp.com/downloads/swtpc/soft ... Benchmark/

The 8080 version is here:
https://deramp.com/downloads/altair/sof ... ellaneous/

Mike D
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Comparisons and contrasts

Post by BillG »

Previously, I wrote:
BillG wrote:
680x programmers are used to coding BSR to call a subroutine over JSR because it is both smaller and faster. On the 6800, if the assembler said that a branch was out of range, change it JMP. On the 6809, it is much easier to add 'L' to make it LBSR even though it is a cycle slower than JMP. Some programmers may have gotten into the habit of always coding "LBSR" instead of "BSR" or "JMP.". I do not believe as he does that a majority of programmers strove for position independent code.

Likewise, it is much easier to change an out of range short conditional branch to a long one than inserting and branching around an absolute jump as was the case on the 6800. Long conditional branches are good and not because they are position independent.
Motorola claimed the 6809 was "source code" compatible with the 6800. That is not entirely true.

There are some obscure differences in the way a few instructions affected the condition codes. Motorola documented those.

What I did not go into with the above quote and Motorola never mentioned in their documentation is that some 6800 instructions grew in size when assembled for the 6809. This may cause marginal relative branch instructions to go "out of range." Maybe the Motorola assembler automatically replaced the branch with a long branch or a jump and conditional branches with the long form or a jump and the opposite branch. Or maybe not. The FLEX 6809 assembler does not and generates an out of range error; the source code must be manually adjusted.
Post Reply