6502 redundant, missed, and suggested features

Bregalad · Post by **Bregalad** » Fri Aug 19, 2016 7:54 pm

Quote:

Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia.

If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

Indeed, I was just talking about the original 6502, and whether the jmp () warp bug/feature was really "a bug". I know squat about the enhancements so I cannot comment.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Aug 19, 2016 10:03 pm

Bregalad wrote:

Quote:

Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia.

If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

Indeed, I was just talking about the original 6502, and whether the jmp () warp bug/feature was really "a bug".

JMP (<addr>) itself is not defective. It becomes defective in the NMOS parts if the jump vector address (<addr>) straddles a page boundary.

Quote:

I know squat about the enhancements so I cannot comment.

Time to take a look at the CMOS version.

BigEd · Post by **BigEd** » Sat Aug 20, 2016 7:00 am

It's important to note that litwr's aim is clearly to write the highest performance code and to write code for both the 6502 and 65C02. I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch, and I can't see a faster way of doing it, if every cycle counts. It's a small piece of self-modifying code.

The nice thing is that the 256 byte table needed for the 6502 and the one needed for the 65C02 overlap, so that a 257 byte table will, I think, work portably on either CPU.

Code: Select all

   lda INDEX
   and #1
   bne l_odd

   sta m_even+1
m_even
   jmp (even_table)
l_odd
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
even_table .word ...   ;256 bytes
odd_table .word ...    ;256 bytes
.byte ... ; 1 extra byte for 65C02

I hope I have that right. It's a bit tricky to think about.

When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP. It's a bit like doing half a JMP to update the PC and then another half a JMP to load the destination address. The PC does the incrementing, ignoring the page boundary issue. It sort of works like this:

Code: Select all

JMP ghostly_jmp ; actually a JMP (far)

; ... lots more code, possibly ...

ghostly_jmp:
    .byte $4C ; JMP destination
far:
    .word destination

Here's the clue, in the comments:

Code: Select all

    JMP0   = 6'd22, // JMP     - fetch PCL and hold
    JMP1   = 6'd23, // JMP     - fetch PCH
    JMPI0  = 6'd24, // JMP IND - fetch LSB and send to ALU for delay (+0)
    JMPI1  = 6'd25, // JMP IND - fetch MSB, proceed with JMP0 state

(The decimal numbers there are just arbitrary state labels.)

We kept that mechanism and added three more states to perform the indexed indirect JMP:

Code: Select all

    JMPIX0 = 6'd51, // JMP (,X)- fetch LSB and send to ALU (+X)
    JMPIX1 = 6'd52, // JMP (,X)- fetch MSB and send to ALU (+Carry)
    JMPIX2 = 6'd53; // JMP (,X)- Wait for ALU (only if needed)

and here's the optional extra cycle:

Code: Select all

        JMPIX0  : state <= JMPIX1;
        JMPIX1  : state <= CO ? JMPIX2 : JMP0;
        JMPIX2  : state <= JMP0;

As is usually the case, Dave had a burst of productivity and gets the credit for the implementation!

Arlet · Post by **Arlet** » Sat Aug 20, 2016 7:32 am

Quote:

When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP

The core does something very similar with vector fetches (interrupt/reset/BRK) by pretending there's a JMP just before the vector.

Dr Jefyll · Post by **Dr Jefyll** » Sat Aug 20, 2016 1:36 pm

BigEd wrote:

I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch

I didn't grasp it either. Now I've had a second look, and yes there's a tricky bit in the latter portion, which is somewhat distracting. What's perhaps easy to overlook is a typo/error in the first portion:

Code: Select all

   lda INDEX
   and #1

Testing bit0 of the index has destroyed bits7-1 of the index! Edit: this has already been noted. I'm guilty of not having thoroughly read all the preceding posts.

Klaus2m5 wrote:

Another "lda INDEX" is required, or you use only the first location of the tables because of the "and #1".

We need to test all 8 bits of the index if we're to have a 256-way branch. One way would be to make changes as follows:

Code: Select all

   lda INDEX
   ror a       <------ was: and # 1
   bcs l_odd   <------ was: bne

   rol a       <------- added
   sta m_even+1
m_even
   jmp (even_table)

l_odd
   rol a       <------- added
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
even_table .word ...   ;256 bytes
odd_table .word ...    ;256 bytes

The code will run as intended on an NMOS cpu, but a CMOS cpu will fail -- which, I now see, is the point litw was making. The fix, as Ed suggested, is to add a 257th byte to the odd table. (Note that bit 0 of the index never gets masked off. This affects the odd_table references only.)

But there's another solution. If the rol's I added get changed to asl's then the problem goes away -- the code will work on NMOS or CMOs. Does that seem right? Did I miss anything?

Quote:

what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP

FWIW -- and in a wonkier context -- my KK Computer uses the same trick, but using external logic since KK has an actual, physical 'C02. The goal is a single KK instruction that performs double-indirect NEXT, as used by ITC Forth. The external logic fakes a $4C, forces IP low then IP high on the bus as its operand; then it fakes a $6C and forces W low then W high on the bus as its operand.

-- Jeff

BigEd · Post by **BigEd** » Sat Aug 20, 2016 1:41 pm

Dr Jefyll wrote:

BigEd wrote:

I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch

I didn't grasp it either. Now I've had a second look, and yes there's a tricky bit in the latter portion, which is somewhat distracting. What's perhaps easy to overlook is a typo/error in the first portion:

Code: Select all

   lda INDEX
   and #1

Testing bit0 of the index has destroyed bits7-1 of the index!

I did notice that, but forgot it. You'll notice that in my failed first attempt to redo the example I tried to restore A (but then got confused about using X and the indexed indirect jump. It was the middle of the night!)

I admit, I had the thought that BIT #1 would be the fix for AND #1 but now realise, sadly, we don't get that on NMOS. Having a constant one in a zp location would do the trick, I think?

Code: Select all

bit one

BigEd · Post by **BigEd** » Sat Aug 20, 2016 2:46 pm

With some prompting from Klaus, I think I've finally understood what he's done in his own revisit of this 256-way branch code - and it does seem right, of course. He picks off the top bit, instead of the bottom bit, and that's a better plan. See his post at
viewtopic.php?f=2&t=4220&start=15#p46849

Edit: typo!

Bregalad · Post by **Bregalad** » Sat Aug 20, 2016 3:20 pm

So, did MOS document this bug/feature in their original datasheet/documentation of their processor? If yes, it's a feature, if no, it's a bug. I think the ROL bug there was in older 6502s is clearly defined to as a "bug", because it was an actual mistake in the processor and was never intended to be there. If JMP () warp-around is the same, then it's a bug. But if it is clearly stated that JMP () will warp-around in the manufacturer documentation, it's a feature.

Quote:

JMP (<addr>) itself is not defective. It becomes defective in the NMOS parts if the jump vector address (<addr>) straddles a page boundary.

What's even the point to replying me and state the obvious? Of course I say "jmp () warp-arround" to make it shorter than saying "jmp () warp-arround when the lower byte is $ff" in order to keep my point short and concise. My efforts to do so are now ruined, thank you.

BigEd · Post by **BigEd** » Sat Aug 20, 2016 3:37 pm

Good idea. I checked the datasheets (MOS, Rockwell) for their description what they call Absolute Indexed addressing. They say that the final byte fetched is the next byte - no mention of page wraparound or of pages, unlike their descriptions of the other modes. So this is confirmation that JMP (abs,x) was always intended to work the way it eventually did work, on the C02 parts.

Bregalad · Post by **Bregalad** » Sat Aug 20, 2016 3:52 pm

In the MOS document you linked to, page 6 under "Absolute indirect" (and NOT absolute indexed !) it says :

Quote:

The next memory location contains the high order byte of the effective address which is loaded into the sixteen bits of the program counter.

This makes it very clear it is not supposed to warp around, and if this document is considered as a reference, the jmp () warp-around is a bug. Now the question is whether it is acceptable to use these as an original 6502 reference.

I'll also add that warp-around within zero-page addressing is clearly documented, since it's mentioned that it always address page zero. So nobody should expect e.g. lda $ff,X to load from page one if X is nonzero and the ZP variant of the instruction is used.

I'm also surprised they mention 65 kilobytes of memory... until I remembered the KB bs KiB difference. I didn't expect this memory-inflated-using-the-wrong-base bull**** to be already present back in the day. It makes especially no sense in this context.

BigEd · Post by **BigEd** » Sat Aug 20, 2016 4:08 pm

> This makes it very clear it is not supposed to warp around
I think we may be in violent agreement!

I'm sure the 65 kilobytes is some faint-hearted person being unable to reconcile '64k' with 65536 - shall we suppose a marketing person?

Arlet · Post by **Arlet** » Sat Aug 20, 2016 4:21 pm

On the other hand, the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) contains a cycle description (page 141) that shows the wrap around as documented:

Dr Jefyll · Post by **Dr Jefyll** » Sat Aug 20, 2016 4:45 pm

Arlet wrote:

the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) [...] shows the wrap around as documented:

And yet, section 9.8.1 (page 141) of the same manual documents the opposite.

"In the JMP Indirect instruction, the second and third bytes of the instruction represent the indirect low and high bytes respectively of the memory location containing ADL. Once ADL is fetched, the program counter is incremented with the next location containing ADH."

BigEd · Post by **BigEd** » Sat Aug 20, 2016 4:49 pm

Yes, many datasheets are either wrong or inconsistent - and therefore wrong! I'd take the table that Arlet notes as an extremely subtle admission of the bug! It seems pretty unwise to try to document internal operations in such a detailed fashion. Even if everything is correct, it makes it harder to make subsequent improvements. Having undocumented behaviour can be a feature.

I think everyone here who's spoken up, with the exception of litwr himself, regards that wrap-around as a bug. (But I'm not sure if it's even interesting to keep arguing about what we call something. Much more interesting to talk about how things work and how to program them.)

White Flame · Post by **White Flame** » Sat Aug 20, 2016 5:05 pm

A while ago I compiled a bunch of fast dispatch ideas to codebase64. The full 256-way dispatch I use follows. I don't quite understand why even/odd comes into play, given that if the dispatch table is page-aligned, you'll never be JMPing through $xxFF anyway.

Code: Select all

  asl
  bcs :++

  ; Dispatch 0-127
  sta :+ +1
: jmp (table)

  ; Dispatch 128-255
: sta :+ +1
: jmp (table + $0100)

.align 256 
table: .word handler0, handler1, ..., handler127, handler128, ...

[Edit: Sorry, yeah, this has been covered. Flipped through the thread too fast; I don't visit often enough

]

6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features

Re: 6502 redundant, missed, and suggested features