6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 5:41 am

All times are UTC




Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 9  Next
Author Message
PostPosted: Fri Aug 19, 2016 7:54 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Quote:
Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia. :lol:

If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

Indeed, I was just talking about the original 6502, and whether the jmp () warp bug/feature was really "a bug". I know squat about the enhancements so I cannot comment.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 19, 2016 10:03 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
Bregalad wrote:
Quote:
Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia. :lol:

If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

Indeed, I was just talking about the original 6502, and whether the jmp () warp bug/feature was really "a bug".

JMP (<addr>) itself is not defective. It becomes defective in the NMOS parts if the jump vector address (<addr>) straddles a page boundary.

Quote:
I know squat about the enhancements so I cannot comment.

Time to take a look at the CMOS version. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 7:00 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
It's important to note that litwr's aim is clearly to write the highest performance code and to write code for both the 6502 and 65C02. I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch, and I can't see a faster way of doing it, if every cycle counts. It's a small piece of self-modifying code.

The nice thing is that the 256 byte table needed for the 6502 and the one needed for the 65C02 overlap, so that a 257 byte table will, I think, work portably on either CPU.
Code:
   lda INDEX
   and #1
   bne l_odd

   sta m_even+1
m_even
   jmp (even_table)
l_odd
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
even_table .word ...   ;256 bytes
odd_table .word ...    ;256 bytes
.byte ... ; 1 extra byte for 65C02

I hope I have that right. It's a bit tricky to think about.

When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP. It's a bit like doing half a JMP to update the PC and then another half a JMP to load the destination address. The PC does the incrementing, ignoring the page boundary issue. It sort of works like this:

Code:
JMP ghostly_jmp ; actually a JMP (far)

; ... lots more code, possibly ...

ghostly_jmp:
    .byte $4C ; JMP destination
far:
    .word destination


Here's the clue, in the comments:
Code:
    JMP0   = 6'd22, // JMP     - fetch PCL and hold
    JMP1   = 6'd23, // JMP     - fetch PCH
    JMPI0  = 6'd24, // JMP IND - fetch LSB and send to ALU for delay (+0)
    JMPI1  = 6'd25, // JMP IND - fetch MSB, proceed with JMP0 state

(The decimal numbers there are just arbitrary state labels.)

We kept that mechanism and added three more states to perform the indexed indirect JMP:
Code:
    JMPIX0 = 6'd51, // JMP (,X)- fetch LSB and send to ALU (+X)
    JMPIX1 = 6'd52, // JMP (,X)- fetch MSB and send to ALU (+Carry)
    JMPIX2 = 6'd53; // JMP (,X)- Wait for ALU (only if needed)

and here's the optional extra cycle:
Code:
        JMPIX0  : state <= JMPIX1;
        JMPIX1  : state <= CO ? JMPIX2 : JMP0;
        JMPIX2  : state <= JMP0;

As is usually the case, Dave had a burst of productivity and gets the credit for the implementation!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 7:32 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP

The core does something very similar with vector fetches (interrupt/reset/BRK) by pretending there's a JMP just before the vector.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 1:36 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigEd wrote:
I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch
I didn't grasp it either. Now I've had a second look, and yes there's a tricky bit in the latter portion, which is somewhat distracting. What's perhaps easy to overlook is a typo/error in the first portion:
Code:
   lda INDEX
   and #1
Testing bit0 of the index has destroyed bits7-1 of the index! Edit: this has already been noted. I'm guilty of not having thoroughly read all the preceding posts.
Klaus2m5 wrote:
Another "lda INDEX" is required, or you use only the first location of the tables because of the "and #1".


We need to test all 8 bits of the index if we're to have a 256-way branch. One way would be to make changes as follows:
Code:
   lda INDEX
   ror a       <------ was: and # 1
   bcs l_odd   <------ was: bne

   rol a       <------- added
   sta m_even+1
m_even
   jmp (even_table)

l_odd
   rol a       <------- added
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
even_table .word ...   ;256 bytes
odd_table .word ...    ;256 bytes
The code will run as intended on an NMOS cpu, but a CMOS cpu will fail -- which, I now see, is the point litw was making. The fix, as Ed suggested, is to add a 257th byte to the odd table. (Note that bit 0 of the index never gets masked off. This affects the odd_table references only.)

But there's another solution. If the rol's I added get changed to asl's then the problem goes away -- the code will work on NMOS or CMOs. Does that seem right? Did I miss anything? :roll:

Quote:
what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP
FWIW -- and in a wonkier context -- my KK Computer uses the same trick, but using external logic since KK has an actual, physical 'C02. The goal is a single KK instruction that performs double-indirect NEXT, as used by ITC Forth. The external logic fakes a $4C, forces IP low then IP high on the bus as its operand; then it fakes a $6C and forces W low then W high on the bus as its operand.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sat Aug 20, 2016 2:15 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 1:41 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Dr Jefyll wrote:
BigEd wrote:
I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch
I didn't grasp it either. Now I've had a second look, and yes there's a tricky bit in the latter portion, which is somewhat distracting. What's perhaps easy to overlook is a typo/error in the first portion:
Code:
   lda INDEX
   and #1
Testing bit0 of the index has destroyed bits7-1 of the index!

I did notice that, but forgot it. You'll notice that in my failed first attempt to redo the example I tried to restore A (but then got confused about using X and the indexed indirect jump. It was the middle of the night!)

I admit, I had the thought that BIT #1 would be the fix for AND #1 but now realise, sadly, we don't get that on NMOS. Having a constant one in a zp location would do the trick, I think?
Code:
bit one


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 2:46 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
With some prompting from Klaus, I think I've finally understood what he's done in his own revisit of this 256-way branch code - and it does seem right, of course. He picks off the top bit, instead of the bottom bit, and that's a better plan. See his post at
viewtopic.php?f=2&t=4220&start=15#p46849

Edit: typo!


Last edited by BigEd on Sat Aug 20, 2016 4:43 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 3:20 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
So, did MOS document this bug/feature in their original datasheet/documentation of their processor? If yes, it's a feature, if no, it's a bug. I think the ROL bug there was in older 6502s is clearly defined to as a "bug", because it was an actual mistake in the processor and was never intended to be there. If JMP () warp-around is the same, then it's a bug. But if it is clearly stated that JMP () will warp-around in the manufacturer documentation, it's a feature.

Quote:
JMP (<addr>) itself is not defective. It becomes defective in the NMOS parts if the jump vector address (<addr>) straddles a page boundary.

What's even the point to replying me and state the obvious? Of course I say "jmp () warp-arround" to make it shorter than saying "jmp () warp-arround when the lower byte is $ff" in order to keep my point short and concise. My efforts to do so are now ruined, thank you.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 3:37 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Good idea. I checked the datasheets (MOS, Rockwell) for their description what they call Absolute Indexed addressing. They say that the final byte fetched is the next byte - no mention of page wraparound or of pages, unlike their descriptions of the other modes. So this is confirmation that JMP (abs,x) was always intended to work the way it eventually did work, on the C02 parts.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 3:52 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
In the MOS document you linked to, page 6 under "Absolute indirect" (and NOT absolute indexed !) it says :
Quote:
The next memory location contains the high order byte of the effective address which is loaded into the sixteen bits of the program counter.

This makes it very clear it is not supposed to warp around, and if this document is considered as a reference, the jmp () warp-around is a bug. Now the question is whether it is acceptable to use these as an original 6502 reference.

I'll also add that warp-around within zero-page addressing is clearly documented, since it's mentioned that it always address page zero. So nobody should expect e.g. lda $ff,X to load from page one if X is nonzero and the ZP variant of the instruction is used.

I'm also surprised they mention 65 kilobytes of memory... until I remembered the KB bs KiB difference. I didn't expect this memory-inflated-using-the-wrong-base bull**** to be already present back in the day. It makes especially no sense in this context.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 4:08 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
> This makes it very clear it is not supposed to warp around
I think we may be in violent agreement!

I'm sure the 65 kilobytes is some faint-hearted person being unable to reconcile '64k' with 65536 - shall we suppose a marketing person?


Last edited by BigEd on Sat Aug 20, 2016 4:43 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 4:21 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
On the other hand, the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) contains a cycle description (page 141) that shows the wrap around as documented:


Attachments:
mos-6500.png
mos-6500.png [ 49.52 KiB | Viewed 1039 times ]
Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 4:45 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Arlet wrote:
the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) [...] shows the wrap around as documented:
And yet, section 9.8.1 (page 141) of the same manual documents the opposite.
    "In the JMP Indirect instruction, the second and third bytes of the instruction represent the indirect low and high bytes respectively of the memory location containing ADL. Once ADL is fetched, the program counter is incremented with the next location containing ADH."

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 4:49 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Yes, many datasheets are either wrong or inconsistent - and therefore wrong! I'd take the table that Arlet notes as an extremely subtle admission of the bug! It seems pretty unwise to try to document internal operations in such a detailed fashion. Even if everything is correct, it makes it harder to make subsequent improvements. Having undocumented behaviour can be a feature.

I think everyone here who's spoken up, with the exception of litwr himself, regards that wrap-around as a bug. (But I'm not sure if it's even interesting to keep arguing about what we call something. Much more interesting to talk about how things work and how to program them.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 5:05 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
A while ago I compiled a bunch of fast dispatch ideas to codebase64. The full 256-way dispatch I use follows. I don't quite understand why even/odd comes into play, given that if the dispatch table is page-aligned, you'll never be JMPing through $xxFF anyway.

Code:
  asl
  bcs :++

  ; Dispatch 0-127
  sta :+ +1
: jmp (table)

  ; Dispatch 128-255
: sta :+ +1
: jmp (table + $0100)

.align 256
table: .word handler0, handler1, ..., handler127, handler128, ...


[Edit: Sorry, yeah, this has been covered. Flipped through the thread too fast; I don't visit often enough :? ]

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Last edited by White Flame on Sat Aug 20, 2016 5:48 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: