If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:
6502 redundant, missed, and suggested features
Re: 6502 redundant, missed, and suggested features
Quote:
Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia. 
If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:
If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: 6502 redundant, missed, and suggested features
Bregalad wrote:
Quote:
Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia. 
If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:
If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:
Quote:
I know squat about the enhancements so I cannot comment.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: 6502 redundant, missed, and suggested features
It's important to note that litwr's aim is clearly to write the highest performance code and to write code for both the 6502 and 65C02. I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch, and I can't see a faster way of doing it, if every cycle counts. It's a small piece of self-modifying code.
The nice thing is that the 256 byte table needed for the 6502 and the one needed for the 65C02 overlap, so that a 257 byte table will, I think, work portably on either CPU.
I hope I have that right. It's a bit tricky to think about.
When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP. It's a bit like doing half a JMP to update the PC and then another half a JMP to load the destination address. The PC does the incrementing, ignoring the page boundary issue. It sort of works like this:
Here's the clue, in the comments:
(The decimal numbers there are just arbitrary state labels.)
We kept that mechanism and added three more states to perform the indexed indirect JMP:
and here's the optional extra cycle:
As is usually the case, Dave had a burst of productivity and gets the credit for the implementation!
The nice thing is that the 256 byte table needed for the 6502 and the one needed for the 65C02 overlap, so that a 257 byte table will, I think, work portably on either CPU.
Code: Select all
lda INDEX
and #1
bne l_odd
sta m_even+1
m_even
jmp (even_table)
l_odd
sta m_odd+1
m_odd
jmp (odd_table)
align 256
even_table .word ... ;256 bytes
odd_table .word ... ;256 bytes
.byte ... ; 1 extra byte for 65C02
When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP. It's a bit like doing half a JMP to update the PC and then another half a JMP to load the destination address. The PC does the incrementing, ignoring the page boundary issue. It sort of works like this:
Code: Select all
JMP ghostly_jmp ; actually a JMP (far)
; ... lots more code, possibly ...
ghostly_jmp:
.byte $4C ; JMP destination
far:
.word destinationCode: Select all
JMP0 = 6'd22, // JMP - fetch PCL and hold
JMP1 = 6'd23, // JMP - fetch PCH
JMPI0 = 6'd24, // JMP IND - fetch LSB and send to ALU for delay (+0)
JMPI1 = 6'd25, // JMP IND - fetch MSB, proceed with JMP0 stateWe kept that mechanism and added three more states to perform the indexed indirect JMP:
Code: Select all
JMPIX0 = 6'd51, // JMP (,X)- fetch LSB and send to ALU (+X)
JMPIX1 = 6'd52, // JMP (,X)- fetch MSB and send to ALU (+Carry)
JMPIX2 = 6'd53; // JMP (,X)- Wait for ALU (only if needed)Code: Select all
JMPIX0 : state <= JMPIX1;
JMPIX1 : state <= CO ? JMPIX2 : JMP0;
JMPIX2 : state <= JMP0;Re: 6502 redundant, missed, and suggested features
Quote:
When Dave and I were doing the upgrade of Arlet's core from 6502 to 65C02, we noticed that he uses a very nice trick for the indirect JMP. His core, like the 6502, has only a byte-wide ALU, and on the face of it, a 16-bit address must be computed and incremented to fetch the two bytes of the destination address. With the byte-wide ALU, that would mean taking an extra cycle if the page boundary is crossed. But the PC is capable of incrementing the full 16 bit value in one cycle - so what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP
Re: 6502 redundant, missed, and suggested features
BigEd wrote:
I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch
Code: Select all
lda INDEX
and #1Klaus2m5 wrote:
Another "lda INDEX" is required, or you use only the first location of the tables because of the "and #1".
Code: Select all
lda INDEX
ror a <------ was: and # 1
bcs l_odd <------ was: bne
rol a <------- added
sta m_even+1
m_even
jmp (even_table)
l_odd
rol a <------- added
sta m_odd+1
m_odd
jmp (odd_table)
align 256
even_table .word ... ;256 bytes
odd_table .word ... ;256 bytesBut there's another solution. If the rol's I added get changed to asl's then the problem goes away -- the code will work on NMOS or CMOs. Does that seem right? Did I miss anything?
Quote:
what the core does is to pretend that there's a JMP just before the two bytes, and proceed as if fetching the two operand bytes of a JMP
-- Jeff
Last edited by Dr Jefyll on Sat Aug 20, 2016 2:15 pm, edited 2 times in total.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: 6502 redundant, missed, and suggested features
Dr Jefyll wrote:
BigEd wrote:
I did fail to grasp his code fragment, but now I see that it is a very fast 256-way branch
Code: Select all
lda INDEX
and #1I admit, I had the thought that BIT #1 would be the fix for AND #1 but now realise, sadly, we don't get that on NMOS. Having a constant one in a zp location would do the trick, I think?
Code: Select all
bit one
Re: 6502 redundant, missed, and suggested features
With some prompting from Klaus, I think I've finally understood what he's done in his own revisit of this 256-way branch code - and it does seem right, of course. He picks off the top bit, instead of the bottom bit, and that's a better plan. See his post at
viewtopic.php?f=2&t=4220&start=15#p46849
Edit: typo!
viewtopic.php?f=2&t=4220&start=15#p46849
Edit: typo!
Last edited by BigEd on Sat Aug 20, 2016 4:43 pm, edited 1 time in total.
Re: 6502 redundant, missed, and suggested features
So, did MOS document this bug/feature in their original datasheet/documentation of their processor? If yes, it's a feature, if no, it's a bug. I think the ROL bug there was in older 6502s is clearly defined to as a "bug", because it was an actual mistake in the processor and was never intended to be there. If JMP () warp-around is the same, then it's a bug. But if it is clearly stated that JMP () will warp-around in the manufacturer documentation, it's a feature.
What's even the point to replying me and state the obvious? Of course I say "jmp () warp-arround" to make it shorter than saying "jmp () warp-arround when the lower byte is $ff" in order to keep my point short and concise. My efforts to do so are now ruined, thank you.
Quote:
JMP (<addr>) itself is not defective. It becomes defective in the NMOS parts if the jump vector address (<addr>) straddles a page boundary.
Re: 6502 redundant, missed, and suggested features
Good idea. I checked the datasheets (MOS, Rockwell) for their description what they call Absolute Indexed addressing. They say that the final byte fetched is the next byte - no mention of page wraparound or of pages, unlike their descriptions of the other modes. So this is confirmation that JMP (abs,x) was always intended to work the way it eventually did work, on the C02 parts.
Re: 6502 redundant, missed, and suggested features
In the MOS document you linked to, page 6 under "Absolute indirect" (and NOT absolute indexed !) it says :
This makes it very clear it is not supposed to warp around, and if this document is considered as a reference, the jmp () warp-around is a bug. Now the question is whether it is acceptable to use these as an original 6502 reference.
I'll also add that warp-around within zero-page addressing is clearly documented, since it's mentioned that it always address page zero. So nobody should expect e.g. lda $ff,X to load from page one if X is nonzero and the ZP variant of the instruction is used.
I'm also surprised they mention 65 kilobytes of memory... until I remembered the KB bs KiB difference. I didn't expect this memory-inflated-using-the-wrong-base bull**** to be already present back in the day. It makes especially no sense in this context.
Quote:
The next memory location contains the high order byte of the effective address which is loaded into the sixteen bits of the program counter.
I'll also add that warp-around within zero-page addressing is clearly documented, since it's mentioned that it always address page zero. So nobody should expect e.g. lda $ff,X to load from page one if X is nonzero and the ZP variant of the instruction is used.
I'm also surprised they mention 65 kilobytes of memory... until I remembered the KB bs KiB difference. I didn't expect this memory-inflated-using-the-wrong-base bull**** to be already present back in the day. It makes especially no sense in this context.
Re: 6502 redundant, missed, and suggested features
> This makes it very clear it is not supposed to warp around
I think we may be in violent agreement!
I'm sure the 65 kilobytes is some faint-hearted person being unable to reconcile '64k' with 65536 - shall we suppose a marketing person?
I think we may be in violent agreement!
I'm sure the 65 kilobytes is some faint-hearted person being unable to reconcile '64k' with 65536 - shall we suppose a marketing person?
Last edited by BigEd on Sat Aug 20, 2016 4:43 pm, edited 1 time in total.
Re: 6502 redundant, missed, and suggested features
On the other hand, the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) contains a cycle description (page 141) that shows the wrap around as documented:
Re: 6502 redundant, missed, and suggested features
Arlet wrote:
the MOS MCS6500 Microcomputer Family Programming Manual (January 1976) [...] shows the wrap around as documented:
- "In the JMP Indirect instruction, the second and third bytes of the instruction represent the indirect low and high bytes respectively of the memory location containing ADL. Once ADL is fetched, the program counter is incremented with the next location containing ADH."
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: 6502 redundant, missed, and suggested features
Yes, many datasheets are either wrong or inconsistent - and therefore wrong! I'd take the table that Arlet notes as an extremely subtle admission of the bug! It seems pretty unwise to try to document internal operations in such a detailed fashion. Even if everything is correct, it makes it harder to make subsequent improvements. Having undocumented behaviour can be a feature.
I think everyone here who's spoken up, with the exception of litwr himself, regards that wrap-around as a bug. (But I'm not sure if it's even interesting to keep arguing about what we call something. Much more interesting to talk about how things work and how to program them.)
I think everyone here who's spoken up, with the exception of litwr himself, regards that wrap-around as a bug. (But I'm not sure if it's even interesting to keep arguing about what we call something. Much more interesting to talk about how things work and how to program them.)
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: 6502 redundant, missed, and suggested features
A while ago I compiled a bunch of fast dispatch ideas to codebase64. The full 256-way dispatch I use follows. I don't quite understand why even/odd comes into play, given that if the dispatch table is page-aligned, you'll never be JMPing through $xxFF anyway.
[Edit: Sorry, yeah, this has been covered. Flipped through the thread too fast; I don't visit often enough
]
Code: Select all
asl
bcs :++
; Dispatch 0-127
sta :+ +1
: jmp (table)
; Dispatch 128-255
: sta :+ +1
: jmp (table + $0100)
.align 256
table: .word handler0, handler1, ..., handler127, handler128, ...
Last edited by White Flame on Sat Aug 20, 2016 5:48 pm, edited 1 time in total.