6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed May 22, 2024 4:35 am

All times are UTC




Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next
Author Message
PostPosted: Sun Aug 21, 2016 6:59 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Probably they didn't feel like adding too much detail to the text, when the description below shows exactly what's happening.

Or, the designers intended to use the PC, and get a full 16 bit increment for free, but ran into some problems and switched to ALU increment late in the design process.


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 21, 2016 7:02 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
My suspicion is that by the time that manual was written, some knowledge was lost.

The usual use case for JMP(abs) is surely a single JMP to indirect a vector. It's unlikely that the vector will be placed at the end of a page, so almost everyone will be unaware of the bug.

Edit: I see that indirect threaded code could possibly hit this, if it didn't take care.


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 21, 2016 7:41 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
Just for historical interest, I had a look to see what's the earliest mention of this bug, and the earliest I found is a note by Heinz J Schilling in 6502 User Notes newsletter, issue 15, June 1979.
http://archive.6502.org/publications/65 ... df#page=24
http://www.classiccmp.org/cini/pdf/KIM% ... df#page=24

It would be interesting to hear of an earlier mention.

Edit: I see that Bob Sander-Cederlof's newletter "Apple Assembly Lines" trumpets this bug note in the first issue, October 1980. I'm thinking therefore that it would not already be common knowledge.
Quote:
There is an error in the JUMP INDIRECT instruction of ALL 6500 family CPU chips, no matter where they were made. This means the error is present in ALL APPLES. This fatal error occurs only when the low byte of the indirect pointer location happens to be $FF, as in JMP ($08FF). Normally, the processor should fetch the low-order address byte from location $08FF, increment the program counter to $0900, and then fetch the high-order address byte from $0900. Instead, the high-order byte of the program counter never gets incremented! The high-order address byte gets loaded from $0800 instead of $0900! For this reason, your program should NEVER include an instruction of the type JMP ($xxFF).

(Again, a confusion between the address bus and program counter!)

Edit: downthread, BDD notes that Leventhal's 1979 book describes the misbehaviour, and it looks like the book was finalised no earlier than April 1979.


Last edited by BigEd on Fri Aug 26, 2016 6:40 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 21, 2016 8:49 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8442
Location: Southern California
litwr wrote:
BigDumbDinosaur wrote:
If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

This code requires 16 bit in XR. So it is impossible for 65C02. It may be used with 65816 but requires rare 16-bit index registers mode.

It can be used with the index registers in 8-bit mode as well.  The Eyes & Liechty programming manual specifically shows it in both 8- and 16-bit mode, on page 382 of my old paper copy.  (The .pdf that was distributed until early last year didn't have the same page numbers.)

Quote:
GARTHWILSON wrote:
JMP (<addr>) is not a ZP instruction.

It is not any kind of 6502 absolute address too. JMP (addr) is the special addressing mode for this instruction only.

The 65816 adds JSR (<addr>,X) as well.

Most of the discussion of the JMP (<addr>,X) bug has been focused on using a table, which requires a jury-rigged operation on the NMOS anyway, unlike the CMOS.  Whether using a table or not, you shouldn't have to put assembler directives before the address to prevent the page boundary straddling or add another byte at the beginning of the next page.  It should just work, and on the CMOS, it does.


Quote:
Almost all of its additional instructions have very little importance.  They may make codes slightly faster and smaller but in the completely tiny scale.  I can estimate less than 2% smaller size and less than 1% faster speed.

Having the better instruction set makes it easier to program too.  I can't think of much the CMOS can do that the NMOS can't do at all, but it's a pain on the NMOS when for example you need to save X without disturbing A, since you can't do PHX and PLX, or zero a memory location without disturbing A, since there's no STZ, or need an indirect addressing mode without disturbing X or Y. Going further, the '816 further increases ease of programming beyond the 65c02.

Quote:
I want only BIT #imm and DEA, INA of its instructions.  Even the useful BIT #imm can't set N and V flags. :( Sometimes 65c02 is even slower than 6502. :( The major evil of CMOS 6502 is the occupation of the valuable opcodes by these unimportant instructions.  This had halted the natural development of 6502.  65816 and even 4510 had to follow this heavy and bad inheritance. :( Instead of these "occupants" maybe placed much more powerful instructions: 16 bit arithmetic, POP XY, PUSH XY, work with two (or more) segment registers (it might give short and fast operations with 20 or 24 address bus and the relocation of codes), 16-bit accumulator, maybe another 16 bit accumulator, etc.

The '816 has most of these, plus instructions and addressing modes that are totally impractical to do on the '02 at all.  (Keeping with the topic title), PEA for example is a three-byte instruction that pushes a two-byte literal (its operand), which is typically an address but it can also be data, onto the stack, without affecting the processor registers.  One use of it is to pass data to a subroutine.  For a 6502 to synthesize it requires six instructions, and more if you need to save A.  It's a similar story for PEI.  PER requires 18 6502 instructions to synthesize (and more if you need to save A), 11 of those being in a subroutine.  BRL (Branch Relative Long) and a four-byte (two-instruction) BSR (Branch to SubRoutine, or Branch, Saving Return address) with a 16-bit relative address are valuable, especially in relocatable code.  So are the extra stack-addressing modes and the 16-bit stack pointer permitting much heavier use of the hardware stack for passing lots of parameters, as in C or recursive functions that may run the '02 out of stack space.  Many features of the '816 make it far more suitable for multitasking.

Quote:
I have also to note that we have tens of thousands (or even more) ML programs for NMOS 6502.

If everything has to be pulled down to the weakest version, what's the sense in making any improvements??  If you're writing software for something like the venerable C64, using C64 kernel entry points and hardware, then by all means, avoid the extra CMOS instructions and addressing modes.  It makes total sense.  There never was a CMOS 6510 anyway.  But for new builds, there's no sense in using the NMOS, and I will never go back to it.

BigEd wrote:
Just for historical interest, I had a look to see what's the earliest mention of this bug, and the earliest I found is a note by Heinz J Schilling in 6502 User Notes newsletter, issue 15, June 1979.

Thanks, Ed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 1:02 am 
Offline
User avatar

Joined: Sat Dec 07, 2013 4:32 pm
Posts: 246
Location: The Kettle Moraine
For some time I've thought about a 65c02 upgrade board for the C64.

Someday, I will. It will break a lot of software, but for my purposes, I won't be too concerned.

If I get time this winter, I'd like to put a 65c02 in a PET and see what the consequences are. Unfortunately most of the PET software I have already doesn't work, so I might not find out.

I did have possession of a couple Apple //es that were dealer "enhanced." I don't have any recollection who the dealer was, but apparently they had forged ROMs. Visually they looked just like Apple ROMs should look, but they must have relied on the 65c02. Several times over the years I put known good NMOS 6502s in them and they both failed to boot. Unfortunately I think I sold both of those machines in recent years.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 5:11 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 277
KC9UDX wrote:
I did have possession of a couple Apple //es that were dealer "enhanced." I don't have any recollection who the dealer was, but apparently they had forged ROMs. Visually they looked just like Apple ROMs should look, but they must have relied on the 65c02. Several times over the years I put known good NMOS 6502s in them and they both failed to boot. Unfortunately I think I sold both of those machines in recent years.


The Apple IIe enhanced is *exactly* an Apple IIe where the processor has been upgraded to a 65c02, and the ROMs rewritten to use the new opcodes (and in the process, freeing up enough space to add functionality - notably, the mini assembler).

See https://en.wikipedia.org/wiki/Apple_IIe#Enhanced_IIe.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 9:02 am 
Offline
User avatar

Joined: Sat Dec 07, 2013 4:32 pm
Posts: 246
Location: The Kettle Moraine
rwiker wrote:
KC9UDX wrote:
I did have possession of a couple Apple //es that were dealer "enhanced." I don't have any recollection who the dealer was, but apparently they had forged ROMs. Visually they looked just like Apple ROMs should look, but they must have relied on the 65c02. Several times over the years I put known good NMOS 6502s in them and they both failed to boot. Unfortunately I think I sold both of those machines in recent years.


The Apple IIe enhanced is *exactly* an Apple IIe where the processor has been upgraded to a 65c02, and the ROMs rewritten to use the new opcodes (and in the process, freeing up enough space to add functionality - notably, the mini assembler).

See https://en.wikipedia.org/wiki/Apple_IIe#Enhanced_IIe.

That's what I always believed until a few years ago.

If you have an Enhanced //e, you can take the 65c02 out and put in a 6502, and it will run all day long. Except for the two that I had, and maybe others, but I've not heard of anyone else with this situation.

At least this is what has been reported to me by other owners who were perplexed when I told them that the Enhanced ROM requires a 65c02.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 5:14 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
GARTHWILSON wrote:
It can be used with the index registers in 8-bit mode as well. The Eyes & Liechty programming manual specifically shows it in both 8- and 16-bit mode, on page 382 of my old paper copy. (The .pdf that was distributed until early last year didn't have the same page numbers.)

How the code below can be used with 8-bit XR?
Code:
         lda #index            ;zero-based routine index
         asl a                 ;double it
         tax                   ;now absolute index
         jmp (table,x)         ;goto routine

it may work only with 128 entries instead of 256...
BigEd wrote:
I see that it's inconvenient to have to account for the difference between the 02 and the C02. But, I think it may still be possible to use a 257 byte table which will suit both CPUs?

It is exactly the current code for 6502. One man spent hours trying to find out why this program working at C64 is not working with SuperCPU (65816) - http://www.lemon64.com/forum/viewtopic.php?t=58674&postdays=0&postorder=asc&start=19#top. So I had to add one byte for 65C02 "feature". Of course,
Code:
        ldx divisor
        jmp (divjmp,X)

is better than the code for NMOS 6502. It is 3 bytes shorter and 3 cycles faster. This discussion helps me to realize this. 8) So it is the way I should use to prepare the specialized 65C02 version for BBC Micro. It is the only advantage of 65C02 usable in the spigot but the byte division is not exactly in the main loop so the advantage in speed will be less than ≈0.1%.
GARTHWILSON wrote:
The '816 has most of these, plus instructions and addressing modes that are totally impractical to do on the '02 at all. (Keeping with the topic title), PEA for example is a three-byte instruction that pushes a two-byte literal (its operand), which is typically an address but it can also be data, onto the stack, without affecting the processor registers. One use of it is to pass data to a subroutine. For a 6502 to synthesize it requires six instructions, and more if you need to save A. It's a similar story for PEI. PER requires 18 6502 instructions to synthesize (and more if you need to save A), 11 of those being in a subroutine. BRL (Branch Relative Long) and a four-byte (two-instruction) BSR (Branch to SubRoutine, or Branch, Saving Return address) with a 16-bit relative address are valuable, especially in relocatable code. So are the extra stack-addressing modes and the 16-bit stack pointer permitting much heavier use of the hardware stack for passing lots of parameters, as in C or recursive functions that may run the '02 out of stack space. Many features of the '816 make it far more suitable for multitasking.

You'd written about 65816 instructions. They are good and powerful indeed. I can only think that they can be better and faster. I bet for the segment registers, for example. I also bet for Z register of 4510 - it is much better the plain (zp) mode.

GARTHWILSON wrote:
BigEd wrote:
Just for historical interest, I had a look to see what's the earliest mention of this bug, and the earliest I found is a note by Heinz J Schilling in 6502 User Notes newsletter, issue 15, June 1979.

Thanks, Ed.

6502 development was beheaded so without "political cover" it was easily influenced by men who did not think primarily about this development but might have other aims. :(

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 9:24 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
litwr wrote:
IMHO LSR4 instruction would be worth to mention too. 6502 requres 4 LSR to get the higher nibble. The lower nibble can be get by AND #15. So what's the purpose of SWN (swap nibbles)? It is better to have z80 RLD which allows to make fast 4-bit shift of the sequence of bytes.

I just use tables for whatever function that's missing. You can often combine it with some operand (ORA, AND) to shave off cycles.

For example 4bit*4bit math (x*y):
Code:
TXA
AND #$0F
ORA ShiftLtoH,Y
TAX
LDA MultTable,X


14 cycles. Two 256 byte tables. And it disregards the upper 4 bits (if they are present).

Without the shift table:
Code:
TXA
ASL
ASL
ASL
ASL
STA $zp1
TYA
AND #$0F
ORA $zp1
TAX
LDA MultTable,X


26 cycles. One 256 byte table. If you had a swap function you could shave off 6 cycles, but it would still be faster with the extra table.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 22, 2016 10:06 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
BigEd wrote:
I see that it's inconvenient to have to account for the difference between the 02 and the C02. But, I think it may still be possible to use a 257 byte table which will suit both CPUs?

Depending on how crucial speed is, I would put the compatibility into the code, not into the data structure. If the index is always odd, then a simple DEX brings it back into 0-254 range with even numbers and there's no page overflow.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2016 4:05 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Klaus2m5 wrote:
However, it is bold to tie the assumption of a bug in the 65c02 to the very special case that you are talking about. It is like calling the missing undocumented opcodes in the 65C02 a bug. I am sure, that there are much more coders having been bitten by the lack of the carry into the upper byte of the indirect address, than there are coders facing the same problem as you.

Thanks again for working with my code. :) However I don't see any "the very special case" - it is just an ordinary code, natural for this case. I can even assume that there are no coders at all who were "bitten by the lack of the carry into the upper byte of the indirect address". Could you show any practical example which shows the situation in reverse? IMHO The problem of JMP (xxFF) is completely contrived and artificial.

GARTHWILSON wrote:
If everything has to be pulled down to the weakest version, what's the sense in making any improvements?? If you're writing software for something like the venerable C64, using C64 kernel entry points and hardware, then by all means, avoid the extra CMOS instructions and addressing modes. It makes total sense. There never was an NMOS 6510 anyway. But for new builds, there's no sense in using the NMOS, and I will never go back to it.

My point is in fact that the writers of this enormously big amount of software never complained about JMP (xxFF). Another my point that the advancement from NMOS 6502 to CMOS was very tiny and it has even several back steps (the creation of JMP (xxFF) incompatibility, the occupation of valuable opcode space by the unimportant instructions, slow down the redundant BCD mode, ...) that it can't be called the advancement at all.
kakemoms wrote:
26 cycles. One 256 byte table. If you had a swap function you could shave off 6 cycles, but it would still be faster with the extra table.

I'd only written that LSR4 maybe almost always used instead of SWN. SWN might be realized easier though...
White Flame wrote:
Depending on how crucial speed is, I would put the compatibility into the code, not into the data structure. If the index is always odd, then a simple DEX brings it back into 0-254 range with even numbers and there's no page overflow.

The fastest speed for different platforms is the aim of the spigot project. :D

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2016 5:03 am 
Offline
User avatar

Joined: Sat Dec 07, 2013 4:32 pm
Posts: 246
Location: The Kettle Moraine
Nobody ever complained, because everyone was aware of the behaviour, and acted accordingly.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2016 5:15 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8190
Location: Midwestern USA
litwr wrote:
IMHO The problem of JMP (xxFF) is completely contrived and artificial.

I was aware of the JMP ($xxFF) nearly 40 years ago and, in fact, tripped over it back then. It is a real bug and everyone who has professionally developed for the 6502 (not the CMOS hardware) knows it's a real bug. Using Commodore as example, in page $03 are the kernel and BASIC indirect vectors. CBM page-aligned those vectors precisely because of JMP ($xxFF). They knew it was a bug back when the paint on the PET 2001 was still drying.

Methinks you are beating a dead horse. :)

Quote:
GARTHWILSON wrote:
If everything has to be pulled down to the weakest version, what's the sense in making any improvements?? If you're writing software for something like the venerable C64, using C64 kernel entry points and hardware, then by all means, avoid the extra CMOS instructions and addressing modes. It makes total sense. There never was an NMOS 6510 anyway. But for new builds, there's no sense in using the NMOS, and I will never go back to it.

My point is in fact that the writers of this enormously big amount of software never complained about JMP (xxFF). Another my point that the advancement from NMOS 6502 to CMOS was very tiny and it has even several back steps (the creation of JMP (xxFF) incompatibility, the occupation of valuable opcode space by the unimportant instructions, slow down the redundant BCD mode, ...) that it can't be called the advancement at all.

I'm sure Apple didn't agree with you about the 65C02 when they started using it in place of the NMOS part. In fact, they disagreed even more when the 65C816 found its way in the Apple ][gs.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2016 6:34 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8442
Location: Southern California
litwr wrote:
My point is in fact that the writers of this enormously big amount of software never complained about JMP (xxFF).

On the contrary, it was a huge troubleshooting problem to a few early on, before it was documented.  After it was documented, people knew to take measures to keep the indirect address from straddling the page boundaries.  The CMOS version fixed all the NMOS bugs.

Quote:
Another my point that the advancement from NMOS 6502 to CMOS was very tiny and it has even several back steps (the creation of JMP (xxFF)

Just stop, please.  It has been made clear by several knowledgeable people here that NMOS had a big bug in JMP (xxFF).  It does it wrong!

Quote:
the occupation of valuable opcode space by the unimportant instructions,

Even if they were unimportant (which I totally disagree with), what does the taking of "valuable" op code space matter?  The op code table was nowhere near full in the CMOS either.

Quote:
slow down the redundant BCD mode,

again to fix an NMOS bug, which was that its flags were not valid after a decimal-mode operation!

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2016 7:17 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
litwr wrote:
Could you show any practical example which shows the situation in reverse?

Fair enough. First let's notice that in your application the 16-bit destination values accessed by JMP (abs) are stored together, adjacent to one another in a table. But in other circumstances a programmer may wish to store the destination values separately from one another, with other fields (of varying length) in between. IOW you might have a destination value, then one or more unrelated fields of arbitrary length, then another destination value, then some other unrelated fields, and so on. It's not uncommon. The dictionary used by Forth is an example of this.

When creating a mixed data structure like this it's desirable to freely allocate space exactly as needed. But if you allocate space exactly as needed then occasionally a destination value will straddle a page boundary. With NMOS 6502 this is a dangerous anomaly.

In the bad old days folks were forced to check for this condition every time before allocating space for a destination value. If the next available space happened to be at $xxFF then remedial action was required, such as sticking an extra, unused byte into the structure -- ie, wasting the byte at $xxFF -- to ensure proper alignment. Does that sound messy? It was!

Now we have the 'C02, and no anomalies. JMP (abs) just works. :mrgreen:

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: