NMOS 6502 Reset bug

Dr Jefyll · Post by **Dr Jefyll** » Sun Sep 11, 2011 4:53 pm

In one of our other threads we found ourselves discussing a somewhat mysterious reset bug reported to exist in NMOS 6502s. I opened this new thread in order not to interfere with the original discussion, which pertains to a Microbot robot arm. You can read preliminary comments regarding the reset bug beginning with the 2nd-last post on this page.

I wrote:

The NMOS 6502 (unlike most CPUs, including 65C02) is prohibited from long periods with RST true, and therefore a one-shot timing circuit is usually provided.

Later my curiosity was aroused when I learned that Data Sheets make no mention of the bug. (I myself learned about it only informally, thanks to references by Garth Wilson.)

Garth Wilson wrote:

The CMOS 6502's reset timing is not critical since it does not have the heating problem that the NMOS 6502 had whereby the uP could be destroyed if you kept the reset line true for too long. IIRC, the maximum recommended reset time on those was 50 or 100 ms.

Heating problem??! With luck, in this new thread we'll be able to get to the bottom of the matter. NMOS chips have less relevance nowadays perhaps, but there's an interesting puzzle here and probably a chance to learn.

-- Jeff

falcon5252 · Post by **falcon5252** » Sun Sep 11, 2011 6:08 pm

Excerpt from "The rise of MOS Technologies" Concerning the cost of modifying a FAB when you own the clean-room..

""Using MOS' Engineers and facilities, Commodore was able to produce prototype chips in days rather than months and at practically no cost. Other companies, like Atari, Apple, and Osborne would have to spend tens of thousands of dollars and wait weeks or months for new chips. MOS was critical to Commodore's fast paced style of business in the 1970's and early 80's. Commodore Engineers could get a chip produced without so much as formal paperwork. When Bill Herd (the Plus/4 and C128 Engineer) quit and went to work for a chip small design shop, he was shocked that he could not simply order up new prototype chips. ""

Also note the pulse width on the commodore C64 pulse reset circuit is 500ms according to the text on the link below,, And being they owned MOS and had the inside info, It kind of verifies that the Bug still existed at least in MOS6502 or actually the MOS6510 when they were designing the C64..

Weather or IF the bug was ever corrected and when in the NMOS 6502's either by MOS or any of the cross licensed manufacturers is what we are trying to determine...

As I stated in the other thread,, The robot Arm I own has been running 29 years in a School or University with a 2mhz Rockwell R6502AP with a simple non pulsed rest circuit without smoking the CPU.. Also let me add here the reset circuit is a 4.7k charging a 47uf Cap,, And its quite hard to calculate the time constant because the supply voltage is rising from 0v on its way to 5v and reset toggles L to H at the gate threshold which varies somewhat from various chip brands and temp of the 74LS14,, On a very rough guess I'll say reset is only being held low for about 100 to a 125 ms... Which is far less time than the actual commodore pulsed reset at 500ms ,, AAAAAA Drain Bramage..

http://www.zimmers.net/anonftp/pub/cbm/ ... c64-05.gif

Dr Jefyll · Post by **Dr Jefyll** » Sun Sep 11, 2011 8:11 pm

Thanks for the excerpt, falcon5252. Is "The rise of Mostek" something we can read online?

Regarding the reset bug, the most provoking aspect of the puzzle is the overheating thing -- a hardware fault so different from "soft" errors (such as missing flag updates in Decimal Mode, etc.) Here is one possible explanation.

- at the silicon level, RESET is just an interrupt, much like NMI or IRQ. For the sake of economy, hardware is shared among these 3 functions as much as possible.

- RESET produces the same pattern of bus activity as IRQ and NMI. RESET actually results in stack access cycles as if pushing a return address. Perhaps the very earliest in-house 6502 prototypes actual did push a return address.

- But RST has no need to save a return address. In fact it may be considered undesirable that RST should write to memory.

- MOS opted to eliminate the push of the return address. The easiest way to make the change was to keep the stack access cycles, but simply let those cycles be reads instead of writes.

- Somehow they patched it so the R/W pin remains high during RST's dummy stack pushes. This satisfies the requirement that no write occur. But if the R/W pin is the only change they made then the system would experience bus contention. During the dummy push, does the CPU try to drive the external data bus (despite R/W being high)? Memory will be driving the data bus too, resulting in excess current flow -- and heating -- of CPU and memory ICs. Or, perhaps it's only internal buses within the 6502 that butt heads and draw excess current.

Notice that, with RST held low, the 6-cycle interrupt response sequence (including the dummy stack pushes) can be expected to repeat indefinitely. The amount of heating will be substantial, since the dummy pushes account for 50% of all bus activity (3 cycles out of every 6-cycle sequence; along with the return address the Status register is also "pushed").

What would be nice is to run some sort of concrete test. Is anyone willing to try a hardware experiment using an actual 6502? Can the Visual6502 be coaxed to reveal the truth? All comments (and offers of assistance) are appreciated!

-- Jeff

falcon5252 · Post by **falcon5252** » Sun Sep 11, 2011 8:25 pm

The Rise of MOS Technology,, Some of the links sited in the story are dead but its still a good read..
http://www.commodore.ca/history/company ... nology.htm

External contention is easy to test,, just poke the address and data with a scope or logic probe while the reset is low .. but still as long as the other chips on the board are in reset they are also prohibited from buss access to,, but being address and data on the nmos cannot be tri-stated some signal has to be on the buss probably all Low or all High at-least on the address buss because you really don't want the address decoders responding,, The answer to that should be in at-least one of the many datasheets,, There should be no buss contention except some not so well thought out user designed circuits,,

As for the internal circuitry I don't know enough about it to really comment on it, Other than it makes some sense in that if you keep accessing the same register endlessly the duty cycle on those transistors goes way up an could cause spot heating on the wafer, Maybe enough to damage it...

Maybe a email to Western Design Center,, Might be some of the original chip designers that left MOS still around that could settle this for us.. Bill Mensch the founder of WDC came straight from MOS Technologies with full ownership of all the 6502 hardware ( I stand corrected"MOS Tech not Mostek) Its been a long time sense i saw a chip with either lable on it somehow my brain melded the two )

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 12, 2011 1:34 am

falcon5252 wrote:

Excerpt from "The rise of Mostek"

Mostek != MOS Technology. Two different companies.

GARTHWILSON · Post by **GARTHWILSON** » Mon Sep 12, 2011 3:08 am

I have not found any more info on this bug, but ioncannon's comment does support what I've been saying. I'm going on what I read in the 1980's, which I cannot find again. I have used only the CMOS version for the last 25 years, and there has been no reason to go back to the inferior NMOS one unless I had been doing a lot with the C64 which had the 6510 which was not available in a CMOS version. I don't know offhand when they quit making the NMOS 6502. I don't want people to run into the reset hardware problem, but if it is indeed a potential problem, they ought to know about it.

I can't put much credibility into an argument that a company could easily make non-trivial changes to the masks easily though, because to my knowledge, the only bug that ever got fixed in NMOS was ROR bug which performed entirely wrong the first year such that they didn't even list it in the instruction set. It was really messed up—in all address modes—and did not have a workaround. The worst of the other many bugs may have been the JMP(xxFF) where the page did not increment before reading the second operand byte, and in all the years the NMOS 6502 was made by different companies, even that one never got fixed.

falcon5252 · Post by **falcon5252** » Mon Sep 12, 2011 3:38 am

Garth.. It does say in the article "The rise of MOS" That once Commodore purchased MOS all interest turned to production and R&D was discouraged.. That's why most of their R&D department jumped ship... Including Bill Mensch one of the original designers of the 6502 who started WDC after leaving MOS,, So the lack of bug fixes can probably be blamed on Commodore management rather than the inability to do it...WDC went on to fix many bugs but by that time the CMOS version was already in the works.. And with the original design team gone MOS now under Commodore, They just kept pumping out the original 6502 design.. After all they were still selling millions of chips even with the bugs

Dr Jefyll · Post by **Dr Jefyll** » Mon Sep 12, 2011 3:57 am

Garth Wilson wrote:

I have not found any more info on this bug, but ioncannon's comment does support what I've been saying. I'm going on what I read in the 1980's, which I cannot find again.

My proposed theory also supports what you've been saying, and my initial doubts/confusion have faded. Thank you for making us aware of the bug. As for whatever may've been written about it in the 1980s, additional info is nice. But if indeed I've hit on the correct explanation (see 3rd post in this thread) we'll be able to confirm the matter for ourselves fairly easily.

falcon5252 wrote:

There should be no buss contention except some not so well thought out user designed circuits

Hmmm, yes... the key word is "should" !

But here (I believe) is a case where the professionals have slipped up.

Normally at any given time all the drivers on a bus are tri-stated (disabled) except one. If more than one are enabled simultaneously it represents a collision of sorts; ie, bus contention. A potential short-circuit exists, for example if the pull-up transistor of one driver tries to pull a bus line high while the pull-down transistor of another driver tries to pull the line low. The result is a large drain on the supply, with current flowing from VCC through the pull-up to the bus line then through the pull-down to ground.

Contention on an external bus is probably more serious since the current capability of the external-bus-driving transistors is greater than that for transistors driving internal buses. Also, correct me if I'm wrong, but with NMOS logic, don't they use weak, passive pull-up transistors for the internal logic? That would greatly reduce the current flow -- to safe values, in fact. (Mebbe I was wrong to suggest the bug could be the result of contention on internal buses. Contention on the external bus seems a more likely bet.)

-- Jeff

BigEd · Post by **BigEd** » Mon Sep 12, 2011 11:03 am

Dr Jefyll wrote:

In one of our other threads we found ourselves discussing a somewhat mysterious reset bug reported to exist in NMOS 6502s.

For ease of reference, I'm going to bring in some quotes from that thread, and from messages Garth referenced. I'll do that at the bottom of this post.

The idea of the bug is that NMOS parts, or early NMOS parts, suffered a heating problem or even suffered damage if RST was active for too long. The information has not been found in any datasheet. Garth recalls it as being true, and ioncannon believes he saw a heating problem this year (**)

Checking Acorn's BBC micro and Electron schematics, neither make any effort to limit the length of RST. I have a BBC micro here, with a Rockwell NMOS 6502 with datecode 8402. I held RST for well over ten seconds, with no heating problem and no evidence of ill effect. Of course, that's a late datecode.

GARTHWILSON wrote:

I can't put much credibility into an argument that a company could easily make non-trivial changes to the masks easily though

I've read(*) that MOS had a competitive advantage in that they could make trivial changes to their masks, where other companies would have to re-do one or more masks. That's more expensive. It's not unusual to seek fixes which affect the smallest number of masks, then as now.

GARTHWILSON wrote:

Making new masks for a design change was very expensive. It would not make sense for them to fix this bug without fixing several others; but those were never fixed on the NMOS 6502.

Indeed, new masks are expensive, but a single mask is cheaper than a full mask set, and patching a mask is cheaper still. Fixing the ROR bug might have been possible as a mask patch, and so might this RST bug (if was real) whereas other fixes might not have been possible. So I don't think you can argue they would fix all or nothing. They'd fix the things which were hurting most, according to the cost of each fix - as with all engineering.

falcon5252 makes the point that an in-house fab can change the process to improve yield, and indeed the parent company can design according to the process capability. That might well have helped Commodore once they'd acquired MOS, and MOS having design and process very close would have helped them get good yields and performance faster, compared to a hypothetical company where design and process didn't communicate.

Dr Jefyll wrote:

Contention on an external bus is probably more serious since the current capability of the external-bus-driving transistors is greater than that for transistors driving internal buses. Also, correct me if I'm wrong, but with NMOS logic, don't they use weak, passive pull-up transistors for the internal logic?

Right, sort of! There are several possible design techniques. Almost all the internal logic of 6502 uses depletion-mode pullups with the gate controlled by the output, which are always on, but a bit more strongly on when driving a high value. The larger internal drivers have the pullup's gate controlled by a logic signal, so there's a potential for a logic bug. But even those pullups are usually quite weak. The only strong pullups on the chip which could cause contention are the ones which drive the databus pins. It's not impossible that the databus could be driven by the 6502 even during a read cycle, if there were a logic bug on the chip - and such a bug would need fixing!

The suppression of the stack writes during reset is something I still haven't tracked down in visual6502. Bear in mind that visual6502 is a particular NMOS part, in fact it had date code 8316 and is a Revision D. So, not an early part. (Although, it's possible that Rockwell, say, had a second source licence and never updated their mask copies other than to shrink them.)

Now, the references, for reference:

GARTHWILSON wrote:

At viewtopic.php?t=1759&start=4 , or the fifth post of the topic General Discussions-->Beginner in digital circuitry which is at viewtopic.php?t=1759 , after I mentioned the NMOS RST chip-heating bug, ioncannon wrote on Feb 1 of this year, "Ahh so it wasn't just me, whenever RST was grounded, the chip would heat up significantly." So far I have not found anything more on it though. My possibly inadequate search line was "6502 NMOS RST bug".

Making new masks for a design change was very expensive. It would not make sense for them to fix this bug without fixing several others; but those were never fixed on the NMOS 6502.

ioncannon wrote:

...

GARTHWILSON wrote:

All good answers from BDD, but I think he's mostly thinking about the CMOS 6502. I don't think the NMOS one had a Schmitt-trigger input for RST\, so it needed a clean edge with a quick rise time. I can't find anything about it in the Rockwell or Synertek data sheets right now though. I do know that the NMOS ones had a die-heating problem with leaving the RST line down more than a tenth of a second or so. 50ms was what a lot of RST circuits were made to deliver. You will need more than just a switch, since the switch will produce a lot of bouncing instead of a clean RST signal. If the last low time is at least a few clock cycles long, you might be in luck, if the rise time is fast enough. Use a CMOS one and you won't have to worry about it.

...

The NMOS 6502 had several bugs and quirks that got fixed in the CMOS one. ...

Ahh so it wasn't just me, whenever RST was grounded, the chip would heat up significantly. Should 500khz clock be ok? I read it's due to DRAM data inside the CPU degrading if the chip is not clocked fast enough.

Cheers
Ed

ps. Edit: added footnote:
(*)

falcon5252 wrote:

The Rise of MOS Technology,, Some of the links sited in the story are dead but its still a good read..
http://www.commodore.ca/history/company ... nology.htm

yes, it's a good read! It also contains the statement "MOS figured out a process to repair Masks as they are reduced" which echoes the similar statement on wikipedia, attributed to a conversation with Bill Mensch:

Quote:

MOS had a secret weapon: the ability to "fix" its masks

(I recommend also "On the Edge" - excerpt here)

(**) would be nice to know the datecode and manufacturer of the CPU - ah, I note it's a Nintendo, which is NMOS and contains a presumed unlicensed mask copy of a 6502 with 5 datapath transistors removed as a single-mask hack. It has some yield fixes which visual6502's revD does not have, but they could be Nintendo's. As it's a Nintendo chip, the reset problem might not even be in the 6502 section.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 12, 2011 3:47 pm

GARTHWILSON wrote:

I can't put much credibility into an argument that a company could easily make non-trivial changes to the masks easily though, because to my knowledge, the only bug that ever got fixed in NMOS was ROR bug which performed entirely wrong the first year such that they didn't even list it in the instruction set. It was really messed up--in all address modes--and did not have a workaround. The worst of the other many bugs may have been the JMP(xxFF) where the page did not increment before reading the second operand byte, and in all the years the NMOS 6502 was made by different companies, even that one never got fixed.

Also, there was the case where if BRK was executed at the same time a hardware interrupt occurred, the BRK instruction was completely ignored. That could have been straightened out as well if MOS (aka CGS) or the other sources were game to do so. That they didn't was quite telling, in my opinion, and partially explained why Apple decided to built the ][e around the 'C02.

My take on this apparent reset errata is like that of considering what happened if you ran a Pontiac straight-eight engine at too high an RPM: academically interesting to historical trivia buffs, but irrelevant in the context of current technology. My not-so-humble opinion is that contemplating what might or might not happen to an NMOS 6502 when reset is held down for a long time is in the same realm as wondering if that Pontiac straight-eight will go or blow at 6000 RPM.

falcon5252 · Post by **falcon5252** » Mon Sep 12, 2011 5:40 pm

Nintendo: The altering the CPU mask was probably done to discourage reverse engineering and cloning their product.. They might have even changed a couple of Op-Codes.. The only other reason I can think for changing it, Was they needed to overclock it and that part of the circuit was preventing it.. It's only speculation on my part but I thought I'd throw it out here.. I really can't Imagine to many other reasons to have a CPU with a custom Mask.. All the Nintendos I had opened up in my life and I never glanced at the crystal to see what clock they were running..

BigEd · Post by **BigEd** » Mon Sep 12, 2011 5:54 pm

Hi falcon,
The supposition is that Nintendo went to the trouble of changing the poly layout (to remove the 5 transistors) so they wouldn't be infringing the patent on BCD correction. They were not seeking to protect their own invention, but re-using someone else's.

If you look at the 2A03 chip photos, you'll see that the rest of the chip has a completely different design style -the 6502 is like a photographic insert.

(Note that the 2A03 did need its own unique custom mask set: there's no question of re-using 6502 masks, only of re-using the mask layout)

Cheers
Ed

falcon5252 · Post by **falcon5252** » Mon Sep 12, 2011 6:54 pm

AAAH to avoid cross licensing.. Missed that one..

GARTHWILSON · Post by **GARTHWILSON** » Mon Sep 12, 2011 8:46 pm

Quote:

I've read(*) that MOS had a competitive advantage in that they could make trivial changes to their masks, where other companies would have to re-do one or more masks. That's more expensive. It's not unusual to seek fixes which affect the smallest number of masks, then as now.

Depending on the nature of the change, I'm sure all (or nearly all) the masks in the "pile" would have to be changed. Still, some changes might be pretty simple, while fixing certain bugs might require opening up more space somewhere out in the middle, moving nearly everything, which almost means starting over, back when they were doing it by hand with huge films (which, BTW, is how I laid out my first PC boards, using Bishop Graphics products).

BigEd · Post by **BigEd** » Mon Sep 12, 2011 9:32 pm

Well, of course, a whole variety of changes would need multi-layer fixes - and that's one reason why bugs remain. But it's also true that one and two layer fixes are possible, surprisingly often - and the economics of the situation are a strong motivator! The ROM/PLA approach can be helpful here (many sites where a transistor can be usefully added or removed as a single rectangle of active), as is the possibility of using the depletion mask to knock out individual pull-downs (as was used to fix bugs in the z80.)

Cheers
Ed