6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue May 14, 2024 3:01 pm

All times are UTC




Post new topic Reply to topic  [ 30 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Sat Aug 25, 2018 10:39 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
One additional information out of that SYNERTEK Data Book 1985: the propagation delay varies with process variations. The factor is 0.67 (best case) through 1 (normal) to 1.5 (worst case). But there is no clue if at all and how much these process variation may vary across an entire chip.

But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.



Regards
Arne


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 26, 2018 8:13 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
GaBuZoMeu wrote:
But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.

Indeed - only rather recently, for very large and very fast chips in very small processes does variation across the die need to be accounted for. Back in the day, the biggest variation would be lot to lot, then wafer to wafer, and then site to site. (Probably this is still true, but we add edge to edge too.)


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 31, 2018 4:35 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
BigEd wrote:
GaBuZoMeu wrote:
But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.

Indeed - only rather recently, for very large and very fast chips in very small processes does variation across the die need to be accounted for. Back in the day, the biggest variation would be lot to lot, then wafer to wafer, and then site to site. (Probably this is still true, but we add edge to edge too.)


Which is probably tested in some way for all components as they are usually supplied with a speed grade; the best being the most expensive. At least for FPGA/CPLDs.

So that is another reason to do speed- and stress-testing: Component cost. If you want to make hundreds, even a few dollars will add up to a considerable savings. Now, if you want to make thousands...

I know for a fact that many people who design for volumes spend weeks or months trying to press a particular design into a smaller component. Can one use a smaller Lattice, Xilinx or Intel fpga, the savings will be considerable.

Now, if your task is to make it into an ASIC, this becomes even more important, and testing, re-testing and analysis becomes ridiculously expensive and time-consuming. One of my friends work with implementing designs into layouts for Microchip (I think it was for one of the smaller AT chips), and his whole summer has been spent in moving different parts around to improve stability and get pins and parts into their right location. If you move to larger components (like the larger ARMs), the simulation (of the component) itself becomes very difficult, and the simulation speed goes way down (maybe 20Hz or something). The solution is then usually to make simplified models of different parts to see if they are stable, but in the end, the models have to be put into hundreds of specialized FPGAs (look at Mentor Graphics servers) to get a few MHz "real" speed.

All-in-all, its very limited what kind of code and configuration an ASIC producer can simulate, while a FPGA producer don't even know what kind of logic that will be within the component. Add up signal delays and give an estimate? Sure.. but it has nothing to do with what you will really experience when you run the core, component, board or system. Stress-testing is the only way.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 01, 2018 7:00 pm 
Offline
User avatar

Joined: Thu May 14, 2015 9:20 pm
Posts: 155
Location: UK
When testing existing semiconductor IC's, especially when overclocking, or operating at supply voltage extremes or at temperature extremes, or a combination of these, it is well worthwhile having the chips current continuously monitored.

When the supply voltage and the current are taken into account, you can then tell if the semiconductor is running at a greater temperature, or it it is in the normal range, far quicker and earlier than waiting for the encapsulation to get warm or hot.

Mark


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 01, 2018 3:08 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
I'll soon be back to extend my 6502 and have been wondering if there is a good method to do error correction on instruction execution.

E.g. if an instruction fails/misses due to high speed bit errors, it will crash the program (and halt the processor) in most instances. If the failed state can trigger an early fail-flag (or alike), it would be possible to re-execute the instruction before the resulting error can trigger a crash.

Anyone familiar with such methods? I know one can do error checking on memory fetch with extra bits, but I was hoping for error checking on actual core performance. The only way I can think of is to use two parallel cores... but for me it sounds like an inefficient way.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 01, 2018 7:15 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
I searched for "error-detecting ALU" and got a few hits. But you'd need to instrument your ALU, and your bus traffic, and your address calculation hardware. Seems like a tall order! Even with a pair of cores, I'm not sure how you'd know that only one of them is going to fail. You'd need a safe margin on one - but how big?


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 01, 2018 7:50 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
I was introduced to the idea 30 years ago. The man was talking about having three computers running in parallel, and if one disagreed with the other two, it was automatically taken offline. The example he gave was a factory where if something crashes, you could ruin a million dollars' worth of perfume. I've worked on some space stuff, although it was for unmanned spacecraft where failure only risks a lot of money but no one's life, and there they never do the multiple processors or microcontrollers or memory for any job in the spacecraft. The biggest risk is just software bugs. If you have multiple hardware sets running the same software bugs, guess what'll happen.

To prevent high-speed errors, just run the clock speed up gradually until you start observing problems (which will probably mean a crash), measure the frequency, and back it down for some margin. That's what I did with my workbench computer. It started having problems at just a hair over 7MHz, so I run it at 5MHz and have never had a problem.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 01, 2018 11:47 am 
Offline
User avatar

Joined: Thu May 14, 2015 9:20 pm
Posts: 155
Location: UK
In the industry where I work, for critical applications or for redundancy (or both) we run dual or triple MPU systems.

These take one of three forms depending on requirements. But all are run well within the manufacturers specifications. So no overclocking. Reliability being very important.

The non-safety critical systems typically have two independent MPU cards (each has on board EPROM and SRAM) with a watchdog system. They both run the same software. Both MPU cards run all the time. But only one able to control and write to the bus to the rest of the system. In the event of the current in-use MPU failing to toggle it’s watchdog, the other MPU will automatically be promoted to being the in-use MPU and the faulty MPU will loose control. We can also manually switch between them.

For the safety critical systems, two MPU systems are present in a single cased module. Both MPUs are normally always online processing all the data. If either of them disagrees with the other on the system outputs, a comparison system will blow the internal supply fuse taking both off-line and putting all the outputs into a safe state. Note that both MPUs use the same software.

For the safety critical systems where redundancy is important, three separate modules are used, but all are interlinked. Each module (which fit in a 19 inch rack) contains a single MPU, RAM, control circuits, custom network interfaces and power circuitry. On the front there is a slot for the EPROM module. Similar to the system described above, a comparison system is used. But it is far more complex. The principal is that in the even of a fault, the two ‘good’ modules will blow the internal fuse of the ‘defective’ module and take it off-line. The two remaining modules will then continue to work, but in a dual configuration system only. Again, they all use the same software.

The software used in the safety critical systems has been carefully designed. As it is processing data predefined within a fixed limited format, it is possible to use the same program in many systems, even though the real world data is different. So each system contains EPROMs that contain a table that the system uses to process the real world data. The data in the table determines which logic is applied in response to the real world inputs, in order to decide what (if any) response will be output back to the real world.

This system enables simulation systems run on far more capable hardware to be used to test the logic of the data in the tables, with all ‘real world’ inputs and outputs also being simulated.

The downside is that these various systems are not particularly fast :(

Mark


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 02, 2018 1:20 pm 
Offline
User avatar

Joined: Fri Dec 12, 2008 10:40 pm
Posts: 1001
Location: Canada
GARTHWILSON wrote:
To prevent high-speed errors, just run the clock speed up gradually until you start observing problems (which will probably mean a crash), measure the frequency, and back it down for some margin. That's what I did with my workbench computer. It started having problems at just a hair over 7MHz, so I run it at 5MHz and have never had a problem.


I guess the question then becomes "What do you run to do that speed test?". There maybe some specific code sequences that might fail at X Mhz, but generally the system will run safely at 1.2X Mhz.

_________________
Bill


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 06, 2018 9:08 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
BillO wrote:
I guess the question then becomes "What do you run to do that speed test?". There maybe some specific code sequences that might fail at X Mhz, but generally the system will run safely at 1.2X Mhz.


Exactly.

I am thinking about a mad monkey approach. But to predict its behavior, a simulator would be required.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 07, 2018 1:37 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
Quote:
But to predict its behavior, a simulator would be required.

I don't think a simulator will be of any value, since the point is to see what a particular part or set of parts can do, when there are variations from one production lot to another, one wafer to another, even different parts of the same wafer, in addition to the usual matters of voltage and temperature. The infamous WDC 65C51 worked fine in simulation, but the simulator did not catch a race condition that produced a bug in actual units.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 07, 2018 8:37 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
You do need some way to know whether or not your test program is misbehaving, and a simulator is one way to do that. Another is self-checking code (Klaus' testsuite is self-checking and might be useful but is not designed as a test for running at speed.) Another is some way to compare two runs - an unstressed run and a speed-challenge. You do need to be sure that your unstressed run has plenty of margin.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 07, 2018 2:03 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
Yes, the simulation was more as a way to determine what the code would do (or should do). If you have a way to change the speed of the machine, a stressed/unstressed run with comparison on resulting behavior would probably be faster. Still, the code would have to be analysed to understand its intended behavior (e.g. since a mad monkey would make completely random code at all times).


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 07, 2018 3:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
As an alternative to clocking a design ever-faster, it's worth considering instead running it at reduced voltage - that might be more convenient. (Another alternative is running it at higher temperature - inside a temperature-controlled oven, for example. But that's more difficult and probably more risky too.)

As a very rough indication, WDC rate their 65C02 at:
2MHz at 1.8V, 4MHz at 2.5V, 8MHz at 3V, 14MHz at 5V


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 07, 2018 11:47 pm 
Offline
User avatar

Joined: Thu May 14, 2015 9:20 pm
Posts: 155
Location: UK
But surly, before you start testing, you have to define what the objective is.

So if the objective is to run a MPU as fast as possible in a simple real world application, the code that is being run should at the very least be the intended application code (assuming that the code already exists and is known to be reliable and stable at "normal" clock speeds).

If the objective is to test a MPU as much as possible by getting it to execute as many of it's instructions in it's instruction set as possible, then you need a program that does just that, and which only outputs a result "signal" when it has completed all the tests.

The more difficult question may be how to determine that it is the MPU that has failed, compared to the glue logic/support circuitry/RAM/ROM...

Mark


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 30 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: