Page 2 of 2

Re: Stress-testing a 6502 - suggestions?

Posted: Sat Aug 25, 2018 10:39 pm
by GaBuZoMeu
One additional information out of that SYNERTEK Data Book 1985: the propagation delay varies with process variations. The factor is 0.67 (best case) through 1 (normal) to 1.5 (worst case). But there is no clue if at all and how much these process variation may vary across an entire chip.

But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.



Regards
Arne

Re: Stress-testing a 6502 - suggestions?

Posted: Sun Aug 26, 2018 8:13 am
by BigEd
GaBuZoMeu wrote:
But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.
Indeed - only rather recently, for very large and very fast chips in very small processes does variation across the die need to be accounted for. Back in the day, the biggest variation would be lot to lot, then wafer to wafer, and then site to site. (Probably this is still true, but we add edge to edge too.)

Re: Stress-testing a 6502 - suggestions?

Posted: Fri Aug 31, 2018 4:35 am
by kakemoms
BigEd wrote:
GaBuZoMeu wrote:
But thinking about how a chip is manufactured, I believe the variations across a single chip are small as there are hundreds of them on a single wafer.
Indeed - only rather recently, for very large and very fast chips in very small processes does variation across the die need to be accounted for. Back in the day, the biggest variation would be lot to lot, then wafer to wafer, and then site to site. (Probably this is still true, but we add edge to edge too.)
Which is probably tested in some way for all components as they are usually supplied with a speed grade; the best being the most expensive. At least for FPGA/CPLDs.

So that is another reason to do speed- and stress-testing: Component cost. If you want to make hundreds, even a few dollars will add up to a considerable savings. Now, if you want to make thousands...

I know for a fact that many people who design for volumes spend weeks or months trying to press a particular design into a smaller component. Can one use a smaller Lattice, Xilinx or Intel fpga, the savings will be considerable.

Now, if your task is to make it into an ASIC, this becomes even more important, and testing, re-testing and analysis becomes ridiculously expensive and time-consuming. One of my friends work with implementing designs into layouts for Microchip (I think it was for one of the smaller AT chips), and his whole summer has been spent in moving different parts around to improve stability and get pins and parts into their right location. If you move to larger components (like the larger ARMs), the simulation (of the component) itself becomes very difficult, and the simulation speed goes way down (maybe 20Hz or something). The solution is then usually to make simplified models of different parts to see if they are stable, but in the end, the models have to be put into hundreds of specialized FPGAs (look at Mentor Graphics servers) to get a few MHz "real" speed.

All-in-all, its very limited what kind of code and configuration an ASIC producer can simulate, while a FPGA producer don't even know what kind of logic that will be within the component. Add up signal delays and give an estimate? Sure.. but it has nothing to do with what you will really experience when you run the core, component, board or system. Stress-testing is the only way.

Re: Stress-testing a 6502 - suggestions?

Posted: Sat Sep 01, 2018 7:00 pm
by 1024MAK
When testing existing semiconductor IC's, especially when overclocking, or operating at supply voltage extremes or at temperature extremes, or a combination of these, it is well worthwhile having the chips current continuously monitored.

When the supply voltage and the current are taken into account, you can then tell if the semiconductor is running at a greater temperature, or it it is in the normal range, far quicker and earlier than waiting for the encapsulation to get warm or hot.

Mark

Re: Stress-testing a 6502 - suggestions?

Posted: Thu Nov 01, 2018 3:08 am
by kakemoms
I'll soon be back to extend my 6502 and have been wondering if there is a good method to do error correction on instruction execution.

E.g. if an instruction fails/misses due to high speed bit errors, it will crash the program (and halt the processor) in most instances. If the failed state can trigger an early fail-flag (or alike), it would be possible to re-execute the instruction before the resulting error can trigger a crash.

Anyone familiar with such methods? I know one can do error checking on memory fetch with extra bits, but I was hoping for error checking on actual core performance. The only way I can think of is to use two parallel cores... but for me it sounds like an inefficient way.

Re: Stress-testing a 6502 - suggestions?

Posted: Thu Nov 01, 2018 7:15 am
by BigEd
I searched for "error-detecting ALU" and got a few hits. But you'd need to instrument your ALU, and your bus traffic, and your address calculation hardware. Seems like a tall order! Even with a pair of cores, I'm not sure how you'd know that only one of them is going to fail. You'd need a safe margin on one - but how big?

Re: Stress-testing a 6502 - suggestions?

Posted: Thu Nov 01, 2018 7:50 am
by GARTHWILSON
I was introduced to the idea 30 years ago. The man was talking about having three computers running in parallel, and if one disagreed with the other two, it was automatically taken offline. The example he gave was a factory where if something crashes, you could ruin a million dollars' worth of perfume. I've worked on some space stuff, although it was for unmanned spacecraft where failure only risks a lot of money but no one's life, and there they never do the multiple processors or microcontrollers or memory for any job in the spacecraft. The biggest risk is just software bugs. If you have multiple hardware sets running the same software bugs, guess what'll happen.

To prevent high-speed errors, just run the clock speed up gradually until you start observing problems (which will probably mean a crash), measure the frequency, and back it down for some margin. That's what I did with my workbench computer. It started having problems at just a hair over 7MHz, so I run it at 5MHz and have never had a problem.

Re: Stress-testing a 6502 - suggestions?

Posted: Thu Nov 01, 2018 11:47 am
by 1024MAK
In the industry where I work, for critical applications or for redundancy (or both) we run dual or triple MPU systems.

These take one of three forms depending on requirements. But all are run well within the manufacturers specifications. So no overclocking. Reliability being very important.

The non-safety critical systems typically have two independent MPU cards (each has on board EPROM and SRAM) with a watchdog system. They both run the same software. Both MPU cards run all the time. But only one able to control and write to the bus to the rest of the system. In the event of the current in-use MPU failing to toggle it’s watchdog, the other MPU will automatically be promoted to being the in-use MPU and the faulty MPU will loose control. We can also manually switch between them.

For the safety critical systems, two MPU systems are present in a single cased module. Both MPUs are normally always online processing all the data. If either of them disagrees with the other on the system outputs, a comparison system will blow the internal supply fuse taking both off-line and putting all the outputs into a safe state. Note that both MPUs use the same software.

For the safety critical systems where redundancy is important, three separate modules are used, but all are interlinked. Each module (which fit in a 19 inch rack) contains a single MPU, RAM, control circuits, custom network interfaces and power circuitry. On the front there is a slot for the EPROM module. Similar to the system described above, a comparison system is used. But it is far more complex. The principal is that in the even of a fault, the two ‘good’ modules will blow the internal fuse of the ‘defective’ module and take it off-line. The two remaining modules will then continue to work, but in a dual configuration system only. Again, they all use the same software.

The software used in the safety critical systems has been carefully designed. As it is processing data predefined within a fixed limited format, it is possible to use the same program in many systems, even though the real world data is different. So each system contains EPROMs that contain a table that the system uses to process the real world data. The data in the table determines which logic is applied in response to the real world inputs, in order to decide what (if any) response will be output back to the real world.

This system enables simulation systems run on far more capable hardware to be used to test the logic of the data in the tables, with all ‘real world’ inputs and outputs also being simulated.

The downside is that these various systems are not particularly fast :(

Mark

Re: Stress-testing a 6502 - suggestions?

Posted: Fri Nov 02, 2018 1:20 pm
by BillO
GARTHWILSON wrote:
To prevent high-speed errors, just run the clock speed up gradually until you start observing problems (which will probably mean a crash), measure the frequency, and back it down for some margin. That's what I did with my workbench computer. It started having problems at just a hair over 7MHz, so I run it at 5MHz and have never had a problem.
I guess the question then becomes "What do you run to do that speed test?". There maybe some specific code sequences that might fail at X Mhz, but generally the system will run safely at 1.2X Mhz.

Re: Stress-testing a 6502 - suggestions?

Posted: Tue Nov 06, 2018 9:08 pm
by kakemoms
BillO wrote:
I guess the question then becomes "What do you run to do that speed test?". There maybe some specific code sequences that might fail at X Mhz, but generally the system will run safely at 1.2X Mhz.
Exactly.

I am thinking about a mad monkey approach. But to predict its behavior, a simulator would be required.

Re: Stress-testing a 6502 - suggestions?

Posted: Wed Nov 07, 2018 1:37 am
by GARTHWILSON
Quote:
But to predict its behavior, a simulator would be required.
I don't think a simulator will be of any value, since the point is to see what a particular part or set of parts can do, when there are variations from one production lot to another, one wafer to another, even different parts of the same wafer, in addition to the usual matters of voltage and temperature. The infamous WDC 65C51 worked fine in simulation, but the simulator did not catch a race condition that produced a bug in actual units.

Re: Stress-testing a 6502 - suggestions?

Posted: Wed Nov 07, 2018 8:37 am
by BigEd
You do need some way to know whether or not your test program is misbehaving, and a simulator is one way to do that. Another is self-checking code (Klaus' testsuite is self-checking and might be useful but is not designed as a test for running at speed.) Another is some way to compare two runs - an unstressed run and a speed-challenge. You do need to be sure that your unstressed run has plenty of margin.

Re: Stress-testing a 6502 - suggestions?

Posted: Wed Nov 07, 2018 2:03 pm
by kakemoms
Yes, the simulation was more as a way to determine what the code would do (or should do). If you have a way to change the speed of the machine, a stressed/unstressed run with comparison on resulting behavior would probably be faster. Still, the code would have to be analysed to understand its intended behavior (e.g. since a mad monkey would make completely random code at all times).

Re: Stress-testing a 6502 - suggestions?

Posted: Wed Nov 07, 2018 3:00 pm
by BigEd
As an alternative to clocking a design ever-faster, it's worth considering instead running it at reduced voltage - that might be more convenient. (Another alternative is running it at higher temperature - inside a temperature-controlled oven, for example. But that's more difficult and probably more risky too.)

As a very rough indication, WDC rate their 65C02 at:
2MHz at 1.8V, 4MHz at 2.5V, 8MHz at 3V, 14MHz at 5V

Re: Stress-testing a 6502 - suggestions?

Posted: Wed Nov 07, 2018 11:47 pm
by 1024MAK
But surly, before you start testing, you have to define what the objective is.

So if the objective is to run a MPU as fast as possible in a simple real world application, the code that is being run should at the very least be the intended application code (assuming that the code already exists and is known to be reliable and stable at "normal" clock speeds).

If the objective is to test a MPU as much as possible by getting it to execute as many of it's instructions in it's instruction set as possible, then you need a program that does just that, and which only outputs a result "signal" when it has completed all the tests.

The more difficult question may be how to determine that it is the MPU that has failed, compared to the glue logic/support circuitry/RAM/ROM...

Mark