Hi there,
I want to share an experience that leaves me somewhat puzzled.
Short summary: it took me about 2 weeks hardware debugging to find out that of the two chips of seemingly identical functionality, namely BSI BS62LV4006PIP55 and Alliance AS6C4008-55PCN, only the Alliance chip works in my computer, the other produces random errors. And I have no idea why...
Long story:
I want to resurrect my CaSpAer computer (aka CS/A65
http://www.6502.org/users/andre/csa/index.html ). To make sure to have a stable system, I have adapted the "8296 burnin" program from the Commodore PET to have a repeatedly running RAM test. See
https://github.com/fachat/cbm-burnin-testsWhat that test showed me was that after a few minutes random errors appeared in the RAM test for the VMEM (video memory, actually dRAM on the video card), but also in the static RAM ($0200-$8000). Before the tests I was assuming the stability problems I had were coming from using the video card with its dRAM (2x 41464), so I was primarily looking for problems there - however when I saw the SRAM errors (using the BSI chip), I assumed some common problem.
What irritated me was that the errors did not seem to have address patterns, were both of the Read or Write error (Read error is when the data is correctly read again after a first errornous read, Write error is when the data stays incorrect), and also bits did not seem to matter. And those errors happened both in the SRAM test and the VMEM (dRAM test).
So I was looking into the bus drivers on the CPU board (changing between ALS and HCT did not change it). I was looking into bus termination (5k6 to 5V, 3k3 to GND) did not change it. I was improving the backplane's capability to supply 5V - did not change. Switching between 1MHz (40col) and 2MHz (80col) did not change anything.
I then really started focusing on the SRAM, because that was the easier signal path.
For the signal path see here:
https://flic.kr/p/2mxZNtj Attachment:
File comment: Signal paths
20211006_180828_1920x1080.jpg [ 1.41 MiB | Viewed 2206 times ]
Signals go from the CPU board to the BIOS board with the SRAM. Most signals are straightforward, only higher address lines go through some more complex selection logic. I only drew the a simple version with the ICs on it, to find out the longest (slowest) signal path.
You can find the schematics here:
http://www.6502.org/users/andre/csa/cpu ... 0k_sch.png (CPU) and
http://www.6502.org/users/andre/csa/bio ... 0f-sch.png (BIOS)
The longest path is in the /CS line - however, when scoping the signal I found that, before qualifying it with Phi2, it was only about 80ns after Phi2 falling (from the previous cycle, so about at least 150ns _before_ it would be active on /CS due to NANDing it with Phi2, so no timing problem here.
https://flic.kr/p/2mxZNwW Attachment:
File comment: Address select
20211006_220712_1920x1080.jpg [ 1.71 MiB | Viewed 2206 times ]
The yellow signal is Phi2, light blue is address line A0, violet is D0 (all from the bus), and dark blue is the RAM select line taken from the BIOS card directly before NANDing it with Phi2.
The next signals I looked at were /WE and /CS regarding Phi2 and Data bus.
https://flic.kr/p/2mxZNvo Attachment:
File comment: /CE and /WE timing
20211006_220255_1920x1080.jpg [ 1.8 MiB | Viewed 2206 times ]
The dark blue here is /WE, the others as described above. So /WE goes high about 16ns after Phi2 goes low, while the databus is still valid. /CS and /OE look the same.
Finally, as I was pretty desperate, I switched to another SRAM chip from a different supplier (Alliance). And, suddenly, all errors completely went away! Even the VMEM (dRAM) tests suddenly worked flawlessly. The latter I assume seem to have come because the test itself was running in the (assumed more stable) SRAM...
https://flic.kr/p/2my8oYp Attachment:
File comment: Successful RAM test
20211005_225955_1920x1080.jpg [ 1.55 MiB | Viewed 2206 times ]
(Ignore the header line except the number of cycles that tell how often the test has ran, also the other errors are known and of no significance - important are the OK on the RAM tests)
I then checked two other chips of the same type from BSI, all had the same problem! I only have that one Alliance chip, but that works flawlessly.
So, I looked at the datasheets of these two chips, but could not find any significant difference (if I haven't overlooked anything).
Driver capacity was the same (up to 1mA IIRC) to drive against termination resistors, but with or without termination resistors on the bus did not change it. Comparing the timing against the scope measurements seems to be totally valid.
10mA is the same power requirements for both chips at 1MHz, which is reasonable. The chips even has extra supply lines directly from the bus connector and an extra 100nF cap soldered to it.
So I am really out of ideas what could be the reason for this problem.
Do you have any ideas what I could check?
_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content:
http://6502.org/users/andre/