Page 1 of 3

Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 3:10 am
by jmthompson
This is on my JRC-1 system. 8 MHz, 2 x 1 MB RAM, 512 KB ROM (currently on a memsim emulator)

Basically, I've got an issue where certain memory pages get corrupted in multiple banks.

1. The pages in question are for the same pages as used for the stack and direct page, just not in bank 0.
2. I can move the stack around and watch the page corruption move with it.
3. It only happens to banks on the first RAM chip (banks $00-$07). Banks $08-$0F are unaffected.
4. No other pages in these banks are corrupted.
5. The problem persists even if I swap the 16 MHz crystal for a 4 MHz one.
6. The problem does not move if I swap the RAM chips around.
7. The problem happens early in boot, before interrupts are even on. I've verified this by copying one of the affected pages to a safe spot during boot.

The corrupted bytes look to be related to the bank address. Sometimes the byte becomes the bank address, sometimes it's partial:

Code: Select all

* 01/400.4ff
01/0400: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
01/0410: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
01/0420: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
01/0430: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
01/0440: 55 55 55 55 55 55 55 55 55 55 55 55 55 01 55 01 | UUUUUUUUUUUUU?U?
01/0450: 55 01 55 01 55 55 55 55 55 01 55 01 55 55 55 55 | U?U?UUUUU?U?UUUU
01/0460: 55 01 55 01 01 55 01 55 01 01 01 01 01 55 01 01 | U?U??U?U?????U??
01/0470: 01 55 01 01 01 45 01 41 55 01 01 01 01 55 01 01 | ?U???E?AU????U??
01/0480: 61 01 51 01 55 01 55 01 55 55 61 01 61 01 55 01 | a?Q?U?U?UUa?a?U?
01/0490: 55 01 55 01 01 01 01 55 61 55 01 55 01 55 01 55 | U?U????UaU?U?U?U
01/04A0: 01 55 01 01 55 55 55 55 01 55 55 01 55 55 55 55 | ?U??UUUU?UU?UUUU
01/04B0: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
01/04C0: 01 55 01 41 01 01 01 41 01 01 55 01 41 01 01 45 | ?U?A???A??U?A??E
01/04D0: 41 C1 01 01 01 C1 01 01 01 01 01 01 01 41 01 01 | AA???A???????A??
01/04E0: 01 01 01 41 01 41 01 01 01 01 E1 E1 01 01 41 01 | ???A?A????a???A?
01/04F0: 01 01 01 01 55 01 55 01 01 55 01 55 55 55 01 55 | ????U?U??U?UUU?U

Code: Select all

* 04/400.4ff
04/0400: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
04/0410: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
04/0420: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
04/0430: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
04/0440: 55 55 55 55 55 55 55 55 55 55 55 55 55 04 55 55 | UUUUUUUUUUUUU?UU
04/0450: 55 55 55 55 55 55 55 55 55 55 04 55 55 55 55 55 | UUUUUUUUUU?UUUUU
04/0460: 55 55 04 55 04 55 04 55 04 55 55 04 04 55 04 04 | UU?U?U?U?UU??U??
04/0470: 55 04 04 04 04 04 55 55 04 04 04 55 04 04 04 04 | U?????UU???U????
04/0480: 04 04 04 04 04 04 04 04 55 04 04 04 54 04 04 64 | ????????U???Td?d
04/0490: 55 04 04 55 04 55 55 55 55 55 55 04 04 04 04 04 | U??U?UUUUUU?????
04/04A0: 55 55 55 55 04 55 55 55 04 55 55 04 04 55 04 55 | UUUU?UUU?UU??U?U
04/04B0: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 | UUUUUUUUUUUUUUUU
04/04C0: 04 54 04 55 55 55 55 55 54 44 55 55 04 55 04 84 | ?T?UUUUUTDUU?U??
04/04D0: 04 84 04 84 55 84 04 A4 84 A4 84 A4 04 04 84 04 | ????U??$?$?$????
04/04E0: 04 04 84 04 84 04 84 A4 64 04 A4 A4 04 04 04 84 | ???????$d?$?????
04/04F0: 04 84 04 55 24 C5 55 55 55 55 04 15 55 55 04 55 | ???U$EUUUU??UU?U
The bank address logic is the usual setup with a '573 and a '245:
jrc-1-CPU.png
The memory section is mostly just parallel connections but I'm including it anyway:
jrc-1-Memory.png
The I/O decoding is done mostly in an ATF1504. Here's the Verilog code for it:

Code: Select all

module jigl (
	 input wire A4,
	 input wire A5,
    input wire A8,
    input wire A9,
    input wire A10,
    input wire A11,
    input wire A12,
    input wire A13,
    input wire A14,
    input wire A15,
    input wire A16,
    input wire A17,
    input wire A18,
    input wire A19,
    input wire A20,
    input wire VDA,
    input wire VPA,
	 input wire nRESET,
    output wire RESET,
    output wire nROMCS,
    output wire nRAM1CS,
    output wire nRAM2CS,
    output wire nVIASEL,
    output wire nSPISEL,
    output wire nUARTSEL,
    output wire nSLOTEN,
    output wire nSLOT1SEL,
    output wire nSLOT2SEL,
    output wire nSLOT3SEL
);

// True if current acess is in bank 0
wire bank0 = ~A20 && ~A19 && ~A18 && ~A17 && ~A16;

// True if current acess is the LOWROM area ($00/F800 - $FFFF)
wire lowrom = bank0 && A15 && A14 && A13 && A12 && A11;

// True if the current acess is in HIGHROM ($10/0000 - $13/FFFF)
wire highrom = A20 && ~A19 && ~A18;

// True if current acess is in RAM
wire ram    = ~A20;
wire ram1 = ram && ~A19;
wire ram2 = ram && A19;

// True for accesses in the $00/F000 - $F7FF range
wire io = bank0 && A15 && A14 && A13 && A12 && ~A11 && (VDA || VPA);

// Internal I/O devices
wire intio = io && ~A10 && ~A9 && ~A8;
wire via   = intio && ~A5 && ~A4;
wire spi   = intio && ~A5 && A4;
wire uart  = intio && A5 && ~A4;

// Slots
wire slot1 = io && ~A10 && ~A9 && A8;
wire slot2 = io && ~A10 && A9 && ~A8;
wire slot3 = io && ~A10 && A9 && A8;
wire sloten = slot1 || slot2 || slot3;

// Assign active low output signals
assign RESET = ~nRESET;
assign nROMCS = ~(lowrom || highrom);
assign nRAM1CS = ~(ram1 && ~io && ~lowrom);
assign nRAM2CS = ~ram2;
assign nVIASEL = ~via;
assign nSPISEL = ~spi;
assign nUARTSEL = ~uart;
assign nSLOTEN = ~sloten;
assign nSLOT1SEL = ~slot1;
assign nSLOT2SEL = ~slot2;
assign nSLOT3SEL = ~slot3;

endmodule
There is a small additional bit of logic included in the on-board SPI65 (another ATF1504) that produces RDB and WRB signals by qualifying RWB with PHI2. This was done to save a chip and because the main glue logic 1504 was out of pins.

Everything else about this system is 100% stable for the last 3 years. The fact that it follows the stack and direct page around, and only affects those pages, suggests it's not some general problem with the circuit or the glue logic equations.
Best guess so far (and I'm grasping at straws here) is that some stack/dp instructions have a dead cycle in them that somehow trips up the glue logic. Otherwise I am stumped.

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 4:24 am
by BigDumbDinosaur
jmthompson wrote:
This is on my JRC-1 system. 8 MHz, 2 x 1 MB RAM, 512 KB ROM (currently on a memsim emulator)...certain memory pages get corrupted in multiple banks.

I’m having a little trouble following the Verilog for the CPLD.  What exactly is being qualified by VDA and VPA?  Also, what is the speed rating of the RAM?  What is generating your clocks?  Have you published a schematic somewhere that shows the entire enchilada?

BTW, you could free up some CPLD pins by using discrete logic to generate the resets.

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 5:26 am
by Dr Jefyll
jmthompson wrote:
Everything else about this system is 100% stable for the last 3 years.
Are you saying the problem developed recently? If so, was there any circumstance that might've brought it on?

Quote:
Sometimes the byte becomes the bank address
Very revealing, I'd say. It pretty much proves that the ram somehow had /WE and /CE both low (ie, it was writing, even if only briefly) during the 816's Phi2-low period; I can't think of any other explanation. This leads me to wonder about the other ATF1504 that produces RDB and WRB signals by qualifying RWB with PHI2. Can we see the Verilog for that?

BTW some photos of the project might be helpful (in addition to the info BDD requested). There may be a layout or other physical issue causing the erratic behavior.

-- Jeff

PS: In other words, I don't necessarily agree that "The fact that it follows the stack and direct page around, and only affects those pages, suggests it's not some general problem with the circuit or the glue logic equations."

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 1:44 pm
by Dr Jefyll
Moments ago I did a search and discovered that you talk about your JRC-1 here and here.

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 3:32 pm
by BigDumbDinosaur
Dr Jefyll wrote:
jmthompson wrote:
Everything else about this system is 100% stable for the last 3 years.
Are you saying the problem developed recently?  If so, was there any circumstance that might've brought it on?
Quote:
Sometimes the byte becomes the bank address
Very revealing, I'd say.  It pretty much proves that the ram somehow had /WE and /CE both low (ie, it was writing, even if only briefly) during the 816's Phi2-low period; I can't think of any other explanation.  This leads me to wonder about the other ATF1504 that produces RDB and WRB signals by qualifying RWB with PHI2.  Can we see the Verilog for that?

I’m thinking along the same lines as Jeff; there may be some sort of obscure timing problem or race condition in the CPLD that generates the qualified read/write signals.  It could be it’s been present all along, but is being provoked by something changing in the unit’s operating conditions.

Dr Jefyll wrote:
Moments ago I did a search and discovered that you talk about your JRC-1 here and here.

Regarding the “Area 73” website Jeff found (first link), I had no idea it existed.  I looked around on it, but couldn’t find any schematics of the JRC-1 unit.  I have a tough time reading pages that are formatted with a poorly-contrasting pastel text color, so I may have missed something.

As for the local topic (second link), it would be best to add to it instead of starting a new topic so the casual reader can more-easily connect the dots.  The search engine built into the forum’s software is not the best, so it’s entirely possible for a search to fail to find relevant posts.

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 5:08 pm
by jmthompson
BigDumbDinosaur wrote:
jmthompson wrote:
This is on my JRC-1 system. 8 MHz, 2 x 1 MB RAM, 512 KB ROM (currently on a memsim emulator)...certain memory pages get corrupted in multiple banks.
I’m having a little trouble following the Verilog for the CPLD.  What exactly is being qualified by VDA and VPA?  Also, what is the speed rating of the RAM?  What is generating your clocks?  Have you published a schematic somewhere that shows the entire enchilada?
I just added a PDF of the entire schematic here: https://area73.org/files/jrc-1/jrc-1.pdf.

The clock is the standard dual flip-flop that's posted all over the forums. It's on page 5 of the schematic. RAM speed is 55ns. The CPLDs are 7ns.

VDA and VPA are only used to qualify the I/O selects. Yes I could've just used VDA for that; I had originally wired both up in case I wanted the signal at a future time.
Quote:
BTW, you could free up some CPLD pins by using discrete logic to generate the resets.
I would have had to add a whole chip for one NOT gate. The CPLD is just inverting /RESET into RESET for the UART). The actual reset signal is generated by a DS1813.
Quote:
Regarding the “Area 73” website Jeff found (first link), I had no idea it existed. I looked around on it, but couldn’t find any schematics of the JRC-1 unit. I have a tough time reading pages that are formatted with a poorly-contrasting pastel text color, so I may have missed something.
Sorry about that, I'm not a Wordpress guy and so I'm kind of at the whim of the theme developers. They're all like that, more or less.

Re: Ok, I've got a really odd issue

Posted: Mon Apr 28, 2025 5:38 pm
by jmthompson
Dr Jefyll wrote:
jmthompson wrote:
Everything else about this system is 100% stable for the last 3 years.
Are you saying the problem developed recently? If so, was there any circumstance that might've brought it on?
It is impossible to say 100% for sure, as until fairly recently the memory layout had the stack and DP in the $Exxx range, and bank $01 has nothing using that range yet, so I would not have noticed. But other than this, there are been zero issues, including the board running idle for months on end with no crashes or reboots.
Quote:
Quote:
Sometimes the byte becomes the bank address
Very revealing, I'd say. It pretty much proves that the ram somehow had /WE and /CE both low (ie, it was writing, even if only briefly) during the 816's Phi2-low period; I can't think of any other explanation.
See, I thought something like this, but here is the problem: nothing anywhere in my firmware writes to any banks beyond bank $01 right now. So, how is memory in other banks being corrupted, if nothing is generating write cycles in those banks? Not saying it's not the logic (I mean, it kinda HAS to be, somehow); I just cant find a fault that would only manifest on specific pages like this.
Quote:
This leads me to wonder about the other ATF1504 that produces RDB and WRB signals by qualifying RWB with PHI2. Can we see the Verilog for that?
Unfortunately I seem to have, er, lost the logic for the 65SPI. More accurately, I think at some point in the last 3 years I deleted it, not realizing I had actually added this one tiny customization to it (it's Daryl's 65SPIv2 VHDL code.). From memory though it was just two lines of logic:

Code: Select all

wire rd = ~(phi2 && rwb)
wire wr = ~(phi2 && ~rwb)

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 2:25 am
by jmthompson
I've been doing more investigation today and narrowed this down a bit. What actually seems to be happening is that frequent writes to RAM seems to corrupt bytes in that same range in bank|1. In other words, heavy writes to even banks corrupt the same page in the next bank, but writes to odd banks don't seem to be causing issues. The reason it's manifesting primarily on the stack page is just because that's the one page that gets the most write activity.

I also poked around with my scope a bit. One thing I noticed so far is that the 100 ohm resistors on the clock lines (intended to suppress ringing) were distorting the clocks a bit too much. So, I bodged across the resistors, and the clocks look better now. A16 coming off of the '573 latch also seems to look fine.

Tomorrow I will try clipping the scope to PHI2 and /WR and see if they are ever active at the same time. If they are I'm going to have to do a bit of work to set myself back up to write ATF1504s...I haven't really done hardware for the past 3 years, so the JTAG hardware is put away and the software isn't even installed on my current workstation.

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 4:01 am
by BigDumbDinosaur
jmthompson wrote:
I also poked around with my scope a bit. One thing I noticed so far is that the 100 ohm resistors on the clock lines (intended to suppress ringing) were distorting the clocks a bit too much. So, I bodged across the resistors, and the clocks look better now. A16 coming off of the '573 latch also seems to look fine.

Can you photograph the Ø2 scope display with the resistors in-circuit?  It sounds as though something is abnormally loading the clock circuit if bodging out the resistors results in a cleaner signal.  What type of resistor are you using?

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 8:05 am
by BigEd
It feels like this is a timing marginality rather than a clock error. Most likely at the end of a write cycle, although just possibly at the beginning.

(That said, qualifying with phi2 is a logical operation which is intended to sort out what would otherwise be timing marginalities. But the exact phase of the phi2 relative to other signals will be meaningful.)

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 12:48 pm
by Dr Jefyll
Dr Jefyll wrote:
jmthompson wrote:
Sometimes the byte becomes the bank address
Very revealing, I'd say. It pretty much proves that the ram somehow had /WE and /CE both low (ie, it was writing, even if only briefly) during the 816's Phi2-low period; I can't think of any other explanation.
I just thought of another explanation. :oops:

It's normal for the CPU to drive the Bank Address onto the data bus during Phi2 low. Then during Phi2 high that is supposed to get replaced by either write data from the CPU or read data from memory. But if somehow the read/write data failed to get driven onto the data bus then the bus's capacitance would cause the Bank Address to linger. IOW you'd have the abnormal circumstance of the Bank Address appearing during Phi2 high.

To me at this moment I don't see how this observation could be relevant, but I decided to mention it anyway. My own suspicion remains focused on the theory that a glitch or marginal timing on /CE and/or /WE -- the signals that cue the RAM to write -- resulted in a problematic write during the time that the Bank Address is supposed to appear on the data bus. And it won't surprise me if some snooping with the 'scope results in a head-slap "aha!" moment.

-- Jeff

PS: a question for jmthompson. In the lead post you included some data dumps. During that experiment, did you have any code executing in Bank 1? Just checking. (It looks as if you simply filled Bank 1 with $55 and then left it alone.) From where did you have code executing?

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 3:47 pm
by jmthompson
Dr Jefyll wrote:
PS: a question for jmthompson. In the lead post you included some data dumps. During that experiment, did you have any code executing in Bank 1? Just checking. (It looks as if you simply filled Bank 1 with $55 and then left it alone.) From where did you have code executing?
All the code at the moment is in ROM, bank $10. Bank $01 is all data for the OS: variables, serial buffers, the heap, etc. The data only goes up to $1000 or so at the moment.

I have run some short hand-entered programs in other banks though, without incident.

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 4:21 pm
by jmthompson
BigDumbDinosaur wrote:
jmthompson wrote:
I also poked around with my scope a bit. One thing I noticed so far is that the 100 ohm resistors on the clock lines (intended to suppress ringing) were distorting the clocks a bit too much. So, I bodged across the resistors, and the clocks look better now. A16 coming off of the '573 latch also seems to look fine.

Can you photograph the Ø2 scope display with the resistors in-circuit?  It sounds as though something is abnormally loading the clock circuit if bodging out the resistors results in a cleaner signal.  What type of resistor are you using?
The resistors are some generic 1% tolerance carbon film resistors I bought ages ago.

With resistor in place:
SDS00002.jpg
With resistor bypassed:
SDS00001.jpg
Both PHI1 and PHI2 look like this. Both have very little connected to them: the 74AHCT573 and 74AHCT245 use PHI1, and PHI2 goes to the 65SPI and the VIA. Everything else is using the directed /RD and /WR.

Kinda surprised it was working with that signal; I know how picky the 65816 is about clocks, especially the rise time.

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 6:17 pm
by BigDumbDinosaur
jmthompson wrote:
The resistors are some generic 1% tolerance carbon film resistors I bought ages ago...

I’ve got carbon films in POC V1.3’s clock generator; everything works fine at 16 MHz.

Neither clock signal looks good.  The clock without the damping resistors looks horrid, to be frank, which means the resistors were actually helping.  Something is definitely not kosher; I think you have an abnormal loading problem somewhere, judging by the way the rise and fall noticeably slow as the signal approaches the respective terminal level.

That said, I agree with Ed that this may not be directly related to clocking; it’s more likely a subtle timing issue, although clock quality could ultimately be responsible.  Do you have access to a logic analyzer?  If so, looking closely at the behavior of the bus transceiver during the critical “turnaround” period when the clock goes high during a write cycle might be helpful.  During write-cycle turnaround, the bank bits persist for a short time before the 65C816 begins to emit data.  I’m grasping a bit at straws here, but everything is on the table until eliminated.

Something that might help with figuring out if Jeff’s idea is a possibility would be to bias the data bus to VCC on the A side of the data bus transceiver (U8).  3.3K is a good choice for this, and can be done with either discrete resistors or an array.  During Ø2 low, the data bus will be floating, since U8’s /CE input would be high, causing the transceiver to high-Z its data pins.  Absent a signal, the outward side would be subject to the whims of stray capacitance.

Getting back to Ø1 and Ø2, you need to figure out why you are seeing those slow edges on the signal.  First thing is to verify that your test setup is correct.  Your probe needs to be on the ×10 setting and correctly compensated for your scope.  The scope’s signal ground needs to be as close as you can get to the point where you are picking off your signal.  If these conditions are being met, then it is probably safe to say what you are seeing is reality.

Assuming the scope isn’t lying about clock quality, it is possible that although the 65C816 seems to be working okay, its timing is actually being messed with by the too-slow rise and fall of the clock.  WDC says in the data sheet that the clock’s rise and fall times should not exceed 5ns—a 74AC74 on five volts will easily achieve that.  I can’t offer any experience with what happens if the clock edges are slow; Ø1 and Ø2 on my POC V1.3 unit are sharply-defined when viewed on a 275 MHz scope with a compensated probe.

Since multiple parts of your circuit are dependent on the clock, it could be one of them is not liking what it is seeing, resulting in flaky operation.  Before doing anything else, I’d be figuring out what’s up with that clock signal.

Re: Ok, I've got a really odd issue

Posted: Tue Apr 29, 2025 6:21 pm
by barnacle
That looks like the CR product of your inline resistor and circuit load capacitance is (scratches head: 6MHz = 160ns, so the CR is about a fifth of that) enough to shift the timing of the clock - assuming it happens to trigger at half rail - by maybe 20ns rising and perhaps a touch more falling.

Effectively, you're probably shifting your clock signals by 20ns or so; it can't be healthy if its qualifying other stuff around the (nominal) edges.

Neil