6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 2:06 pm

All times are UTC




Post new topic Reply to topic  [ 544 posts ]  Go to page Previous  1 ... 29, 30, 31, 32, 33, 34, 35 ... 37  Next
Author Message
 Post subject: Re: POC VERSION TWO
PostPosted: Wed May 23, 2018 8:51 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
Rob Finch wrote:
Hi, a lot of info to try an absorb.

Sorry! :D I tend to overdo it when it comes to this stuff.

Quote:
I note the CPLD is very fast < 10ns and on the graphs the timing is synchronous with phi02.

The graphs are a little misleading in that regard. As you note, all timing is slaved to Ø2. The MPU itself is directly driven by the clock generator, so it sees Ø2 with zero lag. Ditto for the clock input on the CPLD. That said, all CPLD outputs are dependent on conditions produced by the MPU, which implies that the CPLD's outputs are going to lag the outputs of the MPU by at least the pin-to-pin propagation time, 10ns in the part I am using. So when the fall of Ø2 occurs and the CPLD outputs change in some way, you have to mentally add the 10ns prop time to that.

Unfortunately, Atmel's development software doesn't factor prop time into the simulation results, which makes it appear as though the CPLD outputs are changing exactly when the inputs change. What the simulator does do is demonstrate the results of whatever logic is going to be programmed into the CPLD. It just can't help in resolving timing issues.

Quote:
Could it be that hold times are just barely being met ? Would it be possible to insert a few more ns delay on the output of the CPLD ? (another clock cycle depending on input clock ?).

I'm not aware of any way to intentionally increase the CPLD's propagation time or keep the outputs in a certain state for an arbitrary amount of time, such as to assure that hold times are being met. As I said above, the CPLD's outputs already lag the MPU's by at least the pin-to-pin prop time. However, it is, I suppose, possible for a hold time to be marginal, something which might be provable by using the 25ns version of the CPLD. That should, in theory, cause all circuit actions dependent on the CPLD's outputs to lag the MPU by 25ns minimum. Continuing with that theory, the output hold times would be longer relative to the MPU's notion of time.

One thing that can be controlled with the ATF1504AS is the output slew rate, either slow or fast, which is an option set in code. I have it set slow.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Wed May 23, 2018 9:31 pm 
Offline
User avatar

Joined: Fri Dec 12, 2008 10:40 pm
Posts: 1007
Location: Canada
BigDumbDinosaur wrote:
I'm not aware of any way to intentionally increase the CPLD's propagation time ...


You could use some of the unused I/O to add another ~10nS. For instance, take Phi2 in pin3 and out pin10 before passing it back externally to pin43. Then just add the extra line in your CUPL code to make pin10 follow pin3.

Might take a little surgery.

_________________
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Wed May 23, 2018 10:52 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
BillO wrote:
BigDumbDinosaur wrote:
I'm not aware of any way to intentionally increase the CPLD's propagation time ...


You could use some of the unused I/O to add another ~10nS. For instance, take Phi2 in pin3 and out pin10 before passing it back externally to pin43. Then just add the extra line in your CUPL code to make pin10 follow pin3.

Might take a little surgery.

Unfortunately, every I/O on the CPLD is spoken for, so that mod isn't going to be possible. :cry:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Wed May 23, 2018 11:25 pm 
Offline
User avatar

Joined: Fri Dec 12, 2008 10:40 pm
Posts: 1007
Location: Canada
BigDumbDinosaur wrote:
BillO wrote:
BigDumbDinosaur wrote:
I'm not aware of any way to intentionally increase the CPLD's propagation time ...


You could use some of the unused I/O to add another ~10nS. For instance, take Phi2 in pin3 and out pin10 before passing it back externally to pin43. Then just add the extra line in your CUPL code to make pin10 follow pin3.

Might take a little surgery.

Unfortunately, every I/O on the CPLD is spoken for, so that mod isn't going to be possible. :cry:


My mistake. I thought you had pin3 and pin10 connected to power for stability, but I was looking at the TQFP package. I guess them you are using the PLCC variant.

_________________
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 12:26 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
BillO wrote:
My mistake. I thought you had pin3 and pin10 connected to power for stability, but I was looking at the TQFP package. I guess them you are using the PLCC variant.

Yep, PLCC44. Much as I'd like to use the TQFP package, I can't see well enough to work with stuff that small.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 1:22 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
You may add a series resistor into +5V supply to make it less reliable - but this simply may cause other errors to appear, not necessarily what actually causes trouble.
You may use a blow dryer to heat up the board to verify its tolerance against heat, or cool it down with coolant spray. Again this not necessarily cause the same bug to appear.

For a brief moment I thought about a flaky /RES signal - long enough to reset the CPLD but not long enough to disturb the 816. It would cause your MMU to be reseted. Most likely this could cause your OS to slip. But you use a DS1813 with a huge buffer cap - nearly impossible to get a false reset. (On the other hand, you could strap reset with a low ohm resistor to Vcc and look if that does change anything.)

At the moment I believe it is not a hardware problem (just a feeling, may be wrong). Some seldom arrangement of internal states which causes a glitch perhaps. Guus' idea of monitoring the regular behavior of the SW is a pretty good one. Instead of adding a display you could use one of your serial IOs to print "I am here and everything is fine" messages from various corners of your OS until it crashes or stops issuing messages. This sort of messaging is frequently used in safety critical applications where usually two or more identical subsystems report their states to some supervisor. If any regular message is missing the supervisor causes the faulty subsystem to verify itself and then perhaps enter a failsafe state.


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 2:26 am 
Offline
User avatar

Joined: Fri Dec 12, 2008 10:40 pm
Posts: 1007
Location: Canada
BigDumbDinosaur wrote:
Yep, PLCC44. Much as I'd like to use the TQFP package, I can't see well enough to work with stuff that small.


Ditto.

_________________
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 4:16 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
GaBuZoMeu wrote:
At the moment I believe it is not a hardware problem (just a feeling, may be wrong).
I echo GaBuZoMeu's comment about it not being hardware problem. (And the part about maybe being wrong, too! :P )

The unpredictability of the crashes reminds me of a case I had where there was a vulnerability to an interrupt arriving at the wrong time. It seemed utterly random. So, hardware and software can both produce sporadic behavior, and neither can be ruled out, IMO.

Are there multiple interrupt sources active when these crashes occur? I was gonna suggest taking one source at a time and, as a test, drastically increasing or decreasing its frequency to see if the crashes also increase/decrease.

On a different tack, would you be able to kludge together some way to manually switch to single-stepping the CPU clock? You might glean a valuable clue by examining the Program Counter after a failure has occurred. Just stop the clock and test the address lines using your logic probe. For instance: is the chip executing utter garbage, or is it in a fairly reasonable loop (but waiting for a condition that'll never come)? Hard to say what might turn up.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 6:38 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
Dr Jefyll wrote:
GaBuZoMeu wrote:
At the moment I believe it is not a hardware problem (just a feeling, may be wrong).

I echo GaBuZoMeu's comment about it not being hardware problem. (And the part about maybe being wrong, too! :P )

I'll come back to this below.

Quote:
Are there multiple interrupt sources active when these crashes occur?

In the minimalist firmware I'm running right now, there are three active interrupts: channel A transmitter empty, channel A receiver full and timer A underflow, all coming from the QUART. All other QUART interrupts are disabled, and there is no other hardware in the system that can generate an IRQ.

At power-on, the firmware does a memory test of the critical RAM areas (direct page and stack), configures the UART, sets up the driver tables, starts the UART timer to produce a 100 Hz jiffy IRQ and then outputs some text to serial channel A. I've got a dumb terminal connected to channel A, so I get a display on the screen.

Attachment:
File comment: POC V2.0 Serial I/O Test Display (ROM ID 20180514-00)
pocv2.0_sio_test.gif
pocv2.0_sio_test.gif [ 534.49 KiB | Viewed 2730 times ]

With those steps completed, the firmware just spins in a tight loop (HERE: BRA HERE) and does nothing else. I can see if the jiffy IRQ is running by observing /IRQ with one of my logic probes. The circuit appears to be high, but the pulse indicator flashes, indicating activity. This is the same behavior I see if I probe /IRQ on POC V1.1 when it's sitting idle awaiting input.

I also have another logic probe hooked up to monitor RDY. RDY spends most of its time low, since the spin loop is executing in ROM and each ROM access incurs a wait-state. That is evident with the pulse indicator flashing on the probe.

As I write this, the unit shows activity—it's been running for about 30 minutes, following updating of the CPLD code. If and when it goes kaput, /IRQ will go low and the probe's pulse indicator will cease flashing.. At the same time, RDY will go high and the pulse indicator on that probe will also cease flashing. A check of RWB and a couple of other signals generated by the MPU will indicate complete fatality. Just to eliminate the MPU as a possible suspect, I've swapped the '816 with the one in POC V1.1, with no effect.

Quote:
I was gonna suggest taking one source at a time and, as a test, drastically increasing or decreasing its frequency to see if the crashes also increase/decrease.

I can disable the receiver and transmitter interrupts and just let the timer IRQ continue, but at a much higher rate (up to about 921,000 IRQs per second). If it's an interrupt issue that should provoke it pretty quickly, one would think.

Quote:
On a different tack, would you be able to kludge together some way to manually switch to single-stepping the CPU clock? You might glean a valuable clue by examining the Program Counter after a failure has occurred. Just stop the clock and test the address lines using your logic probe. For instance: is the chip executing utter garbage, or is it in a fairly reasonable loop (but waiting for a condition that'll never come)? Hard to say what might turn up.

I'd have to build something. I have a single-stepper that plugs into the Ø2 oscillator socket, but that is push button operated. Each push of the button switches the clock from one phase to the other. Needless to say, I could be pushing that button for a long time before something happens. :shock:

GaBuZoMeu wrote:
At the moment I believe it is not a hardware problem (just a feeling, may be wrong).

I'd like to think that, but haven't been able to rule out hardware as the culprit. The firmware that I've written so far reuses the IRQ handler that I developed for POC V1.1, with changes to accommodate the greater number of serial I/O channels in POC V2. Also, some minor changes had to be made to the foreground code used to handle serial I/O. However, the core of it is unchanged, and POC V1.1 can run for weeks on end without any problems.

Quote:
For a brief moment I thought about a flaky /RES signal - long enough to reset the CPLD but not long enough to disturb the 816. It would cause your MMU to be reseted. Most likely this could cause your OS to slip. But you use a DS1813 with a huge buffer cap - nearly impossible to get a false reset. (On the other hand, you could strap reset with a low ohm resistor to Vcc and look if that does change anything.)

The reset circuit is solid—I observed it on the scope for both initial power-on and manual reset. The reset circuitry is identical to that used in POC V1.1, with which I have never had any trouble.

It's pretty puzzling at this point.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 2:21 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigDumbDinosaur wrote:
It's pretty puzzling at this point.
Agreed. And thanks for the extra info. It's helpful to know what the system is actually doing. I'm gonna unpack that, just to list the details and maybe (hopefully!) tickle an insight. Before a failure the system will...

  • fetch and execute a BRA (repeats zillions of times, until... )
  • IRQ goes low and firmware decides which interrupt(s) it thinks need service
  • the corresponding service routine(s) are executed
  • an RTI occurs, which ought to take us back to the BRA loop (unless the stack's corrupted)

Only one link in the chain needs to break. Might there be, or appear to be, a spurious interrupt (ie, not the timer/jiffy) from the UART? If the code that determines the interrupt source had a bug, what would happen if an inappropriate routine were called? Are there calls to RAM for any of that stuff?

The reason I ask about RAM is because that's where PC apparently points after a failure (as shown by RDY going and staying high).

Quote:
A check of RWB and a couple of other signals generated by the MPU will indicate complete fatality.
Just a reminder: WAI and STP are the only instructions that can cause your CPU to actually cease the fetching and executing of instructions. Otherwise it is surely fetching and executing something, whether it's data, a loop it can't escape, or code that's entirely irrelevant. The distinction between these different "fatalities" is meaningful.

That's why I'm curious where PC points after a failure. You may find PC points to (actually just past) a WAI or a STP instruction. Or, possibly something else. I admit this is fishing for clues. The results could be helpful or not.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Thu May 24, 2018 3:30 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
To get a more detailed picture of what "fatality" might be, you could use the NMI (manually triggered once the MPU is in the weeds) to get a dump of the current state (registers, flags, top of stack +/- 10 entries). As your serial ports might not work properly at that moment I would dump these information into some non used RAM locations. There they can be verified after RESET.

Does your system issue some sort of message when there is a "spurious" IRQ? Does you have code (and messages) provided that responds to IRQs, BRKs, NMIs in the case of being in emulation mode? What would happen then? (I assume the POC is usually running in native mode.)

You may expand your idle loop with two verifications: a) is the SP pointing to the regular place? b) is the state of the flags (especially M,X, and I) what it should be?


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Fri May 25, 2018 1:12 am 
Offline

Joined: Wed Feb 12, 2014 1:39 am
Posts: 173
Location: Sweden
If it was hitting a WAI/STP instruction it'd pull RDY low wouldn't it? BDDs observation is that it stops with RDY high


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Fri May 25, 2018 4:20 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
LIV2 wrote:
it'd pull RDY low wouldn't it?
Yes. My main focus was on pointing out that a crashed CPU usually keeps executing something. It's true I didn't thoroughly discuss the two exceptions, STP and WAI. (It occurred to me to wonder about BDD's CPLD logic, which sometimes actively pulls RDY high. But if a bug in that department resulted in a tug-of-war on RDY then I think there'd be some noticeably hot chips, or a loss of +5.)

@GaBuZoMeu, great suggestion about using NMI and stashing the saved info in RAM to be viewed after a reset! BTW and FWIW, the ABORTB input could be used instead (if for some reason NMI isn't convenient).

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Fri May 25, 2018 6:12 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
Dr Jefyll wrote:
...Before a failure the system will...

  • fetch and execute a BRA (repeats zillions of times, until... )
  • IRQ goes low and firmware decides which interrupt(s) it thinks need service
  • the corresponding service routine(s) are executed
  • an RTI occurs, which ought to take us back to the BRA loop (unless the stack's corrupted)

Only one link in the chain needs to break. Might there be, or appear to be, a spurious interrupt (ie, not the timer/jiffy) from the UART?

The oscilloscope says there are no spurious IRQs. After the POST screen has been displayed and the firmware goes into the spin loop, the only activity I see on /IRQ is a series of very short duration negative pulses spaced exactly 10ms apart. As the jiffy IRQ rate is 100 Hz, that pulse spacing is correct.

Quote:
If the code that determines the interrupt source had a bug, what would happen if an inappropriate routine were called? Are there calls to RAM for any of that stuff?

The entire IRQ handler (ISR) is in ROM but of course, accesses both direct page and stack space. There is no code in RAM, however.

At present, the ISR only knows about the three programmed QUART interrupts: channel A receiver full, channel A transmitter empty, and timer A underflow. Were the QUART to generate a spurious interrupt, the MPU would repeatedly execute the ISR in a futile effort to find and service the IRQ. That condition would result in /IRQ being continuously low, with no pulse activity, and RDY mostly low, but with pulse activity. Since IRQs are not unmasked until SR is pulled by the RTI instruction, repeated attempts to service a spurious IRQ will not underflow the stack.

Both direct page and the stack are located in a block of RAM that extends from $00D800 to $00DEFF, with the stack pointer initialized to $DEFF during the early stages of POST. The memory map for that range is as follows:

Code:
FIRMWARE WORKSPACE DEFINITIONS
;
000000   kerneldb =$00                  ;default data bank
00D800   kerneldp =d8ram                ;virtual direct page
00D900   workspac =kerneldp+s_rampag    ;start of vectors & tables
00DA00   siocfifo =workspac+s_rampag    ;start of SIO CFIFOs
00DEFF   hwstack  =hmubas-1             ;top of MPU stack

The space allocated to serial I/O (SIO) queues (CFIFOs) extends from $00DA00 to $00DDFF inclusive. Hence the stack is 256 bytes, more than enough to handle anything the firmware would require.

Quote:
The reason I ask about RAM is because that's where PC apparently points after a failure (as shown by RDY going and staying high).

Same thing I'm thinking. An inadvertently executed SToP instruction would produce the conditions I described.

LIV2 wrote:
If it was hitting a WAI/STP instruction it'd pull RDY low wouldn't it? BDDs observation is that it stops with RDY high

According to the 65C816 data sheet, executing STP has no effect on RDY. Therefore, if in fact accidentally executing STP is what is crashing the machine /IRQ being continuously low and RDY being continuously high following the crash would make sense.

GaBuZoMeu wrote:
To get a more detailed picture of what "fatality" might be, you could use the NMI (manually triggered once the MPU is in the weeds) to get a dump of the current state (registers, flags, top of stack +/- 10 entries). As your serial ports might not work properly at that moment I would dump these information into some non used RAM locations. There they can be verified after RESET.

Unfortunately, the unit is not yet to the point where something like that is practical.

Quote:
Does your system issue some sort of message when there is a "spurious" IRQ?

No.

Quote:
Does you have code (and messages) provided that responds to IRQs, BRKs, NMIs in the case of being in emulation mode? What would happen then? (I assume the POC is usually running in native mode.)

At reset, the 65C816 is put into native mode and it stays there. If the MPU somehow gets switched back to emulation mode and an interrupt hits the machine will crash. There is no support for emulation mode in the firmware (also true of POC V1.1).

Quote:
You may expand your idle loop with two verifications: a) is the SP pointing to the regular place? b) is the state of the flags (especially M,X, and I) what it should be?

Checking and/or correcting SP would be fairly easy. I can't easily observe the others, but once the MPU has entered the HERE: BRA HERE loop, m and x wouldn't matter.

Dr Jefyll wrote:
Yes. My main focus was on pointing out that a crashed CPU usually keeps executing something.

Just to amplify Jeff's comment, in the case of the 65C816 all opcodes are documented instructions. Therefore the '816 really can't be killed by executing any opcode, other than STP and WAI.

Quote:
It's true I didn't thoroughly discuss the two exceptions, STP and WAI. (It occurred to me to wonder about BDD's CPLD logic, which sometimes actively pulls RDY high. But if a bug in that department resulted in a tug-of-war on RDY then I think there'd be some noticeably hot chips, or a loss of +5.)

The CPLD is the biggest power-consumer in the unit and is slightly warm to the touch after a period of time. All the other major chips are barely warm to the touch. While that isn't a scientific test to determine if something is using too much juice, it's reasonable. Power at the input jack on the unit is 5.02 volts, which is what I have been seeing all along.

Now, here's where it gets curiouser and curiouser. I powered up the unit at 0638 ZULU time last night (May 24), right after I had reprogrammed the CPLD with my tidied-up code. As I write this, it is 0611 ZULU time on May 25, the unit has been continuously running since that startup the previous night and is still alive. The probe on /IRQ shows normal activity, as does the probe on RDY. In other words, everything appears to be copacetic right now. I'm wondering if reprogramming the CPLD accidentally fixed the damned thing. :shock: :shock:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC VERSION TWO
PostPosted: Fri May 25, 2018 8:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Or maybe the probes are damping down the spurious signals...!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 544 posts ]  Go to page Previous  1 ... 29, 30, 31, 32, 33, 34, 35 ... 37  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: