6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Jun 26, 2024 5:08 am

All times are UTC




Post new topic Reply to topic  [ 564 posts ]  Go to page Previous  1 ... 28, 29, 30, 31, 32, 33, 34 ... 38  Next
Author Message
PostPosted: Tue Jun 01, 2021 2:48 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
floobydust wrote:
That can't be fun to find out after the fact.... so, not being that familiar with the 65C816, dare I suggest using a normal memory move routine between banks? No idea if it would behave differently, but if it did work, then perhaps you've found a hardware timing anomaly with the bank logic.... I'm really guessing here.

I've done exactly what you suggested and in fact, modified the firmware's M/L monitor F(ill) and T(ransfer) functions to use the classic load/store technique, instead of MVx. Inter-bank copying, and memory fill in either bank works without a hitch, although slower, of course. That suggests that reading and writing bank $01 RAM is reliable, at least at the rate possible with indirect long indexed addressing.

While looking at this (apparent) hardware bug, I observed that an MVx from bank $00 to bank $01 partially completes before the machine crashes¹, but the point at which failure occurs is random in nature, and the amount of data actually copied varies. There is something about the 816's behavior in writing to bank $01 with the MVx instructions that is causing the train wreck.

Based upon the fact I can use MVx to copy from bank $01 to bank $00, but not vice versa, it appears to only be a write issue. That theory is further bolstered by the fact that using MVx to make a copy entirely within bank $01 also fails in the same fashion (copying via load/store has no problem). What makes this somewhat baffling is the only real difference between copying with load/store code and copying with MVx is the latter can do it faster, at the rate of one byte per seven Ø2 cycles. A write with MVx should be no different than one with STA [<dp>],Y. Both complete in one cycle.

I have one other thing to try, and that is to write a test program that will use MVx to copy from bank $00 to bank $01, but prior to actually executing MVx, tells DUART #1 to temporarily stop the jiffy IRQ. If that fixes it then I may have discovered a hardware bug in the 816 itself (MVN and MVP are the only interruptible instructions in the 816's instruction set). If it doesn't fix it then I clearly have a timing problem somewhere in the glue logic that is not directly influenced by Ø2 rate.

One other interesting thing I discovered has to do with the size of the circular queues used for serial I/O (SIO). For some time, I have used a queue size of 64, with some Boolean bugaloo in the SIO driver queue indexing code to respect the 64 byte boundaries. This is a little more complicated in POC V1.2 and V1.3 due to having four SIO channels, each with two queues, of which only two of the eight queues fall on even page boundaries. That scheme works great with POC V1.2, but not as well with V1.3, whose SIO performance was mediocre, especially writing to the console.

In an effort to get to the bottom of it, I commented out the code that does the Boolean bugaloo and expanded the queues to 256 bytes each, which allows for the use of a simpler circular indexing arrangement. That was the rocket propellant needed to get into orbit. As bank $00 RAM consumption is less critical in this machine than its predecessors—I have all of bank $01 for code and data, I think I will leave the code as is, even though 2KB is being eaten up by queues.

In passing, there may be firmware bugs that are behind all of this. The firmware was originally written for POC V1.0, which came to life in December 2009. Since then, the firmware has had more patches applied than a hobo's worn-out trousers. The MVx instructions diddle with the DB register, leaving it pointing at the destination bank. As no POC unit before V1.3 has had more than bank $00 RAM, I was not particularly careful with DB. So it could be that somewhere in the firmware DB is being inadvertently stepped on. As preservation of the exact MPU state by IRQ handlers is de rigueur if MVx is being used, my "shut off the IRQs" test might show the problem is nothing more than a stupid firmware bug.

——————————————————————————————————————————————
¹During the POST memory test, RAM contents are preserved, except in the real zero page, the emulation-mode stack ($000100) and the native mode stack ($00BF00). Hence I am able to examine RAM following a hard reset and see how far the copy got before the machine puked.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 01, 2021 8:09 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
BigDumbDinosaur wrote:
In passing, there may be firmware bugs that are behind all of this...As preservation of the exact MPU state by IRQ handlers is de rigueur if MVx is being used, my "shut off the IRQs" test might show the problem is nothing more than a stupid firmware bug.

Seek and ye shall find...

As I said, prior iterations of the firmware didn't pay much attention to DB and what was in it. It didn't matter with POC V1.0, V1.1 and V1.2, because the only address space was in bank $00. It does matter with V1.3.

After running the "shut off the IRQs test," MVx worked as it should. That result led me to investigate the IRQ handler's front end to see if DB was being correctly handled. It was being preserved, but while examining the code, it became clear what was causing the trouble. As you read the following, keep in mind that the MVx instructions are interruptible.

The IRQ handler refers to some data tables as it is processing virtual QUART interrupts, data tables that are in ROM and therefore are in bank $00 (also, the serial I/O queues are in bank $00, right below the stack). As an MVx instruction executes, the destination bank is loaded into DB. There was no code in the IRQ handler's front end to set DB to bank $00 before further processing, which meant whatever was in DB at the time of an IRQ was the bank in which the IRQ handler would look for its run-time data. If the destination bank for an MVx instruction was bank $01 the IRQ handler would go to that bank looking for non-existent data. Lacking proper data tables, the part of the IRQ handler that processes virtual QUART interrupts was being derailed.

A couple of lines of code fixed the problem, and now anything using MVx instructions works right. I reverted the monitor F(ill) and T(ransfer) functions to the original code that uses MVN and MVP. I then used T(ransfer) to copy the firmware ROM, which is at $00D000-$00FFFF to RAM starting at $01D000. There were no crashing sounds and the monitor prompt immediately came back. I followed that with the C(ompare) function to compare the two memory regions. They were identical.

Further testing with the F(ill) function showed that all was well—F(ill) uses MVN to do the grunt work. BTW, scribbling in the entire bank $01 space with F(ill) happens too quickly to be perceptible. As MVN processes a byte per seven Ø2 cycles and Ø2 is 16 MHz, it only takes about 29 milliseconds to fill the entire bank, assuming no interrupts occur. The jiffy IRQ rate is 100 Hz, so two or three IRQs probably sneaked in there while bank $01 was being whitewashed with nulls. :D

The next thing to accomplish is to look at timing with an eye toward raising the Ø2 limit of 16 MHz.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 05, 2021 6:34 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
Well, POC V1.3 seems to be stable. Uptime is now approximately four days and everything seems to be functioning as it should. It seems strange being able to load a program into RAM beyond $00FFFF and run it. :shock: What am I going to do when I build something with much more RAM?

I need to hook up the logic analyzer to it and see if I can figure out why it won't run faster than 16 MHz—it completes the first stage POST memory check, but goes belly-up as soon as it tries to set up the DUARTs. Since the only change from POC V1.2's glue logic was the addition of the bank latching hardware, I have a feeling the answer is buried somewhere in there—or in my new-fangled stretchable clock generator.

Meanwhile, I decided to resume the postmortem on POC V1.2. So far, nothing is obvious.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Tue Jun 15, 2021 8:50 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 08, 2021 3:37 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1378
I've been on a road trip... so haven't gotten around to the forum for a bit.

For 1.3, glad that you were able to troubleshoot the problem was just some code that needed updating. As I'm not that familiar with the 65C816, I wouldn't have thought of it anyway, but your findings do explain the problem/resolution. Glad the new POC is running almost as expected, sans the maximum clock rate of 16MHz.

Is there an easy way to disable the bank latching? Just work only within Bank0 (likely a firmware change) and see if the DUARTs come alive... looking at the schematic (lightly once over) I didn't see anything that looked obvious for initializing the DUARTs. I guess another option is trying to disable the 74ACT540 and see if you can init a single DUART at a higher clock speed.

For 1.2, it well may be the SRAM has gone bad... replacing it is the only real troubleshooting option. Keep us updated.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 08, 2021 3:59 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
floobydust wrote:
Glad the new POC is running almost as expected, sans the maximum clock rate of 16MHz.

Looking at it from the optimistic point of view, 16 MHz is 2 MHz faster than the 65C816's official maximum rating. Also, for perspective, the best POC V1.1 could do was 14 MHz, and it was on the ragged edge due to no wait-stating—some ROMs wouldn't work.

I can recall when any computer that could run at 16 MHz represented a major financial investment.

Quote:
Is there an easy way to disable the bank latching?

No. The presence or absence of a particular signal that is the result of bank latching is an integral part of the glue logic. Without that signal being generated there'd be no ROM or I/O available.

Quote:
...looking at the schematic (lightly once over) I didn't see anything that looked obvious for initializing the DUARTs. I guess another option is trying to disable the 74ACT540 and see if you can init a single DUART at a higher clock speed.

I need to pay particular attention to the clock generator circuit. Stretching Ø2 high is how wait-stating is brought about. There might be a timing constraint in there that becomes apparent above 16 MHz. The machine is rock-solid at 16.

Quote:
For 1.2, it well may be the SRAM has gone bad... replacing it is the only real troubleshooting option. Keep us updated.

I'd have to get someone to R&R the SRAM for me. I can desolder the old one with my small heat gun but I won't be able to solder the new one unless my left eye suddenly starts seeing again. :shock:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 09, 2021 12:47 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
BigDumbDinosaur wrote:
Well, POC V1.3 seems to be stable. Uptime is now approximately four days and everything seems to be functioning as it should.

V1.3's uptime is now $000A248C seconds, which is 184 hours and 36 minutes, about 7½ days.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Wed Jun 09, 2021 5:48 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 09, 2021 2:17 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1938
Location: Sacramento, CA, USA
Is it just "awake", or are you doing something useful with your up-time, like mining bitcoins or something?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 09, 2021 5:48 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
barrym95838 wrote:
Is it just "awake", or are you doing something useful with your up-time, like mining bitcoins or something?

I'm using it to test my new-and-improved character string manipulation library. The new library is able to process strings that span banks, which the old one couldn't do. Naturally, a machine with more than 64K of RAM is needed to do testing. So V1.3 is doing something beside consuming electricity and keeping time. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 15, 2021 9:00 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
BigDumbDinosaur wrote:
BigDumbDinosaur wrote:
Well, POC V1.3 seems to be stable. Uptime is now approximately four days and everything seems to be functioning as it should.

V1.3's uptime is now $000A248C seconds, which is 184 hours and 36 minutes, about 7½ days.

V1.3 continues to run at 16 MHz without any problems. I've been using it to complete testing on my new-and-improved string manipulation library and so far, nothing has acted up. Current uptime is $001280CD seconds, which is 336 hours, 50 minutes and 21 seconds—a hair over 14 days.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 19, 2021 4:01 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1378
BigDumbDinosaur wrote:
floobydust wrote:
Glad the new POC is running almost as expected, sans the maximum clock rate of 16MHz.

Looking at it from the optimistic point of view, 16 MHz is 2 MHz faster than the 65C816's official maximum rating. Also, for perspective, the best POC V1.1 could do was 14 MHz, and it was on the ragged edge due to no wait-stating—some ROMs wouldn't work.

I can recall when any computer that could run at 16 MHz represented a major financial investment.

Quote:
Is there an easy way to disable the bank latching?

No. The presence or absence of a particular signal that is the result of bank latching is an integral part of the glue logic. Without that signal being generated there'd be no ROM or I/O available.

Quote:
...looking at the schematic (lightly once over) I didn't see anything that looked obvious for initializing the DUARTs. I guess another option is trying to disable the 74ACT540 and see if you can init a single DUART at a higher clock speed.

I need to pay particular attention to the clock generator circuit. Stretching Ø2 high is how wait-stating is brought about. There might be a timing constraint in there that becomes apparent above 16 MHz. The machine is rock-solid at 16.

Quote:
For 1.2, it well may be the SRAM has gone bad... replacing it is the only real troubleshooting option. Keep us updated.

I'd have to get someone to R&R the SRAM for me. I can desolder the old one with my small heat gun but I won't be able to solder the new one unless my left eye suddenly starts seeing again. :shock:


I've been doing quite a few road trips this year (and last)... but if you can remove the old SRAM chip, I'd be happy to solder a new one in place... as you may recall, I recently picked up a new Weller Pico station for doing very fine work.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 19, 2021 4:24 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
floobydust wrote:
I've been doing quite a few road trips this year (and last)...

Ah, yes! I remember all-too-well the "pleasure" of living out of a suitcase for weeks on end. :shock:

Quote:
...but if you can remove the old SRAM chip, I'd be happy to solder a new one in place... as you may recall, I recently picked up a new Weller Pico station for doing very fine work.

Thanks for the offer. I'll PM you about it.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 29, 2021 1:56 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
Some probing with the logic analyzer enabled me to determine why POC V1.3 is slower than POC V1.2 (16 MHz vs. 20 MHz). It is, as I suspected, due to the extra glue logic needed to deal with the banking, specifically the logic that keeps ROM and I/O from mirroring in bank $01.

If I try to run the system faster than 16 MHz, prop time through the glue logic results in the clock generator not getting the "stretch the clock" signal soon enough before Ø2 goes high—and the clock just keeps going like nothing happened. ROM and I/O can't respond in the 25ns available during Ø2 high at 20 MHz and the machine augers in.

Oh well! I'm fine with it, as I didn't even expect the unit to run at 16 MHz.

I had planned on building PC V1.4 with 512KB of RAM. However, with what I've discovered with V1.3 and the results of a detailed timing analysis of V1.4, whose glue logic is more complicated than V1.3's, I've concluded I've pushed the POC V1 series far enough. If I want to go faster with more RAM (which means more logic to generate the bank bits) I'm going to have to discontinue using discrete logic.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 29, 2021 8:08 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10834
Location: England
Could you sketch that glue logic please BDD? It would be interesting to see what the scale of the problem is. There's a fair chance that glue logic is often in the critical path - which would provide a link between the design goals and the system performance.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 30, 2021 4:44 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8230
Location: Midwestern USA
BigEd wrote:
Could you sketch that glue logic please BDD? It would be interesting to see what the scale of the problem is. There's a fair chance that glue logic is often in the critical path - which would provide a link between the design goals and the system performance.

Glue logic for which machine?

As I said above, I decided not to proceed with POC V1.4. V1.4 was supposed to be an expansion of V1.3 with 512KB of RAM. That in itself didn't pose any design problems with bank latching. However, the need to prevent ROM and I/O from mirroring in the extended banks increased the total gate count. My method of preventing mirroring in V1.3 was easy to implement and only required a single inverter, possible because only A16 is generated. The inverter generates a (positive logic) BNK0 signal that exposes I/O or ROM if the effective address is $00C000-$00C7FF or $00D000-$00FFFF, respectively.

In V1.4, A16, A17 and A18 would be in the picture, making use of a single inverter not possible. The method I would have used would be to tie A16, A17 and A18 to a 3-input NOR, whose output would be BNK0. While such a device does exist in fast logic (74AVC), it is non-stock with the three distributors I use. An alternative source would have been a parts liquidator, all of whom have order minimums, usually 200 to 250 USD. :(

A "plan B" method would be to add an inverting D-latch and connect its three outputs corresponding to A16, A17 and A18 to a 3-input AND, which again would generate BNK0. The 3-input AND is a stock item with two of my three distributors, so that wouldn't have been a problem. However, yet another device would have to be added to the board and suddenly there were 19 discrete devices in the entire design, with 15 of them being SOIC or SOJ packages. That's a lot of soldering!

More complication was involved in generating wait-states, but only doing so if I/O or ROM were selected. This added even more to the circuit.

Given all that, I concluded that I had reached the practical limits of an all-discrete design and that it was time to move on.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 01, 2021 1:53 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3366
Location: Ontario, Canada
BigEd wrote:
Could you sketch that glue logic please BDD?
I'm guessing Ed means mean V1.3. There's a schematic in this post.

And I like the clock-stretch circuit you've got there, BDD. I've taken the liberty of editing away some detail to clarify its workings.

  • U16a (highlighted with color) will always invert its state on every oscillator cycle
  • U16b (immediately below) tries to copy what U16a is doing, but it gets messed with if /WSE (the "stretch the clock" signal) goes low :)

Attachment:
POC 1-3 mod.png
POC 1-3 mod.png [ 23.75 KiB | Viewed 98251 times ]

BigDumbDinosaur wrote:
If I try to run the system faster than 16 MHz, prop time through the glue logic results in the clock generator not getting the "stretch the clock" signal soon enough before Ø2 goes high

Well, there are two possible solutions to investigate. If the glue logic that generates /WSE can't be made faster, maybe the /WSE timing can be accommodated as-is.

In order for a stretch to occur, /WSE needs to go low before the J-K flipflop U15b gets clocked. So, how about delaying the clocking of U15b? At the top of the diagram I've drawn two possible options. In both cases the goal is to add delay in the path that leads to U15b's clock input.

It's hard to predict how much advantage you'll gain -- it might be substantial or it might be negligible. As you continue to increase the delay on U15b's clock, at some point you'll run afoul of a different timing constraint further downstream. Myself, I'd wanna try an experiment. It only requires a couple of bodge wires (and a spare gate).

Your clock stretcher doesn't use actual Ø2 for its decision, and that could prove to be a very significant advantage. :!:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 564 posts ]  Go to page Previous  1 ... 28, 29, 30, 31, 32, 33, 34 ... 38  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 97 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: