I've done exactly what you suggested and in fact, modified the firmware's M/L monitor F(ill) and T(ransfer) functions to use the classic load/store technique, instead of MVx. Inter-bank copying, and memory fill in either bank works without a hitch, although slower, of course. That suggests that reading and writing bank $01 RAM is reliable, at least at the rate possible with indirect long indexed addressing.
While looking at this (apparent) hardware bug, I observed that an MVx from bank $00 to bank $01 partially completes before the machine crashes¹, but the point at which failure occurs is random in nature, and the amount of data actually copied varies. There is something about the 816's behavior in writing to bank $01 with the MVx instructions that is causing the train wreck.
Based upon the fact I can use MVx to copy from bank $01 to bank $00, but not vice versa, it appears to only be a write issue. That theory is further bolstered by the fact that using MVx to make a copy entirely within bank $01 also fails in the same fashion (copying via load/store has no problem). What makes this somewhat baffling is the only real difference between copying with load/store code and copying with MVx is the latter can do it faster, at the rate of one byte per seven Ø2 cycles. A write with MVx should be no different than one with STA [<dp>],Y. Both complete in one cycle.
I have one other thing to try, and that is to write a test program that will use MVx to copy from bank $00 to bank $01, but prior to actually executing MVx, tells DUART #1 to temporarily stop the jiffy IRQ. If that fixes it then I may have discovered a hardware bug in the 816 itself (MVN and MVP are the only interruptible instructions in the 816's instruction set). If it doesn't fix it then I clearly have a timing problem somewhere in the glue logic that is not directly influenced by Ø2 rate.
One other interesting thing I discovered has to do with the size of the circular queues used for serial I/O (SIO). For some time, I have used a queue size of 64, with some Boolean bugaloo in the SIO driver queue indexing code to respect the 64 byte boundaries. This is a little more complicated in POC V1.2 and V1.3 due to having four SIO channels, each with two queues, of which only two of the eight queues fall on even page boundaries. That scheme works great with POC V1.2, but not as well with V1.3, whose SIO performance was mediocre, especially writing to the console.
In an effort to get to the bottom of it, I commented out the code that does the Boolean bugaloo and expanded the queues to 256 bytes each, which allows for the use of a simpler circular indexing arrangement. That was the rocket propellant needed to get into orbit. As bank $00 RAM consumption is less critical in this machine than its predecessors—I have all of bank $01 for code and data, I think I will leave the code as is, even though 2KB is being eaten up by queues.
In passing, there may be firmware bugs that are behind all of this. The firmware was originally written for POC V1.0, which came to life in December 2009. Since then, the firmware has had more patches applied than a hobo's worn-out trousers. The MVx instructions diddle with the DB register, leaving it pointing at the destination bank. As no POC unit before V1.3 has had more than bank $00 RAM, I was not particularly careful with DB. So it could be that somewhere in the firmware DB is being inadvertently stepped on. As preservation of the exact MPU state by IRQ handlers is de rigueur if MVx is being used, my "shut off the IRQs" test might show the problem is nothing more than a stupid firmware bug.
——————————————————————————————————————————————
¹During the POST memory test, RAM contents are preserved, except in the real zero page, the emulation-mode stack ($000100) and the native mode stack ($00BF00). Hence I am able to examine RAM following a hard reset and see how far the copy got before the machine puked.