... the 53CF94’s FIFO empties and DREQ is de-asserted, but the MPU doesn’t know that and keeps reading, thus loading garbage.
Could you hack together a little hardware/software gizmo with an (interruptible) interrupt service routine to pause just your block moves until DREQ reasserts?
Dunno. I think in order for that to work, the ISR would have to spin in order to keep the block-copy in a suspended state—the restoration of PB and PC by executing RTI restarts the copy instruction following an interrupt. The problem with that is if an error occurs while the disk is seeking and it (the disk) changes the bus phase while the ISR is waiting on DREQ, the phase change will not be acted upon, even though it causes the host adapter to assert /IRQ, potentially resulting in deadlock.
Another approach I had considered was to build a “quasi-DMA controller” using a 65C816 with a stoppable Ø2 clock (the main Ø2 clock generator fed through logic), SRAM for workspace and a ROM to provide the code needed to make it work. In essence, the controller would be a specialized computer that has been optimized to perform rapid data transfers using MVN and/or MVP. Hardware trickery would have to be used to make either the source or destination address appear to be static, even though the 816 would keep emitting changing addresses. DREQ could be used to control the clock, stopping it on the high phase during the part of each transfer cycle when access to the 53CF94 would occur, but the FIFO is empty. I anticipate timing would be tricky and, of course, the controller would have to become the bus master and shut down the main 816 while DMA transfer is in progress. So far, it’s just a vague thought.
What is really needed is a true DMA controller that is bus-compatible with the 65C816, something that could be created in programmable logic. A fair amount of logic resources and I/O pins would be needed. Required registers would be 16-bit source address, 16-bit destination address, bi-directional 8-bit bank/data register, 16-bit counter register, 8-bit control register and 8-bit status register. I/O pins would include address, data and control bits.
In the abstract, I know how such a device should work, but my programmable logic chops aren’t at the level needed to go from thoughts to working hardware. Something like an Atmel ATF1508 would have enough I/O pins, but I don’t know if it would have enough logic fabric.
Yeah, the special ISR I'm suggesting would do a spin-wait on DREQ, but it could also sniff out and abort on a seek error as well. You're on the bare metal, so you should have full control over all eventualities, so to speak.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
1: Is it safe to assume that once a block starts transferring, it is completely available in the disk controller's memory & nothing will slow the transfer down? So you could make sure that each block has started transferring, then finish that block with a MVN transfer.
It would depend on the disk geometry, the buffer size and the buffer caching algorithm that is used. Some of that is a hardware implementation detail that is usually not directly known. Furthermore, it would only be applicable to a block device, e.g., disk or CD/DVD, since all medium accesses are in units of a block (usually 512 bytes). I should note that block devices produced since the latter 1990s typically buffer a full track, not just a block, which makes random access within a small block range incur only one medium access.
The rub is if an operation involves multiple blocks and the range of blocks being read/written crosses cylinder boundaries, buffering will not prevent a momentary data flow interruption as the disk seeks. The 53CF94 SCSI controller has a 16-deep FIFO to act as a conduit between the SCSI bus and the host bus. At the speed V1.3 runs (16 MHz), the FIFO can be emptied in a few microseconds. If a seek occurs mid-transfer, the SCSI bus will momentarily go “silent,” the FIFO will empty and DREQ will be de-asserted, although the transfer hasn’t actually finished. It apparently is that condition that tripped up my experiment—and it was much worse when accessing the CD-ROM, which is painfully slow when compared to a hard disk.
Stream devices, such as tape, usually look somewhat like serial I/O to the SCSI bus, so any medium movement that momentarily interrupts the data stream may cause an almost-immediate stoppage at the host. Therefore, both the host adapter hardware and the SCSI driver have to be able to accommodate data flow interruptions as a routine event.
In a SCSI implementation that supports disconnect/reconnect (select-with-ATN has been used to initiate the transaction), the driver will know if there is going to be an interruption, as the target device will change the bus phase to message-in, send a disconnect message and then releasing control of the bus. The POC V1.3 SCSI driver doesn’t implement disconnect/reconnect due to limited ROM space—a body of code attached to the interrupt handler is needed to configure the host adapter to accept reselection when the SCSI device is ready to resume access. Also, the kernel in V1.3’s firmware is single-tasking, so disconnect/reconnect has no value—the MPU would have nothing else to do while waiting for reselection, other than service IRQs. Disconnect/reconnect is something that would be of value in a multitasking kernel, especially when working with a SCSI device that is mechanically slow, e.g., a tape drive.
Quote:
2: Have DREQ stall the CPU's access to the SCSI controller's data transfer page if data isn't ready yet. Additionally have a timeout that sets a "fail" status & frees the CPU.
See my above reply to Mike. Other than squatting in a spin loop inside the interrupt handler, the only practical way to stall an MVN or MVP instruction (which iterates N + 1 times, N being the byte count loaded into the 16-bit accumulator) is to actually stop the 816.¹ With both methods, SCSI IRQs will not be serviced (IRQs from other I/O subsystems could be processed via polling), which, as I described to Mike, would result in deadlock if the target SCSI device encounters an error and changes the bus phase in response. A bus-phase change causes an IRQ, which would not be recognized in hardware if the MPU is halted, requiring a polling function for that specific condition.
———————————————————
Anything that changes the value of PC while MVx is executing will “break” the instruction. If all registers are preserved and PC is later restored, MVx will restart as though nothing had happened.
Yeah, the special ISR I'm suggesting would do a spin-wait on DREQ, but it could also sniff out and abort on a seek error as well. You're on the bare metal, so you should have full control over all eventualities, so to speak.
Yes, I do have bare-metal control, but things still have to be handled in a certain order, since SCSI bus operation is independent of host operation—the 53CF94 is almost like a little computer, and it and the target device work together once the initial operation is started. This autonomous behavior is what I had to get my head around in order to write a SCSI driver that could handle all possible conditions. Once a transaction has been initiated, POC V1.3 doesn’t control the SCSI bus, the target does. V1.3’s job devolves to schlepping bytes and responding to status changes reported by the CF94.
A characteristic of SCSI operation is bus phases (there are eight of them) can come in almost any order and phase changes can occur when least expected. Hence SCSI processing is very much IRQ-oriented and as you know, interrupt processing is something that has to be handled with care and alacrity. Anything anomalous could occur in a target device during a transfer, not just a seek error, causing the device to immediately change the bus phase to status-in to tell the host about it. The CF94, seeing the bus phase change, will autonomously buffer the status byte (usually “check condition”) sent by the target, assert DREQ, since the FIFO now has data, and interrupt the MPU.
Furthermore, after the status byte has been accepted by the CF94 and the latter has sent an ACKnowledge back to the target, the target may then go to either the message-in phase or the bus-free phase, either of which will result in yet another bus-phase-change interrupt. Complicating things, if the switch is to message-in, the CF94 will autonomously capture the message in the FIFO and keep DREQ asserted.
Since the MPU was stalled inside a spin loop and was not responding to IRQs while all this activity was transpiring, it won’t know what the heck is going on with phase changes and whatnot. Directly reading the CF94’s status will only report the condition associated with the most recent (unserviced) IRQ (the status register is only one-deep), which will confuse the driver and likely leave the SCSI bus in an indeterminate state with no further activity pending.
Since the MPU was stalled inside a spin loop and was not responding to IRQs while all this activity was transpiring, it won’t know what the heck is going on with phase changes and whatnot. Directly reading the CF94’s status will only report the condition associated with the most recent (unserviced) IRQ (the status register is only one-deep), which will confuse the driver and likely leave the SCSI bus in an indeterminate state with no further activity pending.
I know practically nothing about SCSI, but I do know that ISRs can be easily interrupted if they're designed that way. Instead of spin-waiting directly on an I/O port the special "transfer pause" ISR could just spin-wait on a "status byte" in RAM that would be updated by one or more higher-priority ISRs. Please forgive me if I'm spouting nonsense ... you clearly know far more about these subjects than I.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Since the MPU was stalled inside a spin loop and was not responding to IRQs while all this activity was transpiring, it won’t know what the heck is going on with phase changes and whatnot. Directly reading the CF94’s status will only report the condition associated with the most recent (unserviced) IRQ (the status register is only one-deep), which will confuse the driver and likely leave the SCSI bus in an indeterminate state with no further activity pending.
I know practically nothing about SCSI, but I do now that ISRs can be easily interrupted if designed that way. Instead of spin-waiting directly on an I/O port the special "transfer pause" ISR could instead spin-wait on a "status byte" in RAM that would be updated by one or more higher-priority ISRs. Please forgive me if I'm spouting nonsense ... you clearly know far more about these subjects than I.
Due to concerns about stack smashing and a potential for out-of-order execution, I concluded that nesting interrupts, which is what I think you are suggesting, would not be safe. Instead, all interrupts are handled “serially,” with DUART receiver IRQs handled first. Hardware features in V1.3’s design greatly reduce register polling when an IRQ occurs, so the potential hits to performance tend to be minimized.¹
SCSI itself operates in “phases,” eight in all:
Bus-Free — all devices are quiescent and the bus floats.
Arbitration — one or more devices attempt to seize the bus.
Selection — the device that “won” arbitration (the “initiator”) selects another device (the “target”) with which to communicate.
Command — the initiator tells the target the operation to be performed.
Data — a data payload is transferred between initiator and target; the issued command indicates the direction of transfer.
Status — the target reports internal operating status to the initiator.
Message-in — the target has bus management information for the initiator.
Message-out — the initiator has bus management information for the target.
Only one phase is possible at any given instant. Typically, a sequence such as a read operation from a device would progress as follows:
bus-free → arbitration → selection → command → data → status → message-in → bus-free.
Each time the phase is switched, the 53CF94 controller will generate an IRQ. The CF94 has other IRQ sources, but the change-of-phase interrupt is foremost for execution routing. The CF94 has several status registers as well, one of which indicates the current phase with a binary-coded decimal value that the driver may use to route execution (think ON - GOTO).
With the exception of selection and arbitration, the current phase is determined by the target, not the initiator. That is, the target controls the bus following selection and therefore the initiator’s SCSI driver cannot make any assumptions about the order in which phases will selected, or if any given phase will even be used during the transaction. It is this semi-random nature of SCSI’s operation that makes writing a bullet-proof driver a challenge, especially in assembly language.
For example, following selection, the target will usually switch to the command phase. However, a switch to the status phase might occur if the target currently has a problem with accepting commands. Or, following receipt of a command, the target might switch to the status phase due to it not being able to execute the command. Or, during the middle of the data phase, the target might experience a fatal hardware error and abruptly drop off the bus, resulting in an unexpected switch to the bus-free phase. Hence the initiator has to always be prepared for an unexpected phase change and the resulting interrupt, and must handle such changes in the correct order.
Accordingly, the SCSI driver I’ve developed is actually a set of modules, each of which handles a specific phase, along with a “dispatcher” to analyze why a phase change occurred and control execution accordingly. The selection of the appropriate module is indirectly made in the IRQ service routine by manipulating the stack values that were pushed when the interrupt was serviced. When the interrupt handler returns, foreground execution is directed by the dispatcher to the correct module. The IRQ handler also reads the status registers in the CF94 and writes those values to the stack so they will be returned in the MPU registers when foreground execution is resumed.
For these reasons, nested interrupts are a problem, in that out-of-order execution could happen if a phase change occurs while the previous phase change is still being serviced. By not nesting interrupts, it is assured that phase changes will be handled in the order in which they occur as soon as the IRQ handler can get to it.
Once I get to writing a multitasking kernel, I will revisit how SCSI interrupts are handled, especially since I will have justification for implementing disconnect/reconnect in the SCSI driver—it’s a feature that can aid system throughput when a SCSI device is temporarily busy with mechanical functions, e.g., a seek in a disk.
———————————————————— ¹There are 12 possible interrupt sources in POC V1.3, eight of which are associated with serial I/O (SIO) processing. A bit field that updates in real time to indicate the interrupt status of the four SIO receivers and transmitters can be gotten in a single atomic read operation. A similar feature in the SCSI host adapter indicates if the 53CF94 is interrupting. These hardware features greatly reduce the necessity of polling individual chips and multiple registers in each chip each time an IRQ occurs.
Last edited by BigDumbDinosaur on Sun Jun 08, 2025 3:13 pm, edited 2 times in total.
... Another approach I had considered was to build a “quasi-DMA controller” using a 65C816 with a stoppable Ø2 clock (the main Ø2 clock generator fed through logic), SRAM for workspace and a ROM to provide the code needed to make it work. In essence, the controller would be a specialized computer that has been optimized to perform rapid data transfers using MVN and/or MVP. Hardware trickery would have to be used to make either the source or destination address appear to be static, even though the 816 would keep emitting changing addresses. DREQ could be used to control the clock, stopping it on the high phase during the part of each transfer cycle when access to the 53CF94 would occur, but the FIFO is empty. ...
It does seem like the larger pin count version of the ATF1504 could handle the trick of making source or destination address appear static ... you need to have 16 address pins on the CPU side and 16 address pins on the bus side, the data bus (bussed, not through the device) & the I/O device select (already have the address lines for the target I/O address) to load the address of the opcode, and a setting whether the load or store is frozen (which only needs one bit). Since the through address lines are combinatorial when not frozen, the output pins can latch the frozen address, and the input pins latch the target MVN/MVP opcode address. The first VDP=VDA=1 with an address different from the stored MVN/MVP opcode address stops the static address mode.
A while back, I bloviated a bit on POC V1.4, which wrapped up most of POC V1.3’s discrete glue logic into a 7.5ns 22V10 GAL. The goal was to build a unit that could potentially be clocked beyond 20 MHz. Unfortunately, V1.4’s stretchable-clock design proved to be not be up to the task, and the unit pooped out above 16 MHz, same as did V1.3 (and for the same reason). POC V1.4 has joined the hall of shame, alongside a couple of other failed designs.
V1.4’s shortcomings led to further clock generation experiments, resulting in a new clock circuit design that checked out at 40 MHz. The question then became one of how to test the new-and-improved™ clock generator in a real-world application. I had considered butchering up V1.3 to do the testing, but was reluctant to risk wrecking the only specimen of that design. I concluded the way forward was to update V1.4. to V1.4.1.
V1.4.1 is mostly an evolutionary step; except for the clock generator, a larger address space and slightly different memory map, it is functionally similar to V1.3 and V1.4.1. Here are the schematic, PCB layout and GAL equations:
V1.3 and V1.4 have 128KB address space, of which 112KB is RAM, and 64KB of that is contiguous, extended RAM ($010000 - $01FFFF). V1.4.1 has 256KB address space, of which 242KB is RAM and 192KB of that is contiguous, extended RAM ($010000 - $03FFFF).
Selectable Wait-State Duration
The clock generator can be configured to stretch Ø2 high for either one or two clock cycles. The latter may become necessary if I can get this thing to run in the 25 MHz range.
Discriminatory I/O Wait-Stating
In V1.3, any access to the I/O block ($00C000-$00CFFF) results in a wait-state, even when accessing a device that doesn’t require one. This behavior is due to limitations in V1.3’s discrete glue logic. V1.4.1’s logic discriminates between devices that must be wait-stated and those that can run at full speed.
Better Data Bus Management
During testing on V1.3 with the logic analyzer, I determined that some data bus contention was going on during a read cycle when the 65C816 switches from emitting the bank bits to listening on the data bus, this despite the /RD (read data) strobe being qualified with Ø2. I added a transceiver to the data bus to keep it isolated until after the rise of Ø2.
New Expansion Port Arrangement
I devised some electrical and mechanical changes to the expansion port with an eye to being able to try out several different card designs, as well as to correct some deficiencies noted in the past.
New Socket
Past POC units used a DIP28 socket on the mainboard as the expansion port, an arrangement that has had less-than-sterling reliability. V1.4.1’s expansion port is a Samtech square-pin female header (P/N SSW-118-01-T-Q) whose 36 pins are arranged on two rows spaced 200 mils (5.08 mm) apart. Aside from offering a more-positive retention characteristic, this header makes it possible for a vertically-mounted expansion card to be self-supporting, something that wasn’t practical with the slot-style expansion connector on V1.4.
The expansion layout I devised also permits mezzanine-style card mounting, similar to what V1.3 has. As a greater force will be required to plug a card into the Samtech header than was required with the DIP socket, I added an extra mounting hole to the PCB near the expansion socket to prevent flexing when inserting a card.
Additional Signals
The new expansion port includes signals that would be needed to correctly interface 65xx-type I/O devices, e.g., a 65C22, to the expansion port. The extra signals are GCLK (global clock, which is a non-stretched signal in phase with Ø2) and RWB, the 65C816’s read/write signal. One of the chip-selects assigned to the expansion port, XIOA, is not wait-stated and is suitable for use with a WDC 65C21 or WDC 65C22 at Ø2 speeds up to 20 MHz.
More Robust Power Connections
There are three VCC and six ground connections, versus one VCC and three grounds with the DIP expansion socket. Having only one VCC connection proved to be a bit of a trouble-maker with the SCSI host adapter, as the SCSI bus may sink upwards of 500 mA during some stages of its protocol. During testing of the most-recent host adapter design, I noted a small amount of VCC sag was going on during SCSI activity and decided that it had to be corrected.
Flash ROM for Firmware
I have been using 27C256 UV-erasable EPROMs since time immemorial, mainly because I had a lot of them, as well as an EPROM eraser that could erase 20 at a time. With heavy usage, some have developed “stuck bits” and thus my supply has been slowly depleted. While I still have quite a few EPROMs, I finally decided to get with the times and switch to a 55ns 39SF010 flash ROM for the firmware. The flash is in a PLCC32 package, which of course affected the PCB layout.
RTC Mounted On Main Board
Like V1.4, V1.4.1 has the Maxim DS1511 RTC mounted on the main PCB. V1.3 and earlier mounted it on the SCSI host adapter because the RTC’s socket was pressed into service as the expansion socket.
MCE Indicator
The MCE (machine check exception) indicator is a red LED that is indirectly driven by the 65C816’s E output, which is high when the 816 is running in emulation mode. The 816 is immediately switched to native mode following reset, which means MCE should be dark during normal operation.
In the first stage of POST, which is prior to any I/O being established, a memory test is conducted on critical RAM areas. If a failure occurs during this test, there is no way to report it on the console, but it could be reported with the MCE indicator. Ergo the 816 will be returned to emulation mode, causing MCE to illuminate, after which the 816 will be halted with the STP instruction. I’ve been contemplating a way to make MCE flash when this sort of fault occurs.
There are no DIP packages in this design, other than the GAL. With a few exceptions, discrete devices are either SIOC or SOJ. As two latches are needed to generate the A16 and A17 signals required to support extended RAM, I decided to use a pair of 74LVC1G373s (SOT-23 packages), which have a D-to-Q Tpd rating of 4ns maximum on 5 volts. The GAL logic includes the state of A16 and A17 in decoding decisions, so the very rapid Tpd of these devices helps to maximize overall glue logic performance. All GAL logic is combinatorial, so the estimated total prop time through the logic will be 11.5ns, worst-case.
Testing has shown that the 0.6µ version of the 65C816 typically emits a valid address 12ns after the fall of Ø2. At 20 MHz, Ø2 low lasts 25ns, which assuming the 12ns number is consistent, leaves 13ns of Ø2-low time to generate a chip select and determine if a wait-state is required. The wait-state hardware (a J-K flop) uses a timing clock signal that slightly lags Ø2, so there is some leeway in how quickly the wait-state enable (/WSE in the schematic) must be asserted relative to Ø2. It was this aspect of the older design that faltered above 16 MHz.
So the next step after I have slept on this design for a few days will be to build it and see if it goes or blows.
Thanks for the updates BDD. Gosh, between the 'POC saga' and Barnacle's 'FAT quest' there's a lot of addictive reading for Christmas.
I'm going to watch this space for your results, as I also am pondering upon how to improve the glue logic speed of my 65816/pico2 contraption.
Thanks for the updates BDD. Gosh, between the 'POC saga' and Barnacle's 'FAT quest' there's a lot of addictive reading for Christmas.
Christmas dinner being what it usually is, the quest for “FAT” should be an easy one.
BTW, while making up the bill of materials for POC V1.4.1, I discovered I already have a lot of what I’m going to need. There are a couple of items that I have not used in the past that I will have to order. I usually buy multiples, since I figure I will use more in the future.
I am pleased if my prose can assist your post-lunch snooze in any way, Glenn!
BDD, I usually carefully check all my parts before I order the board, so I can discover whether the missing parts are unavailable before it's too late to change the circuit... I recall a discrete ALU I made a couple of years back that went through at least three PCB iterations because I couldn't get all the parts in the same family and package I initially wanted.
I am pleased if my prose can assist your post-lunch snooze in any way, Glenn!
It never fails! I eat dinner, take my after-dinner medicine (seven pills total ) and five minutes later my wife is telling me to not nod off, just as I’m about to do a face-plant into my plate.
Quote:
BDD, I usually carefully check all my parts before I order the board, so I can discover whether the missing parts are unavailable before it's too late to change the circuit...
Same here. The one time I didn’t do that, I ended scrapping freshly-made PCBs, modifying the circuit, fixing the PCB layout, and spending 2× the money for 1× the number of required PCBs.
PCBs have arrived, so it’s time to warm up the soldering iron!
V1.4.1’s SMT soldering has been completed, but the stresses and strains of daily living (plus wrestling with a TLS project on a client’s mail server) seem to be consuming too much time. Also, my wife, who underwent surgery in January, but did not recover very well, is needing a lot of care right now. So there’s that. I’ll eventually get to finishing V1.4.1, but not sure how soon.
Also, my wife, who underwent surgery in January, but did not recover very well, is needing a lot of care right now. So there’s that. I’ll eventually get to finishing V1.4.1, but not sure how soon.
I'm sorry to hear your wife's recovery isn't going well. I hope she improves.