POC Computer Version One

For discussing the 65xx hardware itself or electronics projects.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

POC Computer Version One: POC V1.5

Post by BigDumbDinosaur »

——— POC V1.5 ———

In my never-ending quest to get my contraptions to run faster and do more, I’ve been mulling several ways of progressing.  One thing I’ve been wanting to do is build a unit with more than the 128KB of RAM that is in POC V1.3 and V1.4.  My immediate goal is to move up to 512KB, which can be done with a single SRAM, but with a little more complication in the glue logic.

In a 65C816 system with more than 64KB of RAM, a simplistic glue logic setup will likely result in ROM and I/O being mirrored in higher banks, which is usually undesirable.  In my designs, ROM and I/O appear in bank $00, starting at $00C000 (I/O), $00D000 (ROM), and ending at $00FFFF.  Lacking proper glue logic, those items would also appear at $01C000, $01D000, $02C000, $02D000, etc..  If mirroring is to be avoided, logic has to know when A16-A23 is not $00.

In a 512KB system, bank $00 detection would be implemented with logic that performs the equivalent operation of BNK0 = !(A16 | A17 | A18), in which | represents logical ORBNK0 will be true only when the effective address is $00xxxx.  A triple-input NOR gate would have to be hooked up to the three address lines coming out of the transparent latch used to capture A16-A18, with the gate’s output fed back into the glue logic to tell it when the address is $00xxxx.  Simple enough...until one considers the timing.

The problem is the A16-A18 address component doesn’t immediately appear during Ø2 low, as the latch adds its propagation delay to the delay that occurs from the fall of the clock to when the 816 emits a valid address (the tADS spec).  Added to that will be the prop delay of the above-mentioned NOR gate before the glue logic actually knows if the address is, or is not, $00xxxx.

My timing analysis indicates that the “it won’t work” Ø2 threshold with this arrangement is around 18 MHz, using discrete 74AC or 74AHC logic.  Allowing for variations in prop delay that are inevitable from one part to another, I projected that 16 MHz would be the practical Ø2 “ceiling,” something that has been borne out by real-world testing.  So I needed to find another way.

A while back, I had mentioned the possibility of building POC V1.5 with discrete logic that could support 512 KB of RAM.  I decided it would take too many gates (equating to soldering a lot of SOIC parts) and might not perform any better than V1.3, which is stable at 16 MHz.  Given that, I have concocted a substantially different design that consolidates glue logic and bank bits latching in two GALs, appropriately named GAL1 and GAL2:D

Here’s the memory map this new contraption will set up:

  • Code: Select all

    000000-00BFFF — base RAM (48 KB)
    00C000-00C37F — Input/output (1 KB)
    00C380-00C3FF — not decoded
    00C400-00CFFF — RAM (3 KB)
    00D000-00FFFF — ROM (12 KB)
    010000-07FFFF — extended RAM (448 KB)


Although the “island” of RAM created at $00C400-$00CFFF will be globally accessible (there is no memory protection), my intended use for it will be to give the firmware and/or operating system a private storage area for its direct page, MPU stack and other needed work space.  By doing so, the entire 48 KB of base RAM will be available for user programs, with the firmware or OS transparently remapping direct page and the stack during system API calls.

The I/O block will be mapped as follows:

  • Code: Select all

    00C000 — serial I/O ports ‘A’ & ‘B’, timer ‘A’
    00C080 — serial I/O ports ‘C’ & ‘D’, timer ‘B’
    00C100 — serial I/O IRQ status*
    00C180 — real-time clock
    00C200 — expansion port chip select ‘A’*
    00C280 — expansion port chip select ‘B’
    00C300 — expansion port chip select ‘C’
    
    *Not wait-stated.


The above I/O map is like that of POCs V1.2, V1.3 and V1.4, excepting for different addresses.

In order to make all this work, I’ve assigned the two GALs the following functions:

  • GAL1 contains the address decoding logic and generates corresponding chip selects.  It also tells the clock generator when to stretch Ø2 high for the purpose of wait-stating ROM and I/O accesses.

    GAL1’s inputs are A7-A15, the aforementioned BNK0 (a low-true signal emitted by GAL2), VDA and VPA.  This GAL uses only combinatorial logic, with no pins being used as feedback nodes.  Hence it will respond to all input combinations in no more than tPD nanoseconds, tPD being the advertised pin-to-pin prop time of the device.
     
  • GAL2 serves multiple purposes:
    • Latch A16-A18 during Ø2 low;
    • Produce fully-qualified /RD (read data) and /WD (write data) control signals;
    • Aggregate three separate IRQ inputs into a single IRQ output;
    • Generate the BNK0 signal needed by GAL1 to make its logic decisions.
    GAL2’s inputs are Ø1, D0-D2 (which are A16-A18 during Ø2 low), RWB, VDA, VPA, IRQA, IRQB and IRQC.  This GAL uses both combinatorial and registered logic, but should still perform at tPD due to no pin feedback nodes being used.


As GAL1 depends on an output from GAL2 ( its !BNK0 signal), both devices need to be fast in order to support high Ø2 rates.  My choices for the GAL are Microchip’s ATF22V10C-7PX, basically the venerable 22V10, but with better tPD, or their ATF750C-7PX, the latter which may be described as a 22V10 on steroids—the 750C has about 40 percent more gates that the 22V10.  The two types are electrically interchangeable, but require somewhat different programming methods when using registered logic due to, among other things, the 750C’s more-flexible clocking.

Both parts have a 7.5ns tPD rating, which in my implementation, should be achievable due to no pins being used as feedback nodes.  Hence the worst-case elapsed time from when the 816 emits an address to when a chip select has been generated will be 15ns, a level of performance that I cannot consistently achieve with the fastest equivalent discrete logic.  15ns total prop delay should result in stable operation at 20 MHz.

V1.5 also incorporates my second-generation, stretchable Ø2 clock circuit that is capable of at least 40 MHz operation.  I’m shooting for a top speed of 20 MHz in V1.5, but may try to go faster if it works.

I should note that observations of a couple of ATF22V10Cs I have here indicate that these devices run about 25 percent faster than guaranteed. While that isn’t something that should be relied upon—minor changes in voltage and/or temperature could slow down things, it might mean that I could get POC V1.5 running above 20 MHz. The clock generator circuit is designed so an extended wait-state may be configured to handle above-20 MHz operation.

poc_v1.5.pdf
POC V1.5 Schematic
(338.42 KiB) Downloaded 105 times
logic.zip
Glue Logic Design Files
(3.92 KiB) Downloaded 94 times
atf22v10c.pdf
Microchip 22V10 GAL
(1.87 MiB) Downloaded 103 times
atf750c.pdf
Microchip ATF750C “Super GAL”
(491.27 KiB) Downloaded 103 times
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: POC Computer Version One

Post by BigDumbDinosaur »

I’ve got a PCB layout done for POC V1.5. The board dimensions are 6-1/2" × 4", and it is in four layers.  I found that by positioning the SRAM and the data bus transceiver east-west, it was easier to make the connections without an excessive number of vias.  I also tinkered a bit with the schematic to add numerous test points that I can use with the logic probe, scope and/or logic analyzer to observe circuit operation.

poc_v1.5_sch.pdf
POC V1.5 Schematic
(348.28 KiB) Downloaded 106 times
POC V1.5 Printed Circuit Board
POC V1.5 Printed Circuit Board
Last edited by BigDumbDinosaur on Sun Sep 15, 2024 7:25 pm, edited 1 time in total.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: POC Computer Version One

Post by barnacle »

Nice board layout, BDD. I assume the inner layers are the power.

But so many pullups? Doesn't the GAL pull to the rail? Or is it an open collector output?

And I'm unsure about the db[0..7] outputs after the buffer from the processor... high bits of the address? I'm only vaguely familiar with the 816.

Neil
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: POC Computer Version One

Post by BigDumbDinosaur »

barnacle wrote:
Nice board layout, BDD. I assume the inner layers are the power.

Thanks!

The inner layers are power and ground.

Quote:
But so many pullups? Doesn't the GAL pull to the rail? Or is it an open collector output?

A GAL has TTL-level outputs, not CMOS, with a guaranteed VOH of 2.4, well below what constitutes a valid CMOS logic 1 in a 5 volt system.

In my post about the GAL test rig I built, I mentioned that I did some output voltage checks.  I discovered that while a GAL could drive its unloaded outputs a little past 4 volts, the voltage quickly deteriorated with loading.  Assuming that an unambiguous CMOS logic 1 occurs at 70 percent of VCC, that would be 3.5 volts in a 5 volt system.  A GAL’s outputs barely stay in that range under load, so there is a risk of noise-sensitivity.  Of course, “under load” when driving a CMOS input mostly means charging parasitic capacitance, since CMOS devices draw virtually no input current when the voltage is steady.  Hence the use of the pullups is mainly to assist the GAL in charging the parasitic capacitance.

Quote:
And I'm unsure about the db[0..7] outputs after the buffer from the processor... high bits of the address? I'm only vaguely familiar with the 816.

The 65C816 multiplexes the A16-A23 address bits on D0-D7 during Ø2 low.  A transparent latch, GAL2 in this design, is used to capture and latch those bits—actually, A16-A18—and drive them onto the corresponding inputs of the SRAM.  When Ø2 goes high, there is a small amount of overlap before the 816 stops emitting A16-A23 and starts treating the data bus as a data bus.  This overlap gives the latch enough time to close on the rise of the clock before the 816 “turns around” the bus.

However, the overlap period also creates a window of opportunity for bus contention.  During a read cycle, /RD will be asserted right after the rise of the clock and the selected device will start driving the data bus, possibly while the 816 is still emitting A16-A23.  The resulting contention may inject of a lot of noise into the power and ground planes due to the momentarily-high current flow.  The transceiver, which will be in the high-Z state during Ø2 low, will remain in that state during the overlap period, thus closing the bus contention window.

The bus pullups prevent floating while the transceiver is in the high-Z state.  The transceiver also acts as a level converter, as both the SRAM and ROM have TTL-level outputs, whilst the 816’s inputs are CMOS.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: POC Computer Version One

Post by barnacle »

Gotcha, thanks.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

POC Computer Version One — V1.5 Logic Woes

Post by BigDumbDinosaur »

Okay, it appears I’ve run into a design problem involving GAL2, which is, among other things, in charge of capturing and latching the A16-A18 component of the effective address.  Here’s the CUPL code for this GAL, with some lines excised for brevity:

Code: Select all

Name        gal2;
PartNo      C406210002;
Date        2024/06/21;
Revision    1.0.0;
Designer    BDD;
Company     BCS Technology Limited;
Assembly    POC V1.5;
Location    U4;
Device      v750c;     /* an enhanced 22V10 */ 

/*       SIGNAL     TYPE    FUNCTION                  */
/*====================================================*/
pin  1 = PHI2;   /* input   PHI2 clock                */
pin  6 = D0;     /* input   MPU bank/data             */
pin  7 = D1;     /* input   MPU bank/data             */
pin  8 = D2;     /* input   MPU bank/data             */

pin 16 = A16;    /* output  address bit               */
pin 17 = A17;    /* output  address bit               */
pin 18 = A18;    /* output  address bit               */
/*====================================================*/


/*** EXTENDED RAM LOGIC ***/

[A16..18].ck = PHI2;           /* clocks the “latches”  */
[A16..18].d  = [D0..2];        /* A16-A18 should be following D0-D2 while PHI2 is low */

The above code will compile without error, but testing in the GAL tester indicates that the latching isn’t working as intended, in that A16-A18 latching is not transparent.

If I set the D0-D2 pattern while PHI2 is low, the A16-A18 outputs should follow the D0-D2 inputs.  That doesn’t happen until PHI2 goes high, which means latching will fail to meet timing deadlines.  The GAL needs to act like a 74x373 or 74x573 D-type transparent latch.  Instead, it is acting like a C-D flip-flop, e.g., one-half of a 74x74.

I spent a fair amount of time studying the CUPL programmer’s guide and also searching on-line to find out how to do a transparent latch with a GAL.  So far, I’ve found nothing of value.  It seems I may have designed something that cannot be implemented in hardware.

Incidentally, I tested this using a 22V10 instead of the ATF750C.  The behavior is exactly the same.

Urk!

cupl_reference.pdf
CUPL Programmer’s Reference
(814.53 KiB) Downloaded 129 times
x86?  We ain't got no x86.  We don't NEED no stinking x86!
SamCoVT
Posts: 344
Joined: 13 May 2018

Re: POC Computer Version One — V1.5 Logic Woes

Post by SamCoVT »

BigDumbDinosaur wrote:
I spent a fair amount of time studying the CUPL programmer’s guide and also searching on-line to find out how to do a transparent latch with a GAL.  So far, I’ve found nothing of value.  It seems I may have designed something that cannot be implemented in hardware.
Do the extensions .L (transparent latch input) and .LE (Latch Enable) not work on that PLD? (Page 24 of the manual you linked to)

Quick testing shows the answer is no. CUPL doesn't recognize .L on that part, Changing to a ATF1500 and you can compile using .L and .LE, which I think might do what you want.
John West
Posts: 383
Joined: 03 Sep 2002

Re: POC Computer Version One

Post by John West »

The answer is in the datasheets. ATF1500 outputs can be configured as latches. ATF750 and 22V10 outputs cannot. The ATF1500 datasheet makes a big enough deal about the latch feature that I can't imagine it wouldn't be mentioned in the others if they had it.

It's possible to roll your own latch with a multiplexer, feeding the output into one of the inputs. But that might not translate into a GAL - the multiplexing logic inevitably gets mixed with the input signal logic, and even if it's possible to avoid glitches (adding redundant terms to cover all paths from one state to another) I wouldn't trust a compiler to get it right.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: POC Computer Version One — V1.5 Logic Woes

Post by BigDumbDinosaur »

BigDumbDinosaur wrote:
Okay, it appears I’ve run into a design problem involving GAL2, which is, among other things, in charge of capturing and latching the A16-A18 component of the effective address...
A query to Microchip settled this.

Cutting to the chase, what I want to do cannot be done with an SPLD without eating up a lot of logic resources and most critically, using pin nodes for logic feedback.  Doing the latter will effectively double the prop time when latching the bank bits, which arrangement will not be able to meet the timing deadlines I have set.

Bottom line is I’ve deemed my dual GAL design unworkable, unless I want to run a relatively-slow Ø2 clock (which I don’t).
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: POC Computer Version One — V1.5 Logic Woes

Post by BigDumbDinosaur »

BigDumbDinosaur wrote:
Bottom line is I’ve deemed my dual GAL design unworkable...
I’m respinning V1.5 to use an ATF1504AS CPLD in PLCC44.  While my design won’t strain the 1504 for logic resources—I’m using only 32 percent of the available macrocells, it does use all 32 uncommitted I/O pins.  One of these days, I need to work out how to use the QFP100 package, which offers 64 uncommitted pins in the 1504 (and 80 such pins in the 1508).
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

POC Computer Version One: V1.5 cont’d

Post by BigDumbDinosaur »

I’m busy hacking away at the design for my respun POC V1.5 and have had to solve a resource problem.  As I mentioned in the previous post, I’m using an ATF1504AS CPLD for glue logic.  I figured since I have a bunch of macrocells, flip-flops and other assorted toys at my disposal, why not get fancy with the design?  (CPLD logic attached.)

The only flaw with that thinking is I quickly used up all the available I/O pins.  That prompted me to move the generation of the /RD (read data) and /WD (write data) signals out of the CPLD and back to discrete hardware so I could use those CPLD pins for other functions.  That’s not as bad as it might sound, as a couple of NAND gates in a single SOIC-14 chip that takes up relatively little PCB space is all that is required to produce those signals.

Like its predecessors, starting with V1.2, V1.5 will have ROM in the $00D000-$00FFFF range.  Since I’ve got a lot more flexibility with doing the glue logic in a CPLD, I added the capability of causing a write to ROM address space to “bleed through” to RAM at the same address, that RAM below ROM being referred to as HIRAM.  With this feature, it will be possible to copy (“shadow”) ROM into HIRAM, switch out the ROM and run the firmware from the shadow copy, thus avoiding the frequent wait-states that occur with running from ROM.

Something to consider with such an arrangement is the danger of a misbehaving program accidentally scribbling on the shadowed firmware.  I needed to devise a way to write-protect HIRAM when ROM is mapped out.  This proved to be not as simple as it could be because the /RD and /WD logic was no longer in the CPLD, where it would have been a trivial exercise to arrange write protection.  However, by moving /RD and /WD out of the CPLD, I did free up two I/O pins, which I could use to control HIRAM access.

As my thinking evolved, I decided that when ROM is mapped in, HIRAM will be write-enabled.  Such an arrangement will support the copying of ROM into HIRAM—an excellent application for the 65C816’s MVN instruction.  When ROM is mapped out, HIRAM will be write-protected.¹  I would do this by using one of the now-free I/O pins to act as an input to control the HIRAM function, and use the other now-free I/O pin to tell the discrete read/write logic to disallow a write to HIRAM when ROM is mapped out.

All well and good, but the discrete read/write logic that I have used in earlier POC permutations blindly assumes that when Ø2 is high, it’s okay to read or write.  Here is the discrete read/write circuit I have used in older POC units:

Read/Write Generation
Read/Write Generation

In order to implement write-protection, I modified the above circuit so I can prevent /WD from being asserted when the effective address is $00D000-$00FFFF and ROM is mapped out.  It’s a simple solution:

Read/Write Generation w/HIRAM Write Protection
Read/Write Generation w/HIRAM Write Protection

Operation is similar to that of the first circuit, except if the /WDI (write data inhibit) signal is driven low, the /WD output can never go low despite the MPU being in a write cycle.  The CPLD logic will assert /WDI if the expression VDA || VPA is true, the effective address is $00D000-$00FFFF and ROM has been mapped out.  Using a -7 version of Microchip’s ATF1504 CPLD, I’d expect that /WDI would be asserted about 19 nanoseconds after the fall of Ø2.  With a 20 MHz clock, I would have about 18ns of timing headroom before the rise of the clock.  Adding to that the time required for the NAND gate to react to the high-going clock input, and I think I will have bulletproof HIRAM write protection.

There are probably other ways of accomplishing this, but it is, as I said, a simple solution.  I was hoping to keep the discrete logic to an absolute minimum, but unless I go to a larger CPLD package, e.g., PLCC84 or QFP100, I don’t have enough I/O pins to put all the logic into the CPLD.

cpld.txt
CPLD WinCUPL Logic
(10.6 KiB) Downloaded 74 times

————————————————————
¹I am also looking at the idea of somehow rigging up the write protection logic to toggle the 65C816’s ABORTB input.  Such an arrangement would trigger a processing exception if an attempt is made to write to HIRAM when ROM is mapped out.  The only fly in this ointment would be the tricky timing involved with using ABORTB.
————————————————————
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

POC Computer Version One: SCSI Revisited Again

Post by BigDumbDinosaur »

BigDumbDinosaur wrote:
...With a new SCSI driver not having the SEI - CLI pair in the data-in loop, the logic analyzer indicated that 1.375 µsecs elapsed between successive /DACK pulses, translating to a transfer rate of 710 KB/second, an 18 percent improvement...For now, 710 KB/second is about as fast as POC V1.3 is going to go during SCSI operations.

Now, 710KB/second mass-storage access isn’t bad for a homebrew computer reading and writing eight bits at a time.  However, that limit continues to bug me and I’ve been intermittently thinking of ways to improve it.  The ideal :?: solution would be a DMA controller running at bus speeds.  With POC V1.3 running at 16 MHz and a DMA controller reading and writing on alternating clock cycles, a theoretical transfer rate of 8 MB/second would be possible.  That’s faster than the raw SCSI bus speed in asynchronous mode.

Alas, I don’t have a DMA controller.  :cry:

While pondering this lamentable state of affairs, it occurred to me that a far-out solution would be to somehow take advantage of the fact that the 65C816 has those fast MVN and MVP instructions, which are able to copy data from anywhere to anywhere at the rate of one byte per seven clock cycles.  In POC V1.3, that translates to a theoretical speed of ~2.28 MB/second.

The nature of a SCSI transaction apparently makes MVN appropriate for this sort of thing...but it isn’t quite as simple as it seems.  Any SCSI transaction involves data being moved between core and the host adapter (HBA).  The core address is incremented with each byte transferred, whereas the HBA’s DMA port address is static.  That difference gives rise to the first of two problems in doing pretend DMA with MVN: each byte transferred by the MPU increments both the source and destination addresses, which are held in the X- and Y-registers, respectively.  On the face of it, that would seem to be a show-stopper.  However...

In all POC units built to date, I/O is decoded into pages, which means the HBA’s DMA port appears at 256 contiguous locations.  That being the case, MVN can be used in 256-byte “burst” transfers, since address incrementing would still allow the MPU to “see” the DMA port as long as the I/O page boundary isn’t crossed.  Once 256 bytes have been transferred, the register carrying the DMA port address must be set back to the start of the page for the next burst.  The number of burst transfers required would be INT(BYTES/256), a value that is readily calculated solely with arithmetic shifts.

Many SCSI transfers involve reading or writing a block from a mass storage device’s medium, typically 512 or 2048 bytes, numbers that are evenly divisible by 256—producing the  best possible transfer rate with this scheme.  However, some transfers acquire status information from or write setup information to a device, a data quantity that is rarely evenly divisible by 256.  Hence some accommodation must be made to the setup functions to recognize when a 256-byte burst transfer may not be appropriate.

The method I am outlining obviously incurs some overhead with each 256-byte burst, but that overhead can be kept small enough to not significantly affect the overall transfer rate.  With a single block transfer from a disk (512 bytes), only one reset of the register carrying the HBA address is needed, as well as a refresh of the byte count in the accumulator.  The rest of the work will be done entirely in the MPU as MVN executes.  Which brings us to problem number two...

A few posts back, I had explained the DMA handshake used in POC’s SCSI driver.  Basically, the HBA hardware asserts its DREQ output when it is ready for access, and the MPU asserts the HBA’s /DACK input to read or write a byte.  If DREQ is false, the MPU will spin in a tight loop until DREQ goes true again...or until an interrupt re-vectors execution.

If I access the HBA with MVN, I won’t have a way to poll DREQ, since the MPU will run MVN to completion before moving on to the next instruction—only an interrupt will stop MVN in its tracks.  Not having the DREQ handshake creates the potential for a read operation to return garbage or a write operation to accidentally overflow the HBA’s FIFO.

In order to avoid such contretemps, I am banking on the fact that the SCSI subsystem hardware is significantly faster than the 65C816 and data flow will be continuous.  During testing of the second-design host adapter, I noted that DREQ was continuously asserted during a read operation until all bytes had been fetched from the HBA.  If that holds true for all cases, use of MVN should not mess up.

With the theorizing handled, the next step was to write a test framework:

xfr_size_calc.asm
Some Pretend DMA Using MVN
(3.51 KiB) Downloaded 51 times

The above code computes the number of 256-byte burst cycles required for a transfer, as well as the values for a less-than-256-byte transfer.  It also goes through the motions of a transfer from the HBA to core.  It was assembled and tested in the Kowalski simulator, and demonstrated that the concept will work—at least in simulation.

While testing, I noted the number of clock cycles needed to complete a 1KB transfer—which size corresponds to a logical block in my nascent 816NIX kernel— and computed execution times with the assumption of a 16 MHz Ø2 clock.  It took 464 µseconds to complete the transfer, which included the setup overhead.  That would theoretically result in a raw transfer rate of 2.20 MB/second, very close to the uninterrupted execution rate of MVN.

However, that number that doesn’t account for I/O hardware wait-stating.  During a 1KB transfer, 1024 cycles will be consumed in wait-states, which at 16 MHz, is an extra 64 µseconds.  So the real time required to complete a transfer would be 528 µseconds, producing a likely transfer speed of 1.93 MB/second, which is still a not-insignificant improvement over 710 KB/second, a 2.72:1 speedup, in fact.

Any IRQs that were to occur during a transfer would, of course, steal more cycles and further degrade the transfer rate, but to what extent would be difficult to estimate.  The one predictable IRQ source is the jiffy interrupt, which occurs at 10-millisecond intervals.  With a 1KB transfer completing in ~528 µseconds, there is a 1-in-19 chance a jiffy IRQ will hit during a transfer.  An interrupt source more likely to steal cycles would be the serial I/O subsystem, since SIO throughput tends to be “bursty” in nature.

In order to see if my hairbrained idea will actually work on real hardware, I need to dust off my SCSI driver development environment that I used to write the currently-implemented driver and modify it with this new-and-improved transfer method.  In other words, put it on the machine and see if it goes or blows.

First step will be to verify that the NMI and reset push buttons on POC V1.3 are properly functioning...I suspect I will be needing them.  :shock:
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: POC Computer Version One: SCSI Revisited Again

Post by BigDumbDinosaur »

BigDumbDinosaur wrote:
Alas, I don’t have a DMA controller.  :cry:

While pondering this lamentable state of affairs, it occurred to me that a far-out solution would be to somehow take advantage of the fact that the 65C816 has those fast MVN and MVP instructions, which are able to copy data from anywhere to anywhere at the rate of one byte per seven clock cycles.  In POC V1.3, that translates to a theoretical speed of ~2.28 MB/second.
So much for that idea.  Every so often, garbage gets processed, especially on long reads.  I think what is happening is if the disk being read has to do a seek as data is flowing, there is a short interruption before the track buffer refills.  While that is happening, the 53CF94’s FIFO empties and DREQ is de-asserted, but the MPU doesn’t know that and keeps reading, thus loading garbage.

Oh well, it was an interesting exercise.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: POC Computer Version One: SCSI Revisited Again

Post by barrym95838 »

BigDumbDinosaur wrote:
... the 53CF94’s FIFO empties and DREQ is de-asserted, but the MPU doesn’t know that and keeps reading, thus loading garbage.
Could you hack together a little hardware/software gizmo with an (interruptible) interrupt service routine to pause just your block moves until DREQ reasserts?
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
leepivonka
Posts: 167
Joined: 15 Apr 2016

Re: POC Computer Version One

Post by leepivonka »

Seeds of ideas:

1: Is it safe to assume that once a block starts transferring, it is completely available in the disk controller's memory & nothing will slow the transfer down? So you could make sure that each block has started transferring, then finish that block with a MVN transfer.

2: Have DREQ stall the CPU's access to the SCSI controller's data transfer page if data isn't ready yet. Additionally have a timeout that sets a "fail" status & frees the CPU.
Post Reply