6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 10, 2024 8:41 pm

All times are UTC




Post new topic Reply to topic  [ 581 posts ]  Go to page Previous  1 ... 35, 36, 37, 38, 39
Author Message
PostPosted: Thu Sep 05, 2024 6:46 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
——— POC V1.5 ———

In my never-ending quest to get my contraptions to run faster and do more, I’ve been mulling several ways of progressing.  One thing I’ve been wanting to do is build a unit with more than the 128KB of RAM that is in POC V1.3 and V1.4.  My immediate goal is to move up to 512KB, which can be done with a single SRAM, but with a little more complication in the glue logic.

In a 65C816 system with more than 64KB of RAM, a simplistic glue logic setup will likely result in ROM and I/O being mirrored in higher banks, which is usually undesirable.  In my designs, ROM and I/O appear in bank $00, starting at $00C000 (I/O), $00D000 (ROM), and ending at $00FFFF.  Lacking proper glue logic, those items would also appear at $01C000, $01D000, $02C000, $02D000, etc..  If mirroring is to be avoided, logic has to know when A16-A23 is not $00.

In a 512KB system, bank $00 detection would be implemented with logic that performs the equivalent operation of BNK0 = !(A16 | A17 | A18), in which | represents logical ORBNK0 will be true only when the effective address is $00xxxx.  A triple-input NOR gate would have to be hooked up to the three address lines coming out of the transparent latch used to capture A16-A18, with the gate’s output fed back into the glue logic to tell it when the address is $00xxxx.  Simple enough...until one considers the timing.

The problem is the A16-A18 address component doesn’t immediately appear during Ø2 low, as the latch adds its propagation delay to the delay that occurs from the fall of the clock to when the 816 emits a valid address (the tADS spec).  Added to that will be the prop delay of the above-mentioned NOR gate before the glue logic actually knows if the address is, or is not, $00xxxx.

My timing analysis indicates that the “it won’t work” Ø2 threshold with this arrangement is around 18 MHz, using discrete 74AC or 74AHC logic.  Allowing for variations in prop delay that are inevitable from one part to another, I projected that 16 MHz would be the practical Ø2 “ceiling,” something that has been borne out by real-world testing.  So I needed to find another way.

A while back, I had mentioned the possibility of building POC V1.5 with discrete logic that could support 512 KB of RAM.  I decided it would take too many gates (equating to soldering a lot of SOIC parts) and might not perform any better than V1.3, which is stable at 16 MHz.  Given that, I have concocted a substantially different design that consolidates glue logic and bank bits latching in two GALs, appropriately named GAL1 and GAL2:D

Here’s the memory map this new contraption will set up:

    Code:
    000000-00BFFF — base RAM (48 KB)
    00C000-00C37F — Input/output (1 KB)
    00C380-00C3FF — not decoded
    00C400-00CFFF — RAM (3 KB)
    00D000-00FFFF — ROM (12 KB)
    010000-07FFFF — extended RAM (448 KB)

Although the “island” of RAM created at $00C400-$00CFFF will be globally accessible (there is no memory protection), my intended use for it will be to give the firmware and/or operating system a private storage area for its direct page, MPU stack and other needed work space.  By doing so, the entire 48 KB of base RAM will be available for user programs, with the firmware or OS transparently remapping direct page and the stack during system API calls.

The I/O block will be mapped as follows:

    Code:
    00C000 — serial I/O ports ‘A’ & ‘B’, timer ‘A’
    00C080 — serial I/O ports ‘C’ & ‘D’, timer ‘B’
    00C100 — serial I/O IRQ status*
    00C180 — real-time clock
    00C200 — expansion port chip select ‘A’*
    00C280 — expansion port chip select ‘B’
    00C300 — expansion port chip select ‘C’

    *Not wait-stated.

The above I/O map is like that of POCs V1.2, V1.3 and V1.4, excepting for different addresses.

In order to make all this work, I’ve assigned the two GALs the following functions:

  • GAL1 contains the address decoding logic and generates corresponding chip selects.  It also tells the clock generator when to stretch Ø2 high for the purpose of wait-stating ROM and I/O accesses.

    GAL1’s inputs are A7-A15, the aforementioned BNK0 (a low-true signal emitted by GAL2), VDA and VPA.  This GAL uses only combinatorial logic, with no pins being used as feedback nodes.  Hence it will respond to all input combinations in no more than tPD nanoseconds, tPD being the advertised pin-to-pin prop time of the device.
     
  • GAL2 serves multiple purposes:

    • Latch A16-A18 during Ø2 low;
    • Produce fully-qualified /RD (read data) and /WD (write data) control signals;
    • Aggregate three separate IRQ inputs into a single IRQ output;
    • Generate the BNK0 signal needed by GAL1 to make its logic decisions.

    GAL2’s inputs are Ø1, D0-D2 (which are A16-A18 during Ø2 low), RWB, VDA, VPA, IRQA, IRQB and IRQC.  This GAL uses both combinatorial and registered logic, but should still perform at tPD due to no pin feedback nodes being used.

As GAL1 depends on an output from GAL2 ( its !BNK0 signal), both devices need to be fast in order to support high Ø2 rates.  My choices for the GAL are Microchip’s ATF22V10C-7PX, basically the venerable 22V10, but with better tPD, or their ATF750C-7PX, the latter which may be described as a 22V10 on steroids—the 750C has about 40 percent more gates that the 22V10.  The two types are electrically interchangeable, but require somewhat different programming methods when using registered logic due to, among other things, the 750C’s more-flexible clocking.

Both parts have a 7.5ns tPD rating, which in my implementation, should be achievable due to no pins being used as feedback nodes.  Hence the worst-case elapsed time from when the 816 emits an address to when a chip select has been generated will be 15ns, a level of performance that I cannot consistently achieve with the fastest equivalent discrete logic.  15ns total prop delay should result in stable operation at 20 MHz.

V1.5 also incorporates my second-generation, stretchable Ø2 clock circuit that is capable of at least 40 MHz operation.  I’m shooting for a top speed of 20 MHz in V1.5, but may try to go faster if it works.

I should note that observations of a couple of ATF22V10Cs I have here indicate that these devices run about 25 percent faster than guaranteed. While that isn’t something that should be relied upon—minor changes in voltage and/or temperature could slow down things, it might mean that I could get POC V1.5 running above 20 MHz. The clock generator circuit is designed so an extended wait-state may be configured to handle above-20 MHz operation.

Attachment:
File comment: POC V1.5 Schematic
poc_v1.5.pdf [338.42 KiB]
Downloaded 38 times
Attachment:
File comment: Glue Logic Design Files
logic.zip [3.92 KiB]
Downloaded 38 times
Attachment:
File comment: Microchip 22V10 GAL
atf22v10c.pdf [1.87 MiB]
Downloaded 38 times
Attachment:
File comment: Microchip ATF750C “Super GAL”
atf750c.pdf [491.27 KiB]
Downloaded 38 times

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 15, 2024 6:50 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
I’ve got a PCB layout done for POC V1.5. The board dimensions are 6-1/2" × 4", and it is in four layers.  I found that by positioning the SRAM and the data bus transceiver east-west, it was easier to make the connections without an excessive number of vias.  I also tinkered a bit with the schematic to add numerous test points that I can use with the logic probe, scope and/or logic analyzer to observe circuit operation.

Attachment:
File comment: POC V1.5 Schematic
poc_v1.5_sch.pdf [348.28 KiB]
Downloaded 39 times
Attachment:
File comment: POC V1.5 Printed Circuit Board
pocV1.5_pcb.gif
pocV1.5_pcb.gif [ 721.15 KiB | Viewed 1361 times ]

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Sep 15, 2024 7:25 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 15, 2024 7:10 am 
Online

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 958
Location: Potsdam, DE
Nice board layout, BDD. I assume the inner layers are the power.

But so many pullups? Doesn't the GAL pull to the rail? Or is it an open collector output?

And I'm unsure about the db[0..7] outputs after the buffer from the processor... high bits of the address? I'm only vaguely familiar with the 816.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 15, 2024 7:24 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
barnacle wrote:
Nice board layout, BDD. I assume the inner layers are the power.

Thanks!

The inner layers are power and ground.

Quote:
But so many pullups? Doesn't the GAL pull to the rail? Or is it an open collector output?

A GAL has TTL-level outputs, not CMOS, with a guaranteed VOH of 2.4, well below what constitutes a valid CMOS logic 1 in a 5 volt system.

In my post about the GAL test rig I built, I mentioned that I did some output voltage checks.  I discovered that while a GAL could drive its unloaded outputs a little past 4 volts, the voltage quickly deteriorated with loading.  Assuming that an unambiguous CMOS logic 1 occurs at 70 percent of VCC, that would be 3.5 volts in a 5 volt system.  A GAL’s outputs barely stay in that range under load, so there is a risk of noise-sensitivity.  Of course, “under load” when driving a CMOS input mostly means charging parasitic capacitance, since CMOS devices draw virtually no input current when the voltage is steady.  Hence the use of the pullups is mainly to assist the GAL in charging the parasitic capacitance.

Quote:
And I'm unsure about the db[0..7] outputs after the buffer from the processor... high bits of the address? I'm only vaguely familiar with the 816.

The 65C816 multiplexes the A16-A23 address bits on D0-D7 during Ø2 low.  A transparent latch, GAL2 in this design, is used to capture and latch those bits—actually, A16-A18—and drive them onto the corresponding inputs of the SRAM.  When Ø2 goes high, there is a small amount of overlap before the 816 stops emitting A16-A23 and starts treating the data bus as a data bus.  This overlap gives the latch enough time to close on the rise of the clock before the 816 “turns around” the bus.

However, the overlap period also creates a window of opportunity for bus contention.  During a read cycle, /RD will be asserted right after the rise of the clock and the selected device will start driving the data bus, possibly while the 816 is still emitting A16-A23.  The resulting contention may inject of a lot of noise into the power and ground planes due to the momentarily-high current flow.  The transceiver, which will be in the high-Z state during Ø2 low, will remain in that state during the overlap period, thus closing the bus contention window.

The bus pullups prevent floating while the transceiver is in the high-Z state.  The transceiver also acts as a level converter, as both the SRAM and ROM have TTL-level outputs, whilst the 816’s inputs are CMOS.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 15, 2024 7:35 pm 
Online

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 958
Location: Potsdam, DE
Gotcha, thanks.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 20, 2024 4:56 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
Okay, it appears I’ve run into a design problem involving GAL2, which is, among other things, in charge of capturing and latching the A16-A18 component of the effective address.  Here’s the CUPL code for this GAL, with some lines excised for brevity:

Code:
Name        gal2;
PartNo      C406210002;
Date        2024/06/21;
Revision    1.0.0;
Designer    BDD;
Company     BCS Technology Limited;
Assembly    POC V1.5;
Location    U4;
Device      v750c;     /* an enhanced 22V10 */

/*       SIGNAL     TYPE    FUNCTION                  */
/*====================================================*/
pin  1 = PHI2;   /* input   PHI2 clock                */
pin  6 = D0;     /* input   MPU bank/data             */
pin  7 = D1;     /* input   MPU bank/data             */
pin  8 = D2;     /* input   MPU bank/data             */

pin 16 = A16;    /* output  address bit               */
pin 17 = A17;    /* output  address bit               */
pin 18 = A18;    /* output  address bit               */
/*====================================================*/


/*** EXTENDED RAM LOGIC ***/

[A16..18].ck = PHI2;           /* clocks the “latches”  */
[A16..18].d  = [D0..2];        /* A16-A18 should be following D0-D2 while PHI2 is low */

The above code will compile without error, but testing in the GAL tester indicates that the latching isn’t working as intended, in that A16-A18 latching is not transparent.

If I set the D0-D2 pattern while PHI2 is low, the A16-A18 outputs should follow the D0-D2 inputs.  That doesn’t happen until PHI2 goes high, which means latching will fail to meet timing deadlines.  The GAL needs to act like a 74x373 or 74x573 D-type transparent latch.  Instead, it is acting like a C-D flip-flop, e.g., one-half of a 74x74.

I spent a fair amount of time studying the CUPL programmer’s guide and also searching on-line to find out how to do a transparent latch with a GAL.  So far, I’ve found nothing of value.  It seems I may have designed something that cannot be implemented in hardware.

Incidentally, I tested this using a 22V10 instead of the ATF750C.  The behavior is exactly the same.

Urk!

Attachment:
File comment: CUPL Programmer’s Reference
cupl_reference.pdf [814.53 KiB]
Downloaded 33 times

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 20, 2024 2:21 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
BigDumbDinosaur wrote:
I spent a fair amount of time studying the CUPL programmer’s guide and also searching on-line to find out how to do a transparent latch with a GAL.  So far, I’ve found nothing of value.  It seems I may have designed something that cannot be implemented in hardware.
Do the extensions .L (transparent latch input) and .LE (Latch Enable) not work on that PLD? (Page 24 of the manual you linked to)

Quick testing shows the answer is no. CUPL doesn't recognize .L on that part, Changing to a ATF1500 and you can compile using .L and .LE, which I think might do what you want.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 20, 2024 3:01 pm 
Online

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
The answer is in the datasheets. ATF1500 outputs can be configured as latches. ATF750 and 22V10 outputs cannot. The ATF1500 datasheet makes a big enough deal about the latch feature that I can't imagine it wouldn't be mentioned in the others if they had it.

It's possible to roll your own latch with a multiplexer, feeding the output into one of the inputs. But that might not translate into a GAL - the multiplexing logic inevitably gets mixed with the input signal logic, and even if it's possible to avoid glitches (adding redundant terms to cover all paths from one state to another) I wouldn't trust a compiler to get it right.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 20, 2024 7:14 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
BigDumbDinosaur wrote:
Okay, it appears I’ve run into a design problem involving GAL2, which is, among other things, in charge of capturing and latching the A16-A18 component of the effective address...

A query to Microchip settled this.

Cutting to the chase, what I want to do cannot be done with an SPLD without eating up a lot of logic resources and most critically, using pin nodes for logic feedback.  Doing the latter will effectively double the prop time when latching the bank bits, which arrangement will not be able to meet the timing deadlines I have set.

Bottom line is I’ve deemed my dual GAL design unworkable, unless I want to run a relatively-slow Ø2 clock (which I don’t).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 23, 2024 5:09 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
BigDumbDinosaur wrote:
Bottom line is I’ve deemed my dual GAL design unworkable...

I’m respinning V1.5 to use an ATF1504AS CPLD in PLCC44.  While my design won’t strain the 1504 for logic resources—I’m using only 32 percent of the available macrocells, it does use all 32 uncommitted I/O pins.  One of these days, I need to work out how to use the QFP100 package, which offers 64 uncommitted pins in the 1504 (and 80 such pins in the 1508).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 26, 2024 6:33 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
I’m busy hacking away at the design for my respun POC V1.5 and have had to solve a resource problem.  I’m using an ATF1504AS CPLD for glue logic and figured since I have a bunch of macrocells, flip-flops and such at my disposal, why not get fancy with the design?

The only flaw with that thinking is I quickly used up all the available I/O pins.  That prompted me to move the generation of the /RD (read data) and /WD (write data) signals out of the CPLD and back to discrete hardware so I could use those CPLD pins for other functions.  That’s not as bad as it might sound, as a couple of NAND gates in a single SOIC-14 chip that takes up relatively little PCB space is all that is required to produce those signals.

I should mention that V1.5 will have ROM in the $00D000-$00FFFF range, following a memory-mapping pattern I established starting with POC V1.2.  Since I’ve got a lot more flexibility with doing the glue logic in a CPLD, I added the capability of causing a write to ROM address space to “bleed through” to RAM at the same address, that RAM below ROM being referred to as HIRAM.  With this feature, it will be possible to copy (“shadow”) ROM to HIRAM, switch out the ROM and run the firmware from HIRAM, avoiding the frequent wait-states that occur with running from ROM.

Something to consider with such an arrangement is the danger of a misbehaving program accidentally scribbling on HIRAM and corrupting the firmware.  I needed to devise a way to write-protect HIRAM when ROM is mapped out.  This proved to be not as simple as it could be because the /RD and /WD generation was no longer in the CPLD, where it would have been a trivial exercise to arrange write protection.  However, by moving /RD and /WD out of the CPLD, I did free two I/O pins, which I could use to control HIRAM access.

As my thinking evolved, I decided that when ROM is mapped in, HIRAM will be write-enabled, which arrangement will support the copying of ROM into HIRAM (an excellent application for the 65C816’s MVN instruction).  When ROM is mapped out, HIRAM will be write-protected.  I would do this by using one of the now-free I/O pins to act as an input to control the HIRAM function, and use the other now-free I/O pin to tell the read/write logic to prevent a write to HIRAM when ROM is mapped out.

All well and good, but the read/write logic that I have used in earlier POC permutations blindly assumes that when Ø2 is high, it’s okay to read or write.  That would have to change.  Here is the discrete read/write circuit I have used in older POC units:

Attachment:
File comment: Read/Write Generation
read_write_qualify_alt.gif
read_write_qualify_alt.gif [ 46.98 KiB | Viewed 134 times ]

My plan was to use the above basic circuit, but with it rigged up so I could prevent /WD from being driven low when the effective address is $00D000-$00FFFF and ROM is mapped out.  I pondered the situation for a while before a simple solution presented itself.  Here it is:

Attachment:
File comment: Read/Controlled Write Generation
controlled_read_write.gif
controlled_read_write.gif [ 12.66 KiB | Viewed 134 times ]

Operation is similar to that of the first circuit, except if the /WDI (write data inhibit) signal is driven low, the /WD output can never go low despite the MPU being in a write cycle.  The CPLD logic will control /WDI and assert it if the effective address is $00D000-$00FFFF and ROM has been mapped out.  That will occur during Ø2 low around 7 nanoseconds after the MPU has emitted a valid address, which is well before Ø2 goes high and the potential write cycle starts.

There are probably other ways of accomplishing this, but it is, as I said, a simple solution and introduces little propagation delay to /RD and /WD generation—there is only one gate delay when /RD or /WD is to be asserted.  I was hoping to keep the discrete logic to an absolute minimum, but unless I go to a larger CPLD package (e.g., PLCC84 or QFP100), I don’t have the required number of I/O pins to put all the logic into the CPLD.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 581 posts ]  Go to page Previous  1 ... 35, 36, 37, 38, 39

All times are UTC


Who is online

Users browsing this forum: No registered users and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: