6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 9:35 pm

All times are UTC




Post new topic Reply to topic  [ 544 posts ]  Go to page 1, 2, 3, 4, 5 ... 37  Next
Author Message
 Post subject: POC VERSION TWO
PostPosted: Mon Nov 15, 2010 9:05 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
Now that I've got POC V1.0 working pretty good (at speeds up to 10 MHz), I'm ready to march forward with POC V2.0.

——————————————————————————————————————————————————————————————————————————————
NOTICE: I have terminated this project. A new topic has been started on the "new direction" POC V2.
——————————————————————————————————————————————————————————————————————————————

This new version uses a 22V10C GAL to replace the 74xx glue logic in the first unit, and also implements wait-stating when I/O hardware or ROM is accessed. The wait-state logic is handled by a second GAL (16V8C) and a dual 74ABT74 C-D flip-flop. Since the gate delays imposed by the old logic have been eliminated and wait-stating is now available, higher speeds should be possible than before. I'm going to try for a 20 MHz Ø2 clock this time around.

Also in this new design is a SCSI-SE port, driven by a 53C94 intelligent SCSI controller. The 'C94 is able to execute SCSI bus protocol sequences in hardware, thus offloading a considerable amount of work from the MPU. Without having to manipulate the SCSI bus signals in software, relatively simple (?) interrupt-driven code can be used to access SCSI devices. The 'C94 can be rigged up to run in PIO or DMA mode—I'm implementing PIO in this design due to not having a suitable DMA controller.

As with the POC V1 unit, POC V2.0 includes a dual 2692A ACIA for EIA-232 I/O and a Maxim (Dallas) DS1511 real time clock and watchdog timer.

POC V1 had contiguous RAM from $0000 to $CFFF—implemented in 12 ns SRAM—and I/O at $D000, leaving 8K for the ROM. This version will reduce RAM to 48K, topping out at $BFFF, a change made necessary to get more room in the ROM to add the SCSI API. Hence ROM will be split into two sections, one at $C000-$CFFF and the other starting at $E000. The machine language monitor will be moved to the $C000 block to open up space in the $E000 block.

Here are the schematics:

Memory Map & I/O Assignments

Microprocessor Interface

RAM & ROM

Real Time Clock & EIA-232 Interface

SCSI-SE Interface

External Interface

In redoing the printed circuit board layout, I decided to stick with the six inch width of the first version and increase the height to four inches to accommodate the SCSI hardware. I was going to try to squeeze the entire layout into 21 square inches (3.50 inch board height) to take advantage of ExpressPCB's ProtoPro service. However, ProtoPro has a limit of 650 holes per board and even though I eliminated some holes by not having all those PDIP packages that used to constitute the glue logic, I couldn't stay within that limit. Between the 53C94, the 50 pin SCSI receptacle and the SCSI bus termination resistors, 174 holes were added, putting the layout some 50 holes too high—and that didn't even account for extra via needed to connect everything. I considered directly soldering the 53C94 to the board on SMT pads, but thought better of it—how would I remove it if the design didn't work right? So this layout will use EPCB's production service.

Printed Circuit Board Layout

With this unit, I will now have the capability to implement some kind of operating system over and above the ROM code, due to mass storage being available. Once I've verified that the hardware is working, I'll have to get busy writing a SCSI API that can do simple tasks, such as device inquiry, read long, write long and request sense. Device inquiry will initially be the most useful, as it can prove out several aspects of the API, not the least of which would be the ability to read and write on the bus. Doing all this should keep me out of mischief for a while. :)

————————————————————

Edit #1: my typing basically sucks tonight...

Edit #2: POC V1 manages to run on a 15 MHz Ø2 clock. Dunno why, as none of the I/O hardware is rated for a 67 ns cycle time.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Fri Jul 23, 2021 8:18 pm, edited 12 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Nov 15, 2010 9:45 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi BDD
interesting developments!

Just one question: once you have mass storage, isn't it tempting to have minimal ROM - maybe paged in - which only has the intelligence to load a boot block from storage? This way, you can have the maximum amount of RAM, with just a bit missing for I/O and of course some dedicated to the vectors.

I can see that bringing up such a system would be a bit more difficult, but if you do have a paged ROM approach, you can page quite a big ROM in during development. Once the bootloading process is stable, you can place your software on disk and never need to touch the ROM for the rest of your development.

So, you trade (initial) development difficulty for maximum RAM and a clean memory map.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Nov 15, 2010 10:07 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
BigEd wrote:
Just one question: once you have mass storage, isn't it tempting to have minimal ROM - maybe paged in - which only has the intelligence to load a boot block from storage?

At initial power on, no. There needs to be code to drive the console POST display, test RAM and detect hardware presence, as well as run the SCSI API. Any such API has to account for all SCSI bus phases, so more than a little space will be required. Plus, for efficiency, the part of the API that handles phase changes will be an IRQ-driven function (the 53C94 can generate an IRQ when the bus phase changes). That all has to be in ROM, at least at reset. However, there's no reason why a write on a ROM address can't bleed through to RAM, allowing the ROM to do an ISL under ROM, followed by a jump to that code.

Quote:
I can see that bringing up such a system would be a bit more difficult, but if you do have a paged ROM approach, you can page quite a big ROM in during development.


I had considered doing so but felt that once I had a working SCSI API I wouldn't need additional ROM space—I could page what I needed right off the disk. Also, once I am able to load a kernel, it (the kernel) would have the SCSI API and device drivers included, along with interrupt handlers and corresponding vectors, and ROM could be completely mapped out, leaving everything running in RAM. BTW, if I were to continue to run things out of ROM after ISL, I'd be constrained by wait-states.

Interesting aside: even with the '816 running at full throttle, data can flow on the SCSI bus at a higher rate than the '816 can move from one place to another. So access speed from core would, at best, be no better than from the disk, unless the latter had to perform a seek.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Nov 16, 2010 2:29 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BDD,

Awesome is all I can say! I like the mass storage especially.

I just wanted to chip in and say I've not looked at the '816 datasheet, but if the Fmax vs VDD spec's are the same as the WDC65C02, you may be able to run well above 20MHz if your system is running @5v. I was able to run my 3.3V WDC65C02 PWA system, using 10ns RAM, at 20MHz (spec says ~14MHz)... I may have been able to run faster than that, but having a simple FF divider, the only faster choice I had was 40MHz. That did not work...

Good luck m8!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: POC Version 2
PostPosted: Tue Nov 16, 2010 3:31 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
ElEctric_EyE wrote:
I just wanted to chip in and say I've not looked at the '816 datasheet, but if the Fmax vs VDD spec's are the same as the WDC65C02, you may be able to run well above 20MHz if your system is running @5v. I was able to run my 3.3V WDC65C02 PWA system, using 10ns RAM, at 20MHz (spec says ~14MHz)... I may have been able to run faster than that, but having a simple FF divider, the only faster choice I had was 40MHz. That did not work...

The Fmax vs. VDD curve appears to be the same for the '816, but its more complex architecture may make it stumble if the clock is elevated.

The constraint will be ROM and I/O access speed, which could be handled with longer wait-states. The above design generates one wait-state for ROM or I/O access, which would be 100 ns from the time chip select occurs until the MPU expects to be able to access the device, assuming a 20 MHz Ø2. That might be pushing it a bit—ROM and all I/O devices are rated at 70 ns—but I have run the POC V1 unit at 10 MHz with no wait-states. At 25 MHz, I'd definitely need two wait states to avoid ROM and I/O timing violations.

As I'm not super-constrained right now with board real estate, I could leave room for one more flip-flip and rig up jumpers to allow me to change from one to two wait-states if the unit proves to be unstable with the clock cranked to the max.

I really don't think speeds beyond 25 MHz will be practical. The GALs can certainly handle it (I'm going to use 10 ns parts) but the board layout may become critical at that speed and I may not be able to get it to work well without using more SMT parts. Unfortunately, I'm getting too old to handle working with that tiny stuff. If I do go in that direction I suspect some sort of makeshift reflow oven is in my future. :)

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: POC Version 2
PostPosted: Tue Nov 16, 2010 5:21 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
After contemplating my work for a while, I decided to modify the circuit to make it possible to jumper-select the number of wait-states that occur on ROM or I/O access. Implementing this feature requires that I add another 74ABT74 dual C-D flip-flop, along with a three-position jumper block. Having one left-over flop in one of the packages, I also decided to go back to the clock generator design I used in POC V1. The rationale (aside from having the unused flop) is that the rise and fall time on most readily available TTL can oscillators is typically 5-7 ns, whereas the 'ABT74 can produce a rise and fall time down near 2 ns, which helps to keep the timing straight as the Ø2 is cranked up.

Here are the revised schematics and the corresponding PCB layout:

Memory Map & I/O Assignments

Microprocessor Interface

RAM & ROM

Real Time Clock & EIA-232 Interface

SCSI-SE Interface

External Interface

Printed Circuit Board Layout

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Tue Nov 23, 2010 5:58 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Nov 16, 2010 9:19 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Interesting info on the reflow oven. I see those guys on the link you posted were using it for QFP style packages. Not the only choice for QFP... See here (I've posted for others): viewtopic.php?t=1492

If you get into that BDD, I'd like to see your results!

BGA packages seem to be the future for the higher-tech devices like 32 bit wide rams and such. For devices like that, an oven like this would be mandatory. So, I am interested as well. :twisted:

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: POC Version 2
PostPosted: Wed Nov 17, 2010 11:11 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
ElEctric_EyE wrote:
If you get into that BDD, I'd like to see your results...So, I am interested as well. :twisted:

I have several unpopulated boards from the POC V1 project, which I will use as reflow guinea pigs. Let's hope that when I put them into the toaster-oven they don't turn into...er...toast. :)

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: POC Version 2
PostPosted: Thu Nov 18, 2010 12:05 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigDumbDinosaur wrote:
... The GALs can certainly handle it (I'm going to use 10 ns parts) but the board layout may become critical at that speed...


I believe it was kc5tja that pointed out in another thread, that CRAY's were wirewrapped and achieved speeds to 80MHz. Your 4 layer board, I would think, is much more noise immune then wirewrap which I currently use on my PWA project to 20MHz. (I am using the ALU at this point too, and it's still working fine). Although, you are using the 65816. Will be interesting to see!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Nov 18, 2010 4:09 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Yes, the Cray-1 supercomputer was wirewrapped, and built in a circular form-factor to minimize wiring delays between circuits that would otherwise be quite far apart on a flat PCB. It used differential ECL technology for high noise immunity as well.

If you ever find yourself in the Bay Area, you can visit the Computer History Museum, where you'll find a Cray-1 sitting just outside the storage warehouse, where you can see how it was constructed.


Top
 Profile  
Reply with quote  
 Post subject: POC Version 2
PostPosted: Thu Nov 18, 2010 5:20 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
I recall back in the day when the Cray 1 was announced and how we were all astonished that the unit could run at 80 MHz. Nowadays, the average video card's processor runs much faster. You just have to wonder how much tinkering went into the Cray 1 before they got it to be stable at the maximum clock rate.

While I may get a burr under my saddle to try to run POC V2's clock beyond 20 MHz, I suspect it won't fly. Even if the board doesn't become critical, it is likely the '816 simply can't run that fast due to internal constraints. Particularly worrisome is the very short amount of setup time the '816 has when Ø2 is low. At 25 MHz, that's only 20 ns. I don't think the MPU can do it, but I'm certainly willing to try. The worst that can happen is I waste a few bucks on a 50 MHz oscillator. I'll start the thing off at 10 MHz (20 MHz oscillator), which I know works on POC V1, and scale up from there.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: POC Version 2
PostPosted: Thu Dec 16, 2010 3:41 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
Hmm...funny how things can just start growing on their own. A few posts ago, I described how I was going to use a couple of GALs for the glue logic in POC V2. I incorporated Daryl's improved wait-state circuit and then started thinking about the design some more. That's when the growing started.

Right now I'm using a single 128k x 8 SRAM, with only 48K actually mapped into the system. SRAM up to 512k x 8 is available in an SOJ36 package, which is only slightly larger than the SOJ32 package of the 128k x 8 part. Of course, merely substituting the larger SRAM for the smaller one wouldn't accomplish anything. So, my muddled mind was thinking, why not scale up the design to use a larger PLD and introduce the logic required to implement more RAM?

There are two ways I could go. The first would be to latch the bank address emitted by the 65C816 on D0-D7 during Ø2 low and use it to control the A16-A18 address lines on the larger SRAM. The second method would be to create banks of RAM, with a common area somewhere to ensconce the bank-switching code that would be required. I decided to choose the latter method, one of the reasons being the banking scheme wired into the 65C816 has some annoyances, the biggest being that the stack and zero page are always in bank 0. I really wanted to maintain multiple ZPs and stacks.

With this feature settled, here's the new and "improved" memory map for this mess:

Code:
                           MMU
                           BIT
      Address            Pattern  RWB  Hardware     Symbol
      ----------------------------------------------------
      $bb0000-$bbBFFF   xx000bbb   x   banked RAM   b_mem
      $00C000-$00CFFF   x0000xxx   x   common RAM   c_mem
                        x1000xxx   H   ROM (4K)     c_mem
                        x1000xxx   L   common RAM   c_mem
      $00D000-$00DEFF   xx000xxx   x   I/O          IOBLK
      $00DF00           xx000xxx   x   MMU          mmu
      $00E000-$00FFFF   0x000xxx   H   ROM (8K)     e_mem
                        0x000xxx   L   common RAM   e_mem
                        1x000xxx   x   common RAM   e_mem
      ----------------------------------------------------

   A write to c_mem or e_mem always bleeds through to RAM.  x = don't care.

The MMU, which is a virtual device, allows up to eight banks of RAM to be defined, addressed as $000000-$00BFFF, $010000-$01BFFF, $020000-$02BFFF, and so forth, up to $070000-$07BFFF. This amounts to a total of 384 KB of RAM. As the 65C816 doesn't actually know anything about what is going on in the PLD logic, it merely sees the banked RAM as eight isolated 48K spaces, each with an independent stack and zero page.

The active bank is selected by writing the appropriate bit pattern (bits 0-2) into the MMU, which "device" exists in the PLD and is addressed at $00DF00. Bits 6 and 7 can be used to map RAM or ROM in either the $00C000 block or the $00E000 block. PLD code causes ROM to be mapped in at reset in the $00E000 block. RAM in either block is "common" RAM and is physically located in the first 64K of the SRAM. The banks come into being by manipulating SRAM address lines A16-A18, whose bit patterns correspond one-for-one with the pattern written into the MMU.

Doing all this is well beyond what can be implemented in a single GAL. There aren't enough inputs, not enough outputs, not enough P-terms, etc. I could use several GALs but would then have to provide more board real estate. Also, as some logic functions are dependent on others, I'd have the case where some outputs of one GAL would be driving inputs on another, causing propagation delay stackup.

So I decided to abandon GALs altogether and use a single CPLD, specifically an Atmel ATF2500C in a PLCC44 package. The 2500C has enough of everything I need to implement the logic, including the active high reset and active low write signals some of the hardware will need. Also included will be wait-stating for ROM and I/O accesses, with a jumper-selectable option of one or two wait-states.

After stumbling over a glitch in Atmel's WinCUPL software, I finally have some working code for the 2500C, and so far, it appears my logic is...well...logical. I've updated the schematics and the PCB layout, which are below.

Microprocessor Interface

RAM and ROM

Real-Time Clock & EIA-232 Interface

SCSI2-SE Interface

External Interface

PCB Layout

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 16, 2010 6:59 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Quote:
one of the reasons being the banking scheme wired into the 65C816 has some annoyances, the biggest being that the stack and zero page are always in bank 0. I really wanted to maintain multiple ZPs and stacks.

It is intended to to exactly that-- many direct pages and stacks in bank 0, with the applications being in other banks.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 16, 2010 8:57 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
It may be intended, but it's still an annoyance. :-) It, thankfully, can be worked around with an MMU.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 16, 2010 6:07 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
GARTHWILSON wrote:
It is intended to to exactly that-- many direct pages and stacks in bank 0, with the applications being in other banks.

My beef with that arrangement has to do with the fact that if running multiple processes, a context switch requires pushing the entire system state on to a single stack, whose size is constrained by the requirement that it be entirely in bank 0—along with zero page and whatever else is running there. Each time an interrupt (any kind) occurs, we get forced into bank 0. So the interrupt handlers have to be in bank 0 as well. Plus the MPU vectors have to be in bank 0, which means ROM has to be present in bank 0 in order to get the system going at boot-time. It's starting to get awfully crowded in there...

In the banking scheme I am contemplating, the interrupt handler that would cause a context switch would start by pushing the MPU state onto the current stack. Then the MMU would be written to make the context switch to a different bank. The MPU state would then be pulled from the stack of that bank, and when RTI is executed, a clean context switch has occurred without diddling a reserved area of RAM. Fewer clock cycles to complete, fewer things to mess up.

The reality is that few 65xx programs are all that large or require a real large data set—Lee Davison's EhBASIC is a rare exception (which I may try to implement some time in the future). The '816 cannot cross banks during program execution, so the absolute maximum program size that is possible is 64K. So the 48K bank size I'm looking at doesn't amount to a significant constraint.

In exchange for a slightly smaller execution space, I gain the ability to "sandbox" each running process in hardware and, once I figure out exactly how to implement it (no doubt via a larger CPLD), block attempts to directly access memory outside the process' space by use of the MPU's ABORT input. That would help constrain a wild process from crashing the entire system. The only real penalty of this scheme, aside from not having megabytes of linear storage for data, is the need for cross-bank transfer code to get data between protected areas of RAM, which will slow execution a bit. However, on a 20 MHz '816 running against 10 ns SRAM, it won't be all that slow.

Another advantage I foresee with this banking scheme is I can run up to 256 separate processes in core if I equip the system with a full 16 MB of RAM. The logic required to handle that much RAM isn't much more complicated than what I've already devised—I'd be adding some chip selects to the equations:

Code:
BANK    A16   A17   A18   /CS0   /CS1   /CS2   /CS3 ... /CS31
-------------------------------------------------------------
 $00     0     0     0      1      0      0      0         0
 $01     1     0     0      1      0      0      0         0
 $02     0     1     0      1      0      0      0         0
 $03     1     1     0      1      0      0      0         0
 $04     0     0     1      1      0      0      0         0
 $05     1     0     1      1      0      0      0         0
 $06     0     1     1      1      0      0      0         0
 $07     1     1     1      1      0      0      0         0
 $08     0     0     0      0      1      0      0         0
...
 $0F     1     1     1      0      1      0      0         0
...
 $10     0     0     0      0      0      1      0         0
...
 $18     0     0     0      0      0      0      1         0
...
 $1F     1     1     1      0      0      0      1         0
...
 $FF     1     1     1      0      0      0      0         1
-------------------------------------------------------------

The above would be for a full 16 MB (32 512k x 8 SRAMs). Of course, the CPLD would have to be even larger than the ATF2500C—the above would require 35 outputs just to manage memory, plus all the outputs needed for other functions. Also, board real estate would become interesting with 32 SRAMs installed. The only practical way to do it would be to mount the SRAMs on plug-in SIMMs, probably eight SRAMs per module.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 544 posts ]  Go to page 1, 2, 3, 4, 5 ... 37  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 47 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: