6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 3:14 pm

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Dual 65C816/65C02 build
PostPosted: Thu Mar 31, 2022 7:01 am 
Offline

Joined: Thu Mar 31, 2022 6:40 am
Posts: 11
Hi, I'm finally joining the forums, after lurking for a while. To start off with, thank you SO MUCH for the expertise made available here.

I unwrapped my first Commodore 64 on Christmas Day, 1984, in Louisville, Kentucky, where I still live. The Programmer's Reference Manual was a revelation to me at age 14.9 (my birthday is right after Christmas). The thing that I most remember thinking, after reading the assembly language section was, "That's really IT? That's all a computer does?!"

I've got my build of the Ben Eater 6502 saying "Hello, world!" to me, and I'm following Adrien Kohlbecker's 65816 build.

I have moved to a soldered and stackable board architecture made of Busboard BB1660s (double wide BB-830) and Arduino shield standoffs as the "backplane". I think I've done more soldering in the past week than I have in twenty years.

It was a gamble but it turns out to be really easy to make the pins line up and plug my CPU board on top of (or under) my memory board. If they get bent you just bend them back. It gives me about 0.7" spacing between the bottom of one board and the bottom of the board on top of it, plenty of space for tall capacitors, etc.

My goal is a dual-processor system where either processor can be a 65C02 or 65C816. If both are 65C02 they will have to split the stack in page one. If one is an 65C816 it gracefully moves its stack and zero page out of the way of the other processor.

I'm a big fan of the Commodore's bank switching, implemented using the two registers $00 and $01 on the 6510, so I'm building two memory mapped octal latches and address decoding logic so that, effectively, the system has 16 bits of software accessible "state" to control bank switching of the ROM, I/O area, etc. Some of these lines aren't subject to the latch but are direct inputs: such as whether the active clock belongs to processor A or B, so the two processors can figure out which one they are in software.

Other bus register bits may control which VIA's IRQ goes to which processor, but the other way I thought of to do it is that I can have writes from any processor to any VIA trigger a flip flop that routes future IRQs to the last processor that wrote to that VIA.

I've been a software developer all my life (I had an Atari 400 before the C64) and I know a decent amount about electronics, but I've never gone the distance and made a real computer from scratch, and now that I am, I want it to be a really good one, perhaps the best that could possibly have been made in 1983.

I bet I'll have a good question or two, but for now, thanks to the 6502.org community, and I hope I can live up to the standards of the community.

Alan


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 31, 2022 1:24 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Hi, Alan. Nice to see you've taken the plunge and renounced your lurker status. :wink:

Do you have any photos of your project using the "stackable board architecture made of Busboard BB1660s" ? I think we'd be interested to see them if you care to share. Remember you're allowed to attach images with your post. There's no need to use a third-party image-sharing site.

What's the reason for dual processors in your next project? I realize you probably just want to challenge yourself. But did you have a specific division of workload in mind? (For example, would one processor be dedicated to I/O?)

I'm a big fan of the '816, but if you insist on dual processors then you might be better advised to use a pair of C02's, just to limit the hardware complexity somewhat (address decoding, for example). You are going to have your hands full, and it's no fun launching a project that's eventually found to be beyond one's abilities.

Edit: maybe the approach using C02's only seems like it wouldn't be as much fun. But trust me, getting the software working properly and interacting between two processors will be more than enough to entertain you!

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 31, 2022 11:11 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
Dr Jefyll wrote:
... I'm a big fan of the '816, but if you insist on dual processors then you might be better advised to use a pair of C02's, just to limit the hardware complexity somewhat (address decoding, for example). ...


Or just use the 65C816 in the bottom 64K of it's address space, ignoring the addressing multiplexed onto the data lines.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 01, 2022 2:20 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 181
If the clock inverting method was used to get a dual processor system working, you do not need for all the memory to be shared, it would be possible for each 65C02 to have it’s own (non-shared) zero page and stack. It would also be helpful to set up a memory location to identify if the processor is the first or second one, simply by being able to read the phi0 clock.

Rockwell had a dual processor chip in their data book, but interestingly it wasn’t just two 6502’s with inverted clocks, they actually shared most parts apart from the internal registers, so things like the ALU were working twice as fast, but this surely was a very efficient multiprocessor.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 15, 2022 1:07 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I'm a huge fan of multi-core 6502 systems, however there are significant limitations when a dual core uses both bus phases.

Firstly, an agnostic 6502/65816 will require tri-state bus transceivers. NMOS 6502 always outputs address lines. 65C02 is more graceful. 65816 uses both bus phases to output its extended address range. A 65C02 only system is less general but more tractable.

Secondly, two sequential writes to SRAM won't work because writes will be scribbled elsewhere when bus control switches. Video displays which use the opposite phase avoid this case because they only read. Commodore's dual core systems used DRAM chips which are now obsolete. If you want a period accurate system, study Commodore's dual core floppy drives and be prepared to source dodgy DRAM chips. This is not a beginner project. It may be possible to interface contemporary SDRAM. However, this may require periodic interrupt to a routine which takes the RAM off-line for refreshing. This will hugely affect interrupt response time and is also not a beginner project. It may also be more expensive than SRAM. I'm investigating a SRAM cross-over configuration where one processor writes to an individual chip and the other processor reads it. This requires a minimum of four SRAM chips and has the advantage that each core gets private stack by default.

Thirdly, 65xx peripheral chips are really not suited for use on both bus phases. The NMOS versions used the idle phase to draw energy in preparation for the processor reading a register. An NMOS system which overcame this limitation would then encounter the two sequential write problem. If you devise a system where a peripheral chip switches bus phase, timers will drift. This means periodic interrupts will only be approximate. This may be acceptable for task switching but it may be incompatible with tasks such as software UART. It is preferable for peripheral chips to have an affinity to one core or one bus phase. If you want two cores to share all RAM then it is preferable for one core to receive interrupts. This allows cores to be differentiated by idling in a loop until one receives an interrupt.

Fourthly, a parallel bus over multiple boards may greatly limit the maximum operating frequency. This may be incompatible with video or lead to a system where all cores are slower than a single core system. Distance (and cost) can be reduced by chip stacking processors, cross-over SRAM and I/O chips. Two processor cards may be a boost if they have integrated peripherals. However, a RAM card, a serial port/parallel port card, mouse/keyboard card, a sound card, a video card and two processor cards will only keep pace if the parallel bus consumes a huge amount of energy. I hope you didn't want a portable version of your design.

jds on Fri 1 Apr 2022 wrote:
It would also be helpful to set up a memory location to identify if the processor is the first or second one, simply by being able to read the phi0 clock.


If two cores have the same RAM (somehow) and the same ROM but only one core has I/O then both may mark a boot-strap state. Both may copy a loop from ROM to RAM. Both may configure an interrupt. (One core does so in vain.) One core receives the interrupt and escapes the loop. This core performs LDX #$00 // TXS. The other core is coaxed out of the loop and performs LDX #$80 // TXS. Core number is now stored in the top bit of RegS. It is now possible to determine core using TSX // TXA // BMI or similar. This eliminates the hardware for a clock phase register. Although, it does so at the expense of maximum interrupt performance. If interrupts are vectored in RAM then it requires no additional overhead.

jds on Fri 1 Apr 2022 wrote:
Rockwell had a dual processor chip in their data book, but interestingly it wasn't just two 6502's with inverted clocks, they actually shared most parts apart from the internal registers, so things like the ALU were working twice as fast, but this surely was a very efficient multiprocessor.


I strongly suspect that the dual core, NMOS Rockwell processor was vaporware. 6502 uses both clock phases internally and therefore scope for multiplexing ALU is very limited. Yield for two cores on one die would be poor. It would also be very awkward to stack two dies in one package. The easiest option is to stack two packaged chips. Even here, gains for NMOS 6502 are small because tri-state bus transceivers are required on all address lines. This is in addition to separate interrupt lines and two phase clock.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2022 4:24 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
The awkward case with dual core is two sequential writes. This can be averted by forcing one read or idle phase between two writes. This can be implemented with multiple techniques:

  • No shared memory.
  • DRAM which divides into horrible cases, such as small obsolete units or large pipe-lined units.
  • Video memory where one phase is always reads.
  • Cross-over memory where one core writes and one core reads.
  • Cross-bar memory where two or more cores, on the same phase, have equal priority to memory.
  • NUMA where cores have preferential access to memory.

If you don't care about through-put or latency then no shared memory may be preferable. If you want two or more cores and large video displays then cross-bar memory may be preferable. If you want strictly two cores, don't care about one core slowing the other and don't care about uniform pointers then cross-over memory may be preferable. Overall, your preferences will determine if all cores are permanently in phase, drift in and out of phase or remain permanently out of phase.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2022 11:59 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Sheep64 wrote:
Secondly, two sequential writes to SRAM won't work because writes will be scribbled elsewhere when bus control switches.
Sheep64 wrote:
The awkward case with dual core is two sequential writes. This can be averted by forcing one read or idle phase between two writes. This can be implemented with multiple techniques: [etc]

I agree there's a potential problem of writes being scribbled to unintended addresses. But the solution I'd use -- and it's simple enough -- would be to have the Clock Generator produce, along with each processor's Phi2 pulse, a narrower version of each processor's Phi2 pulse. The narrower pulse would slightly lag Phi2 when going high but would go low simultaneously with Phi2.

Then, as we alternate between CPU-A in control of memory and CPU-B in control of memory, we alternate between Narrow-Phi2-A and Narrow-Phi2-B for qualifying the /WR pulse sent to memory.

I hope I explained that satisfactorily. I supposed it could be considered a four-phase clock scheme. (Or an idle phase, as per your remark I quoted.) But that perhaps makes it sound more daunting and complex than it really is.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 39 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: