6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 2:46 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Wed Apr 03, 2019 12:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Nearby, in the newbies forum, an off-topic excursion concerning using multiple 6502s to implement a complex graphics subsystem:

railsrust wrote:
So I got this email from "my little helper" yesterday:

Quote:
...what would you think of 4 or more 6502s, closely coupled but yet able to execute independently? If we think of the theme of throwing processors at the problem ...
...
Heh, and a bargraph led display that acts like a speedometer the more 6502s you use.

I know people have done multiprocessor ’02’s before I wonder if there is anything in open domain, namely the task dispatcher and manager. They also have done a bunch of 8051s which are mores selfcontained.

I gotta believe one FPGA can manage a crapload of ‘02s.


Anyone know of a way to manage multiple 6502s like this?


GARTHWILSON wrote:
railsrust wrote:
Anyone know of a way to manage multiple 6502s like this?

It's getting off-topic, but that's ok. It's your own topic. :)

Here are some earlier topics that are very relevant, and kc5tja made valuable contributions on:

WDC's W65C02S adds some more signals at the pins, namely ML\ (memory lock not) output (pin 5 on a DIP), BE (bus enable) input (pin 36 on a DIP), and VP\ (vector pull not) output (pin 1 on a DIP).


I have a couple more links to add to Garth's list of previous topics:

but I'll very much second the idea noted in the initial quote: it's the software, the management of tasks and of data transfer, which is the major challenge here.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 5:46 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
BigEd wrote:
but I'll very much second the idea noted in the initial quote: it's the software, the management of tasks and of data transfer, which is the major challenge here.

Depends on how tightly the CPUs are integrated. If they're essentially stand alone machines (w independent CPU/ROM/RAM) connected through some networking, then, yea, it's mostly a software issue.

But if they're sharing RAM and/or other devices, where handshaking is done at a hardware level, then it's a different animal.

Folks are struggling getting the 65816 address decoded properly.

Just having several CPUs fighting for a common bus can be an issue.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 8:08 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
I have thought about trying to run two CPUs through one CPLD. When the clock goes low and you are waiting 30ns for the processor's address lines to settle, you could have the CPLD switch to the address lines of a second processor that has already settled and let that access memory while you wait. I don't think you could run them both at full speed but you would at least be doing something useful during that 30ns.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 8:37 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Druzyek wrote:
I have thought about trying to run two CPUs through one CPLD. When the clock goes low and you are waiting 30ns for the processor's address lines to settle, you could have the CPLD switch to the address lines of a second processor that has already settled and let that access memory while you wait. I don't think you could run them both at full speed but you would at least be doing something useful during that 30ns.


Not sure why you even need a CPLD for just 2 x 6502's into a common memory system - after all, this is how video is done on some of the older systems - Apple II, etc. 6502 accesses RAM on one half cycle, video on the other - one reason the clock crystals seemed a bit weird then. (Exact multiples of NTSC or PAL scan frequency)

If you ran both 6502's off the same Ph2 clock, but one inverted, then we know that the 6502 only uses (less than) half a cycle to access RAM/ROM, so that leaves the other half cycle for the other processor.

Some glue might be needed to toggle the BE pins (65C02) appropriately, and deal with R/W which I don't think is tri-stated with BE.

the '816 has the added complication of the upper 8-bits of address being latched on the "dead" half cycle, so you might need a separate tri-state buffer on the output of that latch.

Other than that... I am being too naive?

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 8:40 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
whartung wrote:
BigEd wrote:
but I'll very much second the idea noted in the initial quote: it's the software, the management of tasks and of data transfer, which is the major challenge here.

Depends on how tightly the CPUs are integrated. If they're essentially stand alone machines (w independent CPU/ROM/RAM) connected through some networking, then, yea, it's mostly a software issue.

But if they're sharing RAM and/or other devices, where handshaking is done at a hardware level, then it's a different animal.

I'm not concerned with the handshaking so much as the administration job of deciding how to distribute the work load to keep the various processors busy in a way that's productive overall.

Thanks for the additional links, Ed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 9:05 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
whartung wrote:
Just having several CPUs fighting for a common bus can be an issue.

Depends on how you try to attempt that and what you are willing to pay :)

Assume you really wish to use one (extraordinary fast) memory to serve as a common RAM for n CPUs. Assume further you arrange a clock generator with n outputs each output delayed by t ns where t is the cycle time of the common RAM. You then have to mux each CPUs bus to that RAM put/fetch_and_latch a byte and then serve the next CPU. Really challenging I think and even with say 7 ns RAM and virtual no delay from the muxes only 10 CPUs (yielding a total cycle time of 70 ns = 14 MHz) could interact this way.

But most likely only a fraction of the RAM need to be "common", e.g. one KB or two. You could then use dual port RAM and use the "other" side to synchronize all DPRAMs. A 6502 can only write each fourth cycle (three if zero page but that would make less sense), so even at 14MHz clock only each 4x 70ns = 280ns a byte could be issued by one CPU. The "other" side of the DPRAM could be operated by some logic at full speed e.g. 14 ns. This logic could select one DPRAM to deliver its contents while all other DPRAMs are simultaneously written with that data. With no further delays 280/14=20 quasi simultaneous writes could be served. Here the problem is to fetch all CPU side accesses and queue them up for processing on the "other" side. :shock: Again challenging I assume :)

Using a FPGA with its block RAMs inside might be easier. Inside the FPGA you can operate even faster. On the other hand: for each 6502 there are 20 pins (10 AB, 8 DB, PHI2, RWB) required....

Me thinks a couple of loosely coupled autonomous computers exchanging highly condensed information in a low frequency occurrence are much much easier to manage. A W65C265S with its four serial ports could act like a poor man's Transputer.

my 2 cents :)


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 03, 2019 9:48 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
drogon wrote:
Not sure why you even need a CPLD for just 2 x 6502's into a common memory system - after all, this is how video is done on some of the older systems - Apple II, etc. 6502 accesses RAM on one half cycle, video on the other [...] Some glue might be needed to toggle the BE pins

Slightly OT, since 2 processors doesn't qualify as massively parallel, but since the subject was mentioned here is a vintage homebrew using two 6809's in that fashion. The two CPU's take turns accessing a shared RAM. And one of the CPU's is also the video system. This project predates CPLD's (although I did exploit programmable logic in the form of 32 x 8 TTL PROM's).

Because DRAM requires multiplexers anyway, this design doesn't do the trick of tying the two CPU address buses together and toggling the BE pins.

Using static RAM you *could* tie the buses together that way (ie, omit the multiplexer). Having each CPU tristated for the first half of every cycle won't affect the RAM because the RAM is fast enough to do the entire access in the remaining half of the cycle. But tristating for the first half of every cycle means extra delay before the address decoder for memory-mapped I/O can begin doing its job, and that might force a reduction in clock speed (as Druzyek mentioned).

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 05, 2019 10:17 pm 
Offline

Joined: Sat Jan 02, 2016 10:22 am
Posts: 197
One possible way to synchronise multiple CPU's is alter the clock ratio.

I've done a quick and dirty test on a breadboard. It's got a 25.175mhz can oscillator (as that's what I had!) driving a 74HC163 4 bit synchronous counter.

The B C and D outputs from the counter feed the A B and C inputs of a permanently enabled 74HC138 3 to 8 decoder to produce 8 non overlapping clocks with a 7:1 high to low ratio. Sending the Y0 to Y7 outputs from the decoder through a 74HC240 inverts the signals to produce 8 "CPU clocks".

If the same Y0-Y7 outputs controlled the enable pins for set of 74HC244 buffers for each CPU then they would all only be connected to the target memory for 1/8 of the time but appear to have full access.

The test board produced a high time of just under 80ns which is comfortably more than the access time for modern SRAM. A 32mhz master clock would have produced 2mhz CPU clocks and 60ns access periods.

The question is then whether 8 CPU's in the 1-2mhz range is a worthwhile goal.


Attachments:
File comment: Two CPU clocks
two async clocks.jpg
two async clocks.jpg [ 34.84 KiB | Viewed 1086 times ]
File comment: 25.175 MHz master clock, 7:1 ration CPU clock at 1.57mhz
master clock plus one async.jpg
master clock plus one async.jpg [ 40.86 KiB | Viewed 1086 times ]
File comment: Can Oscillator, 74HC163, 74HC138,74HC240
test board.jpg
test board.jpg [ 55.89 KiB | Viewed 1086 times ]
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: