commodorejohn wrote:
Well, there's the AppleCrate II, although that's not a multiprocessor system so much as a networked cluster.
kc5tja wrote:
This assumes that every operation the 6502 performs occupies 5 clock cycles, and so maintains phase relative to its peers. This, however, won't be the case. So, with surprising frequency in fact, your 6502s will conflict with each other while accessing shared memory.
What you're describing is called NUMA (Non-Uniform Memory Access), where CPUs have local memory resources and remote memory resources mapped into a common address space. It's really the only way to scale beyond the traditional SMP limits.
However, straight message passing would actually be faster than NUMA (particularly if all the links are point to point or a reasonable approximation thereof) in the general case, since you pass messages directly from one node to another without having to contend for a shared resource (or, if you do, it's software mitigated, so event-driven code can forward data at adaptively scheduled intervals. Think R-ALOHA, for instance). You could use a simple FLIT-routed network, or if you can afford the hardware, a point-to-point, mesh network between your processing elements, and never have to worry about sluggishness, nor synchronization.
What you're describing is called NUMA (Non-Uniform Memory Access), where CPUs have local memory resources and remote memory resources mapped into a common address space. It's really the only way to scale beyond the traditional SMP limits.
However, straight message passing would actually be faster than NUMA (particularly if all the links are point to point or a reasonable approximation thereof) in the general case, since you pass messages directly from one node to another without having to contend for a shared resource (or, if you do, it's software mitigated, so event-driven code can forward data at adaptively scheduled intervals. Think R-ALOHA, for instance). You could use a simple FLIT-routed network, or if you can afford the hardware, a point-to-point, mesh network between your processing elements, and never have to worry about sluggishness, nor synchronization.
Edit: A simple analysis for a case were each 6502 accessed the shared bus 1/5th of the cycles gave 1 cycle delay in 20% of the time with two 6502, 36% of the time with three 6502 and around 50% of the time with four 6502. It means that each 6502 requires 5.5cycles (in average) for a shared SRAM access. By using CLK2 HIGH/LOW multiplexing you can have eight 6502 on such a shared bus and it only slows them down by around 10%.
As per today, I have a single core solution for my 6502/65C02 multiplexed system with a 65C22 on a 4-layer "CPU board" in production: Its missing some logic, but I may build a new interface card and add some of these boards to see how well they can cope with different shared memory divisors. There is also a connector for the 65C22 parallel port which would make point-to-point communication possible (although software driven). I am not sure there is much to gain by sending data over the 65C22 instead of the shared SRAM in this small system, but maybe for larger.
Another implementation would be to have two different shared memory areas along "horizontal" and "vertical" direction in a mesh:
Code: Select all
A1 A2 A3 A4 .....
B1 B2 B3 B4 .....
C1 C2 C3 C4 .....
Thanks for sharing info on multiprocessor networking. I will do some more reading during the upcoming holidays!