Multiprocessing on FPGA using dual-port RAM (pipedream)
Posted: Fri Oct 31, 2014 5:56 pm
This was a pipe-dream thought, but perhaps worth sharing: a 6502 core on FPGA is pretty small, so it's surely possible to fit 4 or even maybe 16 of them on a reasonably-priced FPGA. As the RAM blocks (at least on Xilinx FPGAs) are dual-ported, it would be simple enough to hook every RAM up to a pair of 6502s, and if every 6502 was hooked up to 2 or to 4 RAMs, it would be possible to make a pipeline or a mesh (or a torus) of processors. It would be easy to share code too, if that makes sense, but most crucially the processors could communicate by posting data to the shared memory and setting a flag to indicate that it's ready.
With 4 neighbours, each 6502 would see 4 blocks of 2k RAM, all of which would be shared, but each block shared with a different neighbour. By convention part of each RAM would be private to one side or the other, and part would be shared. The address map might look interesting, being a patchwork, and each patch appearing in a different part of the address map on each side of the shared block.
At the same time, this gives each processor more memory than it would otherwise have, and connects the processors together.
Just maybe, the zero page and stack could be implemented as distributed RAM, and therefore be private. Or maybe there's enough block RAM to have a private block as well as the shared blocks - depends on how big the FPGA is, and how many CPUs to squeeze in. As we know from the Atari 2600 and other efforts, we don't need a full page 0 or page 1 to make a viable machine. Even 64 bytes mapped into both pages can be useful.
As for programming such a network, well that's a software problem!
(The transputer was all about local memory and synchronous communication with up to 4 neighbours over a byte-wide channel, but we don't have an FPGA model for the transputer. It would of course be possible to design a byte-wide channel as a peripheral, but shared memory comes for free.)
With 4 neighbours, each 6502 would see 4 blocks of 2k RAM, all of which would be shared, but each block shared with a different neighbour. By convention part of each RAM would be private to one side or the other, and part would be shared. The address map might look interesting, being a patchwork, and each patch appearing in a different part of the address map on each side of the shared block.
Code: Select all
+------------------------------------------+
| |
| +---+ |
| | | |
| | +-+--+ +---+ +----+ +---+ +----+ +---+ |
+---+6502| |RAM| |6502| |RAM| |6502| |RAM+-+
| +----+ +---+ +----+ +---+ +----+ +---+
|
| +---+ +---+ +---+
| |RAM| |RAM| |RAM|
| +---+ +---+ +---+
|
| +----+ +---+ +----+ +---+ +----+ +---+
| |6502| |RAM| |6502| |RAM| |6502| |RAM|
| +----+ +---+ +----+ +---+ +----+ +---+
|
| +---+ +---+ +---+
| |RAM| |RAM| |RAM|
| +-+-+ +---+ +---+
| |
+---+
Just maybe, the zero page and stack could be implemented as distributed RAM, and therefore be private. Or maybe there's enough block RAM to have a private block as well as the shared blocks - depends on how big the FPGA is, and how many CPUs to squeeze in. As we know from the Atari 2600 and other efforts, we don't need a full page 0 or page 1 to make a viable machine. Even 64 bytes mapped into both pages can be useful.
As for programming such a network, well that's a software problem!
(The transputer was all about local memory and synchronous communication with up to 4 neighbours over a byte-wide channel, but we don't have an FPGA model for the transputer. It would of course be possible to design a byte-wide channel as a peripheral, but shared memory comes for free.)