Forgive me for not making this 6502/65816 specific; however, I don't know where else to post this to solicit design review feedback from people I actually trust on the matter.
This message is something of a request to review my work by Garth Wilson and/or Andre Fachat primarily; however, I also know others frequent this forum who would have relevant experience as well. I'd like to hear any feedback on the following design ideas.
As folks might already know, I've been working on building my own home computer for some time. It started with the 65C816-based Kestrel-1, then moved to a custom CPU design on an FPGA with the Kestrel-2, and I'm changing things up again for the Kestrel-3 (this time using a 64-bit RISC-V compatible CPU). Due to unfortunate circumstances combined with a dash of impatience, instead of using a COTS FPGA development board with millions of gates equivalent for the Kestrel-3, I've decided I want to try building a computer using a backplane, similar in spirit to the RC2014 Z-80-based computer (
https://www.tindie.com/products/Semacht ... puter-kit/), or Andre's own Caspaer (
http://6502.org/users/andre/csa/index.html#caspaer). This is in part motivated by "Big FPGA" companies failure to make reliable software that I can run without much headache, and the fact that the open source toolchain, yosys and iceStorm, only currently works with Lattice iCE40-based parts, which are significantly smaller than the usual Xilinx or Altera FPGA found in a dev board. Thus, I need to decompose the Kestrel into two or three FPGA chips, each doing their own thing.
Ideally, I would place all two or three of these chips on a single circuit board; however, I determined going this route will not support rapid turn-around time or minimum financial expense while I'm still learning how to work with FPGAs at this level. I fully envision the case where I hack on an FPGA circuit, and end up ruining the motherboard and will need to order another batch. Since this will be relatively expensive ($5/sq in for 2-layer board, $10/sq. in. for 4-layer board), I want to minimize the size of the PCBs for each functional unit under development. This just screams backplane. I figure once I have a working constellation of cooperating PCBs, I can "cost reduce" the design into a single board with much greater confidence and probability of success.
My plans for the backplane consist of a 16 sq inch PCB equipped with room for four DIN 41612 sockets. The vast majority of the pins will be bussed together; only a small number of pins are not (which I explain below). The pin-out follows. Note that undocumented pins MAY NOT be bussed, and cards ARE NOT to be connected to them. That way, I can define uses for them later with a reduced concern for backward compatibility.
Code:
A B C
1 D0 +5V WE
2 D1 +5V A1
3 D2 +5V A2
4 D3 +5V A3
5 D4 +5V A4
6 D5 +5V A5
7 D6 +5V A6
8 D7 +5V A7
9 D8 GND A8
10 D9 GND A9
11 D10 GND A10
12 D11 GND A11
13 D12 GND A12
14 D13 GND A13
15 D14 GND A14
16 D15 GND A15
17 50MHz GND A16
18 RESET GND A17
19 CDONE GND A18
20 GND A19
21 GND A20
22 GND A21
23 GND A22
24 SEL0 GND A23
25 SEL1 +5V A56
26 ACK +5V A57
27 STB +5V A58
28 CYC# +5V A59
29 CYCA +5V A60
30 BCL# +5V A61
31 BGO +5V A62
32 BGI +5V A63
Note that a 50MHz reference clock exists to synchronize bus transactions; specifically, all transitions on the bus happen on the rising edge of the 50MHz clock. Personally, I also intend on driving my FPGAs with this reference clock. On paper, this bus should be capable of 100MBps data transfer performance; however, I doubt I'll ever see that in reality. I only need 25MBps throughput to feed the video circuits fast enough. (Both the CPU and the video hardware compete for memory access using the bus arbitration mechanism.)
Pins which are NOT bussed are:
* CDONE -- driven high only when all FPGAs on the card have been configured. RESET is the logical NAND feedback of all CDONE pins. (E.g., if any one CDONE pin is low, RESET is high.)
* CYC# -- driven low only when the card wants to start a transfer cycle on the bus. Similarly, CYCA (Cycle Announce) is the NAND of all CYC# pins. Note that a card must drive CYC# low only when it has permission to. The reason this isn't open-drain is because I needed CYCA to respond within a single 50MHz cycle.
* BGI, BGO -- these form a daisy chained, decentralized, round-robin bus arbitration mechanism. When RESET is asserted, all cards must drive BGO low (the backplane will drive BGI of the left-most card high). A card "has permission" to drive CYC# if, and only if, BGI XOR BGO = 1. If a card doesn't want the bus (anymore), it just passes BGI to BGO for the benefit of the next card. The right-most plug's BGO pin is connected to the left-most plug's BGI pin through an inverter. Unoccupied slots require you to use a jumper from BGI to BGO (like VMEbus).
Only BCL# is open-drain/open-collector. It is used by a card that *really* wants the bus and prefers not to wait its usual turn. It's a polite request to the current bus master to cut its current tenure short if it can. All other pins are actively driven or are three-state in nature. This bus follows Wishbone bus semantics, modified as appropriate to support chip-to-chip and card-to-card interconnects. The Wishbone specs are fully open and easily accessible via Google.
You may be wondering what happened to A24-A55 -- those are not exposed on the bus, due in part because I just don't need them for my needs. My goal is to support up to 16MB of video memory (hence A1-A23), but most other peripherals are substantially smaller than this. I do use the upper-most byte for address decoding though. The goal of this project is two-fold: (1) to help me learn FPGA-based construction techniques, and not so much to realize a finished, commercializable product; and (2), to hopefully get real hardware working by January 2017, in time for the next RISC-V Workshop.
Interrupts are handled through "message-signalled interrupt" mechanism. This is where a controller takes control of the bus and issues a memory write to a special, well-known memory location. The CPU card would monitor the bus for these writes, and upon seeing such, issue a local interrupt.
Anyway, I have 16 positive supply rails, and 16 grounds. The positive supply is 5V, but the rest of the logic in the system runs at 3.3V (FPGA cores run at 1.2V or less). Since an FPGA circuit might have up to four supplies (typically, 3.3V, 1.8V, 1.2V, and/or 0.9V), I just decided to stick with 5V and let each card regulate its own supply. This is more power-hungry, BUT, more flexible, and help R&D efforts. I placed them in column B of the DIN connectors in an effort to help manage high-frequency signal aberrations.
Wow, that was a lot and it may seem like some rambling. But for those with experience designing multi-drop bus systems, I would love to be made aware of any pitfalls before moving forward with this design, particularly with respect to power and ground rails or signal integrity issues. Thanks in advance!