Most FPGAs no longer support internal tri-state busses. There are some respected sources that recommend against using internal tri-state buffers. I, on the other hand, believe in letting the tools transform internal tri-state busses (in FPGAs) into the appropriate implementation. I find it error prone and tedious to specify each and every bus, and to explicitly define the required multiplexers.
EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block
diagram/schematic.
To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.
One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.
An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
Code:
localparam pDataWidth = 32;
wire [9:0] Addrs;
wire CPU_WE, CPU_RE;
wire [(pDataWidth - 1):0] CPU_DI, CPU_DO;
reg CS_RAM_A, CS_RAM_B, CS_RAM_C;
wire WE_RAM_A, OE_RAM_A;
wire WE_RAM_B, OE_RAM_B;
wire WE_RAM_C, OE_RAM_C;
reg [(pDataWidth - 1):0] RAM_A [0:255];
reg [(pDataWidth - 1):0] RAM_B [0:255];
reg [(pDataWidth - 1):0] RAM_C [0:511];
reg [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;
always @(*)
begin
casex(Addrs[9:8])
2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
endcase
end
assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;
assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;
always @(posedge Clk)
begin
if(WE_RAM_A)
RAM_A[Addrs] <= CPU_DI;
RAM_A_DO <= #1 RAM_A[Addrs];
end
assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_B)
RAM_B[Addrs] <= CPU_DI;
RAM_B_DO <= #1 RAM_B[Addrs];
end
assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_C)
RAM_C[Addrs] <= CPU_DI;
RAM_C_DO <= #1 RAM_C[Addrs];
end
assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});
In this example, the OR gate is implicitly created by the synthesizer.
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.