Concept & Design of 3.3V Parallel 16-bit VGA Boards
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
So my first step was to do a block diagram with all signals and signal directions present. I tried to just start typing in the Verilog for the top-level interconnects, but found there were just too many signals to lat it all out from memory.
The idea for the 'initial' operation is for the CPU to be able to read and write to the external synchronous RAM during the horizontal and vertical retrace periods. All the square blocks will be the individual Verilog modules, although I may have to add another module to accommodate the bidirectional data bus of the SRAM. Also, I expect the timing will be off as I use a common clock for everything, with no provisions for the delay of the FPGA and SRAM, but I would expect to see some recognizable activity if the .b core software is actually running. I will just try to clear the video RAM for the first test.
This part of the project is basically a compilation of everything I've learned from previous FPGA projects using 6502 soft cores with blockRAM together with what I've most recently learned about video using the parallel video boards.
The idea for the 'initial' operation is for the CPU to be able to read and write to the external synchronous RAM during the horizontal and vertical retrace periods. All the square blocks will be the individual Verilog modules, although I may have to add another module to accommodate the bidirectional data bus of the SRAM. Also, I expect the timing will be off as I use a common clock for everything, with no provisions for the delay of the FPGA and SRAM, but I would expect to see some recognizable activity if the .b core software is actually running. I will just try to clear the video RAM for the first test.
This part of the project is basically a compilation of everything I've learned from previous FPGA projects using 6502 soft cores with blockRAM together with what I've most recently learned about video using the parallel video boards.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I've added the 2 blockRAMs for the .b core zero-page and stack, but not the ROM yet. R1 and R2 are the FPGA blockRAM databus outputs. Also, that ORs module in the above pic turned out to be an extremely simple line of code inside the top_level module, so a separate ORs module wasn't even needed. This is what attracts me to Verilog; after building the equivalent structure with schematic entry ORing 4 16-bit databuses Inputs & 16 Outputs was a PITA. Now, using Verilog, it takes 1 simple line of code! 
So, I've yet to add the ROM and address decoding module and to finish the SRAMif module before some real world testing can begin. I plan on 1024x768 with a 70MHz system clock.
Also, screw that part of trying to write to the video RAM when HSYNC & VSYNC were inactive, only because the cpu would have to be interrupted or it would have to read some bits from a port when HSYNC or VSYNC was inactive. At this point the 'snow effect' should be acceptable before transitioning to a FIFO buffer. I will need help with this though!
Code: Select all
Assign cpuDI = ( R1Dout | R2Dout | VramDout )So, I've yet to add the ROM and address decoding module and to finish the SRAMif module before some real world testing can begin. I plan on 1024x768 with a 70MHz system clock.
Also, screw that part of trying to write to the video RAM when HSYNC & VSYNC were inactive, only because the cpu would have to be interrupted or it would have to read some bits from a port when HSYNC or VSYNC was inactive. At this point the 'snow effect' should be acceptable before transitioning to a FIFO buffer. I will need help with this though!
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Most FPGAs no longer support internal tri-state busses. There are some respected sources that recommend against using internal tri-state buffers. I, on the other hand, believe in letting the tools transform internal tri-state busses (in FPGAs) into the appropriate implementation. I find it error prone and tedious to specify each and every bus, and to explicitly define the required multiplexers.
EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block diagram/schematic.
To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.
One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.
An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
In this example, the OR gate is implicitly created by the synthesizer.
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block diagram/schematic.
To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.
One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.
An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
Code: Select all
localparam pDataWidth = 32;
wire [9:0] Addrs;
wire CPU_WE, CPU_RE;
wire [(pDataWidth - 1):0] CPU_DI, CPU_DO;
reg CS_RAM_A, CS_RAM_B, CS_RAM_C;
wire WE_RAM_A, OE_RAM_A;
wire WE_RAM_B, OE_RAM_B;
wire WE_RAM_C, OE_RAM_C;
reg [(pDataWidth - 1):0] RAM_A [0:255];
reg [(pDataWidth - 1):0] RAM_B [0:255];
reg [(pDataWidth - 1):0] RAM_C [0:511];
reg [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;
always @(*)
begin
casex(Addrs[9:8])
2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
endcase
end
assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;
assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;
always @(posedge Clk)
begin
if(WE_RAM_A)
RAM_A[Addrs] <= CPU_DI;
RAM_A_DO <= #1 RAM_A[Addrs];
end
assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_B)
RAM_B[Addrs] <= CPU_DI;
RAM_B_DO <= #1 RAM_B[Addrs];
end
assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_C)
RAM_C[Addrs] <= CPU_DI;
RAM_C_DO <= #1 RAM_C[Addrs];
end
assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
Michael A.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Nowadays I just the 'wor' signal type. It implements a wired OR, and then you can just drive the signal with multiple modules. I design every module so that it outputs a '0' when not selected. Usually, this can be done with a minimum of resources. For instance, the block RAMs already have a dedicated signal to reset the output, and flip flops have reset inputs as well. Even 4-input LUTs that implement a full adder can use a spare input on the LUT to produce a zero output. The tools will automatically infer a wide OR gate to combine the signals, which is cheaper than a general MUX, because it doesn't require any select signals. So a 6-input LUT can be a 6-input OR, rather than a 4-1 MUX.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Michael, thanks for your input. Currently I'm going with what I know has worked in my past projects.
So, I'm at the point where everything passes synthesis, although the SRAM interface module is not complete. It's not passing implementation with a vague 'null BMM file' error. ISE seems to be looking for a BMM file although I have INIT_FILE ("boot.coe") in the ROM1 module.
I have only used COE files in the past, but that was after generating the ROM with the lightbulb CoreGenerator tool. Now I am strictly using Verilog and instantiating a BRAM using RAMB16BWER. Is there any way to change a setting so ISE looks for a COE file for the initialization?
So, I'm at the point where everything passes synthesis, although the SRAM interface module is not complete. It's not passing implementation with a vague 'null BMM file' error. ISE seems to be looking for a BMM file although I have INIT_FILE ("boot.coe") in the ROM1 module.
I have only used COE files in the past, but that was after generating the ROM with the lightbulb CoreGenerator tool. Now I am strictly using Verilog and instantiating a BRAM using RAMB16BWER. Is there any way to change a setting so ISE looks for a COE file for the initialization?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I think I figured it out. I had to use CoreGen to generate the 1Kx16 Stack and ZP Rams and the 4Kx16 ROM. Then I used the same names (ZPRAM, STRAM, SYSROM) used in CoreGen and assigned the pins in the Verilog top level.
ISE automatically linked to the .XCO files inside the project.
Now the framework is there for a complete system with all modules present and passing synthesis and implementation (for the most part).
Now it's time to work on the SRAMif, the interface module for the SyncRAM. At the completion of this module, hopefully I will have something running!
EDIT: I forgot about the address decoding. Is it as simple as something like?
Code: Select all
ZPRAM XLXI_7 (.clka(clk),
.rsta(ram1CS),
.ena(1'b1),
.wea(ram1WE),
.addra(cpuAB),
.dina(cpuDO),
.douta(ram1DO));
STRAM XLXI_8 (.clka(clk),
.rsta(ram2CS),
.ena(1'b1),
.wea(ram2WE),
.addra(cpuAB),
.dina(cpuDO),
.douta(ram2DO));
SYSROM XLXI_9 (.clka(clk),
.rsta(rom1CS),
.ena(1'b1),
.addra(cpuAB),
.douta(rom1DO));Now the framework is there for a complete system with all modules present and passing synthesis and implementation (for the most part).
Now it's time to work on the SRAMif, the interface module for the SyncRAM. At the completion of this module, hopefully I will have something running!
EDIT: I forgot about the address decoding. Is it as simple as something like?
Code: Select all
always @*
if ( cpuAB >= 32'h8000_000 & cpuAB <= 32'h801f_ffff )
SRCS <= 1;
else SRCS <=0; Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
EEyE:
You should be able to infer the Spartan 6 block RAMs in either single or dual port configurations. The following code snippet (taken from M65C02_Core.v)shows how to use the initial statement to initialize the block RAM (whether inferred as a ROM or a RAM) without have to rely on CoreGen.
The system function $readmemb() or $readmemh() can be used. I built a "DOS" utility program found here on GitHub to take the binary output of the Kingswood A65 assembler and output an ASCII hex file that is padded out to a power of two size.
With this approach, I've embedded copyright notices and other initialization data in FIFOs and other memories in FPGAs that I ship.
You should be able to infer the Spartan 6 block RAMs in either single or dual port configurations. The following code snippet (taken from M65C02_Core.v)shows how to use the initial statement to initialize the block RAM (whether inferred as a ROM or a RAM) without have to rely on CoreGen.
Code: Select all
// Infer Microprogram ROM and initialize with file created by MCP_Tool
initial
$readmemb(pM65C02_uPgm, uP_ROM, 0, (pROM_Depth - 1));
always @(posedge Clk)
begin
if(MPC_En | Rst)
uPL <= #1 uP_ROM[MA];
end
With this approach, I've embedded copyright notices and other initialization data in FIFOs and other memories in FPGAs that I ship.
Michael A.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Thanks for the pointer. Trying to find info about readmem led me to the Xilinx XST manual. I've never read this manual before.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
MichaelM wrote:
...An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
In this example, the OR gate is implicitly created by the synthesizer.
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
Code: Select all
localparam pDataWidth = 32;
wire [9:0] Addrs;
wire CPU_WE, CPU_RE;
wire [(pDataWidth - 1):0] CPU_DI, CPU_DO;
reg CS_RAM_A, CS_RAM_B, CS_RAM_C;
wire WE_RAM_A, OE_RAM_A;
wire WE_RAM_B, OE_RAM_B;
wire WE_RAM_C, OE_RAM_C;
reg [(pDataWidth - 1):0] RAM_A [0:255];
reg [(pDataWidth - 1):0] RAM_B [0:255];
reg [(pDataWidth - 1):0] RAM_C [0:511];
reg [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;
always @(*)
begin
casex(Addrs[9:8])
2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
endcase
end
assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;
assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;
always @(posedge Clk)
begin
if(WE_RAM_A)
RAM_A[Addrs] <= CPU_DI;
RAM_A_DO <= #1 RAM_A[Addrs];
end
assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_B)
RAM_B[Addrs] <= CPU_DI;
RAM_B_DO <= #1 RAM_B[Addrs];
end
assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_C)
RAM_C[Addrs] <= CPU_DI;
RAM_C_DO <= #1 RAM_C[Addrs];
end
assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
Code: Select all
assign CPU_DI = ...Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
There's no conflict, because the CPU_DI drivers are all 3-state. That's what the conditional assignment with 1'bZ is for: it drives 'Z' on the bus when the OE signal is not asserted. So only one driver (assuming no mistakes) drives the bus, and the others are all 'Z', so they don't conflict.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
But the tools will not make an internal tri-state data bus will they? Maybe an older version of ISE will do this?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
It depends on the device. If the device has internal buses, the tools can use them. The Spartan 3 and 6 don't have these buses, so the tools (even older versions) will convert the logic to a suitable mux.
That's why I suggested using a wired OR instead. A wide OR uses less resources (and is faster) than a MUX. Depending on the situation, you may have to use some extra logic to produce a '0' when the device is not selected, but very often this logic is free.
That's why I suggested using a wired OR instead. A wide OR uses less resources (and is faster) than a MUX. Depending on the situation, you may have to use some extra logic to produce a '0' when the device is not selected, but very often this logic is free.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I've not yet attempted to use Arlet's recommendation: use signal whose type is declared as wor. In the past, my concern over making the RTL work led me to not use wor, wand, and several other signal types not generally associated with synthesizable code. Habits develop, but synthesizers evolve. I just never went back and tried some of these other signal types to see if the synthesizer accepted them.
The code example I provided will be automatically reduced to the multiple input multiplexer that you drew in your block diagram. I was attempting to demonstrate for you a technique that I use to avoid having to manually construct the multiplexer. If an unknown or variable number of signals connect to the "bus", then the explicit multiplexer has to be manually adjusted whenever you add data sources to the bus and/or when you subtract them from the bus.
My suggested approach relieves you from keeping track of this. Arlet's suggested mechanism does the same thing, The point of using these approaches is to let the tool automatically construct the multiplexer.
The code example I provided will be automatically reduced to the multiple input multiplexer that you drew in your block diagram. I was attempting to demonstrate for you a technique that I use to avoid having to manually construct the multiplexer. If an unknown or variable number of signals connect to the "bus", then the explicit multiplexer has to be manually adjusted whenever you add data sources to the bus and/or when you subtract them from the bus.
My suggested approach relieves you from keeping track of this. Arlet's suggested mechanism does the same thing, The point of using these approaches is to let the tool automatically construct the multiplexer.
Michael A.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
MichaelM wrote:
...My suggested approach relieves you from keeping track of this. Arlet's suggested mechanism does the same thing, The point of using these approaches is to let the tool automatically construct the multiplexer.
Code: Select all
assign cpuDI = ( ram1DO | ram2DO | rom1DO | SRDO );