Concept & Design of 3.3V Parallel 16-bit VGA Boards
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
So a Block RAM should be used to store up to 1024linesx56bits of line info? Data transfer to this BRAM could happen during every screen vertical refresh (vblank & vsynce)?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Yes, that's one way to do it. And because the BRAM is dual ported, this is quite easily implemented. Connect one port to the line module, and use the other port to write during the blanking interval.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Some progress with Bresenham, but still a few bugs. Only the first frame works, because I don't reset the per-vector state. Also, as you can see in the picture, the red lines have some weird effects on the left side.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Nice progress! How many lines total? ~70?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
And they flicker if you scroll the display back and forth quickly.
Pretty cool.
Pretty cool.
Michael A.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
ElEctric_EyE wrote:
Nice progress! How many lines total? ~70?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I got my Christmas present early from work and was told to 'hit the road Jack', so I spent last week finding a new job and have been away from the project. Now I have the next 4 days off and wish to forget about my situation.
Arlet, any chance we can see your new code even though it may not be fully functional?
Arlet, any chance we can see your new code even though it may not be fully functional?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I just put it up on github. I fixed some of the bugs, but there's still some things not working properly. I've been too busy with work last week, so I haven't done much verilog lately.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
At the risk of sounding self-important, I wish to say that I am on a firm path to get my old job back, so stress levels are beginning to decrease. 
Soon, I hope to get back into the project and make a useful contribution with a fresh point of view.
Soon, I hope to get back into the project and make a useful contribution with a fresh point of view.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Thanks Ed. 
I think the next step is to assemble another parallel video board and then to experiment with the digital mixing, even if this means plotting some pixels by plotting a scrolling character and sending data to be mixed into the next board. The next board would scroll another character of a different color 'through' the previous video boards output. Monochrome is easy IIRC, one can just use the XOR math function.
I'm going to continue on my initial path of utilizing the external SyncRAM to store the bitmap, also add the 65Org16.b CPU into the design and a bi-directional data path to the SyncRAM. It will be interesting to observe the top speed limitations.
I think the next step is to assemble another parallel video board and then to experiment with the digital mixing, even if this means plotting some pixels by plotting a scrolling character and sending data to be mixed into the next board. The next board would scroll another character of a different color 'through' the previous video boards output. Monochrome is easy IIRC, one can just use the XOR math function.
I'm going to continue on my initial path of utilizing the external SyncRAM to store the bitmap, also add the 65Org16.b CPU into the design and a bi-directional data path to the SyncRAM. It will be interesting to observe the top speed limitations.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
So my first step was to do a block diagram with all signals and signal directions present. I tried to just start typing in the Verilog for the top-level interconnects, but found there were just too many signals to lat it all out from memory.
The idea for the 'initial' operation is for the CPU to be able to read and write to the external synchronous RAM during the horizontal and vertical retrace periods. All the square blocks will be the individual Verilog modules, although I may have to add another module to accommodate the bidirectional data bus of the SRAM. Also, I expect the timing will be off as I use a common clock for everything, with no provisions for the delay of the FPGA and SRAM, but I would expect to see some recognizable activity if the .b core software is actually running. I will just try to clear the video RAM for the first test.
This part of the project is basically a compilation of everything I've learned from previous FPGA projects using 6502 soft cores with blockRAM together with what I've most recently learned about video using the parallel video boards.
The idea for the 'initial' operation is for the CPU to be able to read and write to the external synchronous RAM during the horizontal and vertical retrace periods. All the square blocks will be the individual Verilog modules, although I may have to add another module to accommodate the bidirectional data bus of the SRAM. Also, I expect the timing will be off as I use a common clock for everything, with no provisions for the delay of the FPGA and SRAM, but I would expect to see some recognizable activity if the .b core software is actually running. I will just try to clear the video RAM for the first test.
This part of the project is basically a compilation of everything I've learned from previous FPGA projects using 6502 soft cores with blockRAM together with what I've most recently learned about video using the parallel video boards.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
I've added the 2 blockRAMs for the .b core zero-page and stack, but not the ROM yet. R1 and R2 are the FPGA blockRAM databus outputs. Also, that ORs module in the above pic turned out to be an extremely simple line of code inside the top_level module, so a separate ORs module wasn't even needed. This is what attracts me to Verilog; after building the equivalent structure with schematic entry ORing 4 16-bit databuses Inputs & 16 Outputs was a PITA. Now, using Verilog, it takes 1 simple line of code! 
So, I've yet to add the ROM and address decoding module and to finish the SRAMif module before some real world testing can begin. I plan on 1024x768 with a 70MHz system clock.
Also, screw that part of trying to write to the video RAM when HSYNC & VSYNC were inactive, only because the cpu would have to be interrupted or it would have to read some bits from a port when HSYNC or VSYNC was inactive. At this point the 'snow effect' should be acceptable before transitioning to a FIFO buffer. I will need help with this though!
Code: Select all
Assign cpuDI = ( R1Dout | R2Dout | VramDout )So, I've yet to add the ROM and address decoding module and to finish the SRAMif module before some real world testing can begin. I plan on 1024x768 with a 70MHz system clock.
Also, screw that part of trying to write to the video RAM when HSYNC & VSYNC were inactive, only because the cpu would have to be interrupted or it would have to read some bits from a port when HSYNC or VSYNC was inactive. At this point the 'snow effect' should be acceptable before transitioning to a FIFO buffer. I will need help with this though!
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Most FPGAs no longer support internal tri-state busses. There are some respected sources that recommend against using internal tri-state buffers. I, on the other hand, believe in letting the tools transform internal tri-state busses (in FPGAs) into the appropriate implementation. I find it error prone and tedious to specify each and every bus, and to explicitly define the required multiplexers.
EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block diagram/schematic.
To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.
One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.
An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
In this example, the OR gate is implicitly created by the synthesizer.
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block diagram/schematic.
To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.
One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.
An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
Code: Select all
localparam pDataWidth = 32;
wire [9:0] Addrs;
wire CPU_WE, CPU_RE;
wire [(pDataWidth - 1):0] CPU_DI, CPU_DO;
reg CS_RAM_A, CS_RAM_B, CS_RAM_C;
wire WE_RAM_A, OE_RAM_A;
wire WE_RAM_B, OE_RAM_B;
wire WE_RAM_C, OE_RAM_C;
reg [(pDataWidth - 1):0] RAM_A [0:255];
reg [(pDataWidth - 1):0] RAM_B [0:255];
reg [(pDataWidth - 1):0] RAM_C [0:511];
reg [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;
always @(*)
begin
casex(Addrs[9:8])
2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
endcase
end
assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;
assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;
always @(posedge Clk)
begin
if(WE_RAM_A)
RAM_A[Addrs] <= CPU_DI;
RAM_A_DO <= #1 RAM_A[Addrs];
end
assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_B)
RAM_B[Addrs] <= CPU_DI;
RAM_B_DO <= #1 RAM_B[Addrs];
end
assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});
always @(posedge Clk)
begin
if(WE_RAM_C)
RAM_C[Addrs] <= CPU_DI;
RAM_C_DO <= #1 RAM_C[Addrs];
end
assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});
This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.
Michael A.