Concept & Design of 3.3V Parallel 16-bit VGA Boards

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

ElEctric_EyE wrote:
I've been looking over your code. Sort of difficult to understand, but it looks like it wouldn't be too difficult to add/modify it to read from the RAM. Is it as straightforward, for a simple test, where one could assign the x and y pixel counters as the RAM address in the main.v module, then have the fifo_data <= SRAMdata?
To display from SRAM, you don't even need x/y pixel counters, but just an SRAM address counter. When vtrigger is asserted, set address <= 0. When not fifo_full, read from SRAM at current address, and increment address. When you receive the data from SRAM, write the data to the fifo. When address reaches 640*480, wait until next vtrigger.

The nice part about having the fifo is that you don't have to worry about how many cycles it takes to get the data from the SRAM, so you could put a memory controller in there to allow CPU access to the SRAM as well.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Arlet wrote:
...The nice part about having the fifo is that you don't have to worry about how many cycles it takes to get the data from the SRAM, so you could put a memory controller in there to allow CPU access to the SRAM as well.
Yeah, that's awesome. Thanks!
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

Note that in addition to SRAM latency, you have I/O delay on the pads, and delay on the board. Depending on the clock, you'll be looking at 1-2 cycles between setting the address, and receiving the data. Of course, when your SRAM is just filled with random data, a timing error will be hard to notice :)
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Is it ok to post some of your code I've modified?
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

Sure...
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

I was working for a few hours trying to modify the main module using the pixel clk as the RAM's address counter clock. It only seemed to stretch the pixels horizontally. I also did a ton of other experiments that failed similarly, so I decided to start with a small modification and learn more successfully at a slower pace with guidance...

So this piece of code is the bottom part of the main.v module where it sends data. The data is obviously not delayed from the address generator. But of even more importance to me is the use of a 100MHz clock for the address counter. Onscreen it looks ok though, no flickering, correct pixel sizes etc..

Code: Select all

/*
 * only write fifo during active pixels
 */
assign fifo_write = !ydone;

/*
 * SyncRAM address generator
 */
parameter a=3,b=4;					//block size

always @(posedge clk)
	if( vtrigger )
		SRA <= 0;
		else if ( !fifo_full & (x[b] ^ y[b]) )
			SRA <= SRA + 1;

/*
 * demo test output
 */
always @*
    if( y < 8 || y > 472 || x < 8 || x > 632 )
        fifo_data = 16'b00000_000000_11111;	// blue border
    else if( x[a:0] == 0 || y[a:0] == 0 )
        fifo_data = 16'b11111_000000_00000;	// red lines
    else if( x[b] ^ y[b] )
        fifo_data = SRD;							// RAM data
    else
        fifo_data = 16'b00000_000000_00000;	// black squares
endmodule
This seems to be more correct to me because I've delayed the data going into the fifo by 1 cycle of clk@100MHz compared to the address generated by pclk0, but the pixels are horizontally stretched. When I use pclk0 instead of pclk1 for the fifo clock, I do see flickering:

Code: Select all

/*
 * only write fifo during active pixels
 */
assign fifo_write = !ydone;

/*
 * SyncRAM address generator
 */
parameter a=3,b=4;					//block size

reg pclk1;

always @(posedge clk) begin
	pclk1 <= pclk0;
end

always @(posedge pclk0)
		if( vtrigger )
			SRA <= 0;
		else if ( !fifo_full & (x[b] ^ y[b]) )
			SRA <= SRA + 1;

/*
 * demo test output
 */
always @(posedge pclk1)
    if( y < 8 || y > 472 || x < 8 || x > 632 )
        fifo_data = 16'b00000_000000_11111;	// blue border
    else if( x[a:0] == 0 || y[a:0] == 0 )
        fifo_data = 16'b11111_000000_00000;	// red lines
    else if( x[b] ^ y[b] )
        fifo_data = SRD;							// RAM data
    else
        fifo_data = 16'b00000_000000_00000;	// black squares
endmodule
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

You should only use the main 100 MHz clock to write data into the FIFO. By writing the data at 100 MHz, and reading it at 25 MHz, you can quickly fill the FIFO with data, allowing you to pause for processing at a later time. That's one of the benefits of using the FIFO. So, your first block of code is correct, except for the fact that it ignores the delay between address and read data valid (which would be tricky to fix if you wanted to keep the square pattern superimposed on the bitmap)
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Arlet wrote:
...(which would be tricky to fix if you wanted to keep the square pattern superimposed on the bitmap)
No, there's no point to that. I just need to learn proper timing for reading only from the SyncRAM at this point. I'll work on the more correct version, thanks.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Oh, I begin to understand now. Sorry I am old-school and thought we had to present the pixel clock to the RAM. So we are working that SyncRAM at 100MHz for the FIFO's sake and it is taking care of all the data coming out at the other end at 1/4 speed. So the faster the SyncRAM the better is, no doubt, a hard and fast rule.

So using a main clock of 100MHz, the system through-put will decrease in performance as the pixel clock approaches the main clock...
Remember the old days when limited graphic RAM operations only happened during VSync & HSync. Keeping that lower performance in mind, what ratio of mainclock/pixelclock would be equal to a similar performance using a FIFO type system? Any estimations or real world experience?
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

This is what I had previously. I thought it would be cool to show the full code and some pics.

Code: Select all

/*
 * top level module
 *
 * (C) Arlet Ottens <arlet@c-scape.nl>
 *
 */

module main( 
    input clk100,
	 input  [15:0] SRD,					//SyncRAM data
    output pclk_out,
	 output SRCLK,							//SyncRAM clock
    output [4:0] red,
    output [5:0] green,
    output [4:0] blue,
    output vsync,
    output hsync,
	 output reg DACBLANKn = 1,
	 output reg SRCS = 1,				//SyncRAM CS, active high
	 output reg WEn = 1,					//SyncRAM WE, active low
	 output reg [20:0] SRA				//SyncRAM Address
	 );

wire [15:0] rgb;
assign blue = rgb[4:0];
assign green = rgb[10:5];
assign red = rgb[15:11];
wire fifo_write;
wire fifo_full;
reg [15:0] fifo_data;
wire pclk0;
wire vtrigger;

/* pixel clock output using DDR flipflop */
ODDR2 ODDRA (
    .Q(pclk_out),
    .C0(pclk),
    .C1(~pclk),
    .CE(1'b1),
    .D0(1'b1),
    .D1(1'b0),
    .R(1'b0),
    .S(1'b0)
    );

/* SyncRAM Clock output also using DDR flipflop */
ODDR2 ODDRB (
    .Q(SRCLK),
    .C0(pclk),
    .C1(~pclk),
    .CE(1'b1),
    .D0(1'b1),
    .D1(1'b0),
    .R(1'b0),
    .S(1'b0)
    );
	 
wire dcm_clk100;
wire clk;

/* clock buffers */
IBUFG IBUFG_clk( .I(clk100), .O(dcm_clk100) );
BUFG BUFG_clk( .I(dcm_clk100), .O(clk) );
BUFG BUFG_PCLK( .I(pclk0), .O(pclk) );

/* Use DCM to generate 25 MHz VGA pixel clock from 100 MHz main clock */

DCM_SP #(
         .CLKDV_DIVIDE(4.0),
         .CLKFX_DIVIDE(8),
         .CLKFX_MULTIPLY(2),
         .CLKIN_DIVIDE_BY_2("FALSE"),
         .CLKIN_PERIOD(10.0),
         .CLKOUT_PHASE_SHIFT("FIXED"),
         .CLK_FEEDBACK("1X"),
         .DESKEW_ADJUST("SYSTEM_SYNCHRONOUS"),
         .DLL_FREQUENCY_MODE("LOW"),
         .DUTY_CYCLE_CORRECTION("TRUE"),
         .PHASE_SHIFT(0),
         .STARTUP_WAIT("FALSE")
) DCM_SP_inst (
        .CLKFX(pclk0),      // 0 degree DCM CLK output
        .CLKFB(pclk),      // DCM clock feedback
        .PSEN(1'b0),       // no variable phase shift
        .CLKIN(dcm_clk100),       // Clock input (from IBUFG, BUFG or DCM)
        .RST(1'b0)
);

/*
 * VGA generator
 */
vga vga( 
	.clk(clk),
	.pclk(pclk),
	.hsync(hsync),
	.vsync(vsync),
	.fifo_data(fifo_data),
	.fifo_write(fifo_write),
	.fifo_full(fifo_full),
	.rgb(rgb) ,
	.vtrigger(vtrigger)
   	);

/*
 * when vtrigger is pulsed, generate new frame by sending 640x480 pixels
 * to FIFO.
 */

reg [11:0] x = 0;
reg [10:0] y = 0;

wire xdone = (x == 639) && !fifo_full;
wire ydone = (y == 480);

/*
 * count x, reset at end of line, and pause when FIFO is full
 */
always @(posedge clk)
    if( vtrigger || xdone )
        x <= 0;
    else if( !fifo_full ) 
        x <= x + 1;
		
/*
 * count y, reset at start of new frame, and increment at end
 * of line. Pause when FIFO is full.
 */
always @(posedge clk)
    if( vtrigger ) 
        y <= 0;
    else if( xdone && !ydone )
        y <= y + 1;

/*
 * only write fifo during active pixels
 */
assign fifo_write = !ydone;

/*
 * SyncRAM address generator
 */
parameter a=3,
			 b=4;					//block size

always @(posedge clk)
		if( vtrigger )
			SRA <= 0;
		else if ( !fifo_full & (x[b] ^ y[b]) )
			SRA <= SRA + 1;

/*
 * demo test output using external SyncRAM input
 */
always @*
    if( y < 1 || y > 479 || x < 1 || x > 639 )
        fifo_data = 16'b00000_000000_11111;	// blue border
    else if( x[a:0] == 0 || y[a:0] == 0 )
        fifo_data = 16'b11111_000000_00000;	// red lines
    else if( x[b] ^ y[b] )
        fifo_data = SRD;							// RAM data
    else
        fifo_data = 16'b00000_000000_00000;	// black squares
endmodule
Image
CLOSE-UP - (Sorry about the focus):
Image
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

ElEctric_EyE wrote:
Any estimations or real world experience?
Hard to give general statements, because it all depends on what you want to do. It depends on resolution, clock frequencies and access patterns. In general, you calculate how much time you have based on VGA timings, and how much time it takes, based on amount of data and access times.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Ok. Another question regarding fifo usage: Can we make the fifo clock any frequency we want? I would prefer to try to max it out, with a PLL if necessary.

So, I think it would probably be best to use another clock output from the current DCM that is shifted a certain amount to use for the fifo clock, to compensate for the RAM access + FPGA delay. Done with the pattern generator. And the memory interface of a CPU type would have to be made part of main.v, since the addresses to the SyncRAM would have to be multiplexed?
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

Yes, you can make the fifo clock (or main clock, as you'd probably want to use it for everything else too) as high as you like, as long as you meet timing for your design. For my SDRAM module, I made a phase shifted clock, but I think a better approach is to use the same clock for the CPU as for the SRAM. Any delay that's less than a whole clock cycle can be fixed in the IO block by setting an input delay. The delay should be tuned, such that the incoming data is stable during the setup/hold interval around the clock. This tuning can be done by hand: find out the minimum and maximum delay where it still works, and then pick the value in the middle.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by ElEctric_EyE »

Arlet wrote:
... Any delay that's less than a whole clock cycle can be fixed in the IO block by setting an input delay...
I've read other places about this. And doing some quick reading, apparently one can set it right before the module declaration within the code using :

Code: Select all

(* IOBDELAY = {NONE|BOTH|IBUF|IFD} *)
or from within the .ucf using:

Code: Select all

INST “instance_name” IOBDELAY={NONE|BOTH|IBUF|IFD};
I would prefer to do it with the module, so then for setting the delay value (0-16) for IBUF (a signal):

Code: Select all

(* IBUF_DELAY_VALUE="value " *) input top_level_port_name;
for IFD (a clock) a value of (0-8):

Code: Select all

(* IFD_DELAY_VALUE="value " *) input top_level_port_name;
I found this information in the Xilinx constraints guide UG625, v13.4.

There appear to be no provisions for setting a delay on an output pin.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Post by Arlet »

ElEctric_EyE wrote:
There appear to be no provisions for setting a delay on an output pin.
Correct, but usually you don't need that. In case you do need a delay, you could use a DCM to produce a delayed clock, and use that to drive the output pad.
Post Reply