1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

This morning I finally tracked down the last of the problems and now have successful READ from the SyncRAM. In this instance, the cpu sends the X-coordinate then the Y-coordinate. On reception of the Y coordinate, the graphics accelerator halts the cpu and puts the pixel color info into a read-only register and then frees up the cpu.
All of the accelerator functions are performed after it receives a Y coordinate.
So far the hardware graphics accelerator can perform:
Draw Line
Copy/Paste
Pixel Plot
Fill (simple rectangular)
8x8 Character Plot (background/foreground colors)
Pixel Color Read

What's strange is I had early success on the blitter, so I assumed I could take the READ part of that state machine and use it for a another READ only state machine, as opposed to a READ-WRITE-READ-WRITE... This was not the case and more observation is needed to fully comprehend why this is so. 3 weeks spent on this one!
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

I'll continue developing on the four populated boards I currently have, but I've been wanting to delve into the BGA world of Xilinx FPGAs and design a new more capable board. Earlier I've had my eye on a larger Spartan 6 1mm 256-pin BGA. I forget the price, something like $50+.
But recently I've seen the 28nm 256-pin BGA Xilinx Artix FPGA for sale for $35 for the slowest version: XC7A35T-1FTG256C. This may be the path I choose, but I do not see it available in ISE. Will do more research... New thread here.

BTW, here are the resources currently used (from that thread). The project has almost Max'd this LX9:
Attachments
output board utilization.jpg
Last edited by ElEctric_EyE on Mon Jul 28, 2014 7:54 pm, edited 1 time in total.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

So the project had 2 16Kx16 BRAMs to store X & Y coordinates, only single port mode was used. The cpu would send an 10-bit 'phase' value to the sin/cos LUT. The sin/cos LUT outputted 10-bit data back to the cpu.
The software stores 2 sinewaves into 2 BRAMs. A sin shifted by 1/4 (i.e. 256/1024) is actually cosine. These (sin,cos) (X,Y) coordinates will plot a circle. Modifying the phases of either the X register or the Y register will make an ellipse:

Code: Select all

                  LDA #650
                  STA XORG
                  LDA #450
                  STA YORG
                  
                  LDWi $0000             
                  LDX #0
                  LDY #256
sinwave1          STX phase
                  LDA sinout
                  CLC
                  ADC XORG
                  STAaw scratchx1        
                  STY phase
                  LDA sinout
                  CLC
                  ADC YORG
                  STAaw scratchy1        
                  CPX #1024
                  BMI XNZ1
                  LDX #0
XNZ1              INX                   
                  CPY #1024
                  BMI YNZ1
                  LDY #0
YNZ1              INY
                  INW
                  CPWi $0400             
                  BNE sinwave1
Now I have added a state machine to the hardware accelerator that does the above in hardware. In addition I've added another LUT so that the X & Y values can be computed simultaneously and sent to each BRAM. (I love hardware!) :D
Now the software looks something like this, although different offsets. It auto starts and puts the cpu in RDY upon receipt of the Yphase value:

Code: Select all

                  LDA #0                 ;0-15 1K blocks
                  STA Xblock
                  STA Yblock
                  
                  LDA #20                ;pixel offset values for screen placement
                  STA Xoffset
                  STA Yoffset
                  
                  LDA #0
                  STA Xphase
                  LDA #256
                  STA Yphase
                  
                  LDA #1
                  STA Xblock
                  STA Yblock
                  
                  LDA #1920-20-1024
                  STA Xoffset
                  LDA #20
                  STA Yoffset
                  
                  LDA #0                  ;10-bit phase resolution
                  STA Xphase
                  LDA #256
                  STA Yphase
It was nice to realize after coding the Verilog for DP BRAM initially, that ISE14.7 actually recognized I was trying to infer DP BRAMs. So on port A the cpu can read and write to the 16Kx16 X & Y BRAMs, on port B the hardware accelerator also can read and write simultaneously. Granted, the cpu is halted when the hardware accelerator is reading or writing, but the dual ports really make a design easy.

On the top_level the X & Y SCRAMs (scratchpad BRAMs) are instantiated like this:

Code: Select all

    SCRAM SCRATCHRAMX (	.clkA(clk),					  
								.weA(SR1WE),
								.rstA(SR1CS),
								.addrA(cpuAB [13:0]),
								.dinA(cpuDO [15:0]),
								.doutA(cpuDI [15:0]),
								.clkB(clk),
								.weB(xwe),
								.rstB(1'b0),
								.addrB(addrx [13:0]),
								.dinB(xdata [SINaw:0]),
								.doutB(xdataout [SINaw:0]));
					
	SCRAM SCRATCHRAMY (	.clkA(clk),					  
								.weA(SR2WE),
								.rstA(SR2CS),
								.addrA(cpuAB [13:0]),
								.dinA(cpuDO [15:0]),
								.doutA(cpuDI [15:0]),
								.clkB(clk),
								.weB(ywe),
								.rstB(1'b0),
								.addrB(addry [13:0]),
								.dinB(ydata [SINaw:0]),
								.doutB(ydataout [SINaw:0]));
The actual SCRAM code:

Code: Select all

// 16Kx11 scratch page FPGA dualport blockRAM
`timescale 1ns / 1ps

module SCRAM ( input clkA, clkB,
					input weA, weB,
					input rstA, rstB,
					input [13:0] addrA, addrB,
					input [15:0] dinA, dinB,
					output reg [15:0] doutA, doutB
					);
					
reg [10:0] RAM [16383:0];

always @(posedge clkA) 
	if (weA) 
		RAM[addrA] <= dinA;
		else 
			doutA <= rstA ? 16'h0000 : RAM[addrA];

always @(posedge clkB)
	if (weB)
		RAM[addrB] <= dinB;
		else 
			doutB <= rstB ? 16'h0000 : RAM[addrB];
		
endmodule
EDIT: Uses just abit more resources. Next goal is to modify the size of the output which will require adding a certain value in 2 quadrants, and subtracting that value in the other 2 quadrants.
Attachments
output board utilizationb.jpg
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

Resizing a circle/ellipse using the sin/cos LUT's only work when shifting values to the right, which is a divide by 2, 4, 8 etc. This is unacceptable because only ~10 sizes of circles are realizeable for 10 bits in 1920x1080 resolution. Duh, right?
So, after much time and effort, the sin/cos LUT's are not going to work for the limited BRAM space inside the FPGA.

I had early success with the hardware plotting of a circle using 8 octants with Bresenham circle in hardware (ie Verilog). Plotting directly to the video SyncRAM in (X,Y) Cartesian coordinates was easy with the algorithm I adapted from Wikipedia, but this was not sending coordinates to the BRAMs. Looking forward, adapting the state machine where it used to plot 8 octants for immediate plotting into linear addresses for storage is going to be a challenge.

Tomorrow I'll post a 1920x1080 static pic (and more explanation)of the 10 sin waves in blue, cosine waves in red and circles in green. Right now most is in purple and I've got no time left tonight.
Attachments
sincosLUTscharplotreadfillcopypaste.jpg
Last edited by ElEctric_EyE on Wed Aug 06, 2014 1:02 am, edited 1 time in total.
User avatar
MichaelM
Posts: 761
Joined: 23 Apr 2012
Location: Huntsville, AL

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by MichaelM »

EEyE:

Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
Michael A.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

MichaelM wrote:
...Have you looked at Jack Crenshaw's Real-Time Math Toolkit?
Thanks for the input Michael! I'll look into it.
Colors are correct now.
Attachments
sincosLUTscharplotreadfillcopypaste2b.jpg
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by GARTHWILSON »

MichaelM wrote:
EEyE:

Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
Discussed at viewtopic.php?f=2&t=1971. EEye was the last to post on that topic. I have the paper book which our professional-programmer daughter-in-law got me.
math book2.jpg
math book2.jpg (26.4 KiB) Viewed 1635 times
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

GARTHWILSON wrote:
...Discussed at viewtopic.php?f=2&t=1971. EEye was the last to post on that topic...
I did read Jack Crenshaw's chapter on sines and cosines today... All I can say is wikipedia contributors seem to know their integer math, as well, when it comes to Bresenham circle and line algorithms. That's where I translated C/C++? code into Verilog a few months ago. I rechecked the site and it has even more information now. I've an idea I'm going to try. Thanks for the input. I just need something that works first, then I can optimize based on that book. After all, who wouldn't listen to what a NASA engineer has to say?!
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

I modified Jack's square root algorithm on page 86 to work in Verilog. Takes 50 cycles and actually works on 2 16-bit numbers up to a value of 46,340 (for the formula SQRT(X^2+Y^2)), except I had to perform the last shift right 2x, not 1x like in his algorithm on the last line.

Jack's Code:

Code: Select all

unsigned short sqrt(unsigned long a){
unsigned long rem = 0;
unsigned long root = 0;
for(int i=0; i<16; i++){
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
root ++;
if(root <= rem){
rem -= root;
root++;
}
else
root--;
}
return (unsigned short)(root >> 1);
}
My copy of his work in Verilog. The variable 'a' is 32-bit.:

Code: Select all

.......//SQRT state machine

			SQRTINIT:
				state <= CALC3;
				
			CALC3:
				state <= CALC4;
			
			CALC4:
				state <= CALC5;
				
			CALC5:
				if (i<16)
					state <= CALC3;
					else state <= CALC6;
					
			CALC6:
				state <= WAIT;
............................................
//square root generator

		SQRTINIT:
			begin
				LGREADY <= 0;
				i <= 0;
				rem <= 0;
				Root <= 0;
				a <= Xs*Xs + Ys*Ys;
			end
			
		CALC3:
			begin
				Root <= Root << 1;
				rem <= ((rem << 2) + (a >> 30));
			end
			
		CALC4:
			begin
				a <= a << 2;
				Root <= Root + 1;
			end
			
		CALC5:
			begin
			i <= i + 1;
			if (Root <= rem) begin
				rem <= rem - Root;
				Root <= Root + 1;
			end
				else Root <= Root - 1;
			end			
				
		CALC6:
			Root <= Root >> 2;
At the conclusion of the chapter on page 87 he says:
Quote:
If there’s any one lesson to be learned from this chapter, it is this:
Never trust a person who merely hands you an equation.
So maybe he did this on purpose?

Looks like the ISE 14.7 has offloaded some work to the DSP48A1 tiles:
Attachments
output board utilizationafterSQRT.jpg
User avatar
MichaelM
Posts: 761
Joined: 23 Apr 2012
Location: Huntsville, AL

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by MichaelM »

EEyE:

That's cool. I was reading and thinking about the integer square root function that Dr. Crenshaw published this weekend. Really impressed that it can be implemented succintly in Verilog as your code shows.
Michael A.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

MichaelM wrote:
EEyE:

That's cool. I was reading and thinking about the integer square root function that Dr. Crenshaw published this weekend. Really impressed that it can be implemented succintly in Verilog as your code shows.
I hope to use it to control amplitude of the sin or cos table Xilinx has generated from LUTs and I've stored in BRAM.... Not sure yet. But I am borrowing the following .GIF from the sine wiki so the general idea can be seen, although I currently modify the phase for ellipse. A strictly horizontal or vertical ellipse requires amplitude modulation for the sin or cos.
Attachments
Circle_cos_sin.gif
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

I still haven't made any headway into generating a sine wave with a controllable amplitude from a LUT, so I decided to tackle another portion of the project which needs attention, i.e. how to resolve video ram contention with cpu/hardware accelerator access. Also, on this projects' software side of things, I am attempting to port Bruce's take on Bob Bishop's Mandelbrot program to 65Org16. I will start a new thread on that very soon...

Now as far as the video databus contention issue, I am not pleased when the cpu only has access during non active Hsync or non active Vsync. It's very slow and old hat, quite honestly.
Even though I'm not as good as some EE's out there that have experience with scanline buffers, FIFOs and dual port FPGA blockRAMs, I think I have an idea after doing more than a little bit of research and some testing.

The GS8320Z18 SyncRAM I'm using has a 4.0ns delay time in flow through mode which I currently use with a 150MHz pixel clock. So it is within the 250MHz limit in flow though mode. In pipelined mode the delay is spec'd at 2.5ns which would allow the device to run at 400MHz. The device is not hardwired selected for this mode, however I am currently running it in flow through mode at 300MHz. And it does seem to be working for some states of my hardware graphics accelerator, which is exciting to see since the goal would be to have interleaved access to the SyncRAM and it is possible to do back to back read and/or writes with this ZBT RAM. A post by a member named PK gave me more confidence this idea would work on this thread in other electronics forum. Lots of good links to follow from this guy!

So now I have to re-test 6 of the 8 functions this hardware generator has that plots directly to the syncRAM and try to fix up timing issues:
Line Generator............: not 100%
Rectangular fill............: not 100%
Pixel Plot...................: appears working
Read Pixel Color...........: appears working
Copy & Paste Rectangle: not working
Character Plot.............: not 100%

The project has 4 clocks generated utilizing the Spartan 6 PLL_ADV from an onboard 100MHz can oscillator:
1) 75MHz 65Org16 cpu/hardware accelerator/FPGA blockRAMs used for coordinate storage
2) 300MHz 2MBx16 Synchronous RAM (videoram)
3) 150MHz pixel clock to videoDAC and H/V Sync generator
4) 150MHz main signal to next Parallel Video Board (unused presently)

EDIT:
I realize I may be violating some timing spec's of the syncRAM since it's running 50MHz above spec, so all this is just a test at 1920x1080 with the realization this may not work at this speed.
Finally added in bypass cap's for the syncRAM with no change in behavior. (I suspect the properties of the board material are providing good enough capacitance for power supply filtering)
Tried changing syncRAM pin drive strength and slew rate from the FPGA with no better performance.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

This past experiment was a failure as expected...
I have reached the limitations, in my current knowledge of Verilog, with this board design. I'm still working on the Mandelbrot 65Org16 software on spare time which will take advantage of the current capabilities I've developed.

I'm currently looking at a new design where I will recycling the 4 SYncRAMs I'm currently using across 4 PVB's and put them onto 2 newly designed boards @1920x1080 @150MHz. Each board will have 2 x 2Mx18 4ns GSI SyncRAMs and a larger Spartan 6 IC's in a BGA style package. The FPGA will have 2 separate address and data buses for them. The old backplane will still be able to support up to 6 of these new PVB's because the I/O connector will be the same 96-pin 3-row style. The idea behind the 2 SyncRAM's is to make 2 possibilities an easy reality: 1) is for sprites. 2) is for buffer for the videoDAC.

I'm looking at a new PCB maunfacturer called eurocircuits. They've been mentioned here before in other threads. They're very reasonably priced and are capable of up to 16 layers. Also a great many board construction variables can be set by the maker, and finally checked by their ordering system. They favor the Eagle so I'll have to learn this program as I was used to EPCB.

What really turned me on to them is that since I'll be mounting my own BGA packages now, they offer custom stencils not only for the BGA packages but for the entire board for ~$50 for a 4"x3" board. One can order a stencil for the top and/or the bottom!
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by GARTHWILSON »

ElEctric_EyE wrote:
They favor the Eagle so I'll have to learn this program as I was used to EPCB.
Almost all board houses accept industry-standard gerber files; so any CAD that produces them will work. That would be any except the ones done by board houses that want to lock you into their service alone. If you supply gerber files (gerber files for the copper layers, excellon for drill files), the board house won't care what CAD you used, and they don't have to have even heard of it.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board

Post by ElEctric_EyE »

Garth, check out their PCB guidelines on page 3 of their PCB Design PDF link.

I was about to look into OrCad until I did more research into this PCB manufacturing house.
Post Reply