1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
Initial 3-D Vector animation success! Not so fast as it does the plotting&clearing during the 1/60th of sec vertical refresh.
Will attach U-tube video link soon:
Here it is!
Will attach U-tube video link soon:
Here it is!
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
Got 2 more boards outputting the same triangle animation. Small problem I've noticed now that the display is dynamic: The other 2 boards are drawing at least 1/4 speed. I've surmised this is because the boards' hsync&vsync are delayed in order to line up all 3 parallel pixel streams in hardware, and the software is waiting for vsync to go high in order to start drawing.
Coincidentally, this was going to be my next area of focus, i.e. how to read/write to the videoRAM when the videoRAM is simultaneously outputting data to the videoDAC. I will have to do some research, but I think Arlet used a dual ported blockRAM the depth of which was one scan line. Does this seem correct? A FIFO buffer and then the DP blockRAM. I only have 2 RAMB16BWERs left. I may have to sacrifice some of the 16 16Kx11 X&Y scratchpad RAMs.
But first, I'm going to have some 3D Vector fun with just the one board in another thread.
Coincidentally, this was going to be my next area of focus, i.e. how to read/write to the videoRAM when the videoRAM is simultaneously outputting data to the videoDAC. I will have to do some research, but I think Arlet used a dual ported blockRAM the depth of which was one scan line. Does this seem correct? A FIFO buffer and then the DP blockRAM. I only have 2 RAMB16BWERs left. I may have to sacrifice some of the 16 16Kx11 X&Y scratchpad RAMs.
But first, I'm going to have some 3D Vector fun with just the one board in another thread.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
The other 2 boards are drawing very slow. No doubt due to the way I'm syncing them up. I'm not pleased with this...
I've added a 32-bit 148.5MHz pixel cycle timer so I can try to analyze what is going on. Since the cpu is run at 1/2 speed it can analyze (it's own) cpu cycles as well. I calibrated it for a NOP and got back a value of 4 cycles, then tried an additional NOP and got a hex reading of 8 cycles, which is correct.
In the video the fast spinning green cube is the OUPUT PVB and uses its counter to count and display the cycles for those 2 NOPs, just to calibrate the identical counters in the other 2 pass-thru boards.
This OUTPUT board sends it's HSYNC, VSYNC and pixel clock to the 2nd board... This 2nd board is outputting the yellow pixels and its 32-bit counter is measuring the time VSYNC is high, the vertical retrace. The hex value is extremely large and interestingly, the HEX values cycle from 04308442, 04308442, 0438448. I say it's interesting because there are 3 different values, maybe for 3 different boards?
I've not quite gotten the 3rd board up to speed yet... In the video it is just displaying data from one of the sinwave BRAMs and rotating the red cube at an even slower rate although all boards are running with extremely small delays.
Sorry for the bad quality, Video is ==> Here
The blinking in the red and yellow boards is not seen in real life...
I've added a 32-bit 148.5MHz pixel cycle timer so I can try to analyze what is going on. Since the cpu is run at 1/2 speed it can analyze (it's own) cpu cycles as well. I calibrated it for a NOP and got back a value of 4 cycles, then tried an additional NOP and got a hex reading of 8 cycles, which is correct.
In the video the fast spinning green cube is the OUPUT PVB and uses its counter to count and display the cycles for those 2 NOPs, just to calibrate the identical counters in the other 2 pass-thru boards.
This OUTPUT board sends it's HSYNC, VSYNC and pixel clock to the 2nd board... This 2nd board is outputting the yellow pixels and its 32-bit counter is measuring the time VSYNC is high, the vertical retrace. The hex value is extremely large and interestingly, the HEX values cycle from 04308442, 04308442, 0438448. I say it's interesting because there are 3 different values, maybe for 3 different boards?
I've not quite gotten the 3rd board up to speed yet... In the video it is just displaying data from one of the sinwave BRAMs and rotating the red cube at an even slower rate although all boards are running with extremely small delays.
Sorry for the bad quality, Video is ==> Here
The blinking in the red and yellow boards is not seen in real life...
Last edited by ElEctric_EyE on Wed Apr 16, 2014 8:46 pm, edited 1 time in total.
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
Hi EEye
that video link should probably be https://www.youtube.com/watch?v=dz_CkFuHWXg or https://www.youtube.com/user/UltimateRoadWarrior9
Cheers
Ed
that video link should probably be https://www.youtube.com/watch?v=dz_CkFuHWXg or https://www.youtube.com/user/UltimateRoadWarrior9
Cheers
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
Thanks Ed! It has been a long day here...
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
I went off on a tangent after reading about the amiga video chips, namely the Amiga OCS...
After 4 weeks of attempting to digitally mix a 960x540 RGB video stream generated from PVB2 into a 1920x1080 output from PVB1, today I had to finally give up.
Using a 74.25MHz pixel clock for PVB1 (1/2 derived from PVB2 148.5MHz pixel clock), and arriving at the same horizontal frequency as PVB2, the horizontal pixels were stretched as expected, but the vertical pixels were not... I threw alot of effort into trying to stretch the vertical, but nothing would 'stick'. The vertical video was doubled, i.e. 2 identical pictures. I tried making a scanline doubler unsuccessfully. Since I can't see what is happening in simulation since this involves more than 1 board, I have to quit at this point. All PVBoards will have to output the same video frequencies for now.
After 4 weeks of attempting to digitally mix a 960x540 RGB video stream generated from PVB2 into a 1920x1080 output from PVB1, today I had to finally give up.
Using a 74.25MHz pixel clock for PVB1 (1/2 derived from PVB2 148.5MHz pixel clock), and arriving at the same horizontal frequency as PVB2, the horizontal pixels were stretched as expected, but the vertical pixels were not... I threw alot of effort into trying to stretch the vertical, but nothing would 'stick'. The vertical video was doubled, i.e. 2 identical pictures. I tried making a scanline doubler unsuccessfully. Since I can't see what is happening in simulation since this involves more than 1 board, I have to quit at this point. All PVBoards will have to output the same video frequencies for now.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
This morning I finally tracked down the last of the problems and now have successful READ from the SyncRAM. In this instance, the cpu sends the X-coordinate then the Y-coordinate. On reception of the Y coordinate, the graphics accelerator halts the cpu and puts the pixel color info into a read-only register and then frees up the cpu.
All of the accelerator functions are performed after it receives a Y coordinate.
So far the hardware graphics accelerator can perform:
Draw Line
Copy/Paste
Pixel Plot
Fill (simple rectangular)
8x8 Character Plot (background/foreground colors)
Pixel Color Read
What's strange is I had early success on the blitter, so I assumed I could take the READ part of that state machine and use it for a another READ only state machine, as opposed to a READ-WRITE-READ-WRITE... This was not the case and more observation is needed to fully comprehend why this is so. 3 weeks spent on this one!
All of the accelerator functions are performed after it receives a Y coordinate.
So far the hardware graphics accelerator can perform:
Draw Line
Copy/Paste
Pixel Plot
Fill (simple rectangular)
8x8 Character Plot (background/foreground colors)
Pixel Color Read
What's strange is I had early success on the blitter, so I assumed I could take the READ part of that state machine and use it for a another READ only state machine, as opposed to a READ-WRITE-READ-WRITE... This was not the case and more observation is needed to fully comprehend why this is so. 3 weeks spent on this one!
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
I'll continue developing on the four populated boards I currently have, but I've been wanting to delve into the BGA world of Xilinx FPGAs and design a new more capable board. Earlier I've had my eye on a larger Spartan 6 1mm 256-pin BGA. I forget the price, something like $50+.
But recently I've seen the 28nm 256-pin BGA Xilinx Artix FPGA for sale for $35 for the slowest version: XC7A35T-1FTG256C. This may be the path I choose, but I do not see it available in ISE. Will do more research... New thread here.
BTW, here are the resources currently used (from that thread). The project has almost Max'd this LX9:
But recently I've seen the 28nm 256-pin BGA Xilinx Artix FPGA for sale for $35 for the slowest version: XC7A35T-1FTG256C. This may be the path I choose, but I do not see it available in ISE. Will do more research... New thread here.
BTW, here are the resources currently used (from that thread). The project has almost Max'd this LX9:
Last edited by ElEctric_EyE on Mon Jul 28, 2014 7:54 pm, edited 1 time in total.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
So the project had 2 16Kx16 BRAMs to store X & Y coordinates, only single port mode was used. The cpu would send an 10-bit 'phase' value to the sin/cos LUT. The sin/cos LUT outputted 10-bit data back to the cpu.
The software stores 2 sinewaves into 2 BRAMs. A sin shifted by 1/4 (i.e. 256/1024) is actually cosine. These (sin,cos) (X,Y) coordinates will plot a circle. Modifying the phases of either the X register or the Y register will make an ellipse:
Now I have added a state machine to the hardware accelerator that does the above in hardware. In addition I've added another LUT so that the X & Y values can be computed simultaneously and sent to each BRAM. (I love hardware!)
Now the software looks something like this, although different offsets. It auto starts and puts the cpu in RDY upon receipt of the Yphase value:
It was nice to realize after coding the Verilog for DP BRAM initially, that ISE14.7 actually recognized I was trying to infer DP BRAMs. So on port A the cpu can read and write to the 16Kx16 X & Y BRAMs, on port B the hardware accelerator also can read and write simultaneously. Granted, the cpu is halted when the hardware accelerator is reading or writing, but the dual ports really make a design easy.
On the top_level the X & Y SCRAMs (scratchpad BRAMs) are instantiated like this:
The actual SCRAM code:
EDIT: Uses just abit more resources. Next goal is to modify the size of the output which will require adding a certain value in 2 quadrants, and subtracting that value in the other 2 quadrants.
The software stores 2 sinewaves into 2 BRAMs. A sin shifted by 1/4 (i.e. 256/1024) is actually cosine. These (sin,cos) (X,Y) coordinates will plot a circle. Modifying the phases of either the X register or the Y register will make an ellipse:
Code: Select all
LDA #650
STA XORG
LDA #450
STA YORG
LDWi $0000
LDX #0
LDY #256
sinwave1 STX phase
LDA sinout
CLC
ADC XORG
STAaw scratchx1
STY phase
LDA sinout
CLC
ADC YORG
STAaw scratchy1
CPX #1024
BMI XNZ1
LDX #0
XNZ1 INX
CPY #1024
BMI YNZ1
LDY #0
YNZ1 INY
INW
CPWi $0400
BNE sinwave1Now the software looks something like this, although different offsets. It auto starts and puts the cpu in RDY upon receipt of the Yphase value:
Code: Select all
LDA #0 ;0-15 1K blocks
STA Xblock
STA Yblock
LDA #20 ;pixel offset values for screen placement
STA Xoffset
STA Yoffset
LDA #0
STA Xphase
LDA #256
STA Yphase
LDA #1
STA Xblock
STA Yblock
LDA #1920-20-1024
STA Xoffset
LDA #20
STA Yoffset
LDA #0 ;10-bit phase resolution
STA Xphase
LDA #256
STA YphaseOn the top_level the X & Y SCRAMs (scratchpad BRAMs) are instantiated like this:
Code: Select all
SCRAM SCRATCHRAMX ( .clkA(clk),
.weA(SR1WE),
.rstA(SR1CS),
.addrA(cpuAB [13:0]),
.dinA(cpuDO [15:0]),
.doutA(cpuDI [15:0]),
.clkB(clk),
.weB(xwe),
.rstB(1'b0),
.addrB(addrx [13:0]),
.dinB(xdata [SINaw:0]),
.doutB(xdataout [SINaw:0]));
SCRAM SCRATCHRAMY ( .clkA(clk),
.weA(SR2WE),
.rstA(SR2CS),
.addrA(cpuAB [13:0]),
.dinA(cpuDO [15:0]),
.doutA(cpuDI [15:0]),
.clkB(clk),
.weB(ywe),
.rstB(1'b0),
.addrB(addry [13:0]),
.dinB(ydata [SINaw:0]),
.doutB(ydataout [SINaw:0]));Code: Select all
// 16Kx11 scratch page FPGA dualport blockRAM
`timescale 1ns / 1ps
module SCRAM ( input clkA, clkB,
input weA, weB,
input rstA, rstB,
input [13:0] addrA, addrB,
input [15:0] dinA, dinB,
output reg [15:0] doutA, doutB
);
reg [10:0] RAM [16383:0];
always @(posedge clkA)
if (weA)
RAM[addrA] <= dinA;
else
doutA <= rstA ? 16'h0000 : RAM[addrA];
always @(posedge clkB)
if (weB)
RAM[addrB] <= dinB;
else
doutB <= rstB ? 16'h0000 : RAM[addrB];
endmodule-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
Resizing a circle/ellipse using the sin/cos LUT's only work when shifting values to the right, which is a divide by 2, 4, 8 etc. This is unacceptable because only ~10 sizes of circles are realizeable for 10 bits in 1920x1080 resolution. Duh, right?
So, after much time and effort, the sin/cos LUT's are not going to work for the limited BRAM space inside the FPGA.
I had early success with the hardware plotting of a circle using 8 octants with Bresenham circle in hardware (ie Verilog). Plotting directly to the video SyncRAM in (X,Y) Cartesian coordinates was easy with the algorithm I adapted from Wikipedia, but this was not sending coordinates to the BRAMs. Looking forward, adapting the state machine where it used to plot 8 octants for immediate plotting into linear addresses for storage is going to be a challenge.
Tomorrow I'll post a 1920x1080 static pic (and more explanation)of the 10 sin waves in blue, cosine waves in red and circles in green. Right now most is in purple and I've got no time left tonight.
So, after much time and effort, the sin/cos LUT's are not going to work for the limited BRAM space inside the FPGA.
I had early success with the hardware plotting of a circle using 8 octants with Bresenham circle in hardware (ie Verilog). Plotting directly to the video SyncRAM in (X,Y) Cartesian coordinates was easy with the algorithm I adapted from Wikipedia, but this was not sending coordinates to the BRAMs. Looking forward, adapting the state machine where it used to plot 8 octants for immediate plotting into linear addresses for storage is going to be a challenge.
Tomorrow I'll post a 1920x1080 static pic (and more explanation)of the 10 sin waves in blue, cosine waves in red and circles in green. Right now most is in purple and I've got no time left tonight.
Last edited by ElEctric_EyE on Wed Aug 06, 2014 1:02 am, edited 1 time in total.
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
EEyE:
Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
Michael A.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
MichaelM wrote:
...Have you looked at Jack Crenshaw's Real-Time Math Toolkit?
Colors are correct now.
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
MichaelM wrote:
EEyE:
Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
Have you looked at Jack Crenshaw's Real-Time Math Toolkit? He has a very readable writing style for many practical mathematical tools. In particular, he has a good description of how to use range reduction to extend the practical use of sine/cosine tables. It may be that his book may offer a way to extend your circle and ellipses so that you may continue to use a look up table far at least a portion of your plotting routine.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
GARTHWILSON wrote:
...Discussed at viewtopic.php?f=2&t=1971. EEye was the last to post on that topic...
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 1080p HD Video on custom FPGA/VDAC/2MBx18 SyncRAM board
I modified Jack's square root algorithm on page 86 to work in Verilog. Takes 50 cycles and actually works on 2 16-bit numbers up to a value of 46,340 (for the formula SQRT(X^2+Y^2)), except I had to perform the last shift right 2x, not 1x like in his algorithm on the last line.
Jack's Code:
My copy of his work in Verilog. The variable 'a' is 32-bit.:
At the conclusion of the chapter on page 87 he says: So maybe he did this on purpose?
Looks like the ISE 14.7 has offloaded some work to the DSP48A1 tiles:
Jack's Code:
Code: Select all
unsigned short sqrt(unsigned long a){
unsigned long rem = 0;
unsigned long root = 0;
for(int i=0; i<16; i++){
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
root ++;
if(root <= rem){
rem -= root;
root++;
}
else
root--;
}
return (unsigned short)(root >> 1);
}Code: Select all
.......//SQRT state machine
SQRTINIT:
state <= CALC3;
CALC3:
state <= CALC4;
CALC4:
state <= CALC5;
CALC5:
if (i<16)
state <= CALC3;
else state <= CALC6;
CALC6:
state <= WAIT;
............................................
//square root generator
SQRTINIT:
begin
LGREADY <= 0;
i <= 0;
rem <= 0;
Root <= 0;
a <= Xs*Xs + Ys*Ys;
end
CALC3:
begin
Root <= Root << 1;
rem <= ((rem << 2) + (a >> 30));
end
CALC4:
begin
a <= a << 2;
Root <= Root + 1;
end
CALC5:
begin
i <= i + 1;
if (Root <= rem) begin
rem <= rem - Root;
Root <= Root + 1;
end
else Root <= Root - 1;
end
CALC6:
Root <= Root >> 2;Quote:
If there’s any one lesson to be learned from this chapter, it is this:
Never trust a person who merely hands you an equation.
Never trust a person who merely hands you an equation.
Looks like the ISE 14.7 has offloaded some work to the DSP48A1 tiles: