6502.org :: View topic - Highest speed graphic functions in a Spartan 6 XC6SLX9

6502.org http://forum.6502.org/

Highest speed graphic functions in a Spartan 6 XC6SLX9 http://forum.6502.org/viewtopic.php?f=10&t=2329	Page 1 of 4

Author:	ElEctric_EyE [ Fri Nov 09, 2012 9:32 pm ]
Post subject:	Highest speed graphic functions in a Spartan 6 XC6SLX9
Towards the end of the thread "HDL Implementation of Video Generator Test for 16-bit PVB's", I started experimenting with drawing lines by putting coordinates in a Spartan 6 BRAM, using a resolution of 640x480. I was able to successfullyput X coordinates in one port and Y coordinates in the other port of these 1Kx16 dual-port BlockRAMs. A quick test I was doing for simplicity was putting X values from the horizontal pixel counter into X and Y for a diagonal line. It passed the "smell test" as I saw a solid line with no random pixels on the PVB, after some trial and error. This verified not only the PVB board design and noise issues after assembly, but also timing of the Verilog code I intend on building upon. Arlet had an idea, at the end of the thread, for generating lines during the horizontal blank on the fly, but I am unsure how many lines would be able to be drawn during this time period. However, this would have the benefit of using the BRAMs for only the endpoints' X,Y storage, as opposed to full coordinate storage for an entire line, so potentially hundreds of lines could be drawn. Today, I found some good info on Bresenham's line algorithm in wikipedia. The C? code looks like this: Code: function line(x0, y0, x1, y1) dx := abs(x1-x0) dy := abs(y1-y0) if x0 < x1 then sx := 1 else sx := -1 if y0 < y1 then sy := 1 else sy := -1 err := dx-dy loop setPixel(x0,y0) if x0 = x1 and y0 = y1 exit loop e2 := 2*err if e2 > -dy then err := err - dy x0 := x0 + sx end if if e2 < dx then err := err + dx y0 := y0 + sy end if end loop Looks very similar to Verilog. I intend to try it soon. Last note on my intentions: Looking at the available BRAM resources in the Spartan 6, I should be able to draw 32 full length diagonal lines using the 32 RAMB16BWERs and another 32 lines using the 64 RAMB8BWERs. I may have something worthwhile pursuing here... Maybe in the future I'll be able to optimize BRAM usage based on lengths of each line.

Author:

White Flame [ Sat Nov 10, 2012 6:01 am ]

Post subject:

Re: Highest speed graphic functions in a Spartan 6 XC6SLX9

If I understand this correctly, this is talking about rerendering a single scanline's worth of bitmap in the horizontal blanking period. Lines that are more horizontal than vertical would take longer during that blank, simply because there are more pixels to render for that scanline, whereas a more vertical line would generally only have to draw 1 pixel per scanline. The overall length of the line doesn't matter, you'd be bounded by the dx/dy ratios being large per scanline that the line drawing is active.

The state that you'd have to maintain per active line isn't just the x0,y0,x1,y1 coords, but also includes sx or sy, and err in the algorithm you listed.

For those that don't understand Bresenham, consider a typical line drawing algorithm first:

Start at x0, y0.
Calculate dx and dy.
If dy > dx, you're a more vertical line, else you're a more horizontal line.
Vertical: Loop over y0->y1, at each step calculate x0 += dx/dy, plot pixel at (x0,y0).
Horizontal: Loop over x0->x1, at each step calculate y0 += dy/dx, plot pixel at (x0,y0).

Now, this requires division with a fixed (or floating) point result. What Bresenham does is skip the division step, and considers the numerator & denominator separately to keep everything as integers.

Let's take the vertical-line approach, as that's the direction of the raster scan.

Precalculate sx = the sign of dx, either -1 or +1.
Loop over y0->y1
Instead of x0 += dx/dy, track both x0 and a separate numerator (called "err" in the posted algo)
num += dx.
If num >= dy, then our fractional component is finally greater than 1, so carry the fractional surplus over to our main x0.
- num -= dy (the denominator)
- x0 += sx

Both Bresenham and fixed-point methods use separate loops for lines that are more horizontal or more vertical, and some even use up to 8 different loops for the 4 cardinal directions simplified, and then the 4 angular quadrants. But in general, you want to swap coordinates and use absolute values in there to just deal with distances and not directions as much as possible.

A way to get around this is to make the 'if' step in the above list a 'while' loop. This means that while vertically traversing a screen, you could draw more than 1 pixel for more horizontal lines, the worst case being a ton of purely horizontal lines that span the screen and all overlap in the same scanline. You also need to make sure that this iteration stops once x1 is reached, so you don't overshoot your intended endpoint while accumulating a potentially large dx/dy.

So, Bresenham requires more intermediate state variables per line, and is an add, test, conditional add & sub per step.

Fixed point, on the other hand, could precalculate all its dx/dy deltas during vblank. Then each step is just x0 += dx, and the fixed-point addition easily tracks both the integer & fractional portions of the coordinate. However, the variables do need to be twice as long to carry the required precision.

In order to draw horizontal lines, you'd need to remember both x0 and the new x0'=x0+dx.

If x0' is still the same as x0, then just draw the 1 pixel. This is a vertical portion of the line.
If x0' changed, then draw all the pixels between x0 and x0', not including x0.

As I'm not a hardware guy, I don't know which loop would be cheaper/faster in hardware to implement, but these are the very basic steps between the two options.

(Note also that in order to draw lines well-balanced, you need to start off with a half-pixel offset fudge:

Code:

0#########    Going from pixel-corner to pixel-corner, no fudge
          #1

0#####        Going from pixel-center to pixel-center, with a half-pixel vertical fudge
      #####1

In Bresenham, this means the numerator starts off with half the appropriate delta instead of at zero; in fixed point this means your starting coordinate gets a +0.5 offset, so that it overflows sooner.

Also, if you want smooth motion for 3d type stuff, you really want sub-pixel precision coordinates for the endpoints given to the renderer, which will certainly demand fixed point and not work well with Bresenham.)

Author:	Arlet [ Sat Nov 10, 2012 12:53 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
When generating lines on the fly, three things are needed: first generate a list of all lines that intersect the current line. An easy way is to cycle through all the lines, and determine whether current line count falls between Y0 and Y1. A simple implementation would take 1 cycle per line, and we need to be done within one hsync period. For example, if we have a 1024x768 display @ 60Hz, a line takes about 20 usecs. If we run the state machine at 100 MHz, this gives us a limit of 2000 lines. The second part is determining the X coordinates where the line intersects the scanline. This should be done in a single cycle per line, using either a multiplier, or by incrementally updating all active lines. This can be done in a pipeline, and would still leave us with a maximum of 2000 lines. The third part is drawing all the pixels in the linebuffer. A simple implementation takes 1 cycle/pixel, so this puts an upper limit of 2000 active pixels per line. This is only a problem if you draw many overlapping horizontal lines, which usually you should be able to avoid, especially if you also have a CPU that has some control over what to draw.

Author:	teamtempest [ Sat Nov 10, 2012 5:21 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Yow! Okay, there are things I find a little fuzzy about all three preceeding posts... Regarding the posted version of Bresenham's line-drawing algorithm, it probably works, but it sure doesn't look optimal. For one thing there's a multiplication in it. I know that particular multiplication could be replaced by a shift, but in general Bresenham's uses only addition and subtraction in the main loop. Given two co-ordinates x0,y0 and x1,y1 I come up with several possibilities: 1) dx = dy = 0 : a point (not a line after all - plot it and be done) 2) dx = dy <> 0 : a diagonal line ('dx' or 'dy' iterations, no need to track error term) 3) dx <> 0, dy == 0 : a horizontal line ('dx' iterations, no need to track error term) 4) dx = 0, dy <> 0 : a vertical line ('dy' iterations, no need to track error term) 5) dx > dy <> 0 : a line more horizontal than vertical ('dx' iterations, track error term to learn when to move up or down) 6) dx < dy <> 0 : a line more vertical than horizontal ('dy' iterations, track error term to learn when to move left or right) A case can be made for separating all these out during initial setup and giving each one its own drawing loop. As each such loop is simplified relative to the posted code, overall speed increases. In scenarios 2-6 it also helps to swap co-ordinates if necessary so that x-major lines are always drawn in one direction, say left-to-right, and y-major lines are always drawn, say, top-to-bottom (whatever the hardware makes easiest, actually). This reduces the number of possible loops and, more subtly, guarantees that on a raster display a line from x0,y0 to x1,y1 always exactly overlaps a line drawn from x1,y1 to x0,y0. Code: For example, if we have a 1024x768 display @ 60Hz, a line takes about 20 usecs. If we run the state machine at 100 MHz, this gives us a limit of 2000 lines. This seems to be about deciding during the horizontal blanking interval which pixels on a scanline to light up depending on which drawn lines intersect that scanline. So the calculation is re-run during each horizontal blanking interval in preparation for the next scanline? Mmm, but if the display is only 1024 pixels wide, how does a 2000 line limit apply? Or is that for the 1024x786 display as a whole?

Author:	ElEctric_EyE [ Sat Nov 10, 2012 5:58 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
teamtempest wrote: ...Mmm, but if the display is only 1024 pixels wide, how does a 2000 line limit apply? Or is that for the 1024x786 display as a whole? Hi TT, I think Arlet was just pointing the max # of lines based on timings alone. I agree with you, no need for error tracking at this stage.

Author:	White Flame [ Sat Nov 10, 2012 6:04 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
If you could double-buffer the scanline bitmap buffer, that would open up a ton of timing, giving you an entire line's worth of time for drawing instead of just during the horizontal sync.

Author:	Arlet [ Sat Nov 10, 2012 6:22 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
The 20 usecs is for the whole line. I use 'hsync period' as the time between two hsync pulses.

Author:	Arlet [ Sat Nov 10, 2012 6:35 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
teamtempest wrote: This seems to be about deciding during the horizontal blanking interval which pixels on a scanline to light up depending on which drawn lines intersect that scanline. So the calculation is re-run during each horizontal blanking interval in preparation for the next scanline? Mmm, but if the display is only 1024 pixels wide, how does a 2000 line limit apply? Or is that for the 1024x786 display as a whole? My idea was that you keep a list of all the lines, and that for each line you keep parameters like Y0 (starting line) and Y1 (end line), or Y0 and H (height). During display scanning, when current line is Y. You go through the entire list of lines, looking for all lines that span the current Y coordinate. Note that I assume the list isn't presorted by Y coordinate, otherwise you can do it faster.

Author:	Arlet [ Sun Nov 11, 2012 10:49 am ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Of course, the bigger problem is setting the overall goals for this project. While a "graphics accelerator" sounds like a goal, it is too broad to suggest an optimal implementation, risking a first step on the Wheel of Reincarnation (thanks to Ed for the original pointer in another thread).

Author:	ElEctric_EyE [ Sun Nov 11, 2012 1:10 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Arlet wrote: Of course, the bigger problem is setting the overall goals for this project... The PVBs have 2 PROMs for the FPGA, so I guess there could be 2 goals. 1) Vector based graphics in one PROM. 2) Raster based graphics in the other.

Author:	Arlet [ Sun Nov 11, 2012 2:46 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Yes, but I meant total goals. For instance, a vector based system can't display photographic images. So, if your goal is to create a unit that can display a user interface with a bunch of lines, but also some bitmaps and some text, a vector based system isn't really suited. If your goal is to make a Vectrex clone, it would be perfect. If you just want to play around, anything fun is suitable. If you want to go for maximum versatility, I'd recommend a RAM based bitmap. With a bitmapped display, you can use a general purpose CPU to draw lines, but you still have to option to offload some of the work to a graphics accelerator.

Author:	ElEctric_EyE [ Sun Nov 11, 2012 3:46 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Here's one possible scenario: The first Raster based board (after the controller board, itself most likely going to be Raster based, as initial RGB data will be streaming out from a hard drive or a camera module) would output its RGB signals into PVB #1 which could be a Vector based board. At this point the Vector based board has data ready to go to the display, but data coming in will have to go through the video mixer... EDIT: I've heavily edited this post throughout the day, as I had a very busy day at my job I could not devote the time initially necessary for a response worthy of consideration. I apologize to those that have followed this post throughout today.

Author:	ElEctric_EyE [ Fri Nov 16, 2012 1:34 am ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
I still pursue line drawing algorithms in HDL. It is a natural first step for my goals and physical hardware construction is easy enough, but for me the Verilog is another more difficult issue... I've found a similar project authored by Frank Buss, also on Github, although his Line Drawing routine is in VHDL and commands are sent in through serial SPI. He has a video although it shows slow operation due to the serial SPI... I'm still trying to convert his VHDL line routine to Verilog, almost there.

Author:	White Flame [ Fri Nov 16, 2012 5:36 am ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
ElEctric_EyE: I did technically list all the algorithmic info you need in my post, though I admit it's densely packed and not very tutorial if you're learning it fresh. The code you just posted works on a frame buffer, and has separate branches for vertical & horizontal lines, which will not work well for your scanline buffer. You need to switch to a vertical-only algorithm, invoking it once per scanline per line-drawing segment and saving per-line intermediate state in BRAM. Buss's algorithm only keeps one intermediate state active at a time, because it only draws one line at a time. You're looking at drawing many lines "in parallel", round-robining through all of them once per scanline. I (and others) can explain further, to get you to the point where you can write the Verilog from scratch. Properly understanding the steps & options is far more effective than trying to port copy/pasted code that may or may not work toward your goals. Is there any aspect in particular that's the most opaque at the moment?

Author:	ElEctric_EyE [ Fri Nov 16, 2012 12:47 pm ]
Post subject:	Re: Highest speed graphic functions in a Spartan 6 XC6SLX9
Hi WF, I have read you first post more than once and it is packed with info. And thanks for responding again. I need to do more research, but I figured I could get any line drawing algorithm, get the outputted X and Y coordinates into a BRAM, then sort them by Y. This is probably not the most efficient method though. Have you ever done something similar?

Page 1 of 4	All times are UTC
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/