Video/sprites for the 65org16 Dev. board

Arlet · Post by **Arlet** » Tue Apr 03, 2012 4:23 pm

Because the main thread for the dev board was getting big, I decided to start a new topic specifically for video output.

After a bit of thinking, I came up with this preliminary design for a video output system, useful for implementing games. I wanted something that would be efficient, flexible, and not too hard to write in Verilog. I also wanted to be able to display bitmaps, transparency, overlapping windows, tiles and sprites, but I didn't want a whole bunch of separate modes and systems that would make it very complex.

So, I came up with the following:

1. One block RAM will be used to hold as a big list of items that are shown on the screen. Each item is a rectangular bitmap or arbitrary size. For each item, it will store X/Y coordinates, and the item number. I was planning to use 8 bits for X, 8 bits for Y, and 8 bits for the item #. The remaining 8 bits are reserved for later.

So, for instance, if item #0 represents a 64x80 pixel tree, you could make a table that says:

Code: Select all

(100,100,#0)
(200,100,#0)
(300,200,#0)

and it would display 3 trees on the screen at different coordinates. These could be used as a background, similar to what tile based rendering engines use. The difference is that my system would support fewer (but bigger) tiles, and they wouldn't have to be in a grid. Since I have 32 bits per object, a single block RAM could store 512 of these objects. In case that's not enough, a second block RAM could be used. Since each object stores the X/Y coordinates, objects could be static background tiles, as well as foreground sprites.

The objects support transparency (later maybe also other effects such as shadow/fog/and similar color manipulations), so they can be overlapped to create extra effects. By overlapping the trees, you could make a nice looking forest, for instance. Or you could make a foreground layer and a background layer, and scroll them at different speeds to create a 3D parallax effect. Of course, the rendering time and bandwidth usage will go up depending on the overlap. Animation, such as a running hero could be accomplished simply by changing the item # in the table. For each different number, a different image would be displayed in the same position, and it would only require 1 byte to be written.

2. The description of each possible object is stored in a second table, indexed by item number. So, for our tree item #0, this table will hold:

Code: Select all

0: width=64, stride=64,height=80, base=$1239A

Pixel (x,y) of the image will be stored in SDRAM at location: base + x + y*stride. The 'stride' can be different from the width, so you can take subsections of a larger bitmap. This allows scrolling inside an object, as well as different forms of animation. You could for instance have a 64x200 image of a pipe, and use it to create 64x50, 64x100 and 64x150 sub-images. It's also possible to create animations by changing the 'base' parameter. So, our running hero could remain item #1, but by point the 'base' to different parts of the bitmap storage, it could still be animated. Of course, if item #1 is used in more than one place on the screen, they would all be animated synchronously. This table would require 64 bits per item, so we could use a single block RAM, 32 bits wide, and use 2 accesses, or use 2 block RAMs * 32 bit wide. Using two block RAMs would also mean we could have 512 different objects, instead of 256. All bitmaps are stored in SDRAM. This provides plenty of space, as well as the option to manipulate the bitmaps pixel by pixel. A large object could be used as a canvas for instance, that you could draw lines on, or render text.

I'm trying to avoid artificial limitations, so there's no 8 sprite/scanline limit, for instance. Bandwidth and rendering time vary with total number of objects, objects per scanline, object size, and amount of overlap. If you try to do too much, the engine will just run out of time, and start dropping objects. It's up to the programmer to keep an eye on that.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Apr 03, 2012 9:30 pm

Ooh, this is sounding good!

I am thinking the cheapest way to get others involved, after V1.2 of the DevBoard is complete, is to just sell the bare board at a no profit cost of $33US to current members of 6502.org only. I'm sure there's at least a few people out there in our community that are interested in this journey...

Arlet · Post by **Arlet** » Wed Apr 04, 2012 4:40 am

I think I need some extra bitmap parameters to make it even more versatile.Per bitmap in SDRAM I want to know:

- Base address
- bitmap width (replaces 'stride')
- bitmap height
- screen width
- screen height
- X offset in bitmap
- Y offset in bitmap

It's a lot of parameters, but using these, you can do some nifty tricks, especially when you specify a subsection that crosses the edge of the original. Using the bitmap width and height, the bitmap is first turned into infinite size by tiling. Then, out of this infinite image, a cropped version is cut that will be displayed on the screen. This means that you could for instance define a 64x64 pixel texture bitmap, but "cut out" a 640x128 version of that to display on the screen. It would be automatically repeated 10x2 times to fill the whole area. Now, by adjusting the X/Y offsets, you can make it scroll horizontally and vertically. You could also make a 1x1 blue bitmap, and turn it into a piece of sky.

Similarly, you could define a huge 1440x480 bitmap in SDRAM, and show a 720x480 subsection that you could scroll sideways. While you are scrolling, the CPU can modify the sections that are off-screen.

Arlet · Post by **Arlet** » Wed Apr 04, 2012 7:50 pm

I started on the Verilog code for this design. This first version doesn't support the extended parameters, and doesn't even support the 'stride' parameter. Instead, the sprite in the SDRAM must be layed out in a linear fashion. So, if the sprite is 64 pixels wide, the first 64 words in SDRAM form the top line, and the next 64 words form the second line, and so on. Also, the screen is still set up for non-interlaced, with pixels doubled up in X direction to create a 256x200 resolution in 8 bit grayscale. Transparency is also not yet supported.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/sprites1.jpg

This is the code to draw a sprite on the screen:

Code: Select all

        ldx     #30             ; X coordinate
        ldy     #30             ; Y coordinate
        lda     #64             ; Width and Height

        stx     $c000          ; sprite 0:X
        sty     $c001          ; sprite 0:Y
        ldy     #1                 ;
        sty     $c003           ; this bit enables the sprite

        sta     $c800           ; width
        sta     $c801           ; height

From $C000-$C800 is a table for the 512 sprites, with one byte for X, one byte for Y, one for sprite number, and a spare byte, of which the LSB is now used to enable drawing. Technically, this isn't really necessary, because you can park the sprites off-screen by using a big Y value for instance, but this is more convenient right now.

Between $C800 and $D000 is a table that holds width and height.

There's still a bug in the code. When two sprites are on the same scanline, the synchronization between the pixel coordinates and the data from the SDRAM pipeline gets messed up, and pixel data ends up slightly shifted. You can see the example below, where the character on the right is out of alignment.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/sprite2.jpg

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Apr 05, 2012 10:08 pm

If there was another sprite next the the second one, would it be offset twice as bad?

512 sprites?! Awesome. I imagine it would be difficult to make some sort of collision register?

You've got some great things planned. I'm excited, even if it happens to be low-res. It's like travelling back in time to the old days, only now we have a big bag of goodies!!!

Arlet · Post by **Arlet** » Fri Apr 06, 2012 4:48 am

ElEctric_EyE wrote:

If there was another sprite next the the second one, would it be offset twice as bad?

Not sure, but I think I've fixed it already. The problem was the pipeline delay from SDRAM. Every clock cycle, I try to do a read request for pixel data, but the actual data doesn't come back until 3 cycles later. The problem was that I wasn't properly keeping track of what should happen to the received data once it got back from SDRAM. I now have added a FIFO to keep track of that:

Code: Select all

1. read pixel #0, store 'X0' in FIFO
2. read pixel #1, store 'X1' in FIFO
3. read pixel #2, store 'X2' in FIFO
4. read pixel #3, store 'X3' in FIFO. Receive #0 data. Take 'X0' from FIFO. 
5. read pxiel #4, store 'X4' in FIFO. Receive #1 data. Take 'X1' from FIFO.

Where 'X0' is the coordinate for pixel #0, and so on. Now, in step 4, I have pixel data #0, as well as the pixel coordinate X0, so it's easy to store the right data in the right place. Even if I use different SDRAM latency, everything will still work.

Quote:

512 sprites?! Awesome. I imagine it would be difficult to make some sort of collision register?

Yes, I haven't figured out how to do that yet. It's probably going to be trickier than drawing the things in the first place!

Quote:

You've got some great things planned. I'm excited, even if it happens to be low-res. It's like travelling back in time to the old days, only now we have a big bag of goodies!!!

I'm using really low-res now, but that's just for convenience, and also to make it easier to see if the pixels are correct. It's fairly easy to go to 720x400 (720x480 PAL) interlaced mode later. If you want to go higher, you'll need a VGA or TFT output. The advantage of the low TV resolution is that the pixel clock rate is fairly low, which gives the hardware plenty of time to do more complicated stuff without requiring huge memory bandwidths.

Arlet · Post by **Arlet** » Fri Apr 06, 2012 6:27 am

Working on full color now. I'm still using native UYVY format, which has the advantage of being well matched to the actual output data, but isn't very convenient for most other things. In order to show full color sprites, they'll have to be RGB->YUV converted first. This can be done on a PC before uploading them.

Later, I think I'll add an option to use different kind of image formats. An 8 bit palette format would be useful for changing colors without having to redraw anything, and would also reduce memory bandwidth. Maybe also a 4 bit palette for even less bandwidth usage, and an RGB555 format for convenience.

Here are two 64x64 pixel squares, at low res. No more glitches when they share a scanline. They were supposed to be the same color, but because the X coordinate is shifted by an odd number, the U/V components got swapped.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/sprites3.jpg

Arlet · Post by **Arlet** » Fri Apr 06, 2012 7:10 am

Full color is working. This is a picture of 4 sprites, each 128x128 pixels in size, displayed right next to each other to fill the entire screen. I wrote a simple tool for the PC to convert RGB to YUV, and uploaded the image through the UART to SDRAM.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/sprites4.jpg

I'll have to try some animation soon...

Arlet · Post by **Arlet** » Fri Apr 06, 2012 9:25 pm

Here's a test with a conversion of a tile based game like Mario bros. Instead of using a special tile engine, my code uses 256 regular sprites (16x16 pixels each) to achieve a similar effect. Scrolling through the level should be a matter of updating the coordinates in a software loop.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/mario.jpg

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Apr 07, 2012 1:20 am

Arlet wrote:

I wrote a simple tool for the PC to convert RGB to YUV, and uploaded the image through the UART to SDRAM...

This sounds like an invaluable utility you've created for sprite creation. A utility that converts any RGB .bmp file into the YUV format the CS4954 needs in order to display proper pixels.

Arlet · Post by **Arlet** » Sat Apr 07, 2012 4:51 am

ElEctric_EyE wrote:

This sounds like an invaluable utility you've created for sprite creation. A utility that converts any RGB .bmp file into the YUV format the CS4954 needs in order to display proper pixels.

It's a very crude utility, though. It only accepts .ppm formatted files, and supplies the YUV in the wrong order, to compensate for the fact that the pipeline delay in the cs4954.v code causes the U/V values to be swapped.

I may also need a YUV -> RGB utility, so I can capture the output from the simulation, and convert it back, so I can see if it's working correctly.

Arlet · Post by **Arlet** » Sat Apr 07, 2012 10:35 am

ElEctric_EyE wrote:

A utility that converts any RGB .bmp file into the YUV format the CS4954 needs in order to display proper pixels.

Actually, I'm starting to think that I should switch to another format. The alternating UY/VY format doesn't really work well in the low resolution mode, if you want to move a sprite by single pixel resolution.

I'm thinking about using a YUV844 format, with 8 bits Y, 4 bits U and V components. This way, all the color information about a single pixel fits into a single 16 bit word, which makes manipulating the data really convenient. The loss in chroma detail is probably barely noticeable, given that chroma quality is already very poor.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Apr 07, 2012 2:07 pm

I know you probably have it covered with your skills in verilog, but Xilinx has an IP core for RGB to YUV converter.

Arlet · Post by **Arlet** » Sat Apr 07, 2012 3:02 pm

I was already looking at this method here: RGB to YUV approximation

The multiplications by constants can be done with a multiplier, or with a bunch of adders.

But that's an option for later. For now, I'll use the YUV844 format, it's easy to implement and has the added advantage of offering better grayscale than RGB555.

Arlet · Post by **Arlet** » Sat Apr 07, 2012 3:53 pm

Here's a test with scrolling sprites

I captured the image with the cheap USB video grabber, but it's dropping frames, resulting in jerky movement. On the TV, the action is smooth.

There are 238 sprites in this animation (14x17). It's a bit tricky to get the sprites to behave properly on the left and right edge. The sprite on the left edge has to have a negative X coordinate, but the screen is 256 pixels wide, and the register for the coordinates are only 8 bits wide. I decided to add a 9th bit, but this isn't really convenient either, since it has be handled separately. I don't see any better solution, though.

The 6502 does the update for all 238 sprites in the vertical retrace period. For this demo that takes about 75 microseconds.

Reading all the sprite data from SDRAM takes about 3 microseconds per scanline, which means that it's currently using less than 5% of the SDRAM bandwidth.

Edit: I came up with a method to ease the pain of having a separate 9th bit for the coordinate. I still have a bunch of reserved bits, and I can use one of them to switch between absolute/relative positioning. The relative positioning would be a new feature, and what it does is make the X/Y coordinate work as an offset with respect to the previous sprite.

So, you could make a sprite list where the first sprite (in the top left corner) has an absolute position. All the other sprites would be programmed (only one time) with a relative offset to this first one. If the CPU then moves the first sprite, all the others would automatically follow. This makes scrolling the entire screen a very quick operation. Edit2: Oops... it won't work to connect all the sprites. Attaching them in vertical bars will work, though. Every bar will get a sprite at the top that's absolute and that works as the 'handle' for the whole column. That means only 17 pointers need to be adjusted to scroll by 1 pixel.