Because the main thread for the dev board was getting big, I decided to start a new topic specifically for video output.
After a bit of thinking, I came up with this preliminary design for a video output system, useful for implementing games. I wanted something that would be efficient, flexible, and not too hard to write in Verilog. I also wanted to be able to display bitmaps, transparency, overlapping windows, tiles and sprites, but I didn't want a whole bunch of separate modes and systems that would make it very complex.
So, I came up with the following:
1. One block RAM will be used to hold as a big list of items that are shown on the screen. Each item is a rectangular bitmap or arbitrary size. For each item, it will store X/Y coordinates, and the item number. I was planning to use 8 bits for X, 8 bits for Y, and 8 bits for the item #. The remaining 8 bits are reserved for later.
So, for instance, if item #0 represents a 64x80 pixel tree, you could make a table that says:
Code:
(100,100,#0)
(200,100,#0)
(300,200,#0)
and it would display 3 trees on the screen at different coordinates. These could be used as a background, similar to what tile based rendering engines use. The difference is that my system would support fewer (but bigger) tiles, and they wouldn't have to be in a grid. Since I have 32 bits per object, a single block RAM could store 512 of these objects. In case that's not enough, a second block RAM could be used. Since each object stores the X/Y coordinates, objects could be static background tiles, as well as foreground sprites.
The objects support transparency (later maybe also other effects such as shadow/fog/and similar color manipulations), so they can be overlapped to create extra effects. By overlapping the trees, you could make a nice looking forest, for instance. Or you could make a foreground layer and a background layer, and scroll them at different speeds to create a 3D parallax effect. Of course, the rendering time and bandwidth usage will go up depending on the overlap. Animation, such as a running hero could be accomplished simply by changing the item # in the table. For each different number, a different image would be displayed in the same position, and it would only require 1 byte to be written.
2. The description of each possible object is stored in a second table, indexed by item number. So, for our tree item #0, this table will hold:
Code:
0: width=64, stride=64,height=80, base=$1239A
Pixel (x,y) of the image will be stored in SDRAM at location: base + x + y*stride. The 'stride' can be different from the width, so you can take subsections of a larger bitmap. This allows scrolling inside an object, as well as different forms of animation. You could for instance have a 64x200 image of a pipe, and use it to create 64x50, 64x100 and 64x150 sub-images. It's also possible to create animations by changing the 'base' parameter. So, our running hero could remain item #1, but by point the 'base' to different parts of the bitmap storage, it could still be animated. Of course, if item #1 is used in more than one place on the screen, they would all be animated synchronously. This table would require 64 bits per item, so we could use a single block RAM, 32 bits wide, and use 2 accesses, or use 2 block RAMs * 32 bit wide. Using two block RAMs would also mean we could have 512 different objects, instead of 256. All bitmaps are stored in SDRAM. This provides plenty of space, as well as the option to manipulate the bitmaps pixel by pixel. A large object could be used as a canvas for instance, that you could draw lines on, or render text.
I'm trying to avoid artificial limitations, so there's no 8 sprite/scanline limit, for instance. Bandwidth and rendering time vary with total number of objects, objects per scanline, object size, and amount of overlap. If you try to do too much, the engine will just run out of time, and start dropping objects. It's up to the programmer to keep an eye on that.