On the topic of high video throughput: Even though we don't have the bandwidth for a whole screen buffer to be updated at a high rate, we probably have enough for a certain number of sprites to 'flip' through a memory lookup table. I guess the question will be how big are the sprites? and how many sprites are there?
How big do you want them ? There's plenty of bandwidth available for big and colorful sprites, but the more sprites you want, the less memory bandwidth will be available to the CPU. It's all shared.
To display an image, we need to get a maximum of 720 pixels per scanline. At a 100 MHz clock, in a full burst, this would take 720/100 = 7.2 usec. This needs to be done every hsync period, which is 64 us long. Now, imagine you want to have a sprite that's 64 pixels wide, and partially transparent overlaying the background. In order to display that, we need to retrieve the same 720 background pixels, and also the 64 sprite pixels. To get the 64 sprite pixels from SDRAM, it would take another 64 cycle burst, plus some setup time, let's say 70 cycles total, or 0.7 usec. Now, if you want 10 sprites per scanline, each 64 pixels wide, it would take 10 times as long, so 7 usec for the sprites, and 7.2 usec for the background. That's about 25% of the total memory bandwidth, leaving 75% for the CPU. Of course, that's when every scanline has 10 sprites on it, which isn't very typical.
But maybe you don't want a plain background. Maybe you want a background, plus two transparent layers on top, each independently scrolled, plus 10x64 pixel sprites per scanline. That would take about 30 usec per scanline, or about half the available bandwidth. But even half the bandwidth here is a lot more than any legacy 6502 system ever had.
The whole video timing issue is, for the cpu to access video RAM without display interference we have to wait for hsync or vsync to be inactive at this point. But now we have 2 separate clocks it looks like from Arlet's Verilog.
Correct. In legacy video controllers, the problem was that the video channel needs to retrieve data at exactly the right time to show the pixel on the screen. My first VGA controllers for Spartan-3 worked like that too, but it's a big pain, because everything needs to be timed exactly right, or you'll have wrong pixels, usually at the edge of the screen. For this project, I decided from the start that I would go with a block RAM to store a single scanline. Using the 27 MHz pixel clock, the data is read from the RAM at the right time, and it's up to some other module to make sure the RAM is written before the next scanline. This makes life a lot simpler. This first version uses 1 block RAM, so that means that the data has be written in the blanking time between two scanlines. If that time isn't enough, it's fairly simple to use 2 block RAMs. One displaying the current scanline, and the other one already filling up with the data for the next scanline. That gives you a full 64 usec to generate the data. At the end of the scanline, the two buffer are swapped.
I am always amazed at how apparently simple video controllers appear to be, written in the Verilog language (CS4954 controller & some other video controllers I've seen). Not saying they're simple to develop, just that they look like a bunch of counters. I'm rambling, forgive, heh...
True. In my experience, writing something in Verilog usually takes longer than you think, and when it's finally done, the result is a lot smaller than you think.
In this case, I use the CS4954 in master mode, so it will generate the HSYNC/VSYNC timing for us. This helps to save some logic, but it also makes it easy to switch between PAL and NTSC, because only the CS4954 needs to be reconfigured.