65Org16.x Dev. Board V1.1 using a Spartan 6 XC6SLX9-3TQG144

Arlet · Post by **Arlet** » Wed Mar 28, 2012 8:37 pm

First part of "speckle bug" is fixed, but I noticed still an occasional bad pixel. Still an improvement compared to the ~100 bad pixels I got before. Thinking it maybe was related to refresh cycles interfering with data cycles I disabled refresh. This made the problem worse.

I'm pretty sure it's a read-write bus turnaround issue, where pipelined read data from the SDRAM is driven on the data bus at the same time the FPGA is trying to write. I'll need to add an extra idle cycle when a write is interrupting a read. Update: Yes! After adding an extra cycle when switching from read to write, I don't see any more errors. At the same time, I removed an idle cycle when switching from write to read, which does not need a delay.

Here's a picture of the result so far. The black bar is the area that the 6502 can access right now (between $2000 and $B000 on the bus). Inside the black bar are a few random purple pixels (almost impossible to see on the small image)

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/speckle.jpg

I changed display to a 256x200 grayscale mode (8 bits per pixel) and added a banking mechanism so the CPU can access all of the SDRAM. CPU can write Y coordinate to $B020, and then use $A000, X to address that line of video memory. I then wrote a simple test program to copy incoming UART data into SDRAM.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/trui.jpg

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Mar 31, 2012 12:02 am

Arlet wrote:

...I then wrote a simple test program to copy incoming UART data into SDRAM....

What speed was the incoming UART data?....

Ah, now I see you have edited in a 'success! post' in there. Awesome dude!

So after this discovery, you still see some random pixels in the black bar or no?

BTW, I hate random pixels! Is why I pay exclusive attention to power distribution and bypassing when designing boards.

Arlet · Post by **Arlet** » Sat Mar 31, 2012 7:09 am

ElEctric_EyE wrote:

What speed was the incoming UART data?....

115200. That's the maximum speed I can get in Linux without tricky business.

Quote:

So after this discovery, you still see some random pixels in the black bar or no?

No, everything looks clean now. I had two bugs in the verilog code. First bug was the result of an earlier optimization. I had modified the code so that a 6502 write to SDRAM would not cause a wait state, as long as SDRAM was not busy and if the correct row was already active. This worked fine, except when SDRAM was busy, e.g. with video reads. In that case, the RDY would be deasserted too late, and the 6502 had already removed the pixel data from the DO bus, and replaced it with data belonging to next instruction. Since next instruction after STA (dst), Y was INY, I would get the Y register as pixel data. I had already noticed a pattern of increasing brightness depending on where the bad pixel was

Second bug was caused by not waiting long enough after a read before doing a write, causing a collision on the SDRAM data bus. This also happened when switching from video read to CPU write. Both bugs were relatively easy to fix once I could see them in the simulator. The biggest problem was to find the exact location in a 100,000+ cycle waveform capture.

So, everything works now. The next module I'm planning to make is a module to manipulate windows on the screen. Each window is defined by a rectangle, with a certain position and size. Each window then has user-defined contents, for instance a bit map retrieved from SDRAM, or text from a block RAM, or perhaps something dynamically generated by the FPGA. When you select an SDRAM bitmap, you can then select the start address, and the width (which can be different from the window width). The windows can be overlapping, or side by side. You can move and resize them by changing a few registers, and you can also move around the contents inside a window, for instance to scroll around a large bitmap in a small window. All of the changes would be shown real-time on the display. I'm estimating that the timing budget will allow a maximum of 32 windows. Transparency won't be supported, since it would take too much memory bandwidth.

BigEd · Post by **BigEd** » Sat Mar 31, 2012 7:15 am

Ah, display lists, of a sort: The Wheel of Reincarnation (nothing against display lists)

Arlet · Post by **Arlet** » Sat Mar 31, 2012 7:28 am

BigEd wrote:

Ah, display lists, of a sort: The Wheel of Reincarnation (nothing against display lists)

Yes, modern CPU/GPUs have massive amounts of bandwidth, so now you can just afford to repaint the whole screen in a display buffer. On our little board, the bandwidth is too limited, and we'll have to resort to hardware trickery.

Sprites are another example.

Arlet · Post by **Arlet** » Sat Mar 31, 2012 8:11 am

By the way, I haven't published this code anywhere yet, but if anybody wants a copy, just let me know.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Mar 31, 2012 11:54 am

I would like to check it out. Thanks for sharing!

I am very interested in sprites. That's where I would like to take up more learning of Verilog.

Arlet · Post by **Arlet** » Sat Mar 31, 2012 12:20 pm

ElEctric_EyE wrote:

I am very interested in sprites. That's where I would like to take up more learning of Verilog.

I don't have sprite code yet for this board, but you can take a look at the code I made for a VGA TFT panel last year (from the YouTube demo). Here's the verilog.

I'm not sure the code is functional, because I was already trying to add a window layer as well, but it was very buggy. That's why this time, I'm restarting completely fresh on the window layer, and was going to add the sprites later. You can at least take a look. The sprite code was inspired by the Gameduino source. Be warned that the code is quite complex.

For your dev board, I have the following sources:
CS4954, sends 27 MHz clock, and pixel data, reads HSYNC/VSYNC signals. It's also responsible for the green border I have in all my pictures. You should probably start here.

SDRAM. SDRAM controller with single common R/W port.

SDRAM I/F. SDRAM interface for R/W CPU channel, and read-only video channel. Video has highest priority.

Graphics. Generates bitmap from SDRAM. Nice and small. Interfaces to CS4594 and SDRAM IF modules.

Code still needs extra comments and cleanups, so if you have any questions let me know. Edit: did some clean up on the CS4954 code.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Mar 31, 2012 10:02 pm

Awesome!... I should have my I2C issue solved soon, then I can start experimenting with the CS4954 using your code as a basis. I would like to see if the CS4954 provides VGA compatible signals. The signals are there, but are they compatible? I intend to find out soon after...

On the topic of high video throughput: Even though we don't have the bandwidth for a whole screen buffer to be updated at a high rate, we probably have enough for a certain number of sprites to 'flip' through a memory lookup table. I guess the question will be how big are the sprites? and how many sprites are there?.

The whole video timing issue is, for the cpu to access video RAM without display interference we have to wait for hsync or vsync to be inactive at this point. But now we have 2 separate clocks it looks like from Arlet's Verilog. I am always amazed at how apparently simple video controllers appear to be, written in the Verilog language (CS4954 controller & some other video controllers I've seen). Not saying they're simple to develop, just that they look like a bunch of counters. I'm rambling, forgive, heh...

Arlet · Post by **Arlet** » Sun Apr 01, 2012 6:30 am

ElEctric_EyE wrote:

On the topic of high video throughput: Even though we don't have the bandwidth for a whole screen buffer to be updated at a high rate, we probably have enough for a certain number of sprites to 'flip' through a memory lookup table. I guess the question will be how big are the sprites? and how many sprites are there?

How big do you want them ? There's plenty of bandwidth available for big and colorful sprites, but the more sprites you want, the less memory bandwidth will be available to the CPU. It's all shared.

To display an image, we need to get a maximum of 720 pixels per scanline. At a 100 MHz clock, in a full burst, this would take 720/100 = 7.2 usec. This needs to be done every hsync period, which is 64 us long. Now, imagine you want to have a sprite that's 64 pixels wide, and partially transparent overlaying the background. In order to display that, we need to retrieve the same 720 background pixels, and also the 64 sprite pixels. To get the 64 sprite pixels from SDRAM, it would take another 64 cycle burst, plus some setup time, let's say 70 cycles total, or 0.7 usec. Now, if you want 10 sprites per scanline, each 64 pixels wide, it would take 10 times as long, so 7 usec for the sprites, and 7.2 usec for the background. That's about 25% of the total memory bandwidth, leaving 75% for the CPU. Of course, that's when every scanline has 10 sprites on it, which isn't very typical.

But maybe you don't want a plain background. Maybe you want a background, plus two transparent layers on top, each independently scrolled, plus 10x64 pixel sprites per scanline. That would take about 30 usec per scanline, or about half the available bandwidth. But even half the bandwidth here is a lot more than any legacy 6502 system ever had.

Quote:

The whole video timing issue is, for the cpu to access video RAM without display interference we have to wait for hsync or vsync to be inactive at this point. But now we have 2 separate clocks it looks like from Arlet's Verilog.

Correct. In legacy video controllers, the problem was that the video channel needs to retrieve data at exactly the right time to show the pixel on the screen. My first VGA controllers for Spartan-3 worked like that too, but it's a big pain, because everything needs to be timed exactly right, or you'll have wrong pixels, usually at the edge of the screen. For this project, I decided from the start that I would go with a block RAM to store a single scanline. Using the 27 MHz pixel clock, the data is read from the RAM at the right time, and it's up to some other module to make sure the RAM is written before the next scanline. This makes life a lot simpler. This first version uses 1 block RAM, so that means that the data has be written in the blanking time between two scanlines. If that time isn't enough, it's fairly simple to use 2 block RAMs. One displaying the current scanline, and the other one already filling up with the data for the next scanline. That gives you a full 64 usec to generate the data. At the end of the scanline, the two buffer are swapped.

Quote:

I am always amazed at how apparently simple video controllers appear to be, written in the Verilog language (CS4954 controller & some other video controllers I've seen). Not saying they're simple to develop, just that they look like a bunch of counters. I'm rambling, forgive, heh...

True. In my experience, writing something in Verilog usually takes longer than you think, and when it's finally done, the result is a lot smaller than you think.

In this case, I use the CS4954 in master mode, so it will generate the HSYNC/VSYNC timing for us. This helps to save some logic, but it also makes it easy to switch between PAL and NTSC, because only the CS4954 needs to be reconfigured.

Arlet · Post by **Arlet** » Sun Apr 01, 2012 12:12 pm

ElEctric_EyE wrote:

Awesome!... I should have my I2C issue solved soon, then I can start experimenting with the CS4954 using your code as a basis. I would like to see if the CS4954 provides VGA compatible signals. The signals are there, but are they compatible? I intend to find out soon after...

Don't forget you need to send a 27MHz clock to the CS4954 in order for the I2C bus to work. After that, I would recommend you start with the colorbars, as they only require the 27 MHz clock, and some I2C register programming.

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Apr 02, 2012 1:20 pm

In my last testing I was sending 27MHz (measured), and trying to enable the colorbars by send $01 to register $03.

Also, I don't think resetting the CS4954 will clear the devices' I2C address. I don't see any info in the datasheet on this, but I would imagine if one was going to use more than one device, they would have to retain their programming after power down. This is a similar issue I am facing with 3 DS1085 devices onboard. I must successfully reprogram 1 before soldering the next one...

Arlet · Post by **Arlet** » Mon Apr 02, 2012 6:02 pm

I just changed the I2C ID on my board, and confirmed it was no longer responding to old ID. After a power cycle, the old ID ($00) worked again.

In order to get the color bar output, all I have to do is program the registers from table 4, and set the color bar bit in register 3.

Did you check the ACK bit on the scope ?

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Apr 02, 2012 6:24 pm

Thanks for checking that...
Ok, the DS1085 is different then, I just checked the datasheet. It stores the address in EEPROM and takes 10ms. There is actually a bit that let's one have the option to write to EEPROM so it can save the new value after power-up.

I was not looking at that table 4. Will check it out soon... What I did see was 27 cycles on the SCL line while observing the SDA line. No activity on SDA after the last cycle on SCL.

Arlet · Post by **Arlet** » Tue Apr 03, 2012 7:07 am

This is what I see on the scope when I write $01 to register $03.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/i2c-1.png

Top trace is SCL, bottom trace is SDA.

And this is zoomed in on the first byte. Notice that the device pulls down the SDA during the 9th clock cycle to acknowledge the address.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/i2c-2.png