65Org16.x Dev. Board V1.0 using a Spartan 6 XC6LX9-3TQG144
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
Ah,...MT48LC16M16A2TG-7E, 256Mbit, for $13, max speed 133 MHz. That would give you a top speed of somewhere between 100 and 133 MHz, depending on how far the core can be pushed.
I see them available for ~$7 from a supplier. There's also a version -6A speed grade rated @167MHz, but out of stock right now... This is an excellent price for such a large RAM.
If a user wanted to expand the memory, for every device added, would they have to add another core? That seems to be what one has to do with the larger Spartan 6 devices containing the MCB's like BigEd pointed out earlier. Every MCB controls 1 memory device. Just asking out of curiosity, I have no intent of doing this for the development board because board space is limited. But I must keep in mind expandibility issues as well.
Also, a concern I have with using the 8Mb SPI Flash instead of the 2Mb Xilinx XCF02S PROM to program the FPGA... Will iMPACT erase the entire SPI device fully before reprogramming, like it does the XCF02S?
Ah,...MT48LC16M16A2TG-7E, 256Mbit, for $13, max speed 133 MHz. That would give you a top speed of somewhere between 100 and 133 MHz, depending on how far the core can be pushed.
I see them available for ~$7 from a supplier. There's also a version -6A speed grade rated @167MHz, but out of stock right now... This is an excellent price for such a large RAM.
If a user wanted to expand the memory, for every device added, would they have to add another core? That seems to be what one has to do with the larger Spartan 6 devices containing the MCB's like BigEd pointed out earlier. Every MCB controls 1 memory device. Just asking out of curiosity, I have no intent of doing this for the development board because board space is limited. But I must keep in mind expandibility issues as well.
Also, a concern I have with using the 8Mb SPI Flash instead of the 2Mb Xilinx XCF02S PROM to program the FPGA... Will iMPACT erase the entire SPI device fully before reprogramming, like it does the XCF02S?
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:47 pm, edited 2 times in total.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I have 2 final ideas for I/O for the 16-bit 65Org16.x Core, since board space is getting tight:
1) is the 16-bit 100-pin QFP Cypress CY7C67300 configured for USB, IDE, and a 16-bit PWM.
2) is an 8-bit 6845 controller core for programmable resolution video monochrome graphics. Monochrome to start..
1) is the 16-bit 100-pin QFP Cypress CY7C67300 configured for USB, IDE, and a 16-bit PWM.
2) is an 8-bit 6845 controller core for programmable resolution video monochrome graphics. Monochrome to start..
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:48 pm, edited 2 times in total.
ElEctric_EyE wrote:
If a user wanted to expand the memory, for every device added, would they have to add another core?
Quote:
Also, a concern I have with using the 8Mb SPI Flash instead of the 2Mb Xilinx XCF02S PROM to program the FPGA... Will iMPACT erase the entire SPI device fully before reprogramming, like it does the XCF02S?
ElEctric_EyE wrote:
2) is an 8-bit 6845 controller core for programmable resolution video monochrome graphics. Monochrome to start..
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
What kind of video interface do you have in mind ? VGA, TFT, composite ?...
Well I was going to start with composite and work from an old schematic of a video board my friend used in his high school science fair many years ago. Then I was going to see if I could adapt it to VGA.
Arlet wrote:
... I'm actually working on a new one with support for sprites. My first attempt, but I already have support for multiple sprites.
Neat! Will your sprites have addressable control registers so they can be controlled from assembly? Like an X, Y, and control registers? How about memory pointers? I'll let you do the video if you would like. Sounds like you are deep into it already, and feel free to update on this thread anytime.
What kind of video interface do you have in mind ? VGA, TFT, composite ?...
Well I was going to start with composite and work from an old schematic of a video board my friend used in his high school science fair many years ago. Then I was going to see if I could adapt it to VGA.
Arlet wrote:
... I'm actually working on a new one with support for sprites. My first attempt, but I already have support for multiple sprites.
Neat! Will your sprites have addressable control registers so they can be controlled from assembly? Like an X, Y, and control registers? How about memory pointers? I'll let you do the video if you would like. Sounds like you are deep into it already, and feel free to update on this thread anytime.
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:49 pm, edited 2 times in total.
ElEctric_EyE wrote:
Neat! Will your sprites have addressable control registers so they can be controlled from assembly? Like an X, Y, and control registers? How about memory pointers? I'll let you do the video if you would like. Sounds like you are deep into it already, and feel free to update on this thread anytime.
Maximum sprites per scanline should be at least enough to cover the whole line if they're not overlapping, so 20 sprites @ 32 pixels each, or 40 sprites @ 16 pixels, but that all depends on the clock speed, memory bandwidth, screen resolution and pixel clock rate. These numbers are for my current design, using 640x480 TFT and 50 MHz memory, and full 16-bit color sprites and background. By lowering color depth, resolution, or increasing clock speed, these numbers get better.
As far as video interface, I was looking at this device: Cirrus Logic CS4954 which generates PAL/NTSC composite, S-video and RGB outputs. My idea was to make something that can be hooked up to a TV, so you can make your own video games. Of course, an audio DAC would be required as well, but that can be a simple PWM output + filter/opamp.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Ok, so we are committed to the 16Mx16 SDRAM for lower memory?. Now we must decide on the video ram... Maybe go for a 167MHz version of a 4MBx16 or 8MBx16? That video IC looks very capable.
Today I started a new project file for the memory decoding of the 256x16 zero page, 256x16 stack. Now trying to make a 28.6Kx16 large ROM file work. That should leave about 56Kb for other stuff, like your 512x32 sprite descriptor block. Am I forgetting about anything else that may need BRAM resources?
I would like to start the board layout soon. I do think I am over 2.5"x3.8" limit for the $98 3x 4 layer cheap boards from expressPCB. Will have to check this out soon. Seems like they have a 4x 21sqin 4 layer boards deal for $195. This is reasonable.
Today I started a new project file for the memory decoding of the 256x16 zero page, 256x16 stack. Now trying to make a 28.6Kx16 large ROM file work. That should leave about 56Kb for other stuff, like your 512x32 sprite descriptor block. Am I forgetting about anything else that may need BRAM resources?
I would like to start the board layout soon. I do think I am over 2.5"x3.8" limit for the $98 3x 4 layer cheap boards from expressPCB. Will have to check this out soon. Seems like they have a 4x 21sqin 4 layer boards deal for $195. This is reasonable.
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:49 pm, edited 2 times in total.
My idea was to use a single chip for both CPU as well as video RAM. That's how I did it on my Spartan-3 eval board demo. I lowered my horizontal frequency to the lowest that would work with my TFT panel, so I have 640 real pixels, and 232 pixels for hsync/front porch/back porch, with a 25 MHz pixel clock. That gives me 34.9 usec to generate a scanline.
The memory interface runs at 50 MHz, and is 32 bit wide. With 16 bits color, I can fetch the 640 pixels I need for the background in 320 cycles, or 6.4 usecs. Leaving 28.5 usecs/line for CPU access. and sprite data. If you fill the entire scanline with sprites, that requires another 6.4 usecs (assuming 16 bit color). These timings are the same using 16 bit memory @ 100 MHz.
If you use paletted 8 bit color, only half the bandwidth is required, and it also allows color cycling tricks, which can be pretty effective.
Of course, a couple of block RAMs are required to buffer/prepare all the data. I use one block RAM for the sprite descriptors, and 2 block RAMs for scanline buffering. A palette would also require some block RAM, but could be shared with the sprite descriptors block by limiting those to only 256 sprites.
Because I fetch the video data in (long) bursts, this means the CPU access may be delayed (a long time). By using shorter bursts, latency can be traded for bandwidth.
EDIT: when using the Cirrus chip with full 720 horizontal resolution, each scanline takes 64 usecs, and requires 720 pixels to be fetched from memory, which takes 7.2 usecs for 16 bit color, and 3.6 usecs for 8 bit color.
The memory interface runs at 50 MHz, and is 32 bit wide. With 16 bits color, I can fetch the 640 pixels I need for the background in 320 cycles, or 6.4 usecs. Leaving 28.5 usecs/line for CPU access. and sprite data. If you fill the entire scanline with sprites, that requires another 6.4 usecs (assuming 16 bit color). These timings are the same using 16 bit memory @ 100 MHz.
If you use paletted 8 bit color, only half the bandwidth is required, and it also allows color cycling tricks, which can be pretty effective.
Of course, a couple of block RAMs are required to buffer/prepare all the data. I use one block RAM for the sprite descriptors, and 2 block RAMs for scanline buffering. A palette would also require some block RAM, but could be shared with the sprite descriptors block by limiting those to only 256 sprites.
Because I fetch the video data in (long) bursts, this means the CPU access may be delayed (a long time). By using shorter bursts, latency can be traded for bandwidth.
EDIT: when using the Cirrus chip with full 720 horizontal resolution, each scanline takes 64 usecs, and requires 720 pixels to be fetched from memory, which takes 7.2 usecs for 16 bit color, and 3.6 usecs for 8 bit color.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
A 65Org16.x Core spec'd @86MHz may be able to handle a second dedicated SDRAM geared specifically for the video running @167MHz?
Maybe 1 65Org.x Core dedicated to video commands with 1 SDRAM controller, and another 65Org.x Core dedicated to the other resources with a smaller SDRAM Core for I/O...
I know there's tons of variables I am throwing out there...
On my part, I need to keep in mind that, unfortunately, there is no 208-pin QFP version of the Spartan 6...
Maybe 1 65Org.x Core dedicated to video commands with 1 SDRAM controller, and another 65Org.x Core dedicated to the other resources with a smaller SDRAM Core for I/O...
I know there's tons of variables I am throwing out there...
On my part, I need to keep in mind that, unfortunately, there is no 208-pin QFP version of the Spartan 6...
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:49 pm, edited 2 times in total.
Two independent memory banks would be nice, but, yes, you have to keep in mind that it takes an awful lot of pins.
Of course, if you like high memory bandwidth for the CPU, but still like some form of video out, it's always possible to use a lower resolution, and less colors. At half the horizontal resolution, and 4 bit per color, for instance, only 180 bytes are needed every 64 usecs, leaving nearly all of the memory bus available to the CPU.
Of course, if you like high memory bandwidth for the CPU, but still like some form of video out, it's always possible to use a lower resolution, and less colors. At half the horizontal resolution, and 4 bit per color, for instance, only 180 bytes are needed every 64 usecs, leaving nearly all of the memory bus available to the CPU.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
Two independent memory banks would be nice, but, yes, you have to keep in mind that it takes an awful lot of pins...
Yes let's keep the first one simple. Sometimes I like to dream big though.
Arlet wrote:
...Of course, if you like high memory bandwidth for the CPU, but still like some form of video out, it's always possible to use a lower resolution, and less colors. At half the horizontal resolution, and 4 bit per color, for instance, only 180 bytes are needed every 64 usecs, leaving nearly all of the memory bus available to the CPU.
Personally, I favor the speed. I also favor the 640x480 resolution. 4-bit color is not too bad. It's a tough choice between 4 or 8-bit color...
Two independent memory banks would be nice, but, yes, you have to keep in mind that it takes an awful lot of pins...
Yes let's keep the first one simple. Sometimes I like to dream big though.
Arlet wrote:
...Of course, if you like high memory bandwidth for the CPU, but still like some form of video out, it's always possible to use a lower resolution, and less colors. At half the horizontal resolution, and 4 bit per color, for instance, only 180 bytes are needed every 64 usecs, leaving nearly all of the memory bus available to the CPU.
Personally, I favor the speed. I also favor the 640x480 resolution. 4-bit color is not too bad. It's a tough choice between 4 or 8-bit color...
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:50 pm, edited 2 times in total.
Instead of two independent memory chips, and intermediate solution is to use two parallel banks, and make a 32 bit interface. This doubles throughput at the cost of about 20 pins. Of course, this doesn't get rid of the CPU/video contention, but it makes the periods smaller.
As far as the choice between resolution, color depth and bandwidth, this is something that you can always decide later, or change from one application to another. By putting everything inside the FPGA, it's really flexible. It's even possible to support different video modes at the same time, and have the CPU configure the mode run-time. You could even switch to a 40x25 text-only mode, using 2 block RAMs, and no external memory.
As far as the choice between resolution, color depth and bandwidth, this is something that you can always decide later, or change from one application to another. By putting everything inside the FPGA, it's really flexible. It's even possible to support different video modes at the same time, and have the CPU configure the mode run-time. You could even switch to a 40x25 text-only mode, using 2 block RAMs, and no external memory.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
Instead of two independent memory chips, and intermediate solution is to use two parallel banks, and make a 32 bit interface. This doubles throughput at the cost of about 20 pins. Of course, this doesn't get rid of the CPU/video contention, but it makes the periods smaller...
Then why not use 2 external 16-bit wide SDRAMs side by side? They are cheap @4.86ea...
So far, the basic design with the 65Org16 core and all pins brought out, there are 48 remaining pins, if I'm reading the "bonded IOB's remaining" correctly. (Just an observation/correction, I've noticed the version of the S6 we're using does have 2 MCB's)
I'm presently putting together an IC list which I'll post soon, so I can get a grasp on the design and layout and to solidify current pricing.
Instead of two independent memory chips, and intermediate solution is to use two parallel banks, and make a 32 bit interface. This doubles throughput at the cost of about 20 pins. Of course, this doesn't get rid of the CPU/video contention, but it makes the periods smaller...
Then why not use 2 external 16-bit wide SDRAMs side by side? They are cheap @4.86ea...
So far, the basic design with the 65Org16 core and all pins brought out, there are 48 remaining pins, if I'm reading the "bonded IOB's remaining" correctly. (Just an observation/correction, I've noticed the version of the S6 we're using does have 2 MCB's)
I'm presently putting together an IC list which I'll post soon, so I can get a grasp on the design and layout and to solidify current pricing.
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:50 pm, edited 2 times in total.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Parts list will change over time. The 50MHz oscillator will most likely change. It's cheaper to buy a lower frequency and use the frequency multiplier in the FPGA. It has PLLs, in addition to the DCMs, which I'll have to investigate.
--missing pic needs updating---
--missing pic needs updating---
Last edited by ElEctric_EyE on Fri Nov 11, 2011 7:51 pm, edited 2 times in total.