6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Jun 16, 2024 7:01 am

All times are UTC




Post new topic Reply to topic  [ 56 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
 Post subject:
PostPosted: Mon Apr 09, 2012 12:47 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...The relative positioning would be a new feature, and what it does is make the X/Y coordinate work as an offset with respect to the previous sprite.

So, you could make a sprite list where the first sprite (in the top left corner) has an absolute position. All the other sprites would be programmed (only one time) with a relative offset to this first one. If the CPU then moves the first sprite, all the others would automatically follow...

This is the result of the SDRAM pipeline?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Apr 09, 2012 6:31 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
This is the result of the SDRAM pipeline?


No, right now I have a list of sprites that look like this:
Code:
X=0, Y=00, Sprite=0
X=0, Y=16, Sprite=0
X=0, Y=32, Sprite=0
...
X=0, Y=208,Sprite=2,
X=0, Y=224,Sprite=2

That's a list of 14 sprites in the Mario game, forming the first column of tiles. Sprite=0 means a 16x16 image of sky, and Sprite=2 is a 16x16 image of the rocks. This is then followed by 16 more columns, at X=16, X=32, and so on.

Now, if I want to scroll the entire play field 1 pixel to the left, the CPU has to go through that list, and subtract #1 from all the X coordinates. Like this:
Code:
X=-1, Y=00, Sprite=0
X=-1, Y=16, Sprite=0
X=-1, Y=32, Sprite=0
...
X=-1, Y=208,Sprite=2,
X=-1, Y=224,Sprite=2

With my new feature, the first column would look like this:
Code:
X=00, Y=00, Sprite=0
X+=0, Y+=16, Sprite=0
X+=0, Y+=16, Sprite=0
...
X+=0, Y+=16,Sprite=2,
X+=0, Y+=16,Sprite=2

The "Y+=16" notation means that the sprite is positioned 16 pixels lower than the sprite before that. This option would be encoded using an extra bit in the sprite descriptor. Now, if you want to move all the sprites one pixel to the left, the CPU only has to change the first one:
Code:
X=-1, Y=00, Sprite=0
X+=0, Y+=16, Sprite=0
X+=0, Y+=16, Sprite=0
...
X+=0, Y+=16,Sprite=2,
X+=0, Y+=16,Sprite=2

The other 13 sprites in this list would automatically be shifted as well, because they are positioned at -1 + 0 = -1.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Apr 10, 2012 5:41 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I've been trying to modify the video rendering pipeline. Instead of a shared memory between the cs4954 module and the sprite renderer, I've moved to a local memory in the sprite renderer, and an async FIFO between the two blocks.

This means there's no more line-by-line lockstep operation between generating the pixel values, and displaying them. The renderer starts generating a new screen when it gets a signal, and then generates the whole screen as fast as possible. The data is then send through a FIFO. When the FIFO gets full, the renderer pauses.

This design makes it easy send the pixel data to SDRAM instead of the screen, or even to my UART port for a low-speed (but 100% digital) video capture. The new local memory inside the sprite module uses both ports in parallel, so it can do single cycle read/modify/writes on the data. This will be useful for alpha blending.

There's still a bug in the code. Sometimes there's some jitter in the screen, which I suspect is a problem somewhere at the corner cases of the FIFO empty/full handling. Since it appears to be dependent on exact timing between CS4954-generated hsync/vsync and the FPGA, it is hard to reproduce in sims.


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Wed Apr 11, 2012 9:28 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Arlet wrote:
There's still a bug in the code. Sometimes there's some jitter in the screen, which I suspect is a problem somewhere at the corner cases of the FIFO empty/full handling. Since it appears to be dependent on exact timing between CS4954-generated hsync/vsync and the FPGA, it is hard to reproduce in sims.
Ah, found it. The 'fifo_full' signal was declared as an output instead of an input, so it wasn't detecting at all that the FIFO was full. The weird part is that ISE didn't give me an error message for connecting two outputs together, probably because the signal was only assigned to in one of the modules. The simulator apparently just turned the output into an input, so it worked correctly.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 11, 2012 8:21 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I tried the YUV844 format, but I didn't really like how it looked for subtle color changes, such as skin tones. I decided to add a RGB555 -> YUV conversion, and see if RGB555 looked better. I think it does. Here's the result of a test image:
Image
I think it's pretty good, considering it's only 256x256 pixels, and the image wasn't dithered. With dithering, and at double the resolution, it should look even better.

Here's the code for the RGB->YUV conversion. It's actually quite straightforward, and ISE does a good job of optimizing out all the multiplications.
Code:
always @(posedge clk54) begin
        Yr <= 13'd66  * r;
        Yg <= 13'd129 * g;
        Yb <= 13'd25  * b;

        Ur <= 13'd38  * r;
        Ug <= 13'd74  * g;
        Ub <= 13'd112 * b;

        Vr <= 13'd112 * r;
        Vg <= 13'd94  * g;
        Vb <= 13'd18  * b;
end

always @(posedge clk54) begin
        Ys <=  Yr + Yg + Yb + 16;
        Us <=  Ub - Ug - Ur + 16;
        Vs <=  Vr - Vg - Vb + 16;
end

always @* begin
        y <= Ys[12:5] + 16;
        u <= Us[12:5] + 128;
        v <= Vs[12:5] + 128;
end

This is also running with the new rendering code.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 14, 2012 3:12 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
After fixing a few more bugs in the new sprite renderer (this is probably the most complex piece of Verilog I've written so far), I added a transparency feature, and made a simple test with 10 sprites, 64x64 pixels each.

See it here on YouTube


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 15, 2012 12:08 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
That is awesome! I have a couple questions:
1) Is there a delay on the fastest of those green orbs? I'm curious what max speed is.

2) What speeds are your CPU and the SDRAM clock right now?

How cool would this be when all these sprites could 'page flip' their images, so instead of an orb, maybe a rotating 3D vector cube changing colors?!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 15, 2012 5:59 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
There's no "delay" on any of the sprites, except the inherent delay in the frame rate. PAL has 50 fields per second. For every field, you can change the position of all of the sprites by writing new X/Y coordinates. The slowest orbs have the coordinates changed by 1 pixel, and the fastest go by 5 pixels. You can make them even faster by skipping more pixels, but the more pixels you skip the less natural the motion will seem. NTSC has 60 fields per second, so they'll go slightly faster. Because of the fixed frame rate, you'll always be limited in the choice of smooth speeds.

Everything is still clocked at 100 MHz.

By the way, the CPU interfaces with the video module by accessing a bunch of registers. For now, I have the following interfaces. There's a separate interface to control the CS4954:
Code:
$B030 = Left border width
$B032 = Top border width
$B034 = Status/Control (VSYNC bit etc)

The Status/Control byte is used to synchronize the animation to the frame rate. There is a status bit that indicates the current field is finished, so the CPU can modify the sprite tables. There's also a control bit that the CPU can set to indicate it is ready with the changes, and the new field can be rendered.

To control the sprites, there are a number of tables, each 512 bytes long, with 1 byte per sprite. The first set of 4 tables determine the position/image of the sprite:
Code:
$C000-$C1FF = X coordinate (LSB)
$C200-$C3FF = Y coordinate
$C400-$C5FF = Image number
$C600-$C7FF = Extra/Reserved bits (currently has bit 9 of X coordinate, and enable bit)

In my demo, image number 0 is the green ball, so if you do the following:
Code:
LDA #10
STA $C000 ; X = 10
STA $C200 ; Y = 10
LDA #0
STA $C400 ; image = 0
LDA #2         
STA $C600 ; extra = 2 (sprite enable)

Then you'll get a green ball at coordinates (10,10). To move the ball, just write the new coordinates to the $C000/$C200 registers. To change the appearance of the ball, just change the image number. For instance, you can make a "boing ball" demo, by preparing a set of 8 images at different angles of rotation, and then cycle the image register through 0..7 to make the ball appear to rotate.

To describe the images there are some further tables:
Code:
$C800 - $C9FF = Image width
$CA00 - $CBFF = Image height
$CC00 - $CDFF = X offset in bitmap
$CE00 - $CFFF = Y offset in bitmap
$D000 - $D7FF = Bitmap origin in SDRAM (3 bytes address, plus 8 reserved bits)


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 15, 2012 9:36 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Here's an example of how you can use a small bitmap to define a larger sprite. The background is a 256x256 sprite, but only has a 64x64 bitmap definition. As a result, the bitmap is tiled to fill the entire 256x256 space. By using the X/Y bitmap offsets, it is possible to scroll the bitmap inside the sprite.

http://www.youtube.com/watch?v=VwXqZmVg-lU


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 16, 2012 12:42 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ok, so your CPU is running at 100MHz as is your SDRAM video controller. I assume the CPU is your original 8 bit version?

I would like to experiment with your SDRAM code.
I hope I'm not being too presumptuous here!... On my system, I would like to use the current version of 65016.b core, although it's only spec'd at ~90MHz.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 16, 2012 5:32 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
Ok, so your CPU is running at 100MHz as is your SDRAM video controller. I assume the CPU is your original 8 bit version?

Yes, everything is running at 100 MHz. I've never tried running things at different clock frequencies, and honestly I have no idea how to do it, except in the "easy" cases where the clock domains can be separated by a dual port RAM. The core is a slightly modified version of the 8 bit original. I have removed a few cycles at the cost of an extra adder in the address calculator. Also, I moved the stack page into page zero. I didn't need so much zero page/stack, so now I have extra room for code.
Quote:
I would like to experiment with your SDRAM code

Attached. Note that the sdramif.v module is made for an 8 bit CPU. For the 65org16 core you'll need to modify a few things.
Code:
        sdram_wr_data <= { DOH, DO };
        sdram_addr <= { ABH, AB[7:0] };

Should be changed into:
Code:
        sdram_wr_data <= DO;
        sdram_addr <= AB[23:0];

And some of the signals need to be changed from 8 to 16 bit.
You can also remove this code here:
Code:
always @(posedge clk)
    if( ctrl & WE )
        case( AB[1:0] )
            0: ABH[ 7:0] <= DO;
            1: ABH[15:8] <= DO;
            2: DOH       <= DO;
        endcase


By the way, I'm not too happy with the design of the sdramif module. It's a bit of a quick hack. It seems to work okay, but I would like to replace it with a cleaner design at some point.


Attachments:
File comment: Verilog sources for SDRAM controller
65org16dev.zip [13.92 KiB]
Downloaded 99 times
Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 16, 2012 11:45 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I'll have to use an older 100MHz version of .b core. Thanks so much for your contributions, it is much appreciated! I have some little troubleshooting left on my board, then I can dive in. I've put much time recently into understanding and expanding your original 6502 core, but I think I need a rest from that now.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 16, 2012 12:04 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
I'll have to use an older 100MHz version of .b core.

Not necessarily. The code I have attached will work fine at 90 MHz or even lower. The only issues I can think of are: the generation of the 24MHz and 54 MHz clocks for USB and video, which are both derived from the 100 MHz oscillator, and also the SDRAM refresh time.

The SDRAM refresh can be modified in the sdram.v file. Replace the number 768 with a smaller number to get more frequent refresh cycles.
Code:
        refresh_count <= refresh_count - 16'd768;


ETA: with 768 you get a nominal refresh cycle every 768 / 100 = 7.68 usecs. The SDRAM requires 8192 refresh cycles every 64 ms. I have a mechanism in the SDRAM controller where it can delay up to 32 refresh cycles when it's busy. So, worst case I get 8192 refresh cycles in (8192+32)*7.68 = 63.2 ms.


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 16, 2012 9:00 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
... Also, I moved the stack page into page zero. I didn't need so much zero page/stack, so now I have extra room for code.

This is true of the 65Org16.x too!
This is definately a wise decision IMO. I will modify my testbech and cpu and see if there is any speed increase!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 07, 2012 7:40 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Just to finish up my last comment, I didn't notice any speed increase when I put the stack and zero page in the same 64K block with the stack on top. 1Kx16 each...

For bitmap generation only, no sprites, using the SDRAM, can I use 4 modules as connected here?

Sorry, this is a poor quality picture. I'll make a larger one tonight.

EDIT: Added a larger pic just now, will take some time to update.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 56 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: