6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Sep 25, 2024 6:22 pm

All times are UTC




Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13, 14, 15 ... 41  Next
Author Message
PostPosted: Mon Dec 10, 2012 10:33 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The number of lines shouldn't matter for max clock speed. The number is just a counter, and it draws the lines sequentially.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 10, 2012 11:22 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I dialed the resolution settings back down to 640x480.
I was able to get the system clock up to 200MHz (so your state machines are good for that speed, awesome!), pixel clock @25MHz, and it runs well. One thing is, maybe my eyes deceive me, but there seems to be no appreciable increase in speed?

Also, I set all the initial seg_h[x] to 320. To watch all the lines trace to a common point in parallel far off the screen is cool to watch. I'll have to make a video soon. :)

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 5:38 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
I was able to get the system clock up to 200MHz (so your state machines are good for that speed, awesome!), pixel clock @25MHz, and it runs well. One thing is, maybe my eyes deceive me, but there seems to be no appreciable increase in speed?

Correct. The lines are moving with the VSYNC rate. You can't move a line during a frame, otherwise you'd get shearing, where the top half of the line would be in a different place than the bottom half. If you want to the lines to move faster, you just need to have bigger jumps. In main.v, try w <= w + 2, and they'll move twice as fast. Also, you could increase VSYNC rate.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 1:34 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I tried again, this time successfully, to get the project working at 1024x768 with a pixel clock of 71MHz. It let me get the system clock to operate at 118MHz. Earlier I said 88MHx was max, but I must have skipped a test in all the excitement.
So I guess this is the sweet spot as far as resolution/performance, since the BRAM used for the scanline buffer is only 1024bits deep.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 2:33 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
So, with a 25 MHz pixel clock you were able to push main clock to 200 MHz, but with a 71 MHz pixel clock the main clock only to 118 MHz ? That's strange.

The BRAM is 2kB, so 1024 pixels. I agree that's the sweet spot. Also, higher pixel clocks would mean less cycles available for processing.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 2:48 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
So, with a 25 MHz pixel clock you were able to push main clock to 200 MHz, but with a 71 MHz pixel clock the main clock only to 118 MHz ? That's strange.

Yes. Other than the timings in vga.v, I had to add a bit to the rd_addr & wr_addr registers. Also the w reg in main.v should be [10:0]?
But why would you say strange?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 2:54 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
It's strange that the main clock domain would suffer from a change in the pixel clock domain. They are unrelated blocks of logic. Of course, when you're making registers wider, they'll be slower, but from 200 to 118 MHz is a big drop for just one extra bit. I'll have to look at this myself when I get a chance, and go through the timing analyzer reports to see where the long paths are.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 7:03 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I did a quick test with a 200 MHz main clock, and a 200 MHz pixel clock, and it passed all constraints without modification. The worst case timing path for the pixel clock still has almost 0.7 ns slack, and the main clock domain has 0.44 ns slack. The longest main clock path is the 'last_segment' calculation, and its influence on the state machine followed by paths for the size calculation in the async FIFO. Both parts can be rewritten to incorporate an extra pipeline stage, without impact on the performance.

By the way: the [9:0] address vectors for the line buffer are wide enough for the single block RAM and will allow 1024 pixel horizontal resolution.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 8:25 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
I did a quick test with a 200 MHz main clock, and a 200 MHz pixel clock, and it passed all constraints without modification. The worst case timing path for the pixel clock still has almost 0.7 ns slack, and the main clock domain has 0.44 ns slack. The longest main clock path is the 'last_segment' calculation, and its influence on the state machine followed by paths for the size calculation in the async FIFO. Both parts can be rewritten to incorporate an extra pipeline stage, without impact on the performance...

You're going to do this even though both clocks can run @200MHz?
I must have my PLL hooked up wrong. I see in the warning it converted my PLL_base to PLL_adv, but I see no PLL_adv primitive. Right now I just have a wire from CLKFBOUT to CLKFBIN.
Arlet wrote:
...By the way: the [9:0] address vectors for the line buffer are wide enough for the single block RAM and will allow 1024 pixel horizontal resolution.

I saw that warning as well, the BRAM address size was too big... A mental block I have is to forget reg size = n-1. So when I put in 1024 decimal and convert to binary it gives me 11 bits. I know 1024=0to1023. Maybe I will remember from now on! :oops:

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 11, 2012 8:33 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
You're going to do this even though both clocks can run @200MHz?

No, it was just an observation. But max speed could go down as more logic is added and routing becomes longer. In that case, it's nice to know there's still room for improvement when it becomes necessary.


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 13, 2012 5:34 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
How did you hook up the FBIN/OUT in your PLL? I think that maybe why my system slowed down with the higher pixel clock.
Also some other things I was wondering:
How difficult is it going to be to code for the plotting for the end of the lines?
How are we going to specify the total number of lines and their coordinates?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 13, 2012 6:23 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
How did you hook up the FBIN/OUT in your PLL? I think that maybe why my system slowed down with the higher pixel clock.

I left everything the same as in the github sources, but I just changed the CLKFX_DIVIDE/CLKFX_MULTIPLY numbers (both set to 4), and I changed the timing constraint in the UCF file from 10 ns to 5 ns.

Quote:
How difficult is it going to be to code for the plotting for the end of the lines?

Not very difficult.
Quote:
How are we going to specify the total number of lines and their coordinates?

I was thinking about using block RAMs to specify the lines. You need start/end point (or x, y, dx, dy), which requires 41 bits (10+10+10+11) for a 1024x768 display. Since a BRAM only goes to 32 bits (36 with parity), we'll need to combine two. That will give you 64 bits per line (72 with parity). Adding some color would be nice, so that uses 41+16= 57 bits. Since there are still bits left over, I'd use one bit for visibility, and another bit to indicate the end of the list. A third BRAM will be needed to keep track of the e/x variables as the lines are drawn.

The biggest challenge will be to rewrite the code so that BRAMs can be used instead of the distributed RAM, and optimize it as much as possible to allow a large number of lines to be drawn. In theory, the 3 parallel BRAMs should allow 512 lines to be on the screen at one time, but there will be limitations to how many pixels can be on a given scanline. It's also important to cycle through the list of lines as quickly as possible. I'm thinking it should be possible to handle 1 line per cycle, including drawing 1 pixel. For each additional pixel, another cycle is needed. So, for 512 vertical lines, it requires 512 cycles per scanline to draw them all. If that can be accomplished, it is even feasible to support 1024 lines, at the cost of 6 BRAMs.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 16, 2012 7:32 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I rewrote nearly all the line drawing code, see github for sources. It's still a work in progress, and right now it can't even draw diagonal lines. Instead, the code draws filled in rectangles. This is just for testing purposes to allow me to focus on the state machine and pipeline, without having to deal with Bresenham at the same time. The code works as follows:

The line module now has an interface to retrieve the vectors from the outside, through a new interface:
Code:
output reg [9:0] vector = 0,   // vector number
output read_vector,   // read vector enable
input [9:0] x0,   // top x coordinate
input [9:0] y0,   // top y coordinate
input [9:0] x1,   // vector delta y (abs)
input [9:0] y1,   // vector delta x (abs)
input [15:0] col,   // vector color
input last_vector,   // last vector

It outputs the number of the vector it wants, and a read enable signal. When the main module sees the read_vector enable, it writes the coordinates of the vector to (x0,y0) (x1,y1) and col. It also indicates if this was the last valid vector by setting the last_vector bit. For each scanline, the line module cycles through all the vectors, figures out which ones cross the current scanline, and draws them. Except instead of drawing a diagonal line, it paints the entire range between x0 and x1.

The idea is that the main code can choose how to generate the vectors. They could be read from a BRAM, or generated on the fly. Also, the interface could be extended with a 'wait' signal so that the main code could take some extra time to generate the vectors. This could allow a read from external SRAM, for instance. Of course, we're still racing the beam, so you can't afford to waste too many cycles.

The next step is to combine this with the Bresenham code, so that instead of rectangles, it draws diagonal lines. Below is the demo output of the current code. Each of the green horizontal and vertical lines are 1-pixel wide rectangles. Of course, the red square is also a rectangle.


Attachments:
0000.png
0000.png [ 790 Bytes | Viewed 866 times ]
Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 16, 2012 4:43 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Looking at the code in line.v module:
Code:
   x0 <= 16 + (vector_nr[0] ? {vector_nr[4:1], 4'h0} : 0);
y0 <= 16 + (vector_nr[0] ? 0 : {vector_nr[5:1], 4'h0});
x1 <= 16 + (vector_nr[0] ? {vector_nr[4:1], 4'h0} : 240);
y1 <= 16 + (vector_nr[0] ? 240 : {vector_nr[5:1], 4'h0});
last_vector <= 0;
col <= GREEN;

32 green lines are drawn using the vector_nr as a simple counter. When it's even it draws vertical lines, when it's odd it draws horizontal lines. Back in the line.v module the logic that increments the vector counter:
Code:
always @(posedge clk)
if( ~drawing )   vector <= 0;
else if( read_vector )   vector <= vector + 1;

So, if we wanted to read co-ordinates from memory, let's say, we would test the read_vector bit the same way as above, and have 32 consecutive reads (using the current example)? Do you think there would have to be another FIFO from the SRAM to the part of the code that assigns x0,y0,x1,y1?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 16, 2012 4:48 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
You don't need a FIFO. Just use the vector_nr as the RAM address input, and return the data. Of course, you need 56 bits worth of data, plus a 'last_vector' bit, so you don't have time to read from external memory.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13, 14, 15 ... 41  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: