6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 3:01 am

All times are UTC




Post new topic Reply to topic  [ 155 posts ]  Go to page Previous  1 ... 4, 5, 6, 7, 8, 9, 10, 11  Next
Author Message
PostPosted: Wed Oct 10, 2012 1:16 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
... you don't keep track of the latency of the SRAM. If you were to implement that properly, you'll find that you need to update the address earlier in the pipeline than the RGB channel data.

How would I go about that?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 10, 2012 2:41 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The problem is that you'll need a memory controller with read/write capabilities, otherwise there's not much point in reading the correct pixel out of the SRAM. But in addition to a memory controller, you also need a video generator that doesn't mind if the data from the memory doesn't arrive with predictable timing (otherwise you can't do any writes during active video), so it needs a FIFO.

If I were doing this, I'd start with a simple test image generator, like a black/white checker board pattern with a blue border and figure out how to properly align the sync signals so that all pixels end up where you intended. After that, try doing some more complicated stuff, like a rotozoom, or a low resolution bit mapped screen from block RAM, or a character generator from block RAM. When that all works, go back to external RAM. That's basically how I started.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 10, 2012 3:44 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Sounds like a good plan of attack. I've successfully made my first 100% verilog project now as well, even the top_level.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 14, 2012 12:07 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...If I were doing this, I'd start with a simple test image generator, like a black/white checker board pattern with a blue border and figure out how to properly align the sync signals so that all pixels end up where you intended. After that, try doing some...

Been thinking about this for 2 days now... I think this is good advice about the pattern generator, much like the pattern generated in the Pong example on FPGA4FUN website. It will make sure a proper pattern is displayed down to the pixel, especially at lower resolutions where it can be clearly seen...
Then, I believe the next step will be to learn FIFO from a Verilog point of view. There are alot of possibilities with a FIFO, just by initially looking at it through the .xco "lightbulb tool". It seems there may be a limitation on the width to 1024 bits though... Whereas in the Virtex 6 and higher devices, one can add widths, but not in the S6. I could be wrong though, tired....

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 14, 2012 6:18 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes, a FIFO based VGA generator is very convenient to use. I'd recommend an asynchronous FIFO, 16 bit wide (one pixel), and 1K deep, so it will use exactly one block RAM. The VGA module uses its own pixel clock, asynchronous to your main clock. Using the pixel clock, data is read from the FIFO, and send to the VGA output with the exact timing. The VGA module exposes the writing end of the FIFO through its interface, as well as 'start' signal, that indicates it's going to need a new frame. As soon as you see the start pulse, you need to write 640x480 (or whatever the resolution) pixels into the FIFO, but there's no need to worry about exact timing. You can write them faster than the pixel clock, and then stop for a while, which is perfect if you need to read the data from external RAM. The start pulse should be generated some time before the first pixel is needed, for example at the end of the VSYNC interval. That way there's plenty of time to fill the FIFO, so you get some head start.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 15, 2012 11:44 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I downloaded the Spartan6 HDL library and focused on the 'RAMB16BWER' today. Past couple days I was looking over UG383 - The Spartan 6 Block Ram User's Guide, and I didn't find what I was looking for in there, although fine details for operation are there and may need to be analyzed later.

So at the heart of the FIFO we have a 1Kx16 dualport BRAM with true dual Read & Write ports. It is easy to see that 1 read port goes to the videoDAC and the other read port goes to the controlling hardware. Now at this point, it's also easy to see the 1 write port is also coming from a controlling CPU. Easy enough for me to understand at least, not to implement yet, but...

I began to think that there's an unused port for write access. So it might be possible, in a cascaded videoboard system, to take data from a previous board and put it (whether through AND/OR/EOR/ADD/SUB/MUL/DIV) into the succeeding boards memory?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 12:30 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Quote:
So at the heart of the FIFO we have a 1Kx16 dualport BRAM with true dual Read & Write ports. It is easy to see that 1 read port goes to the videoDAC and the other read port goes to the controlling hardware. Now at this point, it's also easy to see the 1 write port is also coming from a controlling CPU. Easy enough for me to understand at least, not to implement yet, but...

I began to think that there's an unused port for write access. So it might be possible, in a cascaded videoboard system, to take data from a previous board and put it (whether through AND/OR/EOR/ADD/SUB/MUL/DIV) into the succeeding boards memory?

The dual-port nature of the BRAMs and the independent synchronous read/write capability on each of these ports is frequently considered a panacea. It's true that independent clocks may be used for each port, and that the signals on each side are registered with that port's controls and clock signal. The problem lies in the fact that the value (electrons) in the RAM cell cannot be simultaneously exchanged. Thus, if a write on one port addresses a RAM cell which is being read from the other side, the data read will be incorrect. If the same cell is being simultaneously written from both sides, the data stored in the cell will be undefined. Issues such as setup and hold still apply.

The independent and synchronous operation of each port of the BRAM does improve your chances of implementing a FIFO which uses non-synchronous clocks for the input and output ports. One key concept to keep in mind is that the write address counter is synchronous to the write port clock, and the read address counter is synchronous to the read port clock. That is the easy part. The hard part is tracking the number of filled cells in the FIFO, i.e. the FIFO fill counter. The input side fill counter is clocked by the write clock, but this fill count has to be transferred to the read clock domain. On the other hand, as the reads occur the fill counter needs to be decremented. Since this control is synchronous to the read clock, the read enable pulse must be crossed over to the write clock domain so that counter is decremented on the write side. If you get the CoreGen tool to generate a two clock BRAM FIFO, you'll find that the count sequence for the first few bits is not a simple binary sequence. It is a grey-code sequence so that only one bit changes at a time for some small number of data counts. This makes it easier for a single level synchronizer of the FIFO fill count to operate reliably.

As the frequencies of the two clocks approach each other, there will be long periods of time where the setup and hold times of the logic in one domain or the other will be violated. When the clocks differ from each other substantially, the rate at which this phenomenon occurs will decrease. But rest assured, this phenomenon will occur and when it does, without the right clock domain crossing circuits in place, the FIFO flags and fill count register will go completely bonkers.

I do recommend the use of the Xilinx CoreGen FIFOs when two clock domains are necessary. They are compact, and have never exhibited any metastable behaviour; you do have to be aware that the least significant bits of their counters are not necessarily linear binary sequences.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 7:01 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
So at the heart of the FIFO we have a 1Kx16 dualport BRAM with true dual Read & Write ports. It is easy to see that 1 read port goes to the videoDAC and the other read port goes to the controlling hardware. Now at this point, it's also easy to see the 1 write port is also coming from a controlling CPU. Easy enough for me to understand at least, not to implement yet, but...

I began to think that there's an unused port for write access. So it might be possible, in a cascaded videoboard system, to take data from a previous board and put it (whether through AND/OR/EOR/ADD/SUB/MUL/DIV) into the succeeding boards memory?

The purpose of the async FIFO is just to separate the VGA pixel/timing generator from the actual pixel calculation. The use of the FIFO allows two important things: first a different clock rate between your processor and the pixel clock, but more importantly, it allows the pixel calculation to be done in bursts, which makes the work a lot easier than having to provide exactly one pixel per clock cycle.

The FIFO itself isn't really made for pixel manipulation, since it only contains at most 1K pixels, and you don't really keep track of which pixels there are in it. Instead what you do is put all the pixel calculation, like merging different video streams, before the FIFO. For instance, in my sprite engine, I had a staging area consisting of 1 horizontal line of pixels, and a state machine that did:

  • Write background color in line buffer
  • Make a list of all the sprites that intersect current line
  • For each sprite, read the pixels from memory
  • Draw the sprite pixels in the line buffer (optionally combining them with the background pixel for transparency)

When everything was done, the line was finished, and it was sent to the pixel FIFO to be displayed. As you can see, the process is very irregular. There could be large periods where the sprite engine was busy composing the line, but would not generate any pixels. During this time, the video generator drains the pixels from the FIFO. When the sprite engine was done with a line, it would quickly dump a whole line of pixels into the FIFO, and start processing the next line right away.

You could do something similar with other operations.


Last edited by Arlet on Tue Oct 16, 2012 8:42 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 7:17 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
MichaelM wrote:
The independent and synchronous operation of each port of the BRAM does improve your chances of implementing a FIFO which uses non-synchronous clocks for the input and output ports. One key concept to keep in mind is that the write address counter is synchronous to the write port clock, and the read address counter is synchronous to the read port clock. That is the easy part. The hard part is tracking the number of filled cells in the FIFO, i.e. the FIFO fill counter. The input side fill counter is clocked by the write clock, but this fill count has to be transferred to the read clock domain. On the other hand, as the reads occur the fill counter needs to be decremented. Since this control is synchronous to the read clock, the read enable pulse must be crossed over to the write clock domain so that counter is decremented on the write side. If you get the CoreGen tool to generate a two clock BRAM FIFO, you'll find that the count sequence for the first few bits is not a simple binary sequence. It is a grey-code sequence so that only one bit changes at a time for some small number of data counts. This makes it easier for a single level synchronizer of the FIFO fill count to operate reliably.


I don't think transferring the counter to the other clock domain is that hard, as long as you don't mind that it takes a few cycles. I've always made my own async FIFOs by keeping 3 copies of regular binary (head/tail) counters. First is the original counter in its own clock domain, second copy is still in the same clock domain but is kept stable for a few cycles, the 3rd copy is in the other clock domain, and is taken from the 2nd copy only when it's stable. I have a separate synchronizer state machine that generates 2 enable signals that tell when it's safe to make a copy in each clock domain.

It may be not as high performance as the Xilinx CoreGen generated FIFOs, but there's a certain satisfaction in understanding how it works. :)


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 12:59 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Quote:
but there's a certain satisfaction in understanding how it works.

There certainly is that. :D

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 1:30 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet, what does this circuit look like in your CS4954.v module? Is it just that hblank1 is delayed by 1 clock cycle, hblank2 is delayed by 2 cycles, etc. with respect to hblank0?
Code:
// synchronize in other clock domain
always @(posedge clk ) begin
    hblank1 <= hblank0;
    hblank2 <= hblank1;
    hblank3 <= hblank2;
end

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 1:34 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
You can look at it as a 2 clock cycle delay, but it's used here as a synchronizer to solve metastability issues.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 1:45 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
... it's used here as a synchronizer to solve metastability issues.

This is because the 'clk' signal was anticipated to be a high frequency around 100MHz? I'm wondering why you do this only to the hblank signal.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2012 2:24 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
This is because the 'clk' signal was anticipated to be a high frequency around 100MHz? I'm wondering why you do this only to the hblank signal.

Actually, the main clock was running at 100 MHz in my case, and the CS4954 was running at 54 MHz. The hblank signal is the only one that potentially violates the setup/hold time when crossing the clock domain. The vblank signal also crosses the clock domain, but only when it's stable:
Code:
// no need to synchronize vtrigger here,
// because vcount is not near an edge
always @(posedge clk )
    vtrigger <= htrigger0 & (vcount == top);

In this code, the htrigger0 pulse only happens when vcount is stable, so there's no chance of metastability.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 21, 2012 12:38 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet, thanks for explanations.
So, I will be trying to put your advice into action in my next 2 days off:
Arlet wrote:
...If I were doing this, I'd start with a simple test image generator, like a black/white checker board pattern with a blue border and figure out how to properly align the sync signals so that all pixels end up where you intended....

I also remember a former discussion, that BigEd brought up regarding SSO (Simultaneous Switching Outputs, in the Concept & Design Thread, 4 posts down) that would an 'ultimate test'.
So I'm thinking of adding a horizontal counter in your HVSync vga.v generator and than can send out alternating colors based on even or odd bits. Wish I had a current meter, but I'll be able to observe effect at different resolutions.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 155 posts ]  Go to page Previous  1 ... 4, 5, 6, 7, 8, 9, 10, 11  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: