6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed May 15, 2024 6:05 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Tue Dec 28, 2021 2:47 am 
Offline

Joined: Wed Jun 02, 2021 1:23 am
Posts: 25
Hey Guys,

Been working on an FPGA Video controller for my FPGA project and one of my big struggles has been the Video Ram. Running real time assignments giving the CPU priority to the ram works great, apart from the video artifacts, especially during large writes.

The ram is 10ns, the Video clock is at 25mhz and the system clock is 8mhz. Based upon the speed of the ram there should be way more than enough time to fulfill the requests for both the video driver and the cpu within their respective clocks.

Below is a snippet of code for the video ram with the topmost section containing the realtime assignments that are working, followed by the code of one my latest of many failed attempts to share the memory between the two clocks. Of note the main clk input is 100mhz (10ns per cycle) and the sclk is 8mhz with the vclk @ 25mhz. Also, while this attempt at sharing is the one I'm least proud of, it actually produced the best results and was able to return ****somewhat**** valid data to the video driver, but as usual, cpu reads and writes were non existent.

Additionally, I could likely use block ram on the FPGA with better results, but I'd prefer the sram as there is enough to support full bitmap at 640*480. I also don't want to use DMA as I don't want to sacrifice the CPU speed.

Any help would be greatly appreciated.

Code:
`timescale 1ns / 1ps

module vBuffer(
        input clk,
        output wire [18:0] sram_a,
        inout wire [7:0] sram_d,
        output wire sram_wen,
        output wire sram_cen,
        input wire [18:0] syAdr,
        inout wire [7:0] datIO,
        input wire syCs,
        input wire syRw,
        input wire [18:0] vAdr,
        input wire vCs);
       

    assign sram_a = (!syCs) ? syAdr : vAdr;
    assign sram_cen = (!syCs || !vCs) ? 0 : 1;
   assign sram_wen = (!syCs && !syRw) ? 0 : 1;
   assign datIO = (!syCs && !syRw) ? 8'bZZZZZZZZ : sram_d;
   assign sram_d = (!syCs && !syRw) ? datIO : 8'bZZZZZZZZ;
   
   
      /*
   
   module vBuffer(
        input clk,
        input wire vClk,
        input wire sClk,
        output wire [18:0] sram_a,
        inout wire [7:0] sram_d,
        output wire sram_wen,
        output wire sram_cen,
        input wire [18:0] syAdr,
        inout wire [7:0] datIO,
        input wire syCs,
        input wire syRw,
        input wire [18:0] vAdr,
        output reg [7:0] vDat,
        input wire vCs);
   
   
   
   reg vidEvt = 0;
   reg syReadEvt = 0;
   reg syWriteEvt = 0;
   reg vidLastClk = 0;
   reg syLastClk = 0;
   reg vidRead = 0;
   reg syWrite = 0;
   reg syRead = 0;
   
   reg [18:0]_syAdr;
   reg [7:0] _syDatIn;
   reg [7:0] _syDatOut;
   
   assign sram_a = (syWrite || syRead) ? _syAdr : vAdr;
   assign sram_d = (syWrite) ? _syDatIn : 8'bZZZZZZZZ;
   assign sram_cen = (syWrite || syRead || vidRead) ? 0 : 1;
   assign sram_wen = (syWrite) ? 0 : 1;
   assign datIO = (!syCs && syRw) ? _syDatOut : 8'bZZZZZZZZ;

   
   always @(posedge clk) begin
      if (vidRead) begin
          vDat <= sram_d;
          vidRead <= 0;
      end
      else if (syRead) begin
          _syDatOut <= sram_d;
          syRead <= 0;
      end
      else if (syWrite) syWrite <= 0;
         
      if(~vidLastClk && vClk && vidEvt) vidRead <= 1;
      else if (syLastClk && !sClk && syWriteEvt) begin
          syWrite = 1;
          syLastClk <= sClk;
      end
      else if (!syLastClk && sClk && syReadEvt) begin
          syRead = 1;
          syLastClk <= sClk;
      end
      else begin
          syLastClk <= sClk;
      end
   
      vidLastClk <= vClk;
   end
   
   always @(posedge vClk) begin
      if (!vCs) vidEvt <= 1;
      else vidEvt <= 0;
   end
   
   
   always @(sClk) begin
      if (!syCs && syRw && sClk) begin
          _syAdr <= syAdr;
          syWriteEvt <= 0;
          syReadEvt <= 1;
      end
      else if (!syCs && !syRw && !sClk) begin   
          _syAdr <= syAdr;
          _syDatIn <= datIO;
          syReadEvt <= 0;
          syWriteEvt <= 1;
      end
      else begin
          syReadEvt <= 0;
          syWriteEvt <= 0;
      end
   end
*/
endmodule


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 28, 2021 8:26 am 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 337
A couple of years ago I started work on a FPGA based framebuffer for the Beeb that connects to the 1MHz Bus.

It used the 1MHZ Bus FPGA Adapter and a cheap Xilinx LX9 FPGA board:
Attachment:
IMG_1588.JPG
IMG_1588.JPG [ 312.35 KiB | Viewed 1215 times ]


(there are several other projects for this board)

The frame buffer has the following characteristics:
- outputs VGA (640x480) in 256 colours with a 25MHz pixel clock
- uses a 512Kx8 10ns external SRAM that cycles @ 50MHz
- includes a palette producing 12-bit RGB
- includes a simple blitter to accelerate memory copies

Getting the SRAM sharing working was the hardests part. Access is controlled by this state machine:
https://github.com/hoglet67/Beeb1MHzBus ... r.vhd#L389

Some notes:
- clk_video is actually 50MHz and a pixel is putput every two clocks
- access to the SRAM is shared three ways between the CPU, Blitter and video output
- during the video lines (active=1) video gets every other cycle
- next priority is CPU reads
- next priority is CPU writes
- finally the blitter is the lowest priority
- the CPU is on a seperate clock domain, the reqest signals (cpu_rd_pending, cpu_wr_pending) are carefully synchronised
- these request signals toggle to indicate a request is pending
- there is no need to handshake in the other direction

I should resurrect this project, as it's quite fun!

Dave


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 28, 2021 9:07 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298
Dealing with multiple clocks requires care. What is the relationship between the clocks? Do you know the phase difference between the 100MHz and 25MHz? Is the 8MHz actually 8.333? (that would make life a lot simpler)

The first step is to draw a timing diagram. Put all of your signals on it, and add arrows to show which edges control which transitions. If you've got a signal changing too close to the clock edge that samples it, that'll cause problems. If you don't know or can't control the timing between clocks, it gets a lot more complicated.

If you haven't already learned how to use the simulator, do it now. It'll show you a reasonable approximation of what's actually happening, and that will save you a lot of time in the future. I've got a testbed set up for mine, with a fake SRAM attached to the rest of the system. There's no way I could have got anything working without it.

It's also worth running a timing analysis. I don't know which FPGA you're using - if it's Xilinx, the tool you want is trce. FPGAs can be fast, but 100MHz is fast as well. It'll find your critical paths and tell you exactly how many nanoseconds you don't have spare.

I don't know what timing you intend this to have. But be aware that 10ns RAM doesn't necessarily mean a 10ns memory cycle, particularly if you want to write to it. There has to be some time between writes - you can't have a rising edge on /WE if the next cycle wants it low again. And it takes time for signals to make their way through logic and routing and to the RAM's pins. You have to take that time into account.

The memory in my design is driven by an 80MHz clock, and it takes two clock cycles for each memory access. One of the signals ended up 1.5 cycles long, and I had to use combinatorial logic to combine signals from both edges of the clock. It ended up with 40M memory accesses per second, which is exactly what my system needs. The other clocks that I use are derived from the 80MHz clock through logic (driving BUFGs), so I know exactly where in the memory cycle they go. I've attached a screenshot showing a couple of writes followed by a couple of reads.


Attachments:
sram_timing.png
sram_timing.png [ 48.47 KiB | Viewed 1213 times ]
Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 28, 2021 6:56 pm 
Offline

Joined: Wed Jun 02, 2021 1:23 am
Posts: 25
All of my clocks come out of a single MMCM clock generator and I have no skewing.

Thanks for both of your input I will dig into that deeper, and I'm happy to hear of someone else who had done the same that I was attempting, I'm hopeful that will give me some real insight.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 28, 2021 7:11 pm 
Offline

Joined: Wed Jun 02, 2021 1:23 am
Posts: 25
hoglet wrote:
I should resurrect this project, as it's quite fun!


I think there is a need for a good open sourced FPGA Video controller, TBH this is the main reason the Commander X16 is interesting to me, I wish they'd open source the Vera. As far as I can tell they are using an ICE40 which can be had on something like the upduino for a whopping $28 and would open up a world of possibilities for all of us homebrew 6502 guys, and even Z80's and the like.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 29, 2021 12:51 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
jbaum81 wrote:
hoglet wrote:
I should resurrect this project, as it's quite fun!


I think there is a need for a good open sourced FPGA Video controller, TBH this is the main reason the Commander X16 is interesting to me, I wish they'd open source the Vera. As far as I can tell they are using an ICE40 which can be had on something like the upduino for a whopping $28 and would open up a world of possibilities for all of us homebrew 6502 guys, and even Z80's and the like.


I've got one in (slow) development, running on a ULX3S. that board isn't cheap, but the ECP5 on it has something like 496 KB of block RAM which makes it possible to have a huge frame buffer without an external SRAM. I haven't touched it in a few months though, because I decided to get my JRC-1 SBC up and running first to use as a better platform for experimentation.

I also created a smaller video controller a year or so back based on the TinyFPGA (iCE40) board. It also used block RAM, but since that chip only has 16 KB it isn't a very big frame buffer. It does 80x25 text with FG/BG colors, hardware blink, hardware cursor, and hardware scrolling. There's also a 320x200 bitmapped mode with cell-based coloring similar to the VIC II. I can post a GitHub link if anyone's interested; it does work, but it's my first real Verilog project so the code is crap, and it's not exactly well-documented.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 03, 2022 6:50 pm 
Offline

Joined: Wed Jun 02, 2021 1:23 am
Posts: 25
Ha!! I freakin did it!!!

Okay so here is the basics, the 'clk' is a 100mhz clock for the ram and signal gen and is skewed back 30 degrees. hdmiClk @ 250mhz and vClk is at 25mhz are synced. The HDMI encoder sets the rgb/hsync/vsync/vAct off of the vClk which, again, is 30 degrees ahead of the ram clock it's pipelining from. I plan to drop the vClk and just setup a clock divider on the hdmi clock inside the encoder module to to drive the flipflops in there, but for now I have crisp clean video with no artifacts during cpu reads or writes and doesn't appear that I'm having any dropped pixels. Just need to setup hardware scrolling in text mode and reenable bitmap mode and I should have a fully functional video display adapter!!

Code listed below. Not sure 'best practices' but I did opt to use blocking assignments on the first phase the signal generation as it started getting confusing, this way I know exactly what my signals are at that first clock, I didn't find it necessary to use them anywhere else. Once I get everything cleanup up and refined I may share the whole project out on github or the likes for community input / usage.


Code:
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12/28/2021 10:30:59 PM
// Design Name:
// Module Name: displayDriver
// Project Name:
// Target Devices:
// Tool Versions:
// Description:
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
//////////////////////////////////////////////////////////////////////////////////


module displayDriver(
    input clk,
    input syClk,
    input vClk,
    input hdmiClk,
    input wire [7:0] clrReg,
    input wire [7:0] vidReg,
    input wire [18:0] syAdr,
    input wire syCs,
    input wire syRw,
    inout wire [7:0] syDat,
    inout wire [7:0] sram_d,
    output wire [18:0] sram_a,
    output wire sram_wen,
    output wire sram_cen,
    output wire [2:0] TMDSp,
    output wire [2:0] TMDSn,
    output wire TMDSp_clock,
    output wire TMDSn_clock
    );
   
    wire [18:0] vAdr;
    assign vAdr = {vPos[8:3],hPos[9:3]};
    reg [11:0] _buffRGB;

    assign sram_a = (!clkDiv[1] && (syRead || syWrite)) ? _syAdr : vAdr;
    assign sram_cen = ( (!clkDiv[1] && (syRead || syWrite)) || (clkDiv[1] && !vCs) ) ? 0 : 1;
    assign sram_wen = (!clkDiv[1] && syWrite) ? 0 : 1;
    assign syDat = (!syCs && !syRw) ? 8'bZZZZZZZZ : _syDatOut;
    assign sram_d = (!clkDiv[1] && syWrite) ? _syDatIn : 8'bZZZZZZZZ;
   
    reg [9:0] hPosBuf,hPos,vPosBuf,vPos;
    reg vAct,vCs,hSync,vSync;
    reg lastSyEvtW,lastSyEvtR,syRead,syWrite;
    reg [7:0] _vDat,_syDatOut;
    reg [1:0] clkDiv;
    always @(posedge clk) begin
        clkDiv = clkDiv+1;
        if (clkDiv == 2'b10) begin
            hSync <= (hPosBuf > 655 && hPosBuf < 752) ? 0 : 1;
            vSync <= (vPosBuf > 490 && vPosBuf < 493) ? 0 : 1;
            vAct <= (hPosBuf<640) && (vPosBuf<480);
            _rgb <= _txtRGB;
            if (hPos == 799) begin
                hPos = 0;
                vPos = (vPos==524) ? 0 : vPos+1;
            end
            else hPos = hPos+1;
            vCs = (hPos<640 && vPos<480) ? 0 : 1;
        end
        else if (clkDiv == 2'b00) begin
            if (!lastSyEvtR && syEvtR) syRead <= 1;
            else syRead <= 0;
            if (!lastSyEvtW && syEvtW) syWrite <= 1;
            else syWrite <= 0;
            lastSyEvtW <= syEvtW;
            lastSyEvtR <= syEvtR;
        end
    end
   
    always @(negedge clk) begin
        if (clkDiv == 2'b11) begin
            hPosBuf <= hPos;
            vPosBuf <= vPos;
            _vDat <= sram_d;
        end
        else if (clkDiv == 2'b01 && syRead) _syDatOut <= sram_d;
    end
   
   
   
    reg [18:0] _syAdr;
    reg [7:0] _syDatIn;
    reg syEvtW,syEvtR;
   
    always @(posedge syClk) begin
        if (!syCs) begin
            _syAdr <= syAdr;
            if (syRw) syEvtR <= 1;
            else syEvtR <= 0;
        end
        else syEvtR <= 0;
    end
   
    always @(negedge syClk) begin
        if (!syRw && !syCs) begin
            _syDatIn <= syDat;
            syEvtW <= 1;
        end
        else syEvtW <= 0;
    end



    ///////////////////////////////////////////Text Mode///////////////////////////////////////////   
   
    wire [11:0] _txtRGB;
    textMode tm (
        .vClk(clkDiv[1]),
        .fontColors(clrReg),
        .char(_vDat),
        .row(hPos[2:0]),
        .col(vPos[2:0]),
        .pxOut(_txtRGB)
    );
   
    /////////////////////////////////////////Color Palette///////////////////////////////////////////
    wire [11:0] _bmRGB;
    colorPalette cp(
        _vDat,
        _bmRGB);
       
    ////////////////////////////////////HDMI Encoder////////////////////////////////////////////////   
    reg [11:0] _rgb;
    hdmiEncoder enc( 
        .vClk(vClk),
        .hdClk(hdmiClk),
        .vSync(vSync),
        .hSync(hSync),
        .vAct(vAct),
        .rgb(_rgb),
        .TMDSp(TMDSp),
        .TMDSn(TMDSn),
        .TMDSp_Clk(TMDSp_clock),
        .TMDSn_Clk(TMDSn_clock));
   
endmodule


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 3:48 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
can i ask why the CPU is given priority when accessing VRAM?

i mean the Video Output always needs to get it's data on time to prevent artifacts, while the CPU can easily be stalled with next to no consequences.
so it seems like it would make more sense to give the Video Output prioity and when the CPU accesses VRAM it's paused until the next available Memory Cycle.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: