6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 6:56 am

All times are UTC




Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 28, 29, 30, 31, 32, 33, 34 ... 41  Next
Author Message
PostPosted: Fri Apr 19, 2013 7:32 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes, I think so. I haven't looked at the whole block of code, but I imagine you could do something like this:
Code:
  LOAD:
         begin
            x0 <= x0t;
            y0 <= y0t;
            x1 <= x1t;
            y1 <= y1t;               
            if (y0t > y1t)
              dyneg <= 1;
            else dyneg <= 0;

or (depending on your preferred style)
Code:
dyneg <= (y0t > y1t);

Because y0/y1 are not valid until the next clock cycle, you have to look at y0t and y1t instead.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 1:19 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...or (depending on your preferred style)...

Only someone who has mastered something can have a style. Mastery is knowing not only what to do, but more importantly what not to do. Right now I only know what looks right in ISim, and I look at the console. Also I check the design summary. Regular runs of smartexplorer are common at this stage, I have it set to 24 runs max. To Xilinx' credit, it always seems to find a way to keep that speed consistent but has gotten to 18 runs a couple times. Very smart program, I think it learns...
But sometimes after long runs of smartexplorer, my computer crashes and erroneously saves data after a blue screen of death. Last time it cut off all the vectors at the end of my 65O16.b software program. I was modifying all sorts of code only to finally recognize the cpu was in a loop because as65 was not seeing the NMI/IRQ/RES vectors. They were gone! Lines of code and my hours of wasted testing things that had worked!

Anyway, I'll post the most up to date LineGen code in the hopes that maybe someone can find a way to speed it up, although it is plenty fast. A cycle here or there can have great effects in the end though:
Code:
module LineGen ( input clk,
                 input lineCS,
                 input [15:0] cpuDO,
                 input [1:0] cpuAB,
                 output reg RAMWE,
                 output reg [9:0] X,
                 output reg [8:0] Y   
               );
               
reg [9:0] x0, x1, dx, x0t, x1t;   //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D;            //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep;               //1 when dy>dx
reg dyneg;               //1 when dy is negative
reg dxneg;                //1 when dx is negative

reg [2:0] state;
parameter WAIT = 0, LOAD = 1, SLOPE = 2, DXDY = 3, CALC1 = 4, CALC2 = 5, PLOT = 6, DELAY = 7;

always @(posedge clk) begin
   if (lineCS && cpuAB == 2'b00)
      x0t <= cpuDO;
   if (lineCS && cpuAB == 2'b01)
      y0t <= cpuDO;
   if (lineCS && cpuAB == 2'b10)
      x1t <= cpuDO;
   if (lineCS && cpuAB == 2'b11)
      y1t <= cpuDO;   
end
               
always @(posedge clk) begin
   state <= WAIT;

      case (state)
         WAIT:
            if (lineCS && cpuAB == 2'b11)
                    state <= LOAD;
               else state <= WAIT;
         LOAD:
            state <= SLOPE;
            
         SLOPE:
            state <= DXDY;
           
         DXDY:                 
            state <= CALC1;   
           
         CALC1:               
            state <= CALC2;   
         
         CALC2:               
            state <= PLOT;     
           
         PLOT:
         begin
            if (!dyneg && steep)           //e.g. (0,0) to (2,10)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
               
            if (!dyneg && !steep)         //e.g. (0,0) to (10,10)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;
       
            if (dyneg && !steep)          //e.g. (0,10) to (10,0)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;

            if (dyneg && steep)            //e.g. (0,20) to (2,0)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
         end
         
         DELAY:
            state <= CALC2;
            
      endcase
end

always @(posedge clk) begin
   case (state)
      WAIT:
         begin
            RAMWE <= 0;
            if (x0t > x1t)
               dxneg <= 1;
               else dxneg <= 0;
         end
         
      LOAD:
         if (dxneg)
            begin
               x0 <= x1t;
               y0 <= y1t;
               x1 <= x0t;
               y1 <= y0t;               
            end
            else begin
               x0 <= x0t;
               y0 <= y0t;
               x1 <= x1t;
               y1 <= y1t;
            end
     
      SLOPE:
         if (y0 > y1)
              dyneg <= 1;
         else dyneg <= 0;
         
      DXDY:               
         begin           
            dx <= x1 - x0;
            RAMWE <= 1;
            X <= x0;
            Y <= y0;
            if (dyneg)
                 dy <= y0 - y1;
            else dy <= y1 - y0;
         end
         
      CALC1:           
         if (dx >= dy) begin   
            steep <= 0;
            D <= (dy*2 - dx);   
         end
            else begin
               steep <= 1;
               D <= (dx*2 - dy);   
            end
     
      CALC2:
         begin
         RAMWE <= 0;
         if (steep) begin   
            if ( D[16] == 0 ) begin     
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dx*2 - dy*2);   
            end
               else begin
                  y0 <= dyneg ? y0 - 1:
                                y0 + 1;   
                  D <= D + dx*2;
               end
         end
            else if ( D[16] == 0 ) begin   
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dy*2 - dx*2);   
            end
               else begin
                  x0 <= x0 + 1;   
                  D <= D + dy*2;
               end
         end
         
      PLOT:   
         begin
            RAMWE <= 1;
            X <= x0;
            Y <= y0;
         end
         
      DELAY:
         RAMWE <= 1;
               
   endcase

end

endmodule
I actually did have to delay the write to the RAM 1 cycle from LineGen (as you can see). I have problems now with data outputted from the cpu.

My SRAMif module has changed for the better, but as I said there are problems, and now it looks nasty to me to use 'if' statements when using a module for a RAM interface:
Code:
module SRAMif( input clk,
               input vramCS,
               input cpuWE,
               input RAMWE,
               inout [15:0] SRD,
               input [15:0] cpuDO,
               input [15:0] BACCout,
               input [20:0] Vaddr,
               input [31:0] cpuAB,
               input [9:0] X,
               input [8:0] Y,
               output [20:0] SRaddr,
               output reg [15:0] SRDO,
               output SRWEn
               );

always @(posedge clk)
   if (vramCS && !cpuWE && !RAMWE)
         SRDO <= SRD;                     //when reading from SyncRAM, put data into reg
      else if (!vramCS)
         SRDO <= 16'h0000;                     //else reg will output zeroes when vram is not selected

reg [20:0] cpuABopt;
   
always @* begin                           //optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
   cpuABopt [20:19] <= 2'b00;               //bank bits
   cpuABopt [18:10] <= cpuAB [24:16];      //Y[8:0]
   cpuABopt [9:0] <= cpuAB [9:0];         //X[9:0]
end
   
assign SRaddr = RAMWE ? {2'b00, Y, X} : vramCS ? cpuABopt :   Vaddr;            //MUX the SyncRAM address for video timing & cpu access
assign SRWEn = ~((vramCS && cpuWE) || RAMWE);         //SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? 16'hZZZZ : BACCout;      //I/O MUX'd latch to SyncRAM databus. High 'Z' during a read   

endmodule

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 20, 2013 2:30 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I see I can eliminate SLOPE with your previous suggestion and test dyneg for the initital y0t and y1t values at the same time dxneg is tested.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 21, 2013 12:59 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I'm still having issues with an error I need to clean up in ISim, but intuitively the following should work. This is where I meant comparators are not clock driven, they are purely combinatorial, i.e. no FF's needed for a successful circuit.
Code:
module LineGen ( input clk,
                 input lineCS,
                 input [15:0] cpuDO,
                 input [1:0] cpuAB,
                 output reg RAMWE,
                 output reg [9:0] X,
                 output reg [8:0] Y   
               );
               
reg [9:0] x0, x1, dx, x0t, x1t;   //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D;            //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep;               //1 when dy>dx
reg dyneg;               //1 when dy is negative
reg dxneg;                //1 when dx is negative

reg [2:0] state;
parameter WAIT = 0, LOAD = 1, DXDY = 2, CALC1 = 3, CALC2 = 4, PLOT = 5, DELAY = 6;

always @(posedge clk) begin
   if (lineCS && cpuAB == 2'b00)
      x0t <= cpuDO;
   if (lineCS && cpuAB == 2'b01)
      y0t <= cpuDO;
   if (lineCS && cpuAB == 2'b10)
      x1t <= cpuDO;
   if (lineCS && cpuAB == 2'b11)
      y1t <= cpuDO;   
end

always @* begin
   if (x0t > x1t)
      dxneg <= 1;
         else dxneg <= 0;
   if (y0t > y1t)
      dyneg <= 1;
         else dyneg <= 0;
end
               
always @(posedge clk) begin
   state <= WAIT;

      case (state)
         WAIT:
            if (lineCS && cpuAB == 2'b11)
                    state <= LOAD;
               else state <= WAIT;
         LOAD:
            state <= DXDY;
            
         DXDY:                 
            state <= CALC1;   
           
         CALC1:               
            state <= CALC2;   
         
         CALC2:               
            state <= PLOT;     
           
         PLOT:
         begin
            if (!dyneg && steep)           //e.g. (0,0) to (2,10)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
               
            if (!dyneg && !steep)         //e.g. (0,0) to (10,10)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;
       
            if (dyneg && !steep)          //e.g. (0,10) to (10,0)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;

            if (dyneg && steep)            //e.g. (0,20) to (2,0)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
         end
         
         DELAY:
            state <= CALC2;
            
      endcase
end

always @(posedge clk) begin
   case (state)
      WAIT:
         RAMWE <= 0;
         
      LOAD:
         if (dxneg)
            begin
               x0 <= x1t;
               y0 <= y1t;
               x1 <= x0t;
               y1 <= y0t;               
            end
            else begin
               x0 <= x0t;
               y0 <= y0t;
               x1 <= x1t;
               y1 <= y1t;
            end
     
     DXDY:               
         begin           
            dx <= x1 - x0;
            RAMWE <= 1;
            X <= x0;
            Y <= y0;
            if (dyneg)
                 dy <= y0 - y1;
            else dy <= y1 - y0;
         end
         
      CALC1:           
         if (dx >= dy) begin   
            steep <= 0;
            D <= (dy*2 - dx);   
         end
            else begin
               steep <= 1;
               D <= (dx*2 - dy);   
            end
     
      CALC2:
         begin
         RAMWE <= 0;
         if (steep) begin   
            if ( D[16] == 0 ) begin     
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dx*2 - dy*2);   
            end
               else begin
                  y0 <= dyneg ? y0 - 1:
                                y0 + 1;   
                  D <= D + dx*2;
               end
         end
            else if ( D[16] == 0 ) begin   
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dy*2 - dx*2);   
            end
               else begin
                  x0 <= x0 + 1;   
                  D <= D + dy*2;
               end
         end
         
      PLOT:   
         begin
            RAMWE <= 1;
            X <= x0;
            Y <= y0;
         end
         
      DELAY:
         RAMWE <= 1;
               
   endcase

end

endmodule

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 21, 2013 6:31 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
In general, extra flip flops are a good idea. The are practically free, because half of the LUTs include a flip flop at the end, whether you use them or not. Using the flip flops means it will be easier for the tools to synthesize and meet timing.

Of course, a flip flop adds an extra clock cycle, so you have to consider that too. I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running. The whole initialization/setup portion of the state machine is less critical. If you draw a 400 pixel line, it doesn't matter that much whether that line can be drawn in 401 or 402 cycles total.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 26, 2013 2:51 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running...

That's good advice, I will zero in on that soon!

But, I've been wrestling with a few issues, most of which has been adjusting the design for real world timing for interfacing to an external RAM from FPGA. My early interface attempt I knew was clearly wrong although it worked. So I thought I should post now in order to keep the flow, because I am making progress:
Code:
`timescale 1ns / 1ps

module SRAMif( input clk,
               input vramCS,
               input cpuWE,
               inout [15:0] SRD,
               input [15:0] cpuDO,
               input [20:0] Vaddr,
               input [31:0] cpuAB,
               output [20:0] SRaddr,
               output reg [15:0] SRDO,
               output SRWEn
               );

always @(posedge clk) begin
   if (vramCS) begin                      
      if (!cpuWE) begin
         SRDO <= SRD;                     //when reading from SyncRAM, put data into reg
      end
         else begin
            SRDO <= cpuDO;
         end
   end
   else begin
      SRDO <= 16'h0000;                     //else reg will output zeroes when vram is not selected
   end
end

reg [20:0] cpuABopt;
   
always @* begin                           //optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
   cpuABopt [20:19] <= 0;                  //bank bits
   cpuABopt [18:10] <= cpuAB [24:16];      //Y[8:0]
   cpuABopt [9:0] <= cpuAB [9:0];         //X[9:0]
end
   
assign SRaddr = vramCS ? cpuABopt :   
                         Vaddr;            //MUX the SyncRAM address for video timing & cpu access
assign SRWEn = !(vramCS && cpuWE);         //SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? SRDO : 16'hZZZZ;      //I/O MUX'd latch to SyncRAM databus. High 'Z' during a read   

endmodule

This code results is good plotting believe it or not. In ISim the waveforms look like:


Attachments:
tmng.JPG
tmng.JPG [ 20.2 KiB | Viewed 639 times ]

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502
Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 26, 2013 2:59 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
You can see the VRAMCS is going active due to the 'false read' 1 cycle earlier than it should, because it's actually a write for the software doing an indirect indexed Y store to memory. I don't think the 'false read' is the problem because there is going to be a delay of at least one cycle using this RAM, but a faster RAM? Will be nice to correct another problem from the NMOS 6502.

To me this just doesn't look right because it would be writing ZZZZ's to the RAM and not the data which clearly comes 1 cycles after. The write enable strobe should come at the same time as the data is valid I would think?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 26, 2013 5:50 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
No, that's not looking right. It is probably working because the SRWEn signal is delayed in real life, since it's an unconstrained combinatorial output, so there's some overlap.

What you need to do is get rid of the combinatorial outputs to the SRAM (SRaddr and SRWEn), and replace them with registers. If all the signals are registered, you know they'll all switch at the same time (the registers are all in the IOB, with same delay to pad). Of course, this means that these signals will be delayed by a clock cycle (instead of being delayed by some unknown amount).

Edit: here is my SRAM interface again. It works with async RAM, but it shouldn't be too hard to modify for the sync RAM. It has a read-only port (ch1) for VGA read access, and a read/write port (ch2) for CPU access. All outputs to SRAM are registered.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 26, 2013 10:56 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Maybe it is time I abandon my 1st attempt and try modifying your state machine again, since now I have a better understanding of how they work... But what has to be delayed? You send the clk, address and the WE then delay the data right? or do you delay the WE and the data after sending the clk and address?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 26, 2013 11:12 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
When writing, everything gets sent at the same time, clock, address, data, we. When reading, the data valid signal is delayed to match the delayed data.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 27, 2013 7:31 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ah I see that now, 'state' is 1 cycle behind 'next' and the valid signals are testing the 'state'. I'll try to start on it tonight. Also later, I'll have to consider there are 3 possible addresses going to the RAM. By priority, they are from the Line Generator, cpu, and HVSync generator. The data outputted is always from the B Accumulator.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 27, 2013 9:41 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I modified your module. There are several things I did.
Got rid of:
1) The banking
2) Read only channel 1 logic
3) VGA1 state
4) The OE signal is commented out because my RAM has OE tied active low.

Does it look functional?

Instead of running Sims on the module alone, I decided to reconcile my SRAMif module IO signals to your SRAM controller module IO signals, in order to fit your module into my project.

EDIT: Also, the structure of your state machine is a little different from the one I adapted from the orsoc opencores project. I'm going to try to redo my Line Generator module using that structure and observe any differences in resources used.


Attachments:
sram_mod.txt [2.1 KiB]
Downloaded 39 times

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502
Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 28, 2013 6:30 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Looks reasonable at first sight, but why did you get rid of the second channel ? I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 28, 2013 9:05 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
Looks reasonable at first sight, but why did you get rid of the second channel ?...

The read only VGA1 channel seemed to have priority over all the other states, since it is tested first. This was a conflict in the way I thought my system should work.
Arlet wrote:
... I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?

I was thinking of an address MUX for the WRITE1 state for the Line Generator and cpu and another address MUX for the READ1 state for the cpu and HVSync generator. I'll probably have to add more states to correctly do this, but I just wanted to at least get the cpu writing to the RAM correctly first with the smallest amount of hardware in the way.

EDIT: Maybe I should put the read only channel back in for the HVSync generator, but just give it last priority. What channel did you use to get the pixel counter out to the RAM?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 28, 2013 9:33 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The advantage of having multiple channels in the SRAM module is that you can prioritize traffic, but also optimize read/write actions. For instance, going from read -> write needs a dummy cycle to avoid bus conflicts, but going from read -> read doesn't. By putting all the decisions in one module, you can balance the priorities better. But there are also other ways to solve this. You could have one read channel and one write channel in the SRAM module, and provide priority indication with your read/write enables (0=no read request, 1 = read request priority 1, 2 = read request priority 2, ...and so on). The SRAM state machine would then switch from read -> write if the write was higher priority, but stay in read mode if it wasn't.

I figure VGA needs a high priority, since it is has the real time requirements to keep the output going. An underrun on your pixel output will result in bad video output. For the CPU/line drawing it doesn't really matter if they are delayed a little bit.

Note that in my state machine, there are no write bursts, forcing the WE# to be deasserted after every write. I assume your syncram will also allow write bursts.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 28, 29, 30, 31, 32, 33, 34 ... 41  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: