6502.org

Posted: **Fri Apr 19, 2013 7:32 pm**

Yes, I think so. I haven't looked at the whole block of code, but I imagine you could do something like this:

  LOAD:
         begin
            x0 <= x0t;
            y0 <= y0t;
            x1 <= x1t;
            y1 <= y1t;               
            if (y0t > y1t)
              dyneg <= 1;
            else dyneg <= 0;

or (depending on your preferred style)

Code: Select all

dyneg <= (y0t > y1t);

Because y0/y1 are not valid until the next clock cycle, you have to look at y0t and y1t instead.

Posted: **Sat Apr 20, 2013 1:19 am**

Arlet wrote:

...or (depending on your preferred style)...

Only someone who has mastered something can have a style. Mastery is knowing not only what to do, but more importantly what not to do. Right now I only know what looks right in ISim, and I look at the console. Also I check the design summary. Regular runs of smartexplorer are common at this stage, I have it set to 24 runs max. To Xilinx' credit, it always seems to find a way to keep that speed consistent but has gotten to 18 runs a couple times. Very smart program, I think it learns...
But sometimes after long runs of smartexplorer, my computer crashes and erroneously saves data after a blue screen of death. Last time it cut off all the vectors at the end of my 65O16.b software program. I was modifying all sorts of code only to finally recognize the cpu was in a loop because as65 was not seeing the NMI/IRQ/RES vectors. They were gone! Lines of code and my hours of wasted testing things that had worked!

Anyway, I'll post the most up to date LineGen code in the hopes that maybe someone can find a way to speed it up, although it is plenty fast. A cycle here or there can have great effects in the end though:

Code: Select all

module LineGen ( input clk,
					  input lineCS,
					  input [15:0] cpuDO,
					  input [1:0] cpuAB,
					  output reg RAMWE,
                 output reg [9:0] X,
					  output reg [8:0] Y   
               );
               
reg [9:0] x0, x1, dx, x0t, x1t;   //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D;            //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep;               //1 when dy>dx
reg dyneg;               //1 when dy is negative
reg dxneg;					 //1 when dx is negative

reg [2:0] state;
parameter WAIT = 0, LOAD = 1, SLOPE = 2, DXDY = 3, CALC1 = 4, CALC2 = 5, PLOT = 6, DELAY = 7;

always @(posedge clk) begin
	if (lineCS && cpuAB == 2'b00) 
		x0t <= cpuDO;
	if (lineCS && cpuAB == 2'b01) 
		y0t <= cpuDO;
	if (lineCS && cpuAB == 2'b10) 
		x1t <= cpuDO;
	if (lineCS && cpuAB == 2'b11)
		y1t <= cpuDO;	
end
					
always @(posedge clk) begin
   state <= WAIT;

      case (state)
			WAIT:
				if (lineCS && cpuAB == 2'b11)
						  state <= LOAD;
					else state <= WAIT;
			LOAD:
				state <= SLOPE;
				
         SLOPE:
            state <= DXDY;
           
         DXDY:                 
            state <= CALC1;   
           
         CALC1:               
            state <= CALC2;   
         
         CALC2:               
            state <= PLOT;     
           
         PLOT:
         begin
            if (!dyneg && steep)           //e.g. (0,0) to (2,10)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
               
            if (!dyneg && !steep)         //e.g. (0,0) to (10,10)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;
       
            if (dyneg && !steep)          //e.g. (0,10) to (10,0)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;

            if (dyneg && steep)            //e.g. (0,20) to (2,0)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
			end
			
			DELAY:
				state <= CALC2;
				
      endcase
end

always @(posedge clk) begin
   case (state)
		WAIT:
			begin
				RAMWE <= 0;
				if (x0t > x1t)
					dxneg <= 1;
					else dxneg <= 0;
			end
			
		LOAD:
			if (dxneg)
				begin
					x0 <= x1t;
					y0 <= y1t;
					x1 <= x0t;
					y1 <= y0t;					
				end
				else begin
					x0 <= x0t;
					y0 <= y0t;
					x1 <= x1t;
					y1 <= y1t;
				end
     
      SLOPE:
         if (y0 > y1)
              dyneg <= 1;
         else dyneg <= 0;
         
      DXDY:               
         begin           
            dx <= x1 - x0;
				RAMWE <= 1;
            X <= x0;
            Y <= y0;
            if (dyneg)
                 dy <= y0 - y1;
            else dy <= y1 - y0;
         end
         
      CALC1:           
         if (dx >= dy) begin   
            steep <= 0;
            D <= (dy*2 - dx);   
         end
            else begin
               steep <= 1;
               D <= (dx*2 - dy);   
            end
     
      CALC2:
			begin
			RAMWE <= 0;
         if (steep) begin   
            if ( D[16] == 0 ) begin     
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dx*2 - dy*2);   
            end
               else begin
                  y0 <= dyneg ? y0 - 1:
                                y0 + 1;   
                  D <= D + dx*2;
               end
         end
            else if ( D[16] == 0 ) begin   
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dy*2 - dx*2);   
            end
               else begin
                  x0 <= x0 + 1;   
                  D <= D + dy*2;
               end
			end
         
      PLOT:   
         begin
				RAMWE <= 1;
            X <= x0;
            Y <= y0;
         end
			
		DELAY:
			RAMWE <= 1;
					
   endcase

end

endmodule

I actually did have to delay the write to the RAM 1 cycle from LineGen (as you can see). I have problems now with data outputted from the cpu.

My SRAMif module has changed for the better, but as I said there are problems, and now it looks nasty to me to use 'if' statements when using a module for a RAM interface:

Code: Select all

module SRAMif( input clk,
					input vramCS,
					input cpuWE,
					input RAMWE,
					inout [15:0] SRD,
					input [15:0] cpuDO,
					input [15:0] BACCout,
					input [20:0] Vaddr,
					input [31:0] cpuAB,
					input [9:0] X,
					input [8:0] Y,
					output [20:0] SRaddr,
					output reg [15:0] SRDO,
					output SRWEn
					);

always @(posedge clk)
	if (vramCS && !cpuWE && !RAMWE)
			SRDO <= SRD;							//when reading from SyncRAM, put data into reg
		else if (!vramCS)
			SRDO <= 16'h0000;							//else reg will output zeroes when vram is not selected

reg [20:0] cpuABopt;
	
always @* begin									//optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
	cpuABopt [20:19] <= 2'b00;					//bank bits
	cpuABopt [18:10] <= cpuAB [24:16];		//Y[8:0]
	cpuABopt [9:0] <= cpuAB [9:0];			//X[9:0]
end
	
assign SRaddr = RAMWE ? {2'b00, Y, X} : vramCS ? cpuABopt :	Vaddr;				//MUX the SyncRAM address for video timing & cpu access
assign SRWEn = ~((vramCS && cpuWE) || RAMWE);			//SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? 16'hZZZZ : BACCout;		//I/O MUX'd latch to SyncRAM databus. High 'Z' during a read	

endmodule

Posted: **Sat Apr 20, 2013 2:30 am**

I see I can eliminate SLOPE with your previous suggestion and test dyneg for the initital y0t and y1t values at the same time dxneg is tested.

Posted: **Sun Apr 21, 2013 12:59 am**

I'm still having issues with an error I need to clean up in ISim, but intuitively the following should work. This is where I meant comparators are not clock driven, they are purely combinatorial, i.e. no FF's needed for a successful circuit.

Code: Select all

module LineGen ( input clk,
					  input lineCS,
					  input [15:0] cpuDO,
					  input [1:0] cpuAB,
					  output reg RAMWE,
                 output reg [9:0] X,
					  output reg [8:0] Y   
               );
               
reg [9:0] x0, x1, dx, x0t, x1t;   //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D;            //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep;               //1 when dy>dx
reg dyneg;               //1 when dy is negative
reg dxneg;					 //1 when dx is negative

reg [2:0] state;
parameter WAIT = 0, LOAD = 1, DXDY = 2, CALC1 = 3, CALC2 = 4, PLOT = 5, DELAY = 6;

always @(posedge clk) begin
	if (lineCS && cpuAB == 2'b00) 
		x0t <= cpuDO;
	if (lineCS && cpuAB == 2'b01) 
		y0t <= cpuDO;
	if (lineCS && cpuAB == 2'b10) 
		x1t <= cpuDO;
	if (lineCS && cpuAB == 2'b11)
		y1t <= cpuDO;	
end

always @* begin
	if (x0t > x1t)
		dxneg <= 1;
			else dxneg <= 0;
	if (y0t > y1t)
      dyneg <= 1;
         else dyneg <= 0;
end
					
always @(posedge clk) begin
   state <= WAIT;

      case (state)
			WAIT:
				if (lineCS && cpuAB == 2'b11)
						  state <= LOAD;
					else state <= WAIT;
			LOAD:
				state <= DXDY;
				
         DXDY:                 
            state <= CALC1;   
           
         CALC1:               
            state <= CALC2;   
         
         CALC2:               
            state <= PLOT;     
           
         PLOT:
         begin
            if (!dyneg && steep)           //e.g. (0,0) to (2,10)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
               
            if (!dyneg && !steep)         //e.g. (0,0) to (10,10)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;
       
            if (dyneg && !steep)          //e.g. (0,10) to (10,0)
               if (x0 != x1)
                    state <= DELAY;
               else state <= WAIT;

            if (dyneg && steep)            //e.g. (0,20) to (2,0)
               if (y0 != y1)
                    state <= DELAY;
               else state <= WAIT;
			end
			
			DELAY:
				state <= CALC2;
				
      endcase
end

always @(posedge clk) begin
   case (state)
		WAIT:
			RAMWE <= 0;
			
		LOAD:
			if (dxneg)
				begin
					x0 <= x1t;
					y0 <= y1t;
					x1 <= x0t;
					y1 <= y0t;					
				end
				else begin
					x0 <= x0t;
					y0 <= y0t;
					x1 <= x1t;
					y1 <= y1t;
				end
     
     DXDY:               
         begin           
            dx <= x1 - x0;
				RAMWE <= 1;
            X <= x0;
            Y <= y0;
            if (dyneg)
                 dy <= y0 - y1;
            else dy <= y1 - y0;
         end
         
      CALC1:           
         if (dx >= dy) begin   
            steep <= 0;
            D <= (dy*2 - dx);   
         end
            else begin
               steep <= 1;
               D <= (dx*2 - dy);   
            end
     
      CALC2:
			begin
			RAMWE <= 0;
         if (steep) begin   
            if ( D[16] == 0 ) begin     
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dx*2 - dy*2);   
            end
               else begin
                  y0 <= dyneg ? y0 - 1:
                                y0 + 1;   
                  D <= D + dx*2;
               end
         end
            else if ( D[16] == 0 ) begin   
               x0 <= x0 + 1;           
               y0 <= dyneg ? y0 - 1:
                             y0 + 1;           
               D <= D + (dy*2 - dx*2);   
            end
               else begin
                  x0 <= x0 + 1;   
                  D <= D + dy*2;
               end
			end
         
      PLOT:   
         begin
				RAMWE <= 1;
            X <= x0;
            Y <= y0;
         end
			
		DELAY:
			RAMWE <= 1;
					
   endcase

end

endmodule

Posted: **Sun Apr 21, 2013 6:31 am**

In general, extra flip flops are a good idea. The are practically free, because half of the LUTs include a flip flop at the end, whether you use them or not. Using the flip flops means it will be easier for the tools to synthesize and meet timing.

Of course, a flip flop adds an extra clock cycle, so you have to consider that too. I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running. The whole initialization/setup portion of the state machine is less critical. If you draw a 400 pixel line, it doesn't matter that much whether that line can be drawn in 401 or 402 cycles total.

Posted: **Fri Apr 26, 2013 2:51 am**

Arlet wrote:

...I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running...

That's good advice, I will zero in on that soon!

But, I've been wrestling with a few issues, most of which has been adjusting the design for real world timing for interfacing to an external RAM from FPGA. My early interface attempt I knew was clearly wrong although it worked. So I thought I should post now in order to keep the flow, because I am making progress:

Code: Select all

`timescale 1ns / 1ps

module SRAMif( input clk,
					input vramCS,
					input cpuWE,
					inout [15:0] SRD,
					input [15:0] cpuDO,
					input [20:0] Vaddr,
					input [31:0] cpuAB,
					output [20:0] SRaddr,
					output reg [15:0] SRDO,
					output SRWEn
					);

always @(posedge clk) begin
	if (vramCS) begin 							
		if (!cpuWE) begin
			SRDO <= SRD;							//when reading from SyncRAM, put data into reg
		end
			else begin
				SRDO <= cpuDO;
			end
	end
	else begin
		SRDO <= 16'h0000;							//else reg will output zeroes when vram is not selected
	end
end

reg [20:0] cpuABopt;
	
always @* begin									//optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
	cpuABopt [20:19] <= 0;						//bank bits
	cpuABopt [18:10] <= cpuAB [24:16];		//Y[8:0]
	cpuABopt [9:0] <= cpuAB [9:0];			//X[9:0]
end
	
assign SRaddr = vramCS ? cpuABopt :	
								 Vaddr;				//MUX the SyncRAM address for video timing & cpu access
assign SRWEn = !(vramCS && cpuWE);			//SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? SRDO : 16'hZZZZ;		//I/O MUX'd latch to SyncRAM databus. High 'Z' during a read	

endmodule

This code results is good plotting believe it or not. In ISim the waveforms look like:

Posted: **Fri Apr 26, 2013 2:59 am**

You can see the VRAMCS is going active due to the 'false read' 1 cycle earlier than it should, because it's actually a write for the software doing an indirect indexed Y store to memory. I don't think the 'false read' is the problem because there is going to be a delay of at least one cycle using this RAM, but a faster RAM? Will be nice to correct another problem from the NMOS 6502.

To me this just doesn't look right because it would be writing ZZZZ's to the RAM and not the data which clearly comes 1 cycles after. The write enable strobe should come at the same time as the data is valid I would think?

Posted: **Fri Apr 26, 2013 5:50 am**

No, that's not looking right. It is probably working because the SRWEn signal is delayed in real life, since it's an unconstrained combinatorial output, so there's some overlap.

What you need to do is get rid of the combinatorial outputs to the SRAM (SRaddr and SRWEn), and replace them with registers. If all the signals are registered, you know they'll all switch at the same time (the registers are all in the IOB, with same delay to pad). Of course, this means that these signals will be delayed by a clock cycle (instead of being delayed by some unknown amount).

Edit: here is my SRAM interface again. It works with async RAM, but it shouldn't be too hard to modify for the sync RAM. It has a read-only port (ch1) for VGA read access, and a read/write port (ch2) for CPU access. All outputs to SRAM are registered.

Posted: **Fri Apr 26, 2013 10:56 am**

Maybe it is time I abandon my 1st attempt and try modifying your state machine again, since now I have a better understanding of how they work... But what has to be delayed? You send the clk, address and the WE then delay the data right? or do you delay the WE and the data after sending the clk and address?

Posted: **Fri Apr 26, 2013 11:12 am**

When writing, everything gets sent at the same time, clock, address, data, we. When reading, the data valid signal is delayed to match the delayed data.

Posted: **Sat Apr 27, 2013 7:31 pm**

Ah I see that now, 'state' is 1 cycle behind 'next' and the valid signals are testing the 'state'. I'll try to start on it tonight. Also later, I'll have to consider there are 3 possible addresses going to the RAM. By priority, they are from the Line Generator, cpu, and HVSync generator. The data outputted is always from the B Accumulator.

Posted: **Sat Apr 27, 2013 9:41 pm**

I modified your module. There are several things I did.
Got rid of:
1) The banking
2) Read only channel 1 logic
3) VGA1 state
4) The OE signal is commented out because my RAM has OE tied active low.

Does it look functional?

Instead of running Sims on the module alone, I decided to reconcile my SRAMif module IO signals to your SRAM controller module IO signals, in order to fit your module into my project.

EDIT: Also, the structure of your state machine is a little different from the one I adapted from the orsoc opencores project. I'm going to try to redo my Line Generator module using that structure and observe any differences in resources used.

Posted: **Sun Apr 28, 2013 6:30 am**

Looks reasonable at first sight, but why did you get rid of the second channel ? I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?

Posted: **Sun Apr 28, 2013 9:05 am**

Arlet wrote:

Looks reasonable at first sight, but why did you get rid of the second channel ?...

The read only VGA1 channel seemed to have priority over all the other states, since it is tested first. This was a conflict in the way I thought my system should work.

Arlet wrote:

... I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?

I was thinking of an address MUX for the WRITE1 state for the Line Generator and cpu and another address MUX for the READ1 state for the cpu and HVSync generator. I'll probably have to add more states to correctly do this, but I just wanted to at least get the cpu writing to the RAM correctly first with the smallest amount of hardware in the way.

EDIT: Maybe I should put the read only channel back in for the HVSync generator, but just give it last priority. What channel did you use to get the pixel counter out to the RAM?

Posted: **Sun Apr 28, 2013 9:33 am**

The advantage of having multiple channels in the SRAM module is that you can prioritize traffic, but also optimize read/write actions. For instance, going from read -> write needs a dummy cycle to avoid bus conflicts, but going from read -> read doesn't. By putting all the decisions in one module, you can balance the priorities better. But there are also other ways to solve this. You could have one read channel and one write channel in the SRAM module, and provide priority indication with your read/write enables (0=no read request, 1 = read request priority 1, 2 = read request priority 2, ...and so on). The SRAM state machine would then switch from read -> write if the write was higher priority, but stay in read mode if it wasn't.

I figure VGA needs a high priority, since it is has the real time requirements to keep the output going. An underrun on your pixel output will result in bad video output. For the CPU/line drawing it doesn't really matter if they are delayed a little bit.

Note that in my state machine, there are no write bursts, forcing the WE# to be deasserted after every write. I assume your syncram will also allow write bursts.

6502.org

Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards