Page 31 of 41
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 19, 2013 7:32 pm
by Arlet
Yes, I think so. I haven't looked at the whole block of code, but I imagine you could do something like this:
Code: Select all
LOAD:
begin
x0 <= x0t;
y0 <= y0t;
x1 <= x1t;
y1 <= y1t;
if (y0t > y1t)
dyneg <= 1;
else dyneg <= 0;
or (depending on your preferred style)
Because y0/y1 are not valid until the next clock cycle, you have to look at y0t and y1t instead.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sat Apr 20, 2013 1:19 am
by ElEctric_EyE
...or (depending on your preferred style)...
Only someone who has mastered something can have a style. Mastery is knowing not only what to do, but more importantly what not to do. Right now I only know what looks right in ISim, and I look at the console. Also I check the design summary. Regular runs of smartexplorer are common at this stage, I have it set to 24 runs max. To Xilinx' credit, it always seems to find a way to keep that speed consistent but has gotten to 18 runs a couple times. Very smart program, I think it learns...
But sometimes after long runs of smartexplorer, my computer crashes and erroneously saves data after a blue screen of death. Last time it cut off all the vectors at the end of my 65O16.b software program. I was modifying all sorts of code only to finally recognize the cpu was in a loop because as65 was not seeing the NMI/IRQ/RES vectors. They were gone! Lines of code and my hours of wasted testing things that had worked!
Anyway, I'll post the most up to date LineGen code in the hopes that maybe someone can find a way to speed it up, although it is plenty fast. A cycle here or there can have great effects in the end though:
Code: Select all
module LineGen ( input clk,
input lineCS,
input [15:0] cpuDO,
input [1:0] cpuAB,
output reg RAMWE,
output reg [9:0] X,
output reg [8:0] Y
);
reg [9:0] x0, x1, dx, x0t, x1t; //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D; //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep; //1 when dy>dx
reg dyneg; //1 when dy is negative
reg dxneg; //1 when dx is negative
reg [2:0] state;
parameter WAIT = 0, LOAD = 1, SLOPE = 2, DXDY = 3, CALC1 = 4, CALC2 = 5, PLOT = 6, DELAY = 7;
always @(posedge clk) begin
if (lineCS && cpuAB == 2'b00)
x0t <= cpuDO;
if (lineCS && cpuAB == 2'b01)
y0t <= cpuDO;
if (lineCS && cpuAB == 2'b10)
x1t <= cpuDO;
if (lineCS && cpuAB == 2'b11)
y1t <= cpuDO;
end
always @(posedge clk) begin
state <= WAIT;
case (state)
WAIT:
if (lineCS && cpuAB == 2'b11)
state <= LOAD;
else state <= WAIT;
LOAD:
state <= SLOPE;
SLOPE:
state <= DXDY;
DXDY:
state <= CALC1;
CALC1:
state <= CALC2;
CALC2:
state <= PLOT;
PLOT:
begin
if (!dyneg && steep) //e.g. (0,0) to (2,10)
if (y0 != y1)
state <= DELAY;
else state <= WAIT;
if (!dyneg && !steep) //e.g. (0,0) to (10,10)
if (x0 != x1)
state <= DELAY;
else state <= WAIT;
if (dyneg && !steep) //e.g. (0,10) to (10,0)
if (x0 != x1)
state <= DELAY;
else state <= WAIT;
if (dyneg && steep) //e.g. (0,20) to (2,0)
if (y0 != y1)
state <= DELAY;
else state <= WAIT;
end
DELAY:
state <= CALC2;
endcase
end
always @(posedge clk) begin
case (state)
WAIT:
begin
RAMWE <= 0;
if (x0t > x1t)
dxneg <= 1;
else dxneg <= 0;
end
LOAD:
if (dxneg)
begin
x0 <= x1t;
y0 <= y1t;
x1 <= x0t;
y1 <= y0t;
end
else begin
x0 <= x0t;
y0 <= y0t;
x1 <= x1t;
y1 <= y1t;
end
SLOPE:
if (y0 > y1)
dyneg <= 1;
else dyneg <= 0;
DXDY:
begin
dx <= x1 - x0;
RAMWE <= 1;
X <= x0;
Y <= y0;
if (dyneg)
dy <= y0 - y1;
else dy <= y1 - y0;
end
CALC1:
if (dx >= dy) begin
steep <= 0;
D <= (dy*2 - dx);
end
else begin
steep <= 1;
D <= (dx*2 - dy);
end
CALC2:
begin
RAMWE <= 0;
if (steep) begin
if ( D[16] == 0 ) begin
x0 <= x0 + 1;
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + (dx*2 - dy*2);
end
else begin
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + dx*2;
end
end
else if ( D[16] == 0 ) begin
x0 <= x0 + 1;
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + (dy*2 - dx*2);
end
else begin
x0 <= x0 + 1;
D <= D + dy*2;
end
end
PLOT:
begin
RAMWE <= 1;
X <= x0;
Y <= y0;
end
DELAY:
RAMWE <= 1;
endcase
end
endmodule
I actually did have to delay the write to the RAM 1 cycle from LineGen (as you can see). I have problems now with data outputted from the cpu.
My SRAMif module has changed for the better, but as I said there are problems, and now it looks nasty to me to use 'if' statements when using a module for a RAM interface:
Code: Select all
module SRAMif( input clk,
input vramCS,
input cpuWE,
input RAMWE,
inout [15:0] SRD,
input [15:0] cpuDO,
input [15:0] BACCout,
input [20:0] Vaddr,
input [31:0] cpuAB,
input [9:0] X,
input [8:0] Y,
output [20:0] SRaddr,
output reg [15:0] SRDO,
output SRWEn
);
always @(posedge clk)
if (vramCS && !cpuWE && !RAMWE)
SRDO <= SRD; //when reading from SyncRAM, put data into reg
else if (!vramCS)
SRDO <= 16'h0000; //else reg will output zeroes when vram is not selected
reg [20:0] cpuABopt;
always @* begin //optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
cpuABopt [20:19] <= 2'b00; //bank bits
cpuABopt [18:10] <= cpuAB [24:16]; //Y[8:0]
cpuABopt [9:0] <= cpuAB [9:0]; //X[9:0]
end
assign SRaddr = RAMWE ? {2'b00, Y, X} : vramCS ? cpuABopt : Vaddr; //MUX the SyncRAM address for video timing & cpu access
assign SRWEn = ~((vramCS && cpuWE) || RAMWE); //SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? 16'hZZZZ : BACCout; //I/O MUX'd latch to SyncRAM databus. High 'Z' during a read
endmodule
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sat Apr 20, 2013 2:30 am
by ElEctric_EyE
I see I can eliminate SLOPE with your previous suggestion and test dyneg for the initital y0t and y1t values at the same time dxneg is tested.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sun Apr 21, 2013 12:59 am
by ElEctric_EyE
I'm still having issues with an error I need to clean up in ISim, but intuitively the following should work. This is where I meant comparators are not clock driven, they are purely combinatorial, i.e. no FF's needed for a successful circuit.
Code: Select all
module LineGen ( input clk,
input lineCS,
input [15:0] cpuDO,
input [1:0] cpuAB,
output reg RAMWE,
output reg [9:0] X,
output reg [8:0] Y
);
reg [9:0] x0, x1, dx, x0t, x1t; //internal module registers
reg [8:0] y0, y1, dy, y0t, y1t;
reg [16:0] D; //error accumulator + carry bit. if D[16] = 1, it is negative
reg steep; //1 when dy>dx
reg dyneg; //1 when dy is negative
reg dxneg; //1 when dx is negative
reg [2:0] state;
parameter WAIT = 0, LOAD = 1, DXDY = 2, CALC1 = 3, CALC2 = 4, PLOT = 5, DELAY = 6;
always @(posedge clk) begin
if (lineCS && cpuAB == 2'b00)
x0t <= cpuDO;
if (lineCS && cpuAB == 2'b01)
y0t <= cpuDO;
if (lineCS && cpuAB == 2'b10)
x1t <= cpuDO;
if (lineCS && cpuAB == 2'b11)
y1t <= cpuDO;
end
always @* begin
if (x0t > x1t)
dxneg <= 1;
else dxneg <= 0;
if (y0t > y1t)
dyneg <= 1;
else dyneg <= 0;
end
always @(posedge clk) begin
state <= WAIT;
case (state)
WAIT:
if (lineCS && cpuAB == 2'b11)
state <= LOAD;
else state <= WAIT;
LOAD:
state <= DXDY;
DXDY:
state <= CALC1;
CALC1:
state <= CALC2;
CALC2:
state <= PLOT;
PLOT:
begin
if (!dyneg && steep) //e.g. (0,0) to (2,10)
if (y0 != y1)
state <= DELAY;
else state <= WAIT;
if (!dyneg && !steep) //e.g. (0,0) to (10,10)
if (x0 != x1)
state <= DELAY;
else state <= WAIT;
if (dyneg && !steep) //e.g. (0,10) to (10,0)
if (x0 != x1)
state <= DELAY;
else state <= WAIT;
if (dyneg && steep) //e.g. (0,20) to (2,0)
if (y0 != y1)
state <= DELAY;
else state <= WAIT;
end
DELAY:
state <= CALC2;
endcase
end
always @(posedge clk) begin
case (state)
WAIT:
RAMWE <= 0;
LOAD:
if (dxneg)
begin
x0 <= x1t;
y0 <= y1t;
x1 <= x0t;
y1 <= y0t;
end
else begin
x0 <= x0t;
y0 <= y0t;
x1 <= x1t;
y1 <= y1t;
end
DXDY:
begin
dx <= x1 - x0;
RAMWE <= 1;
X <= x0;
Y <= y0;
if (dyneg)
dy <= y0 - y1;
else dy <= y1 - y0;
end
CALC1:
if (dx >= dy) begin
steep <= 0;
D <= (dy*2 - dx);
end
else begin
steep <= 1;
D <= (dx*2 - dy);
end
CALC2:
begin
RAMWE <= 0;
if (steep) begin
if ( D[16] == 0 ) begin
x0 <= x0 + 1;
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + (dx*2 - dy*2);
end
else begin
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + dx*2;
end
end
else if ( D[16] == 0 ) begin
x0 <= x0 + 1;
y0 <= dyneg ? y0 - 1:
y0 + 1;
D <= D + (dy*2 - dx*2);
end
else begin
x0 <= x0 + 1;
D <= D + dy*2;
end
end
PLOT:
begin
RAMWE <= 1;
X <= x0;
Y <= y0;
end
DELAY:
RAMWE <= 1;
endcase
end
endmodule
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sun Apr 21, 2013 6:31 am
by Arlet
In general, extra flip flops are a good idea. The are practically free, because half of the LUTs include a flip flop at the end, whether you use them or not. Using the flip flops means it will be easier for the tools to synthesize and meet timing.
Of course, a flip flop adds an extra clock cycle, so you have to consider that too. I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running. The whole initialization/setup portion of the state machine is less critical. If you draw a 400 pixel line, it doesn't matter that much whether that line can be drawn in 401 or 402 cycles total.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 26, 2013 2:51 am
by ElEctric_EyE
...I recommend focusing your attention on the inner loop where you spend most of the time, and try to optimize that so you can draw 1 pixel/cycle once it starts running...
That's good advice, I will zero in on that soon!
But, I've been wrestling with a few issues, most of which has been adjusting the design for real world timing for interfacing to an external RAM from FPGA. My early interface attempt I knew was clearly wrong although it worked. So I thought I should post now in order to keep the flow, because I am making progress:
Code: Select all
`timescale 1ns / 1ps
module SRAMif( input clk,
input vramCS,
input cpuWE,
inout [15:0] SRD,
input [15:0] cpuDO,
input [20:0] Vaddr,
input [31:0] cpuAB,
output [20:0] SRaddr,
output reg [15:0] SRDO,
output SRWEn
);
always @(posedge clk) begin
if (vramCS) begin
if (!cpuWE) begin
SRDO <= SRD; //when reading from SyncRAM, put data into reg
end
else begin
SRDO <= cpuDO;
end
end
else begin
SRDO <= 16'h0000; //else reg will output zeroes when vram is not selected
end
end
reg [20:0] cpuABopt;
always @* begin //optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
cpuABopt [20:19] <= 0; //bank bits
cpuABopt [18:10] <= cpuAB [24:16]; //Y[8:0]
cpuABopt [9:0] <= cpuAB [9:0]; //X[9:0]
end
assign SRaddr = vramCS ? cpuABopt :
Vaddr; //MUX the SyncRAM address for video timing & cpu access
assign SRWEn = !(vramCS && cpuWE); //SyncRAM write enable, active low during a write to video memory by cpu
assign SRD = SRWEn ? SRDO : 16'hZZZZ; //I/O MUX'd latch to SyncRAM databus. High 'Z' during a read
endmodule
This code results is good plotting believe it or not. In ISim the waveforms look like:
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 26, 2013 2:59 am
by ElEctric_EyE
You can see the VRAMCS is going active due to the 'false read' 1 cycle earlier than it should, because it's actually a write for the software doing an indirect indexed Y store to memory. I don't think the 'false read' is the problem because there is going to be a delay of at least one cycle using this RAM, but a faster RAM? Will be nice to correct another problem from the NMOS 6502.
To me this just doesn't look right because it would be writing ZZZZ's to the RAM and not the data which clearly comes 1 cycles after. The write enable strobe should come at the same time as the data is valid I would think?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 26, 2013 5:50 am
by Arlet
No, that's not looking right. It is probably working because the SRWEn signal is delayed in real life, since it's an unconstrained combinatorial output, so there's some overlap.
What you need to do is get rid of the combinatorial outputs to the SRAM (SRaddr and SRWEn), and replace them with registers. If all the signals are registered, you know they'll all switch at the same time (the registers are all in the IOB, with same delay to pad). Of course, this means that these signals will be delayed by a clock cycle (instead of being delayed by some unknown amount).
Edit: here is my
SRAM interface again. It works with async RAM, but it shouldn't be too hard to modify for the sync RAM. It has a read-only port (ch1) for VGA read access, and a read/write port (ch2) for CPU access. All outputs to SRAM are registered.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 26, 2013 10:56 am
by ElEctric_EyE
Maybe it is time I abandon my 1st attempt and try modifying your state machine again, since now I have a better understanding of how they work... But what has to be delayed? You send the clk, address and the WE then delay the data right? or do you delay the WE and the data after sending the clk and address?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Fri Apr 26, 2013 11:12 am
by Arlet
When writing, everything gets sent at the same time, clock, address, data, we. When reading, the data valid signal is delayed to match the delayed data.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sat Apr 27, 2013 7:31 pm
by ElEctric_EyE
Ah I see that now, 'state' is 1 cycle behind 'next' and the valid signals are testing the 'state'. I'll try to start on it tonight. Also later, I'll have to consider there are 3 possible addresses going to the RAM. By priority, they are from the Line Generator, cpu, and HVSync generator. The data outputted is always from the B Accumulator.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sat Apr 27, 2013 9:41 pm
by ElEctric_EyE
I modified your module. There are several things I did.
Got rid of:
1) The banking
2) Read only channel 1 logic
3) VGA1 state
4) The OE signal is commented out because my RAM has OE tied active low.
Does it look functional?
Instead of running Sims on the module alone, I decided to reconcile my SRAMif module IO signals to your SRAM controller module IO signals, in order to fit your module into my project.
EDIT: Also, the structure of your state machine is a little different from the one I adapted from the orsoc opencores project. I'm going to try to redo my Line Generator module using that structure and observe any differences in resources used.
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sun Apr 28, 2013 6:30 am
by Arlet
Looks reasonable at first sight, but why did you get rid of the second channel ? I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sun Apr 28, 2013 9:05 am
by ElEctric_EyE
Looks reasonable at first sight, but why did you get rid of the second channel ?...
The read only VGA1 channel seemed to have priority over all the other states, since it is tested first. This was a conflict in the way I thought my system should work.
... I assume you also need multiple channels for your CPU, VGA and line generator. How are you planning to attach those to the SRAM ?
I was thinking of an address MUX for the WRITE1 state for the Line Generator and cpu and another address MUX for the READ1 state for the cpu and HVSync generator. I'll probably have to add more states to correctly do this, but I just wanted to at least get the cpu writing to the RAM correctly first with the smallest amount of hardware in the way.
EDIT: Maybe I should put the read only channel back in for the HVSync generator, but just give it last priority. What channel did you use to get the pixel counter out to the RAM?
Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards
Posted: Sun Apr 28, 2013 9:33 am
by Arlet
The advantage of having multiple channels in the SRAM module is that you can prioritize traffic, but also optimize read/write actions. For instance, going from read -> write needs a dummy cycle to avoid bus conflicts, but going from read -> read doesn't. By putting all the decisions in one module, you can balance the priorities better. But there are also other ways to solve this. You could have one read channel and one write channel in the SRAM module, and provide priority indication with your read/write enables (0=no read request, 1 = read request priority 1, 2 = read request priority 2, ...and so on). The SRAM state machine would then switch from read -> write if the write was higher priority, but stay in read mode if it wasn't.
I figure VGA needs a high priority, since it is has the real time requirements to keep the output going. An underrun on your pixel output will result in bad video output. For the CPU/line drawing it doesn't really matter if they are delayed a little bit.
Note that in my state machine, there are no write bursts, forcing the WE# to be deasserted after every write. I assume your syncram will also allow write bursts.