6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 5:58 pm

All times are UTC




Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14, 15, 16 ... 41  Next
Author Message
PostPosted: Wed Jan 16, 2013 5:40 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
So my first step was to do a block diagram with all signals and signal directions present. I tried to just start typing in the Verilog for the top-level interconnects, but found there were just too many signals to lat it all out from memory.

The idea for the 'initial' operation is for the CPU to be able to read and write to the external synchronous RAM during the horizontal and vertical retrace periods. All the square blocks will be the individual Verilog modules, although I may have to add another module to accommodate the bidirectional data bus of the SRAM. Also, I expect the timing will be off as I use a common clock for everything, with no provisions for the delay of the FPGA and SRAM, but I would expect to see some recognizable activity if the .b core software is actually running. I will just try to clear the video RAM for the first test.

This part of the project is basically a compilation of everything I've learned from previous FPGA projects using 6502 soft cores with blockRAM together with what I've most recently learned about video using the parallel video boards.


Attachments:
CPU Section.JPG
CPU Section.JPG [ 83 KiB | Viewed 1197 times ]
Clocks Video &  RAM Interface.JPG
Clocks Video & RAM Interface.JPG [ 76.14 KiB | Viewed 1197 times ]

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502
Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 24, 2013 1:14 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I've added the 2 blockRAMs for the .b core zero-page and stack, but not the ROM yet. R1 and R2 are the FPGA blockRAM databus outputs. Also, that ORs module in the above pic turned out to be an extremely simple
Code:
Assign cpuDI = ( R1Dout | R2Dout | VramDout )
line of code inside the top_level module, so a separate ORs module wasn't even needed. This is what attracts me to Verilog; after building the equivalent structure with schematic entry ORing 4 16-bit databuses Inputs & 16 Outputs was a PITA. Now, using Verilog, it takes 1 simple line of code! :lol:

So, I've yet to add the ROM and address decoding module and to finish the SRAMif module before some real world testing can begin. I plan on 1024x768 with a 70MHz system clock.

Also, screw that part of trying to write to the video RAM when HSYNC & VSYNC were inactive, only because the cpu would have to be interrupted or it would have to read some bits from a port when HSYNC or VSYNC was inactive. At this point the 'snow effect' should be acceptable before transitioning to a FIFO buffer. I will need help with this though!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 26, 2013 4:17 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Most FPGAs no longer support internal tri-state busses. There are some respected sources that recommend against using internal tri-state buffers. I, on the other hand, believe in letting the tools transform internal tri-state busses (in FPGAs) into the appropriate implementation. I find it error prone and tedious to specify each and every bus, and to explicitly define the required multiplexers.

EEyE's block diagram/schematic in an earlier post is an example of the structure I try to avoid. I am not saying it is incorrect, or otherwise a bad thing to do. I am simply saying, that I try to avoid having to explicitly manage the buses and the required bus multiplexers as shown in EEyE's block diagram/schematic.

To accomplish this I generally create an internal "tri-state" bus, which can't be implemented in the FPGA. This "tri-state" bus allows me to create a single common port on all modules that will connect to the "bus". In this manner, the synthesizer is implicitly increasing or decreasing the size of the OR gate that combines all of the buses together as EEyE shows in his block diagram/schematic. Because the FPGA itself has no way of implementing a "tri-state" bus, the synthesizer has to transform the Verilog "tri-state" bus into a multiplexer of some sort and use point to point connections to tie everything together.

One way the multiplexer can be constructed is illustrated by EEyE's block diagram/schematic: a module with an enable and an OR gate to collect all of the connections together. An AND gate, when not enabled, outputs a logic 0, and an OR gate only outputs a logic 0 when all of its inputs are 0. Thus, in EEyE's circuit, the three memories are mutually exclusively selected. If unselected, their outputs are forced to logic 0. The OR gate samples each of the memories' output data. Since only one is selected at a time, there is no problem correctly resolving the logic level of the output of the enabled memory.

An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
Code:
localparam pDataWidth = 32;

wire    [9:0] Addrs;
wire    CPU_WE, CPU_RE;
wire    [(pDataWidth - 1):0] CPU_DI, CPU_DO;

reg     CS_RAM_A, CS_RAM_B, CS_RAM_C;

wire    WE_RAM_A, OE_RAM_A;
wire    WE_RAM_B, OE_RAM_B;
wire    WE_RAM_C, OE_RAM_C;

reg     [(pDataWidth - 1):0] RAM_A [0:255];
reg     [(pDataWidth - 1):0] RAM_B [0:255];
reg     [(pDataWidth - 1):0] RAM_C [0:511];

reg     [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;

always @(*)
begin
      casex(Addrs[9:8])
            2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
            2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
            2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
      endcase
end

assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;

assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;

always @(posedge Clk)
begin
     if(WE_RAM_A)
         RAM_A[Addrs] <= CPU_DI;
     RAM_A_DO <= #1 RAM_A[Addrs];
end

assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});

always @(posedge Clk)
begin
     if(WE_RAM_B)
         RAM_B[Addrs] <= CPU_DI;
     RAM_B_DO <= #1 RAM_B[Addrs];
end

assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});

always @(posedge Clk)
begin
     if(WE_RAM_C)
         RAM_C[Addrs] <= CPU_DI;
     RAM_C_DO <= #1 RAM_C[Addrs];
end

assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});

In this example, the OR gate is implicitly created by the synthesizer.

This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 26, 2013 6:56 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Nowadays I just the 'wor' signal type. It implements a wired OR, and then you can just drive the signal with multiple modules. I design every module so that it outputs a '0' when not selected. Usually, this can be done with a minimum of resources. For instance, the block RAMs already have a dedicated signal to reset the output, and flip flops have reset inputs as well. Even 4-input LUTs that implement a full adder can use a spare input on the LUT to produce a zero output. The tools will automatically infer a wide OR gate to combine the signals, which is cheaper than a general MUX, because it doesn't require any select signals. So a 6-input LUT can be a 6-input OR, rather than a 4-1 MUX.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 26, 2013 12:26 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Good point.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 29, 2013 5:04 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Michael, thanks for your input. Currently I'm going with what I know has worked in my past projects.

So, I'm at the point where everything passes synthesis, although the SRAM interface module is not complete. It's not passing implementation with a vague 'null BMM file' error. ISE seems to be looking for a BMM file although I have INIT_FILE ("boot.coe") in the ROM1 module.

I have only used COE files in the past, but that was after generating the ROM with the lightbulb CoreGenerator tool. Now I am strictly using Verilog and instantiating a BRAM using RAMB16BWER. Is there any way to change a setting so ISE looks for a COE file for the initialization?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 29, 2013 6:33 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I think I figured it out. I had to use CoreGen to generate the 1Kx16 Stack and ZP Rams and the 4Kx16 ROM. Then I used the same names (ZPRAM, STRAM, SYSROM) used in CoreGen and assigned the pins in the Verilog top level.
Code:
   ZPRAM XLXI_7 (.clka(clk),
                 .rsta(ram1CS),
                 .ena(1'b1),
                 .wea(ram1WE),
                 .addra(cpuAB),
                 .dina(cpuDO),
                 .douta(ram1DO));
               
   STRAM XLXI_8 (.clka(clk),
                 .rsta(ram2CS),
                 .ena(1'b1),
                 .wea(ram2WE),
                 .addra(cpuAB),
                 .dina(cpuDO),
                 .douta(ram2DO));
               
   SYSROM XLXI_9 (.clka(clk),
                  .rsta(rom1CS),
                  .ena(1'b1),
                  .addra(cpuAB),
                  .douta(rom1DO));


ISE automatically linked to the .XCO files inside the project.

Now the framework is there for a complete system with all modules present and passing synthesis and implementation (for the most part).
Now it's time to work on the SRAMif, the interface module for the SyncRAM. At the completion of this module, hopefully I will have something running!

EDIT: I forgot about the address decoding. Is it as simple as something like?
Code:
always @*
     if ( cpuAB >= 32'h8000_000 & cpuAB <= 32'h801f_ffff )
          SRCS <= 1;
                else SRCS <=0;

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 29, 2013 11:02 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
EEyE:

You should be able to infer the Spartan 6 block RAMs in either single or dual port configurations. The following code snippet (taken from M65C02_Core.v)shows how to use the initial statement to initialize the block RAM (whether inferred as a ROM or a RAM) without have to rely on CoreGen.

Code:
//  Infer Microprogram ROM and initialize with file created by MCP_Tool
initial
    $readmemb(pM65C02_uPgm, uP_ROM, 0, (pROM_Depth - 1));

always @(posedge Clk)
begin
    if(MPC_En | Rst)
        uPL <= #1 uP_ROM[MA];
end


The system function $readmemb() or $readmemh() can be used. I built a "DOS" utility program found here on GitHub to take the binary output of the Kingswood A65 assembler and output an ASCII hex file that is padded out to a power of two size.

With this approach, I've embedded copyright notices and other initialization data in FIFOs and other memories in FPGAs that I ship.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 30, 2013 6:48 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Thanks for the pointer. Trying to find info about readmem led me to the Xilinx XST manual. I've never read this manual before.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 30, 2013 4:39 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
MichaelM wrote:
...An alternative is to define a single bus. Let's name it something like EEyE has named the output of the OR gate, CPU_DI. At each memory, I would connect its output data to CPU_DI using a tri-state construction:
Code:
localparam pDataWidth = 32;

wire    [9:0] Addrs;
wire    CPU_WE, CPU_RE;
wire    [(pDataWidth - 1):0] CPU_DI, CPU_DO;

reg     CS_RAM_A, CS_RAM_B, CS_RAM_C;

wire    WE_RAM_A, OE_RAM_A;
wire    WE_RAM_B, OE_RAM_B;
wire    WE_RAM_C, OE_RAM_C;

reg     [(pDataWidth - 1):0] RAM_A [0:255];
reg     [(pDataWidth - 1):0] RAM_B [0:255];
reg     [(pDataWidth - 1):0] RAM_C [0:511];

reg     [(pDataWidth - 1):0] RAM_A_DO, RAM_B_DO, RAM_C_DO;

always @(*)
begin
      casex(Addrs[9:8])
            2'b00 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b1, 1'b0, 1'b0};
            2'b01 : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b1, 1'b0};
            2'b1x : {CS_RAM_A, CS_RAM_B, CS_RAM_C} <= {1'b0, 1'b0, 1'b1};
      endcase
end

assign WE_RAM_A = CS_RAM_A & CPU_WE;
assign WE_RAM_B = CS_RAM_B & CPU_WE;
assign WE_RAM_C = CS_RAM_C & CPU_WE;

assign OE_RAM_A = CS_RAM_A & CPU_RE;
assign OE_RAM_B = CS_RAM_B & CPU_RE;
assign OE_RAM_C = CS_RAM_C & CPU_RE;

always @(posedge Clk)
begin
     if(WE_RAM_A)
         RAM_A[Addrs] <= CPU_DI;
     RAM_A_DO <= #1 RAM_A[Addrs];
end

assign CPU_DI = ((OE_RAM_A) ? RAM_A_DO : {pDataWidth{1'bZ}});

always @(posedge Clk)
begin
     if(WE_RAM_B)
         RAM_B[Addrs] <= CPU_DI;
     RAM_B_DO <= #1 RAM_B[Addrs];
end

assign CPU_DI = ((OE_RAM_B) ? RAM_B_DO : {pDataWidth{1'bZ}});

always @(posedge Clk)
begin
     if(WE_RAM_C)
         RAM_C[Addrs] <= CPU_DI;
     RAM_C_DO <= #1 RAM_C[Addrs];
end

assign CPU_DI = ((OE_RAM_C) ? RAM_C_DO : {pDataWidth{1'bZ}});

In this example, the OR gate is implicitly created by the synthesizer.

This approach may or may not help, but it is the technique that I use when I want to connect a varying number of modules/components together. The approach is (1) automatically transformed into cascaded AND-OR gates as discussed above, (2) automatically accounts for all of connections, and (3) keeps me from having to define the multiplexer and/or the variable width OR manually.

I was looking over your code from a couple posts ago and I noticed you had 3 instances where you had
Code:
assign CPU_DI = ...
. How is there not a conflict? and Would the circuit behavior change if you had put all 3 of the 'assign CPU_DI = ...' at the very end of the code?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 30, 2013 5:01 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
There's no conflict, because the CPU_DI drivers are all 3-state. That's what the conditional assignment with 1'bZ is for: it drives 'Z' on the bus when the OE signal is not asserted. So only one driver (assuming no mistakes) drives the bus, and the others are all 'Z', so they don't conflict.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 30, 2013 5:23 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
But the tools will not make an internal tri-state data bus will they? Maybe an older version of ISE will do this?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 30, 2013 5:33 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
It depends on the device. If the device has internal buses, the tools can use them. The Spartan 3 and 6 don't have these buses, so the tools (even older versions) will convert the logic to a suitable mux.

That's why I suggested using a wired OR instead. A wide OR uses less resources (and is faster) than a MUX. Depending on the situation, you may have to use some extra logic to produce a '0' when the device is not selected, but very often this logic is free.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 31, 2013 12:14 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I've not yet attempted to use Arlet's recommendation: use signal whose type is declared as wor. In the past, my concern over making the RTL work led me to not use wor, wand, and several other signal types not generally associated with synthesizable code. Habits develop, but synthesizers evolve. I just never went back and tried some of these other signal types to see if the synthesizer accepted them.

The code example I provided will be automatically reduced to the multiple input multiplexer that you drew in your block diagram. I was attempting to demonstrate for you a technique that I use to avoid having to manually construct the multiplexer. If an unknown or variable number of signals connect to the "bus", then the explicit multiplexer has to be manually adjusted whenever you add data sources to the bus and/or when you subtract them from the bus.

My suggested approach relieves you from keeping track of this. Arlet's suggested mechanism does the same thing, The point of using these approaches is to let the tool automatically construct the multiplexer.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 05, 2013 2:11 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
MichaelM wrote:
...My suggested approach relieves you from keeping track of this. Arlet's suggested mechanism does the same thing, The point of using these approaches is to let the tool automatically construct the multiplexer.

With my one-liner
Code:
assign cpuDI = ( ram1DO | ram2DO | rom1DO | SRDO );
it doesn't seem so difficult to keep track of the output buses, unless my concept won't work. It is a work in progress still, so I don't know yet.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14, 15, 16 ... 41  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron