Arlet Ottens 6502

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 7:11 pm

hoglet wrote:

Why not just run Klaus Dormann's excellent 6502 functional test suite?
https://github.com/Klaus2m5/6502_65C02_functional_tests

... because I didn't know about it

I'll grab it and take a look. Thanks again!

BigEd · Post by **BigEd** » Tue Dec 12, 2017 7:58 pm

When it comes time to test and debug BCD arithmetic, if you're going to do that, it can be fruitful to add in Bruce's tests. Some links here:
http://visual6502.org/wiki/index.php?ti ... stPrograms

jblang · Post by **jblang** » Thu Dec 21, 2017 2:24 am

I'm also trying to get Arlet's core working and I'm having some trouble with the memory timing. I tried to synthesize it on a Spartan XC3S50A FPGA board couldn't get it to work, so I tried simulating it.

Here is my test bench: https://gist.github.com/jblang/a7fed634 ... 1eea1a1eae
And my block ram implementation: https://gist.github.com/jblang/ca99c800 ... efef6289fa
I've loaded the following test program into my ram: https://gist.github.com/jblang/424aab3c ... b65378cf67

I'm using Arlet's 6502 core directly from his github repo without any modifications.

I can see the CPU attempting to fetch the reset vector, but my RAM doesn't seem to be supplying the data fast enough (posedge.png). I also tried triggering on negedge clk, which looks a little better (at least the ram is sending the second byte of the reset vector) but it still doesn't look like it is there when the CPU expects it. I probably need to be doing something with RDY but I don't really understand what.

My synthesizable top module is very similar to my test bench except that it also maps the 8 leds on my board to $D000 and the dip switches to $D001. The intention of my test program was simply to copy the values of the dip switches to the LEDs as a simple test that I had the CPU executing code properly.

I'm very new to FPGAs still, so I apologize if I'm missing something obvious. I got the seven segment display working and a simple vga test pattern but getting the CPU to work has me stumped. I wasn't sure if I should start my own thread but I thought I would ask here first. Please let me know if you want me to open a different one.

Note to Arlet: it would be very helpful for beginners if you included in your github repo a simple top module that shows how to wire the CPU up to some RAM and a few memory mapped IOs.

jblang · Post by **jblang** » Thu Dec 21, 2017 3:07 am

I looked at how SpacedCowboy had implemented his ram and modified mine to match, and now my simulation looks right.

This is my corrected ram implementation: https://gist.github.com/jblang/febdcc0b ... 40f35fdfc0

It's still not running right on my FPGA yet but I still need to do a bit more investigation before I know what questions to ask.

MichaelM · Post by **MichaelM** » Sun Dec 31, 2017 3:10 pm

jblang:

I reread your posts and found that you appear to be using a Spartan 3A FPGA for your project. Therefore, I suspect your problem is that you are using an operating mode in your description of the inferred RAM blocks which is not supported by the Block RAM of the Spartan 3A FPGA family. If you haven't resolved your simulation and implementation issues, I think the following recommendations may help.

I checked the "lightbulb" tool (Language Templates) in ISE 14 and it appears that you have implemented your RAM module using an example from the tool for the No-Change operating mode. I am fairly certain that the No-Change operating mode for BRAM is not directly supported by the Spartan 3A family. I suggest that you change your RAM module to use the Read-First mode, which is the mode that I've always used for my Spartan 3A and Spartan 6 projects:

Code: Select all

   always @(posedge <clock>)
      if (<ram_enable>) begin
         if (<write_enable>)
            <ram_name>[<address>] <= <input_data>;
         <output_data> <= <ram_name>[<address>];
      end

If you need a bus multiplexer, I implement it following the preceding inferred RAM code using a continuous assignment statement. If you want to see an example targeting a Spartan 3A FPGA, you can take a look at some of the modules in this project on Github.

Have a safe and happy new year.

jblang · Post by **jblang** » Tue Jan 02, 2018 5:55 pm

Michael,

Thanks for the suggestions. I think I have my RAM set up the way you suggested now:

Code: Select all

module ram(clk, addr, data_in, data_out, cs, we);
 
parameter ADDR_WIDTH = 11;
parameter DATA_WIDTH = 8;
parameter INIT_FILE = "";

input clk;
input [ADDR_WIDTH-1:0] addr;
input [DATA_WIDTH-1:0] data_in;
output reg [DATA_WIDTH-1:0] data_out;
input cs;
input we;

reg [DATA_WIDTH-1:0] mem[(1 << ADDR_WIDTH)-1:0];

initial
begin
	if (INIT_FILE != "") begin
		$readmemh(INIT_FILE, mem);
	end
end

always @(posedge clk)
begin
	if (cs) begin
		if (we)
			mem[addr] <= data_in;
		data_out <= mem[addr];
	end
end

endmodule

And I have my bus multiplexer set up using continuous assignment as you suggested:

Code: Select all

module cpu_tb;

reg clk;
reg reset;
reg irq;
reg nmi;
reg rdy;

wire [15:0] addr;
wire [7:0] cpu_do;
wire [7:0] cpu_di;
wire we;

wire [7:0] leds;
reg [7:0] dips;

// 6502 CPU

cpu cpu0 (
	.clk(clk), 
	.reset(reset), 
	.AB(addr), 
	.DI(cpu_di), 
	.DO(cpu_do), 
	.WE(we), 
	.IRQ(irq), 
	.NMI(nmi), 
	.RDY(rdy)
);

// Low 2K of RAM $0000-$07FF
wire [7:0] ramlo_do;
wire ramlo_cs = addr[15:11] == 5'h0;
ram #(.ADDR_WIDTH(11)) ramlo (
	.clk(clk), 
	.addr(addr[10:0]), 
	.data_in(cpu_do), 
	.data_out(ramlo_do), 
	.cs(ramlo_cs), 
	.we(we)
);	

assign cpu_di = ((ramlo_cs) ? ramlo_do : 0);

// High 4K of RAM $F000-$FFFF
wire [7:0] ramhi_do;
wire ramhi_cs = addr[15:12] == 4'hF;
ram #(.ADDR_WIDTH(12), .INIT_FILE("ram.hex")) ramhi (
	.clk(clk), 
	.addr(addr[11:0]), 
	.data_in(cpu_do), 
	.data_out(ramhi_do), 
	.cs(ramhi_cs), 
	.we(we)
);

assign cpu_di = ((ramhi_cs) ? ramhi_do : 0);

/*
// Simple I/O ports $D000-$D0FF
wire [7:0] io_do;
wire io_cs = addr[15:8] == 8'hD0;
iomux ports (
	.clk(clk), 
	.addr(addr[7:0]), 
	.data_in(cpu_do), 
	.data_out(cpu_di), 
	.cs(io_cs), 
	.we(we),
	.leds(leds),
	.dips(dips)
);

assign cpu_di = ((io_cs) ? io_do : 0);
*/
initial 
begin
	// Initialize Inputs
	dips = 8'h5C;
	
	irq = 0;
	nmi = 0;
	rdy = 1;
	clk = 0;
	reset = 1;
	repeat(4) #10 clk = ~clk;
	reset = 0;
	forever #10 clk = ~clk;
	$finish;
end      

endmodule

Unfortunately, it's still not working correctly under simulation. The problem seems to be that my chip select signal is entering an indeterminate state while the RAM is outputting the second byte.

I also tried implementing the CS logic using non-blocking assignment inside an always @* block, the way you appear to be doing in your MMU module:

Code: Select all

reg ramlo_cs;
reg ramhi_cs;

always @*
begin
	ramlo_cs <= addr[15:11] == 5'h0;
	ramhi_cs <= addr[15:12] == 4'hF;
end

The CS still goes into an indeterminate state while reading the second byte. What am I doing wrong?

Thanks,
J.B.

jblang · Post by **jblang** » Tue Jan 02, 2018 8:24 pm

I got it working. The trick was I had to delay the address my output multiplexer was using by one cycle.

Edit: here is what finally worked for anyone interested: https://github.com/jblang/verilog-6502

MichaelM · Post by **MichaelM** » Wed Jan 03, 2018 1:53 pm

jblang:

I just saw your last two posts. Glad to see that you have gotten you project working in simulation. I am wondering if you've actually gotten it running yet on your target board?

I haven't tried to get Arlet's core running in simulation or on a board in a few years. However, he clearly defines its operation as targeting an asynchronous RAM interface. Unfortunately, the large RAM blocks in Xilinx and Altera(Intel) FPGAs are synchronous RAMs.

This characteristic essentially adds a pipeline delay that is detrimental to the operating characteristics of Arlet's core. You can see this characteristic clearly in the timing diagram you posted yesterday: the reset address is presented on the address bus on the rising edge of the clock and then the data output of the RAM is shown asserted at the next rising edge of the clock.

One thing that I don't particularly care for in behavioral simulation of HDL is that the logic delays actually present in the real circuit are not shown in the timing diagrams. What is displayed are those transition Xs that indicate that on a clock edge the state of a signal changed. This is all well and good, but in my opinion, frequently provides a misleading picture of the operation of a particular circuit.

In the case of your address output followed by your RAM data output, I would expect that there is a half cycle overlap between the address output and the RAM data output. In actuality, there is a finite, fixed delay between the calculation of the address inside the core, and there is a fixed data setup time on the following rising edge for the data input to the core, i.e. the RAM data output plus the data multiplexer.

When I want to use synchronous RAMs in a reliable manner for single cycle operation, I do the following: (1) output the address from the core on the rising edge, (2) capture the address into the address buffers of the RAM on the falling edge, and (3) capture the output data from the RAM/multiplexer on the rising edge of the clock. I expect that this approach will work for your project as well, and is the approach used in the project I referred you to previously.

What is missing from your timing diagram, because it is not included in your source, are models of the delays inherent in the generation of the address and the RAM data output. Because I can't keep many of these behaviors in the forefront of my mind's eye when looking at timing diagrams, I include a #1 delay statement in all of my synchronous logic statements. Including such a statement in the HDL raises a warning when synthesizing for the FPGA, but I simply mark the warning as OK and have the tool ignore it. It does, however, force the simulator to display the result shifted to the right by the amount I specified, which allows me to view the signal as being generated as a result of the edge instead of thinking of it as being correctly asserted on the edge.

The combinatorial (continuous assignments) signals do not include that delay statement, but because any synchronous signals include a slight delay, the results of the combinatorial signals will also be shifted to the right. In this way, I can keep track of which signals are generated (delayed from) a clock edge and which are expected to be valid on (asserted before) a clock edge.

One note of caution if you decide to adopt a similar approach. Don't spend too much time trying to "model" the logic delays of your logic with the delay statements I recommended you add to your HDL. That is a losing proposition; the tools do that automatically when synthesizing for the target. My recommended approach is only appropriate for the behavioral simulation phase, which doesn't care about the actual logic delays in the target. As I stated above, I use the approach primarily to help keep track of the generation and consumption of signals. Furthermore, I've encountered some problems in the simulation tools (ModelSim and ISim, in particular) where the simulator fails to provide a valid result. With the delay statements added, I've not encountered these issues further with either simulator for the projects I develop.

This post does not directly answer your question, but I think it should provide some insight to some of the problems that you seem to be having. One of my greatest frustrations in moving to HDL-based FPGA development 15+ years ago was the seeming disconnect between the code and the simulation results. I had to work hard to convince myself, by focusing on using the expected inferred logic templates the tool documentation provided, that the behavioral simulation results matched the synthesized logic behavior that I wanted. Once that confidence took hold, and I consistently applied the same logic templates in my HDL, I became much more efficient and could focus more on the abstract solution rather than the low level logic produced by the synthesizer. In fact, I am a big proponent of behavioral simulation, and I frequently use it to help me debug problems in boards with FPGAs by focusing my efforts outside of the FPGA because of the confidence that I have in the simulation/synthesis results.

Hope this helps as you continue the development of your project. Arlet's core is very efficient, and should be used as an example of good Verilog.

BigEd · Post by **BigEd** » Wed Jan 03, 2018 2:04 pm

(Umm, isn't it the other way around? Arlet's core produces outputs early, so it's easy to interoperate with on-chip synchronous RAMs, but needs a pipeline delay to interoperate with typical offchip async SRAMs. If a core takes the opposite tactic, as most do, it's easy to use external SRAM but you end up having to double-clock or inverse-clock the on-chip memories, which can effectively halve the speed.)

jblang · Post by **jblang** » Wed Jan 03, 2018 3:06 pm

Michael, I did get it working on the target board. Thanks for your detailed explanation about the differences between simulation and actual behavior of the FPGA. I will have to read a few more times to fully digest it, but I will definitely keep in mind going forward.

The first test program I wrote just copied the value of the board's 8 dip switches to its 8 LEDs inside a loop, and I confirmed that worked after I fixed my timing. After I was confident that the core was basically working, I added a digit multiplexer and registers to hold the segment values for the seven segment display. I created a lookup table in assembly language to translate from binary values to seven segment output, and I wrote some code to capture the lower nybble of the dip switches on startup and display that on one of the digits. Within the loop, I also continuously translate the full 8-bit dip switch value to hex and display that on the other two digits. All of this is working now and present in the github repo I linked.

I plan next to tackle interrupts via the push buttons, then move on to more complex peripherals like the VGA and UART.

Arlet's core is one of the few I've found that's small enough to run on the XC3S50A and I much prefer it to picoblaze because of the volume of software written for the 6502. I also played with light8080 but I'm not nearly as familiar with 8080 assembly and I don't like the fact that it uses one of the 3 block rams available on my board for microcode.

hoglet · Post by **hoglet** » Wed Jan 03, 2018 3:24 pm

In case it's of any use to you, there is a 65C02 version of Arlet's core available here:
https://github.com/hoglet67/verilog-6502

This was something BigEd and I worked on about 18 months ago.

There's also a forum thread on this here:
viewtopic.php?f=10&t=4224

It's only about 10% bigger, so it may still fit in your XC3S50A

Dave

jblang · Post by **jblang** » Wed Jan 03, 2018 3:47 pm

Hoglet, thanks for pointing me to your core. Yes, I believe it should fit. Arlet's 6502 core alone takes 40% of the slices on the XC3S50A, and my top module adds another 8% so far. I'll probably stick with the classic core for now since I cut my teeth programming the 6502 on the C64 (only last year--I'm a little late to the party

). I will try out the 65C02 at some point tough. I know it's got a few nice additions to the instruction set.

rwiker · Post by **rwiker** » Wed Jan 03, 2018 7:44 pm

hoglet wrote:

In case it's of any use to you, there is a 65C02 version of Arlet's core available here:
https://github.com/hoglet67/verilog-6502

This was something BigEd and I worked on about 18 months ago.

There's also a forum thread on this here:
viewtopic.php?f=10&t=4224

It's only about 10% bigger, so it may still fit in your XC3S50A

Dave

It should definitely fit: I plugged the 65C02 version into enso's code for the CHOCHI board, and it runs just fine. The CHOCHI board has an XC3S50A, an external RAM chip (10ns, 128kB, I think), and a soft UART.

Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502