Arlet Ottens 6502

SpacedCowboy · Post by **SpacedCowboy** » Mon Dec 11, 2017 6:06 pm

So I'd like to use Arlet's 6502 in a design, but I'm having difficulties getting it to integrate with the blockram model I'm using. If I put byte 0xa0 (ASL A) into memory at the location pointed to by PC ($0000), set the interrupt vector to be (PC-1) and let the clock run for a bit, I see the below:

6502.org wrote:

Image no longer available: http://qoldfire.oobergeek.net/imgs/arlet-6502-reset.png

It looks as though the CPU is running through its BRK routine, fetching the vector, jumping through it, and entering DECODE state, but the result coming back from RAM is only available on the following clock cycle, so IR is invalid, and we end up going through another BRK instead of successfully decoding the 'ASL A'.

The README on his git repository states

Quote:

Note: the 6502 core assumes a synchronous memory. This means that valid data (DI) is expected on the cycle *after* valid address. This allows direct connection to (Xilinx) block RAMs

... which seems to be what's happening - Address 0000 (PC) is presented 6 clocks after the 'reset' line is toggled low (shown by the marker), and the data for the instruction at that address appears on the next clock.

The CPU is linked through to the blockram just as you might expect:

Code: Select all

module top
	(
	output		[15:0]		memAB,
	output		[7:0]		memDO,

	input 					clk,
	input					rst_n
	);

	////////////////////////////////////////////////////////////////////////////
    // Instantiate the 64k of BlockRAM for main memory
    ////////////////////////////////////////////////////////////////////////////
    wire					memWE;				// Write-enable
    wire	[7:0]			memDI;				// Data read from memory
    
    ram mem_inst
    	(
    	.clk(clk),
    		
    	.addr(memAB),
    	.we(memWE),
    	.di(memDI),
    	.do0(memDO)
    	);
	
	
	////////////////////////////////////////////////////////////////////////////
    // Instantiate the CPU
    ////////////////////////////////////////////////////////////////////////////
    reg						irq_n;				// External in: IRQ
    reg						nmi_n;				// External in: Non-maskable IRQ
    
	
	cpu cpu_inst
		(
		.clk(clk),								// System clock
		.reset(~rst_n),							// Active high reset
		
		.DI(memDO),								// Read:  data @ (address)
		.DO(memDI),								// Write)
		.AB(memAB),								// Address bus
		.WE(memWE),								// Write-enable
		
		.RDY(1'b1),								// Pause the CPU if low
		.IRQ(irq_n),							// external IRQ if low
		.NMI(nmi_n)							// non-maskable IRQ if low
		);

endmodule

And the blockram is a standard dual-ported inferred model that returns the data on the next clock:

Code: Select all

module bram
	#(
	parameter DATA 		= 8,
	parameter ADDR		= 16
    )
    (
    // Port A
    input   wire                clka,
    input   wire                wea,
    input   wire    [ADDR-1:0]  addra,
    input   wire    [DATA-1:0]  dina,
    output  reg     [DATA-1:0]  douta,

    // Port B
    input   wire                clkb,
    input   wire                web,
    input   wire    [ADDR-1:0]  addrb,
    input   wire    [DATA-1:0]  dinb,
    output  reg     [DATA-1:0]  doutb
    );
    
    // Shared memory
    reg [DATA-1:0] mem [(2**ADDR)-1:0];
     
    // Port A
    always @(posedge clka) 
    	begin
        	if (wea) 
            		mem[addra]	 	<= dina;
        	else
              	douta      			<= mem[addra];
    	end
     
    // Port B
    always @(posedge clkb) 
    	begin
        	if (web) 
            		mem[addrb] 	   <= dinb;
        	else
               	doutb      		   <= mem[addrb];
    	end

endmodule

(There's an intermediate module called 'ram', but that is transparent for this discussion - it's a straight wire-through). I'm sure Arlet's design works well for others, so I was wondering if it's obvious what I'm doing wrong ?

[Edit: Hmm, I noticed that BRK is supposed to be 7 cycles, not 6. Perhaps it's a timing thing... I'm also a little concerned that 'S' is going to 'xx' as a consequence of state BRK3 writing to the register...]

Cheers
Simon

Cray Ze · Post by **Cray Ze** » Mon Dec 11, 2017 7:01 pm

Both Xilinx and Altera BlockRAMs have an optional output register. This is probably on by default (certainly is on Altera) and could be causing the delay.
Just a guess as I've never used the core.

SpacedCowboy · Post by **SpacedCowboy** » Mon Dec 11, 2017 7:06 pm

Yep, agreed, but this is only in simulation, and I'm actually using iverilog/gtkwave, so there's no chance of the extra clock delay being introduced.

Looking at the signals, they seem reasonable - the data is appearing one clock after the address is presented. I did realise (after posting the main text) that the BRK instruction needs 7 cycles, and the core seems to be jumping to DECODE after only 6. That doesn't explain why it works for everyone else, but it's at least plausible for why I'm seeing the issue.

Thanks for the idea, though

barrym95838 · Post by **barrym95838** » Tue Dec 12, 2017 1:25 am

SpacedCowboy wrote:

... If I put byte 0xa0 (ASL A) into memory at the location pointed to by PC ($0000), set the interrupt vector to be (PC-1) and let the clock run for a bit, I see the below ...

Forgive me if I'm being unintentionally patronizing (or clueless of the true nature of your problem), but:

0xa0 is the opcode for LDY #
0x0a is the opcode for ASL A

As far as I know, the vectors should point directly to the code you wish to execute, with no (-1) offset.

Mike B.

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 2:37 am

barrym95838 wrote:

SpacedCowboy wrote:

... If I put byte 0xa0 (ASL A) into memory at the location pointed to by PC ($0000), set the interrupt vector to be (PC-1) and let the clock run for a bit, I see the below ...

Forgive me if I'm being unintentionally patronizing (or clueless of the true nature of your problem), but:

0xa0 is the opcode for LDY #
0x0a is the opcode for ASL A

You're right. It was a typo in the post - you can see the result being returned by the testbench code is in fact 0x0a (the clock after the marker, on the 'do' line).

barrym95838 wrote:

SpacedCowboy wrote:

As far as I know, the vectors should point directly to the code you wish to execute, with no (-1) offset.

Mike B.

Well, when I set the vector contents to be 0x0000, the first read address on the bus is 0x0001, so I subtracted 1

The problem with Arlet's code is it's so damn well-written. It's wonderfully concise, maddeningly elegant, and (due to my relatively limited ability) confusing as hell. I have a design for a 6502 which I wrote myself - it's way larger, a lot slower, and (probably only to me, because I wrote it) far easier to understand

... His is so much more efficient, though, that I'd really like to use it if I can.

To put that into context - his I can get to run at ~100MHz on an S7-50 if I use area-optimised synthesis/implementation. Mine tops out at ~66 MHz, which is pretty pitiful for an 8-bit CPU running on the part in question. Mine does have the saving grace of passing my testbench though [grin], using the same ram definition.

BigEd · Post by **BigEd** » Tue Dec 12, 2017 7:04 am

I'm sure we can find a way to understand and solve this! It would be easiest if someone - maybe me - could run up a simulation of a known-working Arlet-based design and compare notes. But I'm away from keyboard right now. Some notes and observations:
- the reset vector is at FFFC/FFFD so we expect those two locations to be read
- it would be handy if you'd put two different bytes in those locations so we see the expected transfer
- the vector should point directly to the first instruction of the reset routine - the PC-1 business is a clue that something is wrong, or is misinterpreted
- we are seeing something like a BRK executed, but the IR seems to be FF, so we're probably seeing an undefined behaviour not a true BRK. But this is confusing me. Perhaps we're seeing an IRQ??
- Arlet's core is indeed a work of art, and is known working, so if something goes wrong it's most likely the glue or the memory model.

hoglet · Post by **hoglet** » Tue Dec 12, 2017 12:07 pm

It looks like the reset sequence is happening correctly:
BRK0, BRK1, BRK2, BRK3, JMP0, JMP1, DECODE

However, the reset vector (at FFFC/FFFD) seems to be FFFF, and so execution is starting at FFFF, which is undefined (XX). This is why I think the IR then goes to XX, and at that point all bets are off.

How are you loading your RAM? You need to set FFFC/FFFD to point to 0000.

It would help if you could post the complete code (e.g. in a Zip file), rather then just fragments.

Here's a couple of other suggestions:

1. Make sure that irq_n and nmi_n are initialized to '1', at the moment it looks like they with be 'X'

2. Try asserting reset for several cycles, rather than just one cycle.

Dave

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 3:45 pm

So, mea culpa.

I put (PC-1) into the reset vector, because I misunderstood one of the bus-read cycles. Putting the actual *correct* vector (of PC) into the reset vector fixed that particular issue. I can see the 'ASL A' instruction appear during the DECODE cycle:

6502.org wrote:

Image no longer available: http://qoldfire.oobergeek.net/imgs/arlet-6502-reset.png

It's still not actually doing what I would expect (A enters the instruction as 0x80, and remains unchanged until the next reset) but it's definitely progress

If anyone's interested, I've attached the design as a zip file. Assuming you have icarus verilog installed, it's a matter of (cd srcs; make) to build and run the testbench code. Note that I don't actually expect the testbench code to report results correctly yet - I'm still porting it from my own core. I do expect the actual internal state to be correct after an instruction though.

Oh, and if your terminal software doesn't emulate a vt102 very well (with colour attributes), the output might look a little garbled. It ought to look something like:

6502.org wrote:

Image no longer available: http://qoldfire.oobergeek.net/imgs/6502-pass.png

.. except there's a lot more tests

Instead, I'm currently seeing:

6502.org wrote:

Image no longer available: http://qoldfire.oobergeek.net/imgs/6502-fail.png

hoglet · Post by **hoglet** » Tue Dec 12, 2017 5:16 pm

SpacedCowboy wrote:

It's still not actually doing what I would expect (A enters the instruction as 0x80, and remains unchanged until the next reset) but it's definitely progress

I think that's now working.

In the cycle where the second reset happens, you end up with: A=00 C=1 Z=1 which is the correct result of ASL A on A=80.

Move the second reset further out, and it might become clearer.

It's worth noting that the write-back to the registers/flags of an instruction doesn't actually happen until the the end of the fetch of the next instruction. This kind of pipe-lining exists in the original 6502 as well.

Dave

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 5:22 pm

Hmm. Ok, yes I can see that if I extend the 'check' function in the testbench by another clock, A <= 0 in the clock before reset is asserted. Cool - I didn't actually know the register-updates were delayed that much.

Ok, so I can adapt the testbench to that, and hopefully we're done

Well, apart from figuring out why the stack pointer is undefined

Thanks very much for all the help

hoglet · Post by **hoglet** » Tue Dec 12, 2017 5:36 pm

SpacedCowboy wrote:

Well, apart from figuring out why the stack pointer is undefined

On an original 6502, reset does not initialise the stack pointer. So Arlet's core is correct here.

To get a sane value, you need to execute LDX #$FF then TXS.

Dave

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 5:43 pm

hoglet wrote:

SpacedCowboy wrote:

Well, apart from figuring out why the stack pointer is undefined

On an original 6502, reset does not initialise the stack pointer. So Arlet's core is correct here.

To get a sane value, you need to execute LDX #$FF then TXS.

Dave

Ok, makes sense I guess.

I wasn't expecting it to be actively destroyed, that's all. It starts off as 0xFF at the beginning of the reset sequence and state BRK3 turns it into XX. I can work around that ...

hoglet · Post by **hoglet** » Tue Dec 12, 2017 6:06 pm

SpacedCowboy wrote:

I wasn't expecting it to be actively destroyed, that's all. It starts off as 0xFF at the beginning of the reset sequence and state BRK3 turns it into XX. I can work around that ...

The reset sequence follows the same state transitions as BRK does, which normally results in PCH, PCL and P being written to the stack:

Code: Select all

    BRK0   = 6'd8,  // BRK/IRQ - push PCH, send S to ALU (-1)
    BRK1   = 6'd9,  // BRK/IRQ - push PCL, send S to ALU (-1)
    BRK2   = 6'd10, // BRK/IRQ - push P, send S to ALU (-1)
    BRK3   = 6'd11, // BRK/IRQ - write S, and fetch @ fffe

https://github.com/Arlet/verilog-6502/b ... cpu.v#L181

In BRK0 the stack pointer is read, and in BRK3 the new value is written back. The intermediate values just circulate through the ALU, and are not actually written back to S.

In your test harness you are forcing S to $FF at the end of BRK0, but that's doesn't help as a value of XX has already been read and passed to the ALU. This explains why in BRK3 S becomes XX again.

You need to force S to $FF at least one cycle before releasing reset.

I would be tempted to extend the duration of reset to 3 cycles.

Dave

SpacedCowboy · Post by **SpacedCowboy** » Tue Dec 12, 2017 7:01 pm

hoglet wrote:

SpacedCowboy wrote:

I wasn't expecting it to be actively destroyed, that's all. It starts off as 0xFF at the beginning of the reset sequence and state BRK3 turns it into XX. I can work around that ...

The reset sequence follows the same state transitions as BRK does, which normally results in PCH, PCL and P being written to the stack:

Code: Select all

    BRK0   = 6'd8,  // BRK/IRQ - push PCH, send S to ALU (-1)
    BRK1   = 6'd9,  // BRK/IRQ - push PCL, send S to ALU (-1)
    BRK2   = 6'd10, // BRK/IRQ - push P, send S to ALU (-1)
    BRK3   = 6'd11, // BRK/IRQ - write S, and fetch @ fffe

https://github.com/Arlet/verilog-6502/b ... cpu.v#L181

In BRK0 the stack pointer is read, and in BRK3 the new value is written back. The intermediate values just circulate through the ALU, and are not actually written back to S.

In your test harness you are forcing S to $FF at the end of BRK0, but that's doesn't help as a value of XX has already been read and passed to the ALU. This explains why in BRK3 S becomes XX again.

You need to force S to $FF at least one cycle before releasing reset.

I would be tempted to extend the duration of reset to 3 cycles.

Dave

The force of S->$FF happens inside the resetState() call in the test harness. Changing the code there around slightly as you suggest...

Code: Select all

	////////////////////////////////////////////////////////////////////////////
	// Set up a task to set the internal state of the CPU
	////////////////////////////////////////////////////////////////////////////
	task setState;
		input	[7:0]		sr;
		input	[15:0]	  pc;
		input	[7:0]		a;
		input	[7:0]		x;
		input	[7:0]		y;
		input	[7:0]		sp;
	 	
		begin
			rst_n 		    = 1'b0;
			setPC(pc);
			setA(a);
			setX(x);
			setY(y);
			setPS(sr);
			setSP(sp);
			counter 			= 0;
			setMem(16'hfffc, pc[7:0]);
			setMem(16'hfffd, pc[15:0]);

        #30 	rst_n 		= 1'b1;
		end
	endtask

	////////////////////////////////////////////////////////////////////////////
	// Set up a task to set the internal state of the CPU, default params
	////////////////////////////////////////////////////////////////////////////
	task resetState;
		setState(8'h20,0,0,0,0,8'hff);
	endtask

... does produce the expected waveforms:

6502.org wrote:

Image no longer available: http://qoldfire.oobergeek.net/imgs/6502 ... sl-rol.png

But with the (totally expected) side-effect of SP <= $FC. Given that the I flag is also set (correctly) by BRK3, and nothing ever resets it, I'm thinking the proper way to go about this is to inject more code for every test. Instead of having it just test the single instruction I'm passing in, I probably ought to make it do

Code: Select all

LDX #$FF
TXS
LDX #$0
CLI
CLC
CLD
{instruction sequence}

... or similar (modified, if I'm actually testing, for example, CLI) each time. This'll mean going through the entire testbench again to make the expectations match the results, but it's more realistic in terms of what real code would do on a real 6502.

That's a lot of changes though [sigh]. I could always accept that the thing is thoroughly tested already, and actually works [grin].

hoglet · Post by **hoglet** » Tue Dec 12, 2017 7:10 pm

Why not just run Klaus Dormann's excellent 6502 functional test suite?
https://github.com/Klaus2m5/6502_65C02_functional_tests

Arlet Ottens 6502

Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502

Re: Arlet Ottens 6502