My new verilog 65C02 core.

Arlet · Post by **Arlet** » Fri Nov 13, 2020 2:04 pm

Good find, and accurate debugging. Latest github has a fix.

The same bug may be present in other instructions as well. I'll have to check. The problem was that the PC wasn't updated in the last cycle of the JMP (IND). This is only an issue when instruction is directly followed by an interrupt, so it doesn't show up in the test suite.

Edit: I've checked the microcode, and for every instruction, the last cycle loads the PC, so should be good now.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 2:21 pm

Arlet wrote:

Good find, and accurate debugging. Latest github has a fix.

That's working much better, it no longer hangs on reset and I can type things in!

It run 14.5% faster just throught improved CPI:

: capture0.png (4.27 KiB) Viewed 832 times

But for some reason the system is hanging when I run the DORMANN tests.

I'll see if I can debug this now.

Dave

Arlet · Post by **Arlet** » Fri Nov 13, 2020 2:39 pm

Nice.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 3:07 pm

hoglet wrote:

But for some reason the system is hanging when I run the DORMANN tests.

It looks like the issue is also interrupt related, and occurs in test &2A which is the decimal mode add/subtract test.

The trace is a bit strange. Ignore the addresses, as they are bogus. They are inferred by my 6502Decoder tool, and it's getting a bit confused at the different data bus cycle order from JSR and RTS.

Here's a fragment of code that works:

Code: Select all

C21C : A5 53    : LDA 53         : 3
C21E : 79 04 31 : ADC 3104,Y     : 5
C221 : 08       : PHP            : 3
C222 : C5 55    : CMP 55         : 3
C224 : F0 03    : BEQ C229       : 2
C226 : 68       : PLA            : 3
C227 : 29 01    : AND #01        : 2
C229 : C5 56    : CMP 56         : 3
C22B : F0 03    : BEQ C230       : 2
C22D : 28       : PLP            : 3
C22E : 08       : PHP            : 3

Here's the same code next time it executes:

Code: Select all

C21C : A5 53    : LDA 53         : 3
C21E : 79 04 31 : ADC 3104,Y     : 5
C221 : 08       : PHP            : 3
C222 : C5 55    : CMP 55         : 3
C224 : F0 03    : BEQ C229       : 2
C226 : 68       : PLA            : 3
C227 : 29 01    : AND #01        : 2
C229 : C5 56    : CMP 56         : 2
C22B : 56 56    : LSR 56,X       : 2
C22D : 56 56    : LSR 56,X       : 2
C22F : 56 56    : LSR 56,X       : 2
C231 : 56 56    : LSR 56,X       : 2

I'm guessing that an interrupt is occurring, and is causing the microcode to crash.

For some reason the CMP 56 is only taking 2 cycles, where as previously it took 3.

Here's the data bus cycles:

Code: Select all

0 68 1 1 1
1 29 0 1 1
2 f9 0 1 1
C226 : 68       : PLA            : 3
0 29 1 1 1
1 01 0 1 1
C227 : 29 01    : AND #01        : 2
0 c5 1 1 1
1 56 0 1 1
C229 : C5 56    : CMP 56         : 2
0 56 1 0 1
1 56 0 1 1
C22B : 56 56    : LSR 56,X       : 2
0 56 1 0 1
1 56 0 1 1
C22D : 56 56    : LSR 56,X       : 2
0 56 1 0 1
1 56 0 1 1
C22F : 56 56    : LSR 56,X       : 2

There is no sign of the interrupt begin taken, but if I disable interrerupts before the test starts, then it runs to completion.

Edit: Looking at the microcode, it does look like the IRQ handler is missing from the upper bank (0x180-0x1FF). I think it should be there at 0x1E0, but that jumps to the NMI handler.

Dave

Arlet · Post by **Arlet** » Fri Nov 13, 2020 3:29 pm

Ok, I can reproduce it in simulation. Same code with CLD works fine, with SED it crashes.

Looks like a design flaw in the decimal/irq/nmi handling. I need to rethink that.

Arlet · Post by **Arlet** » Fri Nov 13, 2020 4:00 pm

Ok, it wasn't so bad. It was basically the same bug I had before on my hardware. I had fixed it, but then reintroduced the same bug when I added support for NMI.

Luckily, it was just a matter of microcode updates. Fix in github.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 4:18 pm

Arlet wrote:

Luckily, it was just a matter of microcode updates. Fix in github.

Excellent, that works now, and the Dorman tests pass:

: capture1.png (1.16 KiB) Viewed 820 times

The next one is slightly strange, and I don't think it's interrupt related:

: capture2.png (637 Bytes) Viewed 820 times

It seems to be something setting the overflow flag when it shouldn't.

Let me dig a bit further....

Arlet · Post by **Arlet** » Fri Nov 13, 2020 4:36 pm

I just checked the V flag handling, and it looks correct to me.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 4:53 pm

Here's a trace of this going wrong.

Code: Select all

0 6a 1 1 1
E7BC : 6A       : ROR A          : 1
0 28 1 1 1
1 2a 0 1 1
2 b1 0 1 1
E7BD : 28       : PLP            : 3
0 2a 1 1 1
E7BE : 2A       : ROL A          : 1
0 68 1 1 1
1 b8 0 1 1
2 8c 0 1 1
E7BF : 68       : PLA            : 3
0 b8 1 1 1
E7C0 : B8       : CLV            : 1
0 60 1 1 1
1 a0 0 1 1
2 73 0 1 1
3 e3 0 1 1
E7C1 : 60       : RTS            : 4
0 70 1 1 1
1 9a 0 1 1
E374 : 70 9A    : BVS E310       : 2
0 00 1 1 1
1 fe 0 1 1
2 e3 0 0 1
3 12 0 0 1
4 f0 0 0 1
5 1c 0 1 1
6 dc 0 1 1
pc: prediction failed at E310 old pc was 73A3
E310 : 00 FE    : BRK #FE        : 7

It looks like CLV hasn't cleared the V flag, as evidenced by the BVS E310 being taken, the resulting in a "Bad Command" error.

But this would get picked up by the Dormann tests, so it must be more subtle!

Dave

Arlet · Post by **Arlet** » Fri Nov 13, 2020 5:08 pm

I can reproduce it in sims, but only with the PLA right before the CLV.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 5:24 pm

Arlet wrote:

I can reproduce it in sims, but only with the PLA right before the CLV.

In the code that's going wrong, there is a PLA immediately before the CLV:

Code: Select all

>> d e7bd
E7BD : 28       : PLP 
E7BE : 2A       : ROL A
E7BF : 68       : PLA 
E7C0 : B8       : CLV 
E7C1 : 60       : RTS

Dave

Arlet · Post by **Arlet** » Fri Nov 13, 2020 5:33 pm

Ah, found the problem. In the ALU, I reconstruct the internal carry from bit 7 by XOR of both inputs and the output.

Code: Select all

wire BC7 = adder[7] ^ R[7] ^ M[7];

Where 'M' is the last value read from memory. The problem is that I changed the ALU a while ago to use a separate register 'BI' instead of 'M', in order to speed up the BCD logic. I had forgotten to change it in the carry logic as well. Normally, that's not a problem, because M and BI will be the same. The problem is that the CLV is a single cycle instruction, which means that, unlike every other instruction, does not load the M/BI registers itself, but inherits them from the instruction before it. Edit: there are other single cycle instructions, obviously, but none of them use the M/BI register, because there's no time to load them in a single cycle.

Normally, that wouldn't even be a problem, but the CLV uses a trick to load V with the ALU overflow output, instead of clearing it to zero.

Quite subtle indeed, and only a problem for the generic code. The Spartan-6 specific code just grabs the value from the internal carry chain instead.

Fix in github.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 5:53 pm

Arlet wrote:

Fix in github.

Great, now I can play Planetoid (badly!):

: capture4.png (1.38 KiB) Viewed 807 times

I'll let you know if I find anything else.

Next, to try running at speed!

Arlet · Post by **Arlet** » Fri Nov 13, 2020 6:12 pm

I loved playing Planetoid when I was a kid. I didn't have a Beeb myself, but every Saturday I would go to this computer store where they had a bunch of demo machines set up.

Arlet · Post by **Arlet** » Sat Nov 14, 2020 6:59 am

Arlet wrote:

I wrote a little tool to extract the source register used in the first cycle, and show the results in a 16x16 opcode grid.

Code: Select all

S X - - Z Z Z - S - A - - - - - 
- Z Z - Z X X - - - A - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - A - - - - - 
S X - - - Z Z - S - A - - - - - 
- Z Z - - X X - - - S - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - S - - - - - 
- X - - Z Z Z - Y - X - - - - - 
- Z Z - X X Y - Y - X - - - - - 
- X - - Z Z Z - A - - - - - - - 
- Z Z - X X Y - - - - - - - - - 
- X - - Z Z Z - Y - - - - - - - 
- Z Z - - X X - - - S - - - - - 
- X - - Z Z Z - - - - - - - - - 
- Z Z - - X X - - - S - - - - -

I was staring at this for considerable time, and then I decided to try something different. I had my tool generate a big 'case' statement, one line for each specified register. Like so:

Code: Select all

case( opcode )
        8'h00: out = S;
        8'h01: out = X;
        8'h04: out = Z;
        8'h05: out = Z;
        8'h06: out = Z;
        8'h08: out = S;
        8'h0a: out = A;
        8'h11: out = Z;
        8'h12: out = Z;
       .. etcetera ..

I added an extra 'sync' input to mark the first cycle, and 3 bit binary output to select the register number. I ran it through ISE, and it came up with a design with only 9 LUTs, which is quite excellent. Apparently, the tools are pretty good at optimizing such kind of logic. If I start with a working microcode version, and then automatically generate code for big 'case' statements (with as many "don't cares" as possible), the tools should be able to optimize that to compact logic.

Not only does it save a bunch of work, it should also be bug-free, and easy to modify if desired.

The crux is to identify all the "don't cares". You can't just take the current microcode table and convert it to distributed logic, and hope to get something compact. But it's possible to write a tool that looks at each line of microcode and determines whether ALU result is used, or whether a register needs to be read.

My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.