My new verilog 65C02 core.

BigEd · Post by **BigEd** » Tue Nov 10, 2020 11:22 am

Thanks! That's the sort of block diagram I like, with clock boundaries visible.

(Would you use diagrams like this to do the design, or to evaluate ideas?)

Arlet · Post by **Arlet** » Tue Nov 10, 2020 11:40 am

Me too. I saw it first in James Bowman's description of the J1 CPU, and I thought it made it very clear, so I wanted to do something similar.

I didn't draw any diagrams during the design, except maybe a few quick sketches of some details. I usually start with Verilog, and make some very simple skeleton structure, and then use the simulator to draw the waveforms for me, or in the case for this core, show the internal state for each clock with $display() lines.

I was actually thinking about writing some sort of blog on the design process.

BigEd · Post by **BigEd** » Tue Nov 10, 2020 12:05 pm

I'd read it! And I'd share it too...

Arlet · Post by **Arlet** » Wed Nov 11, 2020 10:54 am

I've ported nearly all of the Spartan-6 optimizations and changes back into the generic version. It passes Klaus' test suite, but I haven't tried it yet on a real board with IRQ and RDY.

There are still a few little things that need to be ported, mostly with flag handling and BCD adjustments. It's not strictly necessary to port those, but I want to keep them as close as possible, so that the generic version can act as documentation for the Spartan-specific version.

hoglet · Post by **hoglet** » Wed Nov 11, 2020 10:58 am

Do you think it might be possible to expose the sync signal on the interface?

I have a few projects that make use of this.

Dave

Arlet · Post by **Arlet** » Wed Nov 11, 2020 11:19 am

Ok, done.

I already had the internal signal, so it was just a matter of adding it to the port declaration.

Arlet · Post by **Arlet** » Wed Nov 11, 2020 11:28 am

Note that because some instructions are single cycle, you can now have 'sync' asserted for 2 cycles in a row. Some external logic may not be prepared for this, so be aware.

Arlet · Post by **Arlet** » Wed Nov 11, 2020 1:26 pm

By the way, if anyone is using this core for a project, I'd love to hear about it.

okwatts · Post by **okwatts** » Wed Nov 11, 2020 11:23 pm

Following with interest especially the generic version.
I am no FPGA programmer but I did implement the KIM-1 (6502) version on an Cyclone II as Stephen Edwards revealed in a presentation (http://www.cs.columbia.edu/~sedwards/presentations.html). Minus the keys and leds. I did need to update to your original cpu, alu rather than the one he included in the downloads. His version would not execute with code starting with the G command (in the ROM code it finishes with a RTI) but after updating to your version it ran fine. Also there is room for increasing the little internal RAM on chip to 5K. Makes for great fun in reliving the past (that and Oscarv's KIMUNO) now I need to dust off the first book of KIM and other old codes to see how much I can revive.
I have little HDL knowledge and got into this mostly from the retrobrew site for implementing Grant Searle's and Neil Crook's (6809) code on that old CycloneII module. I am able to now run Carl Moser's ASSMTED and FIG-forth as well as some Tiny Basic etc through the work of Maik Merten and Daryl Rictor stealing their code to make the SD card and FAT files work with a small monitor to read/write to the SD card. Their stuff is VHL based and Stephen Edwards is verilog so it's been a challenge to understand even that. Daryl has a monitor for the 65C02, so it would would be nice to experiment with this new version for the 65C02 but it will be a challenge for me.
Thanks for keeping the 6502 and family alive for those who were in the first phase of this "home computing" revolution!

Arlet · Post by **Arlet** » Thu Nov 12, 2020 9:46 am

I have added support for RDY & NMI to the generic code, which means that the generic version is now complete.

The Spartan6 version is still missing proper NMI handling, but supports everything else.

Arlet · Post by **Arlet** » Thu Nov 12, 2020 4:22 pm

NMI support also ported to Spartan-6 version.

I noticed that the Spartan-6 version combines the SYNC with the RDY to save a LUT, so when you deassert RDY, then SYNC goes down as well. I'll have to fix that.

Arlet · Post by **Arlet** » Thu Nov 12, 2020 7:47 pm

Oops, I accidentally ended up editing the first message instead of quoting it.

Final design, generic version, still has around 60 slices (there's some random variation in each run). Connected to single block RAM, it meets 8 ns constraint out of the box, and 7 ns with SmartExplorer pushing.

The Spartan-6 version has 120 LUTs, around 50 slices when synthesized without any placement constraints, and around 6.6 ns with SmartExplorer. I tried hand placement of a couple of things, but that only made things worse. I'll have to test some more with that, but I doubt there is much improvement possible.

Longest path in nearly all cases is microcode ROM->register file->ALU adder->ALU shifter->DB out mux->RAM. Nothing fancy, really, and nothing that suggests any possible improvement. In earlier timing runs that I did, the output mux may have been optimized away, so I got some slightly better results.

As you can see in the timing report, logic takes more than 4 ns, and routing a pretty good 2.5 ns, so there's not much hope of improving things with better placement. The critical path is already fairly tight.

Arlet · Post by **Arlet** » Fri Nov 13, 2020 6:43 am

There are two things that could possibly be improved in this path. First of all, getting rid of the microcode ROM could help, but since that's a big project, I was thinking I could perhaps work on that in stages. I could keep the microcode ROM for everything except the register select, and then see a) how much space it would take to generate that register number in logic, and b) how much the timing impact would be.

Secondly, if we're dealing with dual port block RAM, we can add a register stage to hold the DO+AB and do the actual write 1 cycle later. Unless you're doing self modifying code of the next byte, I don't think this would break anything.

I realize that the heavily optimized data path requires more complicated control logic. That's not a problem with a ROM, but it could be an issue with distributed logic. For instance, the ZP / ZP,X / ZP,Y / (ZP), Y / (ZP) / (ZP,X) are all handled in exactly the same way during the first cycle, namely ABL <= DB + REG, where REG is selected to be X, Y, or the special zero register. This simplifies the address logic, but it forces the control logic to select the zero register for many instructions.

I'm not too keen on redoing the entire design. My preference would be just to replace the control logic, and maintain the current data path.

Arlet · Post by **Arlet** » Fri Nov 13, 2020 7:00 am

I wrote a little tool to extract the source register used in the first cycle, and show the results in a 16x16 opcode grid.

Code: Select all

S X - - Z Z Z - S - A - - - - - 
- Z Z - Z X X - - - A - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - A - - - - - 
S X - - - Z Z - S - A - - - - - 
- Z Z - - X X - - - S - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - S - - - - - 
- X - - Z Z Z - Y - X - - - - - 
- Z Z - X X Y - Y - X - - - - - 
- X - - Z Z Z - A - - - - - - - 
- Z Z - X X Y - - - - - - - - - 
- X - - Z Z Z - Y - - - - - - - 
- Z Z - - X X - - - S - - - - - 
- X - - Z Z Z - - - - - - - - - 
- Z Z - - X X - - - S - - - - -

'Z' is the zero register, and '-' is don't care. The register file has 32 registers, so there is the option of using one-hot encoding for the most frequently used ones.

The good part is that we have an entire cycle to do the decoding, and store the result in a register, so it's feasible to use multiple logic layers, rather than very wide logic. The bad part is that such logic is hard to optimize.

hoglet · Post by **hoglet** » Fri Nov 13, 2020 1:39 pm

Arlet wrote:

By the way, if anyone is using this core for a project, I'd love to hear about it.

I'm trying to get this running in a BBC Micro.

The machine boots and I see the normal startup banner, but then it almost immediately crashes, and the crash is interrupt related I think.

Debugging is proving to be a bit of a challenge, but this is what is I think is happening:

It concerns this piece of OS 1.20 code:

Code: Select all

E560 : B8       : CLV            : 1
E561 : 6C 2C 02 : JMP (022C)     : 5
E464 : 08       : PHP            : 3
E465 : 78       : SEI            : 1
E466 : BD D8 02 : LDA 02D8,X     : 4
E469 : DD E1 02 : CMP 02E1,X     : 4
E46C : F0 72    : BEQ E4E0       : 2
E46E : 28       : PLP            : 3
E46F : 38       : SEC            : 1
E470 : 60       : RTS            : 4

Now, it seems that if the instruction *after* JMP (022C) is interrupted, then things go wrong:

Code: Select all

E560 : B8       : CLV            : 1
E561 : 6C 2C 02 : JMP (022C)     : 5
pc: prediction failed at 022E old pc was E464
022E :          : INTERRUPT !!   : 7
DC1C : 85 FC    : STA FC         : 3
DC1E : 68       : PLA            : 3
DC1F : 48       : PHA            : 3
DC20 : 29 10    : AND #10        : 2
DC22 : D0 03    : BNE DC27       : 2
DC24 : 6C 04 02 : JMP (0204)     : 5
...
8134 : 40       : RTI            : 5
022E : D1 E1    : CMP (E1),Y     : 5
0230 : A6 FF    : LDX FF         : 3
0232 : A6 FF    : LDX FF         : 3
0234 : A6 FF    : LDX FF         : 3
0236 : 90 01    : BCC 0239       : 2
0238 : 9F       : ???            : 1
0239 : 0D 0D A1 : ORA A10D       : 4
023C : 02 0D    : NOP #0D        : 1
023E : 2B       : NOP            : 1
023F : 2B       : NOP            : 1
0240 : 2B       : NOP            : 1
0241 : 2B       : NOP            : 1
0242 : 2B       : NOP            : 1

It looks like the wrong return address is being pushed onto the stack.

Here's are the bus cycles around the interrupt:

Code: Select all

0 b8 1 1 1
E560 : B8       : CLV            : 1
0 6c 1 1 1
1 2c 0 1 1
2 02 0 1 1
3 64 0 1 1
4 e4 0 1 1
E561 : 6C 2C 02 : JMP (022C)     : 5
0 08 1 1 1
1 78 0 1 1
2 02 0 0 1
3 2e 0 0 1
4 23 0 0 1
5 1c 0 1 1
6 dc 0 1 1
pc: prediction failed at 022E old pc was E464
022E :          : INTERRUPT !!   : 7
0 85 1 1 1
1 fc 0 1 1
2 00 0 0 1
DC1C : 85 FC    : STA FC         : 3

Does this bug seem possible, or has my debugging gone astray?

Dave

My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.

Re: My new verilog 65C02 core.