My new verilog 65C02 core.

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: My new verilog 65C02 core.

Post by BigEd »

Thanks! That's the sort of block diagram I like, with clock boundaries visible.

(Would you use diagrams like this to do the design, or to evaluate ideas?)
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

Me too. I saw it first in James Bowman's description of the J1 CPU, and I thought it made it very clear, so I wanted to do something similar.

I didn't draw any diagrams during the design, except maybe a few quick sketches of some details. I usually start with Verilog, and make some very simple skeleton structure, and then use the simulator to draw the waveforms for me, or in the case for this core, show the internal state for each clock with $display() lines.

I was actually thinking about writing some sort of blog on the design process.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: My new verilog 65C02 core.

Post by BigEd »

I'd read it! And I'd share it too...
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

I've ported nearly all of the Spartan-6 optimizations and changes back into the generic version. It passes Klaus' test suite, but I haven't tried it yet on a real board with IRQ and RDY.

There are still a few little things that need to be ported, mostly with flag handling and BCD adjustments. It's not strictly necessary to port those, but I want to keep them as close as possible, so that the generic version can act as documentation for the Spartan-specific version.
hoglet
Posts: 367
Joined: 29 Jun 2014

Re: My new verilog 65C02 core.

Post by hoglet »

Do you think it might be possible to expose the sync signal on the interface?

I have a few projects that make use of this.

Dave
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

Ok, done.

I already had the internal signal, so it was just a matter of adding it to the port declaration.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

Note that because some instructions are single cycle, you can now have 'sync' asserted for 2 cycles in a row. Some external logic may not be prepared for this, so be aware.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

By the way, if anyone is using this core for a project, I'd love to hear about it.
okwatts
Posts: 110
Joined: 11 Nov 2020
Location: Kelowna Canada

Re: My new verilog 65C02 core.

Post by okwatts »

Following with interest especially the generic version.
I am no FPGA programmer but I did implement the KIM-1 (6502) version on an Cyclone II as Stephen Edwards revealed in a presentation (http://www.cs.columbia.edu/~sedwards/presentations.html). Minus the keys and leds. I did need to update to your original cpu, alu rather than the one he included in the downloads. His version would not execute with code starting with the G command (in the ROM code it finishes with a RTI) but after updating to your version it ran fine. Also there is room for increasing the little internal RAM on chip to 5K. Makes for great fun in reliving the past (that and Oscarv's KIMUNO) now I need to dust off the first book of KIM and other old codes to see how much I can revive.
I have little HDL knowledge and got into this mostly from the retrobrew site for implementing Grant Searle's and Neil Crook's (6809) code on that old CycloneII module. I am able to now run Carl Moser's ASSMTED and FIG-forth as well as some Tiny Basic etc through the work of Maik Merten and Daryl Rictor stealing their code to make the SD card and FAT files work with a small monitor to read/write to the SD card. Their stuff is VHL based and Stephen Edwards is verilog so it's been a challenge to understand even that. Daryl has a monitor for the 65C02, so it would would be nice to experiment with this new version for the 65C02 but it will be a challenge for me.
Thanks for keeping the 6502 and family alive for those who were in the first phase of this "home computing" revolution!
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

I have added support for RDY & NMI to the generic code, which means that the generic version is now complete.

The Spartan6 version is still missing proper NMI handling, but supports everything else.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

NMI support also ported to Spartan-6 version.

I noticed that the Spartan-6 version combines the SYNC with the RDY to save a LUT, so when you deassert RDY, then SYNC goes down as well. I'll have to fix that.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

Oops, I accidentally ended up editing the first message instead of quoting it.

Final design, generic version, still has around 60 slices (there's some random variation in each run). Connected to single block RAM, it meets 8 ns constraint out of the box, and 7 ns with SmartExplorer pushing.

The Spartan-6 version has 120 LUTs, around 50 slices when synthesized without any placement constraints, and around 6.6 ns with SmartExplorer. I tried hand placement of a couple of things, but that only made things worse. I'll have to test some more with that, but I doubt there is much improvement possible.

Longest path in nearly all cases is microcode ROM->register file->ALU adder->ALU shifter->DB out mux->RAM. Nothing fancy, really, and nothing that suggests any possible improvement. In earlier timing runs that I did, the output mux may have been optimized away, so I got some slightly better results.

As you can see in the timing report, logic takes more than 4 ns, and routing a pretty good 2.5 ns, so there's not much hope of improving things with better placement. The critical path is already fairly tight.
Attachments
path.png
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

There are two things that could possibly be improved in this path. First of all, getting rid of the microcode ROM could help, but since that's a big project, I was thinking I could perhaps work on that in stages. I could keep the microcode ROM for everything except the register select, and then see a) how much space it would take to generate that register number in logic, and b) how much the timing impact would be.

Secondly, if we're dealing with dual port block RAM, we can add a register stage to hold the DO+AB and do the actual write 1 cycle later. Unless you're doing self modifying code of the next byte, I don't think this would break anything.

I realize that the heavily optimized data path requires more complicated control logic. That's not a problem with a ROM, but it could be an issue with distributed logic. For instance, the ZP / ZP,X / ZP,Y / (ZP), Y / (ZP) / (ZP,X) are all handled in exactly the same way during the first cycle, namely ABL <= DB + REG, where REG is selected to be X, Y, or the special zero register. This simplifies the address logic, but it forces the control logic to select the zero register for many instructions.

I'm not too keen on redoing the entire design. My preference would be just to replace the control logic, and maintain the current data path.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: My new verilog 65C02 core.

Post by Arlet »

I wrote a little tool to extract the source register used in the first cycle, and show the results in a 16x16 opcode grid.

Code: Select all

S X - - Z Z Z - S - A - - - - - 
- Z Z - Z X X - - - A - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - A - - - - - 
S X - - - Z Z - S - A - - - - - 
- Z Z - - X X - - - S - - - - - 
S X - - Z Z Z - S - A - - - - - 
- Z Z - X X X - - - S - - - - - 
- X - - Z Z Z - Y - X - - - - - 
- Z Z - X X Y - Y - X - - - - - 
- X - - Z Z Z - A - - - - - - - 
- Z Z - X X Y - - - - - - - - - 
- X - - Z Z Z - Y - - - - - - - 
- Z Z - - X X - - - S - - - - - 
- X - - Z Z Z - - - - - - - - - 
- Z Z - - X X - - - S - - - - - 
'Z' is the zero register, and '-' is don't care. The register file has 32 registers, so there is the option of using one-hot encoding for the most frequently used ones.

The good part is that we have an entire cycle to do the decoding, and store the result in a register, so it's feasible to use multiple logic layers, rather than very wide logic. The bad part is that such logic is hard to optimize.
hoglet
Posts: 367
Joined: 29 Jun 2014

Re: My new verilog 65C02 core.

Post by hoglet »

Arlet wrote:
By the way, if anyone is using this core for a project, I'd love to hear about it.
I'm trying to get this running in a BBC Micro.

The machine boots and I see the normal startup banner, but then it almost immediately crashes, and the crash is interrupt related I think.

Debugging is proving to be a bit of a challenge, but this is what is I think is happening:

It concerns this piece of OS 1.20 code:

Code: Select all

E560 : B8       : CLV            : 1
E561 : 6C 2C 02 : JMP (022C)     : 5
E464 : 08       : PHP            : 3
E465 : 78       : SEI            : 1
E466 : BD D8 02 : LDA 02D8,X     : 4
E469 : DD E1 02 : CMP 02E1,X     : 4
E46C : F0 72    : BEQ E4E0       : 2
E46E : 28       : PLP            : 3
E46F : 38       : SEC            : 1
E470 : 60       : RTS            : 4
Now, it seems that if the instruction *after* JMP (022C) is interrupted, then things go wrong:

Code: Select all

E560 : B8       : CLV            : 1
E561 : 6C 2C 02 : JMP (022C)     : 5
pc: prediction failed at 022E old pc was E464
022E :          : INTERRUPT !!   : 7
DC1C : 85 FC    : STA FC         : 3
DC1E : 68       : PLA            : 3
DC1F : 48       : PHA            : 3
DC20 : 29 10    : AND #10        : 2
DC22 : D0 03    : BNE DC27       : 2
DC24 : 6C 04 02 : JMP (0204)     : 5
...
8134 : 40       : RTI            : 5
022E : D1 E1    : CMP (E1),Y     : 5
0230 : A6 FF    : LDX FF         : 3
0232 : A6 FF    : LDX FF         : 3
0234 : A6 FF    : LDX FF         : 3
0236 : 90 01    : BCC 0239       : 2
0238 : 9F       : ???            : 1
0239 : 0D 0D A1 : ORA A10D       : 4
023C : 02 0D    : NOP #0D        : 1
023E : 2B       : NOP            : 1
023F : 2B       : NOP            : 1
0240 : 2B       : NOP            : 1
0241 : 2B       : NOP            : 1
0242 : 2B       : NOP            : 1
It looks like the wrong return address is being pushed onto the stack.

Here's are the bus cycles around the interrupt:

Code: Select all

0 b8 1 1 1
E560 : B8       : CLV            : 1
0 6c 1 1 1
1 2c 0 1 1
2 02 0 1 1
3 64 0 1 1
4 e4 0 1 1
E561 : 6C 2C 02 : JMP (022C)     : 5
0 08 1 1 1
1 78 0 1 1
2 02 0 0 1
3 2e 0 0 1
4 23 0 0 1
5 1c 0 1 1
6 dc 0 1 1
pc: prediction failed at 022E old pc was E464
022E :          : INTERRUPT !!   : 7
0 85 1 1 1
1 fc 0 1 1
2 00 0 0 1
DC1C : 85 FC    : STA FC         : 3
Does this bug seem possible, or has my debugging gone astray?

Dave
Post Reply