6502.org • View topic - My new verilog 65C02 core.

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

My new verilog 65C02 core.

Page 5 of 16

[ 232 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6, 7, 8 ... 16 Next

Print view

Previous topic | Next topic

Author

Message

BigEd

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 12:29 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Is that Verilator? I gather it's long been the highest-performance simulator, but I'd be interested in others.

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 12:37 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Yes, latest version of Verilator.

Note this is only true on the generic code. Running the LUT-level simulation is significantly slower, because then it needs to handle each bit separately. It also generates a line of C++ code for each bit initialized in each of the LUT memories, producing thousands extra lines of source code, which take noticeably longer to compile.

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 12:54 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

A while ago, when I was near the start of this project, I thought I had a clever idea to make a 'bypassed M input' in the ALU shifter:

Code:

 * op       function
 * ==============================
 * 00---  | unmodified adder result
 * 01---  | bypassed M input
 * 10---  | adder shift left
 * 11---  | adder shift right

The idea was that the ALU itself would not need an option to just pass M unmodified (M is the data from last memory fetch), but it would always incorporate the register file in the calculations. The result then goes into the shifter, which can pass it along, or do a shift/rotate left or right. Because there was an unused 4th option, I figured that I could use that as a bypass for unmodified M. This could be useful for doing LDA/LDX/LDY instructions.

But it now turns out that this feature is not needed. Instead, I can just calculate "M <OR> R", where R comes from the register file, and is selected to fixed zero register. Not only does that reduce some routing pressure, as well as fan-out for M, but it also means that it's easier to speed up Z-flag calculation, since the bits always come from the ALU adder output.

Not sure it's causal, or just random variation, but fastest run is now 6.405 ns (156 MHz)

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 1:31 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

By the way, some of the things are only possible because I'm using a dual port register file. For instance, LDA is doing A <= M | Z, using both A and Zero registers at the same time. And of course, the single-cycle TXA uses both X and A registers.

If you're ever designing a CPU, I highly recommend using dual port (or even triple port) distributed memory for the register file. The costs are acceptable (if your FPGA target supports it), and it makes life a lot easier.

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 5:54 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

I finished the (hopefully) final rewrite of the microcode control bits for the address bus, incorporating the F7MUX, and also did some more instantiations in the Spartan6 design elements, including this ALU shifter mux that worked the first time, even though I wrote down the hex INIT strings without first making a table.

Code:

 * op       function
 * ===============================
 * 0?--  | unmodified adder result
 * 10--  | adder shift left
 * 11--  | adder shift right
 */

LUT5 #(.INIT(32'hf0ccaaaa)) out0(.O(OUT[0]), .I0(add[0]), .I1(SI),     .I2(add[1]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out1(.O(OUT[1]), .I0(add[1]), .I1(add[0]), .I2(add[2]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out2(.O(OUT[2]), .I0(add[2]), .I1(add[1]), .I2(add[3]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out3(.O(OUT[3]), .I0(add[3]), .I1(add[2]), .I2(add[4]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out4(.O(OUT[4]), .I0(add[4]), .I1(add[3]), .I2(add[5]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out5(.O(OUT[5]), .I0(add[5]), .I1(add[4]), .I2(add[6]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out6(.O(OUT[6]), .I0(add[6]), .I1(add[5]), .I2(add[7]), .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out7(.O(OUT[7]), .I0(add[7]), .I1(add[6]), .I2(SI),     .I3(op[3]), .I4(op[4]));
LUT5 #(.INIT(32'hf0ccaaaa)) out8(.O(CO),     .I0(C8),     .I1(add[7]), .I2(add[0]), .I3(op[3]), .I4(op[4]));

The LUT5 is a nice thing to use, because on the Spartan 6 this means you get 8 LUTs in a slice instead of 4.

The address bus logic is now fully instantiated, as well as the main ALU path, but not yet the random logic involving the flags and BCD adjustment. Still need to do the microcode control logic, flags, and register file.

Slice count is 44 right now, 60 flip flops, and 134 LUTs. The slice/LUT ratio indicates that slice count could go lower with better packing. Something in the 30's could be possible, maybe.

Top

BigEd

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 6:09 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

(Just a thought here, for those wanting a full 64k of block RAM and finding a conflict with this new core's 2k microcode taking some of that up: if there's 2k of ROM in the system, perhaps put that into distributed RAM? Or even synthesise it - if it's say a font, it might be quite compressible. Or maybe give yourself 2k of distributed RAM - would that work? It might be that distributed RAM works better for general purpose byte-wide than it does for microcode.)

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 6:20 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

As you can see here, automatic floorplanning doesn't look very impressive. And if you show the ratsnest for some of these isolated elements, it goes all over the place.

Of course, the Xilinx business plan is to sell you a bigger/faster FPGA instead.

Attachments:

floorplan.png [ 15.54 KiB | Viewed 560 times ]

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 6:32 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Here are the 8 base register muxes for the lower address bus, plus all their connections to other elements.

That area in the bottom left is where the lower half of program counter is stored.

Attachments:

adl.png [ 67.36 KiB | Viewed 557 times ]

Top

Chromatix

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 7:12 pm

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462

That's pretty bad. What does it look like for the same elements after you manually place stuff?

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 7:39 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Here I have manually placed the same input mux in a row (the parts in orange). The blue parts are still left up to the tools, and are still scattered everywhere.

Some of the other orange things are just blocks I moved out of the way to make room.

The user interface of that tool (planAhead) is pretty horrible, by the way. I constantly ended up accidentally zooming instead of moving components. It is helpful that it shows you where you can place components. If you hover over the wrong place, shows that it won't fit there. For instance, in the slices with the F7MUX used, you can still use 2 of the 8 flip flops in the slice using bypass inputs, but not any of the remaining 6.

Together with the slice diagram, it's helpful to learn the possibilities and restrictions.

Unfortunately, on my (Linux) machine, the FPGA editor doesn't work. I'm missing some (old) dynamic libraries.

Attachments:

manual.png [ 67.45 KiB | Viewed 543 times ]

Last edited by Arlet on Tue Oct 27, 2020 7:47 pm, edited 1 time in total.

Top

65f02

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 7:41 pm

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany

BigEd wrote:

For the 65F02, I would not want to "hardwire" the design to one specific host -- and would hence want to grab the host ROM upon startup. So it would need to be distributed RAM, not ROM, in the FPGA.

Most host systems will have less than 64k of RAM+ROM anyway, so one could spare 2k for the microcode. Or I could forfeit acceleration for one 2k block of ROM (or RAM), and always access the host memory for that block. Opening libraries for the chess computer, or video RAM for personal computers come to mind; that would not hurt the performance much.

But all the above schemes suffer from the difficulty of excluding (only) a 2k block from the 64k memory range. As I have learned during my experiments with 9-bit-wide memory, Xilinx synthesizes the 64k*8 memory by using the block RAMs in 16k*1 configuration -- resulting in much smaller and faster 4:1 multiplexers than if they would build it from 2k*8 blocks. So I wouldn't be able to carve a 2k*8 block out of the 64k memory, it seems.

Top

BigEd

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 7:48 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Oh, yes, I'd forgotten that architectural situation.

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 7:53 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

You could still mix and match, using 32 kB + 16 kB + 8 kB blocks, etc.. not ideal, but better than building it from 2k*8.

And instead of using a mux, you can make a 6-input OR to combine them. The block RAMs have a reset pin that allows you to set the output to all zeroes (or any value you want). So, you could do 16, 16, 16, 8, 4, 2, and use a single LUT per data bit.

Top

65f02

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 9:00 pm

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany

Arlet wrote:

And instead of using a mux, you can make a 6-input OR to combine them. The block RAMs have a reset pin that allows you to set the output to all zeroes (or any value you want). So, you could do 16, 16, 16, 8, 4, 2, and use a single LUT per data bit.

But the block RAM's reset option works only with registered outputs, if I recall correctly. And that wouldn't work with your core, right?

Top

Arlet

Post subject: Re: My new verilog 65C02 core.

Posted: Tue Oct 27, 2020 9:29 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

The block RAM data is also registered, so the reset will work the same.

I've modified my core to also work with synchronous memory, using unregistered address output. The only caveat is that the writes overlap with the reads, so it requires dual port RAM. If you're not using the 2nd port for anything, it should work with the block RAMs.

Top

Page 5 of 16

[ 232 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6, 7, 8 ... 16 Next

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Who is online

Users browsing this forum: No registered users and 12 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum