X6510 verilog cores

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
Post Reply
User avatar
player55328
Posts: 23
Joined: 06 Aug 2013
Location: Oregon

X6510 verilog cores

Post by player55328 »

Here is the Verilog source for my latest X6510 cpu's. Big thanks to Ed for getting me a version of Klaus' test suite. It found 2 more bugs so do not use anything I may have posted before. I also added the RDY input signal to be more flexible. I am posting a 32 and 16 bit data bus version of the normal memory interface. I may also post the 32 bit Harvard architecture version with pipelined memory once I get RDY working with it.

Here is some example code on how I drove the RDY line...

`ifdef debug
reg cpu_ack2;
reg kernelRDY = 0;
reg kernelRDYa = 0;
reg basicRDY = 0;
reg basicRDYa = 0;
reg basicRDYb = 0;
reg io_spaceRDY = 0;

//#################################################
// delay accesses if required
assign RDY = (charen | ramsel) | (~(cpu_ack ^ cpu_ack2) & (io_spaceRDY | kernelRDY | basicRDY));
`else
assign RDY = kernelen | basicen | charen | ramsel | io_space; //full speed - no delays
`endif

//#################################################
// data read selection
always@(posedge mclk)
begin
`ifdef debug
cpu_ack2 <= cpu_ack;

kernelRDYa <= kernelen & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
kernelRDY <= kernelRDYa & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYa <= basicen & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYb <= basicRDYa & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDY <= basicRDYb & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
io_spaceRDY <= io_space & ~io_spaceRDY & ~cpu_ack;
`endif


If you have any questions or feedback I am all ears...


edit:

New family member with a 64 bit memory bus. This one was to try and reduce the extra cycle that may have to be inserted when writing or doing a absolute rmw cycle several instructions in a row. Also it is a stepping stone to what I am working on now to try and execute 2 consecutive instructions at the same time. This bus structure allows me to guarantee the loading of at least 2 instructions per instruction fetch cycle. The 2 can be decoded simultaneously with 2 decode modules. The harder part is the execution. Attached is a timing comparison on these 3 implementations running some specific code. *** It takes too much logic to do this so I am going to cry uncle here. There really would not be significant throughput improvements anyway since not all instruction combinations could be executed at the same time. ***
Attachments
Atlys64Acpu9.zip
32 bit data bus (an address bus for each 16 bits) - This one is smaller that the other 32 bit version and has a better fmax. I lowered the queue size to 9 , made writes and absolute rmw instructions take 2 cycles and implemented queue write updates as these writes happen so it is a more efficient for self modifying code. Under all the testing circumstances the instructions always take the same number of clocks to complete.
(47.97 KiB) Downloaded 224 times
CpuCompare.rtf
(118.01 KiB) Downloaded 208 times
Atlys64A64cpu.zip
64 bit data bus (an address bus for each 16 bits)
(48.58 KiB) Downloaded 216 times
Atlys64Acpu.zip
32 bit data bus (an address bus for each 16 bits)
(47.79 KiB) Downloaded 218 times
Atlys64A16cpu.zip
16 bit data bus (an address bus for each 8 bits) this is a clock slower with branches and jumps and will need more IR load states but is a little smaller too...
(47.5 KiB) Downloaded 223 times
Post Reply