Page 1 of 1

X6510 verilog cores

Posted: Mon Jan 13, 2014 6:08 am
by player55328
Here is the Verilog source for my latest X6510 cpu's. Big thanks to Ed for getting me a version of Klaus' test suite. It found 2 more bugs so do not use anything I may have posted before. I also added the RDY input signal to be more flexible. I am posting a 32 and 16 bit data bus version of the normal memory interface. I may also post the 32 bit Harvard architecture version with pipelined memory once I get RDY working with it.

Here is some example code on how I drove the RDY line...

`ifdef debug
reg cpu_ack2;
reg kernelRDY = 0;
reg kernelRDYa = 0;
reg basicRDY = 0;
reg basicRDYa = 0;
reg basicRDYb = 0;
reg io_spaceRDY = 0;

//#################################################
// delay accesses if required
assign RDY = (charen | ramsel) | (~(cpu_ack ^ cpu_ack2) & (io_spaceRDY | kernelRDY | basicRDY));
`else
assign RDY = kernelen | basicen | charen | ramsel | io_space; //full speed - no delays
`endif

//#################################################
// data read selection
always@(posedge mclk)
begin
`ifdef debug
cpu_ack2 <= cpu_ack;

kernelRDYa <= kernelen & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
kernelRDY <= kernelRDYa & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYa <= basicen & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYb <= basicRDYa & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDY <= basicRDYb & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
io_spaceRDY <= io_space & ~io_spaceRDY & ~cpu_ack;
`endif


If you have any questions or feedback I am all ears...


edit:

New family member with a 64 bit memory bus. This one was to try and reduce the extra cycle that may have to be inserted when writing or doing a absolute rmw cycle several instructions in a row. Also it is a stepping stone to what I am working on now to try and execute 2 consecutive instructions at the same time. This bus structure allows me to guarantee the loading of at least 2 instructions per instruction fetch cycle. The 2 can be decoded simultaneously with 2 decode modules. The harder part is the execution. Attached is a timing comparison on these 3 implementations running some specific code. *** It takes too much logic to do this so I am going to cry uncle here. There really would not be significant throughput improvements anyway since not all instruction combinations could be executed at the same time. ***