6502.org
http://forum.6502.org/

X6510 verilog cores
http://forum.6502.org/viewtopic.php?f=10&t=2842
Page 1 of 1

Author:  player55328 [ Mon Jan 13, 2014 6:08 am ]
Post subject:  X6510 verilog cores

Here is the Verilog source for my latest X6510 cpu's. Big thanks to Ed for getting me a version of Klaus' test suite. It found 2 more bugs so do not use anything I may have posted before. I also added the RDY input signal to be more flexible. I am posting a 32 and 16 bit data bus version of the normal memory interface. I may also post the 32 bit Harvard architecture version with pipelined memory once I get RDY working with it.

Here is some example code on how I drove the RDY line...

`ifdef debug
reg cpu_ack2;
reg kernelRDY = 0;
reg kernelRDYa = 0;
reg basicRDY = 0;
reg basicRDYa = 0;
reg basicRDYb = 0;
reg io_spaceRDY = 0;

//#################################################
// delay accesses if required
assign RDY = (charen | ramsel) | (~(cpu_ack ^ cpu_ack2) & (io_spaceRDY | kernelRDY | basicRDY));
`else
assign RDY = kernelen | basicen | charen | ramsel | io_space; //full speed - no delays
`endif

//#################################################
// data read selection
always@(posedge mclk)
begin
`ifdef debug
cpu_ack2 <= cpu_ack;

kernelRDYa <= kernelen & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
kernelRDY <= kernelRDYa & ~kernelRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYa <= basicen & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDYb <= basicRDYa & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
basicRDY <= basicRDYb & ~basicRDY & ~(cpu_ack ^ cpu_ack2);
io_spaceRDY <= io_space & ~io_spaceRDY & ~cpu_ack;
`endif


If you have any questions or feedback I am all ears...


edit:

New family member with a 64 bit memory bus. This one was to try and reduce the extra cycle that may have to be inserted when writing or doing a absolute rmw cycle several instructions in a row. Also it is a stepping stone to what I am working on now to try and execute 2 consecutive instructions at the same time. This bus structure allows me to guarantee the loading of at least 2 instructions per instruction fetch cycle. The 2 can be decoded simultaneously with 2 decode modules. The harder part is the execution. Attached is a timing comparison on these 3 implementations running some specific code. *** It takes too much logic to do this so I am going to cry uncle here. There really would not be significant throughput improvements anyway since not all instruction combinations could be executed at the same time. ***

Attachments:
File comment: 32 bit data bus (an address bus for each 16 bits) - This one is smaller that the other 32 bit version and has a better fmax. I lowered the queue size to 9 , made writes and absolute rmw instructions take 2 cycles and implemented queue write updates as these writes happen so it is a more efficient for self modifying code. Under all the testing circumstances the instructions always take the same number of clocks to complete.
Atlys64Acpu9.zip [47.97 KiB]
Downloaded 150 times
CpuCompare.rtf [118.01 KiB]
Downloaded 160 times
File comment: 64 bit data bus (an address bus for each 16 bits)
Atlys64A64cpu.zip [48.58 KiB]
Downloaded 148 times
File comment: 32 bit data bus (an address bus for each 16 bits)
Atlys64Acpu.zip [47.79 KiB]
Downloaded 152 times
File comment: 16 bit data bus (an address bus for each 8 bits) this is a clock slower with branches and jumps and will need more IR load states but is a little smaller too...
Atlys64A16cpu.zip [47.5 KiB]
Downloaded 159 times

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/