I was thinking last night about the synchronous vs asynchronous memory bus. As I was working with real hardware, and trying to use the block RAMs, I had to use the 'negedge' clock to simulate asynchronous behavior. While this works just fine for a test, it has a severe disadvantage that it essentially cuts the clock period in half, resulting in lower Fmax. On the other hand, writing a core for synchronous memory takes more effort, and results in a bigger and slower design, which is a waste if you're dealing with a naturally asynchronous memory bus. Ideally, I want to use both at the same time, with a typical system setup of external asynchronous memory (or I/O devices), and internal block RAM for the boot code/operating system ROMs
Now, I do have an early AD signal available, which is a combinatorial version of the address bus. The core does basically this:
Code:
always @(posedge clk)
AB <= AD;
I had an idea: why not simply feed that AD into the block RAMs instead of the AB signal ? That way we get an early address set up
before the positive clock edge, and the RAM can get the result exactly when the core needs it. We can still use asynchronous memory in the same system by registering that AD signal in the output pads.
It's an obvious idea, but it fails.
The problem is with writing data. With asynchronous memory, you can have a write in one cycle, and then switch to a read on the next cycle, typically to fetch the next opcode. With synchronous memories, when doing a write, followed by a read, the result from that read won't be available until one cycle later.
However, there is a solution.
The block RAMs inside the FPGA are dual ported, so we can use one port as a read port with the address bits connected to AD. The other port we will use for writing only, using the registered AB for the address. Of course, if you only use the block RAM as read-only (such as an OS/boot ROM), you don't need to use the write port.
If you don't have a dual ported RAM available (or you want to use the 2nd port for something else, like a video processor), you can still use a single port, but then add a wait state (using RDY=0) whenever you're trying to switch from write -> read. As long as you're not executing code from that particular memory, I don't think that should ever happen.