Quote:
Do I take it that the 2-tick cycle length for the block RAMs is to allow for the synchronous nature? That during a single microcode state, it allows for one tick to present the address and a second tick to collect the data? Whereas the LUT RAM (distributed RAM) can be combinatorial?
Your analysis is correct. The purpose of the microcycle length controller is to match the microcycle to the device/memory providing the data required. In the case of the recently released update, rather than having the ALU Valid signal initiate the delay required by BCD addition/subtraction, I dynamically lengthen the cycle when an ADC/SBC instruction is executed when the D flag in P is set.
Quote:
I'm also curious about the 4 cycle external memory. Can't you just make it 2 cycle, like the block RAMs, and use the RDY signal to extend it if necessary?
If different memory types are used to enhance the speed of the microprocessor (core plus interrupt handler, memory controller, etc.), then the microcycle length controller can dynamically adjust the length of the microcycle when different memories are accessed. The expectation is that external memory will require no less than two cycles (output address register delay plus input data register delay). If synchronous, fall through memory is used, then an additional cycle is also required. If synchronous, registered memory is used, then there will be a two cycle delay. The fall-through memory is like an FPGA block RAM in that the output is ready on the rising edge after the address is registered. The issue with the fall-through design is that like an asynchronous memory, there is an access delay before the data would arrive back at the FPGA IOB. That delay plus the internal path delays of the FPGA IO path likely means that the data is not valid at the output of the IOB register until the second rising edge. The other synchronous SRAM style imposes an additional clock delay, but it eliminates the access delay and converts it to a register output delay which is generally shorter. This probably makes the interfacing of the FPGA to the registered SyncSRAM a much better proposition. In either of these two cases, I count 4 cycles as being required.
Although the present implementation only supports 1, 2, and 4 cycle microcycle lengths, it is possible to include a 3 clock microcycle. As currently implemented, the microcycle length controller accepts a Wait signal in the second and/or third states. This allows the address to be registered to the external memory, and for the external memory to stretch the access cycle as required. It also possible for all of the timing to be controlled from the memory access controller, but I had anticipated that the memory controller would also support external memory-mapped I/O devices. Those devices may require additional cycles that the design could be designed to support implicitly, but it would be more flexible if I/O devices inserted their own cycle extensions.
On the whole, the microcycle length controller solves some problems, but it tends to leave some cycles on the floor. When a sequence of reads is required from block RAM, then only the first is required to wait one cycle. If the address of the second and third accessess immediately follow, then after four cycles all three data bytes have been read. The problem I was having in my second core stems from attempting to read three bytes in four cycles instead of 6 cycles. The interplay among the various instruction sequences and lengths, and the read modify write sequences played havoc with the microcode.