jblang:
I just saw your last two posts. Glad to see that you have gotten you project working in simulation. I am wondering if you've actually gotten it running yet on your target board?
I haven't tried to get Arlet's core running in simulation or on a board in a few years. However, he clearly defines its operation as targeting an asynchronous RAM interface. Unfortunately, the large RAM blocks in Xilinx and Altera(Intel) FPGAs are synchronous RAMs.
This characteristic essentially adds a pipeline delay that is detrimental to the operating characteristics of Arlet's core. You can see this characteristic clearly in the
timing diagram you posted yesterday: the reset address is presented on the address bus on the rising edge of the clock and then the data output of the RAM is shown asserted at the next rising edge of the clock.
One thing that I don't particularly care for in behavioral simulation of HDL is that the logic delays actually present in the real circuit are not shown in the timing diagrams. What is displayed are those transition Xs that indicate that on a clock edge the state of a signal changed. This is all well and good, but in my opinion, frequently provides a misleading picture of the operation of a particular circuit.
In the case of your address output followed by your RAM data output, I would expect that there is a half cycle overlap between the address output and the RAM data output. In actuality, there is a finite, fixed delay between the calculation of the address inside the core, and there is a fixed data setup time on the following rising edge for the data input to the core, i.e. the RAM data output plus the data multiplexer.
When I want to use synchronous RAMs in a reliable manner for single cycle operation, I do the following: (1) output the address from the core on the rising edge, (2) capture the address into the address buffers of the RAM on the falling edge, and (3) capture the output data from the RAM/multiplexer on the rising edge of the clock. I expect that this approach will work for your project as well, and is the approach used in the project I referred you to previously.
What is missing from your timing diagram, because it is not included in your source, are models of the delays inherent in the generation of the address and the RAM data output. Because I can't keep many of these behaviors in the forefront of my mind's eye when looking at timing diagrams, I include a #1 delay statement in all of my synchronous logic statements. Including such a statement in the HDL raises a warning when synthesizing for the FPGA, but I simply mark the warning as OK and have the tool ignore it. It does, however, force the simulator to display the result shifted to the right by the amount I specified, which allows me to view the signal as being generated as a result of the edge instead of thinking of it as being correctly asserted on the edge.
The combinatorial (continuous assignments) signals do not include that delay statement, but because any synchronous signals include a slight delay, the results of the combinatorial signals will also be shifted to the right. In this way, I can keep track of which signals are generated (delayed from) a clock edge and which are expected to be valid on (asserted before) a clock edge.
One note of caution if you decide to adopt a similar approach. Don't spend too much time trying to "model" the logic delays of your logic with the delay statements I recommended you add to your HDL. That is a losing proposition; the tools do that automatically when synthesizing for the target. My recommended approach is only appropriate for the behavioral simulation phase, which doesn't care about the actual logic delays in the target. As I stated above, I use the approach primarily to help keep track of the generation and consumption of signals. Furthermore, I've encountered some problems in the simulation tools (ModelSim and ISim, in particular) where the simulator fails to provide a valid result. With the delay statements added, I've not encountered these issues further with either simulator for the projects I develop.
This post does not directly answer your question, but I think it should provide some insight to some of the problems that you seem to be having. One of my greatest frustrations in moving to HDL-based FPGA development 15+ years ago was the seeming disconnect between the code and the simulation results. I had to work hard to convince myself, by focusing on using the expected inferred logic templates the tool documentation provided, that the behavioral simulation results matched the synthesized logic behavior that I wanted. Once that confidence took hold, and I consistently applied the same logic templates in my HDL, I became much more efficient and could focus more on the abstract solution rather than the low level logic produced by the synthesizer. In fact, I am a big proponent of behavioral simulation, and I frequently use it to help me debug problems in boards with FPGAs by focusing my efforts outside of the FPGA because of the confidence that I have in the simulation/synthesis results.
Hope this helps as you continue the development of your project. Arlet's core is very efficient, and should be used as an example of good Verilog.