Ok, forget the simulations for now. I've been going with what I have observed that works. And that is running the cpu at 1/2 the video pixel clock.
Today I did exhaustive testing and narrowed the margin for what works 100% and what doesn't work, as far as a read-write-read from the external SyncRAM is concerned.
First I'll explain the test, then the multiple hardware changes:
The test I performed was a 5-part test.
1) CPU writes to the SyncRAM to clear the screen. This tests a repeated write only with the smallest software delay.
2) CPU writes a 8x8 character to SyncRAM. This tests a write only, with some software algorithms to slow it down.
3) CPU writes a circle. This tests a write only, with different software algorithm. Also I can see a true square pixel with this.
4) CPU copy/paste the previously plotted character right next to the original. This tests a read-write-read.
5) Draw a line using LineGen, a hardware module running @cpu speed. This tests the lowest write delay from FPGA to SyncRAM.
I'll start with what works for all above tests, cpu always @1/2 pixel clock:
1024x768 @70MHz
1024x1024 @80MHz
1024x1024 @84MHz
Writing still works, but reading fails:
1024x1024 @88MHz
4MHz is a pretty tight margin, and a clue that I am failing to see right now. But I thought I would report before I forgot.
EDIT: Added #5.
I had forgotten.