Modern 65816s are all the 14MHz part, so unless you can still purchase the 2MHz model of 65816, you will find that the hold time maxes out at 10ns.
However, look at what it's saying: the hold time required of the LCD is 2-55ns. The CPU would deliver 80ns (in reality, more like 10ns due to the more modern fabrication processes). The CPU, assuming you have a real 2MHz part, would meet the requirements of the LCD.
Since you'll likely have a much faster part (even if you drive it slower), then there are still a variety of options available to you.
First, and the easiest possible solution, is to drive the LCD through a VIA chip. Use one of the I/O ports as the data bus, and the other port for bus control signals. Here, you will manually control the E line in software. This will allow you software-defined control over the LCD bus. However, it's slower. There will be no use of the MVN/MVP instructions to block-transfer graphics, for example.
Second, you could write the data to be written to a 74LS373 or compatible 8-bit register. Then, you could write to an I/O port that, via gating in the address decoder, drives the E line. For example:
Code:
LDA #byte_to_write
STA lcd_write_latch
LDA #some_dummy_value
STA lcd_memory_address
The advantage of this approach is that the data in the latch remains asserted even after E goes high. The disadvantage is, while faster than the VIA approach, you still don't have full CPU bandwidth to graphics memory.
Finally, another approach is to just plain use a monostable multivibrator of some kind to bring RDY low for a single clock cycle's duration -- that is, to introduce a
wait state. This will allow you direct CPU access to the video memory, but you'll top out at one half the CPU's maximum throughput,
except for MVN and MVP instructions, where each byte transferred will take
8 cycles instead of 7 (since only one of those cycles involves actually writing to the LCD).
There is a variety of other ways I haven't covered (clock skewing, dynamic clock frequency adjustment, data bus delay lines, quadrature clocks, mutual synchronization of multiple clocks (doable in only 4 D-flipflops, I might add), etc), but I'm trying to find the simplest possible potential solutions.
If you're interested in using multiple clock synchronization, look through this forum for where I talk about bus synchronization. I forget specific links now, but it was a pretty hot topic sometime last year.
NOTE: ADVANCED MUSINGS...
I should point out that these problems all exists on other processor architectures too. This is why the 68000 has a fully asynchronous bus handshake system. The Intel processors now-a-days use a quasi-asynchronous system for I/O and a fully-synchronous, pipelined, burst-mode interface for synchronous DRAM and SRAM interfaces. The idea being, different modes of operation work for different classes of devices.
The 6502 bus architecture is simple, and some might say is
too simple. However, looking at all the bus controller chips and whatnot for the Intel-style bus that include wait-state generators, one would think, "Gee, if it's on a chip, wait state generators must be complex pieces of logic!" In reality, they're dreadfully simple. A wait-state generator for the 6502 consists of a two 2-bit shift registers wired in a certain configuration. It takes only 2 74ACT74 chips and a small number of 2-input gates to implement. Total cost is less than $0.30 depending on where you get your chips from. Dedicating
expensive chip real-estate to that simple a circuit would not be justified.
Another aspect to consider is that, if you're working with lots of different speed peripherals, you're going to have different sets of wait-states. The Intel-bus solutions don't generally handle this gracefully, at least not that I've seen, as long as you rely on those on-chip wait state generators.
So which bus is simpler? You're pulling your hair out over the 6502 bus now. If you used the Z-80, you might get your project running sooner. But, in terms of actual cost, I think you'll find the 6502 is substantially easier to work with. And it's substantially faster -- an 8MHz 6502 bus transfers data at 8MB/s (assuming no wait states). A Z-80 can pull off at best 2.6MB/s, and a 8085 only 2MB/s.
To transfer data faster than this, you'll start looking into wider buses instead of faster CPU speeds. A 6502 at 4MHz will compare favorably with a Z-80 at 8MHz. Thus, a 16-bit wide bus at 4MHz will still support a transfer speed of 8MB/s.
What this boils down to is one of philosophy. In terms of software, the 6502 bus is the Unix philosophy -- that is, "worse is better." The CPU's bus is
just adequate enough to allow the CPU to talk with peripherals. It doesn't say how easily it can, but it can, and that's all that matters. The CPU's bus is optimized for what
it has to do. Intel's bus (and to a large extent, Motorola's) has the philosophy of being as
correct as possible. It's a nice goal to strive for, but how can you be correct for all possible people?
The result is greater complexity and more difficulty in understanding all the different modes and sub-modes of the bus. Intel-style buses have pretty sophisticated timing requirements that are not immediately obvious from reading the timing diagrams, for example. After _RD and _WR are negated, how long until the next cycle begins? That is critical knowledge which, if you read peripheral datasheets, can be
anything, from 0ns to 100ns. How does _RD interact with _CS? If you read the datasheets of RAM chips, you'll find at least 3 different interactions. Etc.
I've also pulled my hair out when working with the 65816. Just search this forum for "Kestrel 1" and you'll see the problems I encountered very early on. But given a choice between the 6502 bus and the 68000 bus today, I'd definitely pick the 6502 bus. The fully synchronous design really is simpler, even if you don't see the simplicity up-front.
OK, I'll get off the soap-box now.