I can now poll the LCD busy signal during normal operation, but I still need to observe the mandated delays during LCD initialization. The shortest delay I need is 100uS, which I calculate to be 400 clocks, since I'm running at 4Mhz.
Code:
; jsr DELAY_100us 6 clocks
DELAY_100us:
txa ; 2 clocks (392)
pha ; 3 clocks (389)
ldx #75 ; Magic number! 2 clocks (387)
delay100us:
dex ; 2 clocks per loop
bne delay100us ; + 3 clocks per loop = 5 clocks per loop * 75 = 375
pla ; 4 clocks (383)
tax ; 2 clocks (381)
rts ; 6 clocks (375
I figured that 1ms is ten of those, so I made a parent subroutine that is basically the same idea, except with 10 iterations instead of 75:
Code:
; jsr DELAY_1ms ; 6 clocks
DELAY_1ms:
txa ; 2 clocks
pha ; 3 clocks
ldx #10 ; 2 clocks
delay1ms:
jsr DELAY_100us ; 400 clocks per loop
dex ; 2 clocks per loop
bne delay1ms ; + 3 clocks per loop = 405 clocks per loop * 10 = 4050
pla ; 4 clocks
tax ; 2 clocks
rts ; 6 clocks
I make that 4075 clocks, which is 1ms + an extra 18.75 uS, which is fine since we want "at least" 1 ms delay. I did exactly the same thing for 4ms (called DELAY_1ms 4 times) and for 40ms (calledDELAY_4ms 10 times). I will refrain from including them, since they are literally the same subroutines, copy/pasted with the labels and magic numbers changed. Each one picks up a few extra microseconds beyond spec from saving / restoring the X register, and so on. Again this seems fine, since we want "at least," and everything seems to be working.
HOWEVER.
Having four special purpose nested subroutines does not seem to me like an elegant solution to this problem. The roughly 50 bytes they occupy is not that big of a ROM cost, but since I've only got 28kb, I expect that things are going to get crowded as I keep adding features.
Since I have two index register that, in combination, can count up to $FFFF, I feel intuitively that I should be able to create one general purpose delay function that simply accepts a parameter telling it what to count to. The (one single page) section about delay loops in Leventhal does something like this, but it is hard-coded for 1Mhz, and the algebra he uses produces constants for faster clocks that are larger than what can be stored in a single byte. I have not tried too hard to invent this subroutine, because I bet that at some point in the last 50 years someone has already done it, and that someone here on these forums knows where to find it.