Blue April - RC6502 Project Space

Paganini · Post by **Paganini** » Sun Apr 17, 2022 4:03 pm

Michael wrote:

Perhaps you could just use CB2 plus two port pins and bit-bang the driver and enjoy using the LCD 8-bit interface mode?

This was my plan...

I think the way Garth has the E signal going out of Q0 is REALLY CLEVER though; it makes me want to do it that way.

Paganini · Post by **Paganini** » Tue Apr 19, 2022 10:34 pm

I am still having some trouble with this. I wired it up like Garth's example:

And I can verify that the '595 is working, by replacing the LCD with some LEDs, and the delay loops are long enough that I can actually see the bytes changing. I suspect the problem might have to do with this:

GARTHWILSON wrote:

Be sure to leave E down after every read or write, and only change the RS and R/W bits when E is down, and never at the same time with raising or lowering E.

From an old thread.

With the control signals on bits 0 and 1 of the SR, they appear instantly, along with the nibble of data, when I pulse the 595. I am thinking that I will have to do something like this, in order to send, say 00111100 to the LCD:

Code: Select all

High nibble send:
 0011  0000
|    ||    |
+----++----+
 0011  0001 <- Enable bit
|    ||    |
+----++----+
 0011  0000
|    ||    |
+----++----+

Low nibble send:
 1100  0000
|    ||    |
+----++----+
 1100  0001 <- Enable bit
|    ||    |
+----++----+
 1100  0000
|    ||    |
+----++----+

In other words, send 6 bytes per byte of data, along withe accompanying 16 clock delays for the SR. That seems like a fair bit of extra steps / delays; before I try it that way I want to make sure I have the concept right.

GARTHWILSON · Post by **GARTHWILSON** » Wed Apr 20, 2022 12:59 am

Paganini wrote:

I wired it up like Garth's example:

I guess I shouldn't have reduced it so much, losing some of the resolution and making it kind of hard to read. It's a scan of the page in the 3x5" loose-leaf ring binder quick-reference guide I keep on the workbench. It was not originally intended for anyone else's eyes.

Quote:

In other words, send 6 bytes per byte of data, along withe accompanying 16 clock delays for the SR. That seems like a fair bit of extra steps / delays; before I try it that way I want to make sure I have the concept right.

Right. These displays have few enough characters though that it will still appear to be instant, which would not be the case if it were 132 columns of 43, 50, or 60 lines and you had to change it all at once!

16 clocks is only about four instructions' worth though, so the added delay to get there isn't much. Five NOPs might do it, six definitely would, or find an instruction with more cycles that doesn't do any damage, like rotate an unused RAM byte.

Paganini · Post by **Paganini** » Wed Apr 20, 2022 1:39 am

GARTHWILSON wrote:

I guess I shouldn't have reduced it so much, losing some of the resolution and making it kind of hard to read. It's a scan of the page in the 3x5" loose-leaf ring binder quick-reference guide I keep on the workbench. It was not originally intended for anyone else's eyes.

I did zoom in on it some, but I think it's pretty clear. Some of your pin names were different from my datasheet, but you also gave pin numbers, so I could cross reference. That made getting everything hooked up pretty straightforward.

Quote:

Right. These displays have few enough characters though that it will still appear to be instant, which would not be the case if it were 132 columns of 43, 50, or 60 lines and you had to change it all at once!

Hah! OK, cool.

Quote:

16 clocks is only about four instructions' worth though, so the added delay to get there isn't much. Five NOPs might do it, six definitely would, or find an instruction with more cycles that doesn't do any damage, like rotate an unused RAM byte.

I made a subroutine called "wait_16." Levinthal says jsr is 6 clocks and rts is six clocks, so I just filled in two nops. It seems to work OK - at least my LED s blink the right bits!

GARTHWILSON · Post by **GARTHWILSON** » Wed Apr 20, 2022 3:10 am

Paganini wrote:

I made a subroutine called "wait_16." Leventhal says jsr is 6 clocks and rts is six clocks, so I just filled in two nops.

NOP is 2 clocks; so JSR to an RTS is 6 NOPs' time. The STA SR itself adds 4 more clocks. So

Code: Select all

      STA  SR
      JSR  <to an empty subroutine>
      ORA  #1
      STA  SR

already gives you 18 from one write to the SR to the next, without any NOPs. However, you still need to pulse the 595's latch input after completing each byte shift. You can do the ORA #1 (for the second write, to set E true) and the AND #$FE (for the third write, to set E false again) while the shifting is being completed, before pulsing the latch input. By the time you do that, the JSR will be overkill for delay, especially if you need the accumulator for both the latching and the ORing and ANDing of what you write to the SR. Without testing, what I don't know is whether reading the SR after the shifting is completed will give you back the byte you wrote to it, such that you could do INC SR...DEC SR (since your LCD's E line is on bit 0) so you wouldn't need the accumulator for the ORing and ANDing, so it's free to use exclusively to pulse the 595's latch line after each byte shift is completed—if indeed you do need the accumulator for that. (Again, I don't know what you have it connected to.)

Paganini · Post by **Paganini** » Mon Apr 25, 2022 11:46 pm

I did get it working using the VIA shift register... kindasorta. The shift register was working properly, latching data, etc., and the LCD would initialize, but not reliably. My particular one seems to be extra fussy about the timing loops - probably because it's a cheap Chinese knockoff. I decided to go back to using the the VIA's output port (but now in 4-bit mode) so I could check the busy flag. That's where it is now. I don't consider the serial excursion to be wasted time even though I ended up abandoning that path; I gained a good basic understanding of how to use the 6522's shift register; it seems like there are plenty of interesting things to do with it so that it's almost a waste only to use it to drive the LCD!

I do have a question about how the 'pla' instruction works though. Right now I'm doing:

Code: Select all

pla
cmp    #$FF
bmi    etc

To check the busy flag in bit 7. I wasn't sure if the cmp instruction is necessary; it looks like pla does set the N flag, but the (exceedingly confusing) diagram in Leventhal seemed to suggest that it does it based on the value of the SP (maybe to be used for testing for stack overflows and other such errors?), not the value pulled off the stack. So, I added the cmp just to be safe.

GARTHWILSON · Post by **GARTHWILSON** » Tue Apr 26, 2022 12:53 am

Paganini (do you play violin?), for reliable LCD initialization, be sure to write the function set three times, and if you can't poll the busy bit, wait the prescribed amount of time after each write. Both of these are shown in my sample code at http://wilsonminesco.com/6502primer/LCDcode.asm . Most instructions take 10 HD44780 controller clock cycles to process, except the "clear display" and "home cursor" instructions which take 410. The clock is typically in the range of 100-250kHz, which puts the time per cycle in the range of 4-10µs; so 10 cycles may be as long as 100µs, and 410 cycles may be as long as 4.1ms. You should wait at least 15ms after power-up before starting the initialization procedure. Observing these delays, I've never had any timing problems, regardless of LCD brand.

PLA's flag results do reflect the contents of what you pulled off the stack, not the contents of the stack pointer.

Paganini · Post by **Paganini** » Tue Apr 26, 2022 2:56 pm

GARTHWILSON wrote:

Paganini (do you play violin?)

I do! It's my day job, so to speak. "Paganini" has been my handle since my folks got our first dialup account in 1998. So many people now know me by that nickname that I respond equally readily to "Pag" or "Pags" as I do to my own actual name (which is "Nathan," although some call me... "Nate."

Sadly, I am only a good violinist, and therefore only attempt the easy Paganini (like Barucaba Variations) and leave the boundary pushing ones to the great violinists

GARTHWILSON wrote:

for reliable LCD initialization, be sure to write the function set three times, and if you can't poll the busy bit, wait the prescribed amount of time after each write. Both of these are shown in my sample code at http://wilsonminesco.com/6502primer/LCDcode.asm . Most instructions take 10 HD44780 controller clock cycles to process, except the "clear display" and "home cursor" instructions which take 410. The clock is typically in the range of 100-250kHz, which puts the time per cycle in the range of 4-10µs; so 10 cycles may be as long as 100µs, and 410 cycles may be as long as 4.1ms. You should wait at least 15ms after power-up before starting the initialization procedure. Observing these delays, I've never had any timing problems, regardless of LCD brand.

All the delays work as you describe for mine, except it seems to need that first delay to be 40ms as described here rather than 15. I'd like to share my delay loops code in the next post. It's working, but I want to make sure I haven't violated any best practices, and maybe it could be improved. Can't really make it run faster (hah!) but maybe I can make it smaller...

GARTHWILSON wrote:

PLA's flag results do reflect the contents of what you pulled off the stack, not the contents of the stack pointer.

Cool. I'll get rid of that cmp then.

Paganini · Post by **Paganini** » Tue Apr 26, 2022 4:42 pm

I can now poll the LCD busy signal during normal operation, but I still need to observe the mandated delays during LCD initialization. The shortest delay I need is 100uS, which I calculate to be 400 clocks, since I'm running at 4Mhz.

Code: Select all

;   jsr     DELAY_100us                       6 clocks
DELAY_100us:
    txa                     ;                 2 clocks (392)
    pha                     ;                 3 clocks (389)
    ldx     #75             ; Magic number!   2 clocks (387)
delay100us:
    dex                     ;                 2 clocks per loop
    bne     delay100us      ;               + 3 clocks per loop = 5 clocks per loop * 75 = 375
    pla                     ;                 4 clocks (383)
    tax                     ;                 2 clocks (381)
    rts                     ;                 6 clocks (375

I figured that 1ms is ten of those, so I made a parent subroutine that is basically the same idea, except with 10 iterations instead of 75:

Code: Select all

;   jsr     DELAY_1ms       ;   6 clocks
DELAY_1ms:
    txa                     ;   2 clocks
    pha                     ;   3 clocks
    ldx     #10             ;   2 clocks
delay1ms:
    jsr     DELAY_100us     ;   400 clocks per loop
    dex                     ;   2 clocks per loop
    bne     delay1ms        ; + 3 clocks per loop = 405 clocks per loop * 10 = 4050
    pla                     ;   4 clocks
    tax                     ;   2 clocks
    rts                     ;   6 clocks

I make that 4075 clocks, which is 1ms + an extra 18.75 uS, which is fine since we want "at least" 1 ms delay. I did exactly the same thing for 4ms (called DELAY_1ms 4 times) and for 40ms (calledDELAY_4ms 10 times). I will refrain from including them, since they are literally the same subroutines, copy/pasted with the labels and magic numbers changed. Each one picks up a few extra microseconds beyond spec from saving / restoring the X register, and so on. Again this seems fine, since we want "at least," and everything seems to be working.

HOWEVER.

Having four special purpose nested subroutines does not seem to me like an elegant solution to this problem. The roughly 50 bytes they occupy is not that big of a ROM cost, but since I've only got 28kb, I expect that things are going to get crowded as I keep adding features.

Since I have two index register that, in combination, can count up to $FFFF, I feel intuitively that I should be able to create one general purpose delay function that simply accepts a parameter telling it what to count to. The (one single page) section about delay loops in Leventhal does something like this, but it is hard-coded for 1Mhz, and the algebra he uses produces constants for faster clocks that are larger than what can be stored in a single byte. I have not tried too hard to invent this subroutine, because I bet that at some point in the last 50 years someone has already done it, and that someone here on these forums knows where to find it.

gfoot · Post by **gfoot** » Tue Apr 26, 2022 5:41 pm

Paganini wrote:

HOWEVER.

Having four special purpose nested subroutines does not seem to me like an elegant solution to this problem. The roughly 50 bytes they occupy is not that big of a ROM cost, but since I've only got 28kb, I expect that things are going to get crowded as I keep adding features.

Since I have two index register that, in combination, can count up to $FFFF, I feel intuitively that I should be able to create one general purpose delay function that simply accepts a parameter telling it what to count to. The (one single page) section about delay loops in Leventhal does something like this, but it is hard-coded for 1Mhz, and the algebra he uses produces constants for faster clocks that are larger than what can be stored in a single byte. I have not tried too hard to invent this subroutine, because I bet that at some point in the last 50 years someone has already done it, and that someone here on these forums knows where to find it.

For precise timing I use bespoke loops, and sometimes macros to generate them. Recently I needed to align precisely based on a 6522 timer 1 interrupt, for video synchronization, so my interrupt handler read back the low byte of T1 to see how much it had ticked since reaching zero (variation in interrupt processing speed), then used conditional branches to waste some cycles if necessary depending on the low bits of the value read, followed by an eight-cycle loop executed based on the higher bits of the value read back from the timer. I can share but I don't think it's very useful for your case as you don't need to be precise.

So for "at least" delays where you don't need to be precise, busy-waiting like this is probably not the best approach; but you could just use one index register like this to make an easy-to-use configurable delay similar to what you're already doing:

Code: Select all

    LDX #4
    JSR delay_Xms
...
.delay_Xms
    JSR delay_1ms
    DEX
    BNE delay_Xms
    RTS

You could wrap it again with a Y-based loop for longer delays, and/or have variants using different base routines that already wait different periods. But overall it's probably better to use a 6522 timer here, either directly or by setting up an "OS"-level time counter.

BigEd · Post by **BigEd** » Tue Apr 26, 2022 5:48 pm

(crossed in the post, but here we go anyway...)

Floobydust posted a delay routine here
viewtopic.php?p=29795#p29795
which could surely be adapted. You can use two nested loops easily enough, counting in X and Y, and you can add NOPs to the innermost loop for a coarse slowdown. You can even have three loops if you bring A into play.

There's also a topic
Delay routines
which might be of interest.

As George suggests, if you have a timer, you might well use that in preference. Although I think counting is conceptually simpler.

GARTHWILSON · Post by **GARTHWILSON** » Tue Apr 26, 2022 6:28 pm

Paganini, I play cello, and Dr Jefyll and BigDumbDinosaur here play bass.

I see you have a TXA, PHA above, later followed by PLA, TAX. Are you limited to the NMOS instruction set? The CMOS 65c02 lets you push and pull X and Y directly, with PHX, PLX, PHY, and PLY. I summarized the differences at http://wilsonminesco.com/NMOS-CMOSdif/ . However, since you're not using X for actual indexing, you could leave X alone and just use A as the loop counter, again if you have a CMOS 65c02, using DEA (DEcrement Accumulator).

Quote:

I'd like to share my delay loops code in the next post. It's working, but I want to make sure I haven't violated any best practices, and maybe it could be improved. Can't really make it run faster (hah!) but maybe I can make it smaller...

From the programming tips page of my 6502 primer:

Need a slick delay? Take this one from Bruce Clark. The delay is 9*(256*A+Y)+8 cycles (plus 12 more for JSR & RTS if you make it a subroutine). This assumes that the BCS does not cross a page boundary.
Code: Select all
```
loop:   CPY  #1
        DEY
        SBC  #0
        BCS  loop
```
He writes: "A and Y are the high and low bytes (respectively) of a 16-bit value; multiply that 16-bit value by 9, then add 8 and you get the cycle count. So the delay can range from 8 to 589832 cycles, with a resolution of 9 cycles. One of the nice things about this code is that it's easy to figure out what values to put in A and Y when you want a delay of, e.g. (approximately) 10000 cycles." Here's the same thing with my structure macros (the resulting machine code being identical):
Code: Select all
```
        BEGIN
           CPY  #1
           DEY
           SBC  #0
        UNTIL_CARRY_CLEAR
```
There's more at http://6502org.wikidot.com/software-delay . In fact, that wiki, although not very big, has a lot of great resources for this kind of thing. Check out also the source code repository on 6502.org.

Quote:

except it seems to need that first delay to be 40ms as described here rather than 15

The 15ms came from a quick-reference card I keep in my 3x5" QR ring binder on the workbench. It agrees with other things in a data sheet I have in the file cabinet, but I've always gone longer for that delay since it's on boot-up and waiting for example one-eighth of a second is not going to be significant. I guess the 1/8 second came from the fact that the first time we used this kind of LCDs at work, we had a 1/8-second delay for something else, so we just used it. So I never actually tried 15ms. Your 40 sure won't hurt anything.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Apr 26, 2022 8:37 pm

Paganini wrote:

GARTHWILSON wrote:

Paganini (do you play violin?)

I do! It's my day job, so to speak.

Cool! I admire anyone who can coax music out of a violin.

Me playing a violin (or a viola, for that matter) would be akin to a grizzly bear filleting a fish. I don't have the hands for it, so the closest I can get is to play a “violin on steroids,” which is my wife’s description of the double bass.

GARTHWILSON wrote:

Paganini, I play cello, and Dr Jefyll and BigDumbDinosaur here play bass.

Also, we have other members who play an instrument. I seem to recall we have a topic about it somewhere around here.

Speaking of which, there is a video on U-Toobe of Jeff and his pal floating down a river on a boat while playing some tunes. Unfortunately, I don't seem to have the link to it. There aren't any videos of me floating down a river, but there is one of my jazz trio making like a garage band during COVID lockdown. It was our first try at playing “Laura,” the “first try” aspect being painfully audible at times.

Dunno if any video exists of Garth playing his ’cello.

GARTHWILSON · Post by **GARTHWILSON** » Tue Apr 26, 2022 8:55 pm

BigDumbDinosaur wrote:

Me playing a violin (or a viola, for that matter) would be akin to a grizzly bear filleting a fish. I don't have the hands for it, so the closest I can get is to play a “violin on steroids,” which is my wife’s description of the double bass.

I have a violin that a great-great uncle made; but I never got good enough at it to play in public. There were 40 of his violins in the house when he died, so my grandfather and each of his seven brothers and sisters got five. I have one of those, and will probably get my mother's when she goes. She's 87. For her, it's just decoration to hang on a wall, as her musical IQ can be expressed with a single digit, at least a hex digit.

Quote:

but there is one of my jazz trio making like a garage band

Nice!

Quote:

Dunno if any video exists of Garth playing his ’cello.

I've been video'ed many times, but I am not aware of any of them being available online. Doing a search for any, I find there's another Garth Wilson who's apparently a professional cellist.

Paganini · Post by **Paganini** » Tue Apr 26, 2022 9:54 pm

GARTHWILSON wrote:

Paganini, I play cello, and Dr Jefyll and BigDumbDinosaur here play bass.

Great! I don't suppose any of you might be planning to attend the upcoming VCF in Atlanta?

GARTHWILSON wrote:

I see you have a TXA, PHA above, later followed by PLA, TAX. Are you limited to the NMOS instruction set?

I have two types of 6502, at the moment. One is the modern 14Mhz WDC65C02; the other type is old, 4Mhz, made by California Micro Devices. In theory it's a clone of the Rockwell 65C02s of the same era, but I haven't been able to find a datasheet for it, and I haven't tested to see what it does if you feed it the "new" WDC instructions. The 6502 Assembly Language Programming book by Levinthal *is* limited to the NMOS instruction set, which is what I've been using as my reference. Kind of a period practice for retrocomputing.

BigDumbDinosaur wrote:

There aren't any videos of me floating down a river, but there is one of my jazz trio making like a garage band during COVID lockdown. It was our first try at playing “Laura,” the “first try” aspect being painfully audible at times.

You guys sound great! I did a little jazz back in my student days, but I was never any good at it. I find the lack of immediate structure very intimidating. Garth, I notice you're located in southern CA; you wouldn't happen to know Richard Levine by any chance? He's a cellist with the San Diego Symphony, and also a fan of the 6502. I believe he got his KIM-1 back when KIM-1s were a brand new market player.

Thanks everyone for all the material on delays. I like the idea of doing it with a timer, for not the least reason that I have no idea how those VIA timers work, and this seems like a good opportunity to learn. In the long term (once I get some basic I/O up and running on this thing) my plan for Blue April is to go through the XINU book and make a little multitasking OS. I will no doubt need those VIA timers to implement something resembling a RTC when I do that. I'll combine my subroutines into one of the snazzy seedable ones from up-thread first so I have a working backup.

Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Fun with Delay Loops

Re: Fun with Delay Loops

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space

Re: Blue April - RC6502 Project Space