6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Sep 29, 2024 10:36 pm

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Mon Oct 28, 2019 3:08 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
I decided to update my BIOS and Monitor which uses the NXP/Philips SCC2691 UART to take advantage of the extended baud rate table. This allows configuration of baud rates up to 115,200 bps.

This is a bit of an odd programming feature, as you need to toggle the "BRG Test" register per the datasheet. As was discovered during the initial development of the BIOS to support the various features of the SCC2691, the 65C02 has a few anomalies in certain addressing modes, that can toggle certain registers, like the MR1/2 registers, which are combined under a single hardware register and accessed sequentially. There are two other registers which are toggled between two states simply by accessing them via a Read function. One is the "X1/X16 Test" register and the other is the "BRG Test" register. With both of these registers, you can only toggle them between states/tables. You can NOT reset them via software, only a physical hardware reset (activating the RESET line) will reset them to their default state.

Needless to say, it becomes an interesting programming challenge to ensure you can select the extended baud rates and "sorta" track the active table, so you can recover from a crash, a reconfigure of the UART operating parameters (not involving a hardware reset) or a software reset/restart of the BIOS and Monitor code. The BIOS has two main routines which are responsible for configuring the SCC2691. The first routine does a software power up of the chip and performs a reset against all basic operating functions. This simply sends a list of commands in sequence to the Command register. The second routine uses a table to configure the various registers of the chip and defines the operating parameters and options so it can be used as a console for the SBC.

Both of these routines are called during startup of the SBC. Once the BIOS is setup, it jumps to the cold start vector for the Monitor and you're off and running, so to speak. One can change the soft config data in page $03 and then call the BIOS routine to reconfigure the chip on the fly (no hardware reset required). There is also a Panic routine via the NMI vector to recover the SBC BIOS/Monitor which is triggered by a manual push button. The routine does some cleanup, restores vectors and soft config data, then reinitializes the UART and jumps to the Monitor warm start vector.

So, to get to heart of this, the "BRG Test" register has to be toggled an "even" number of times to ensure the table does NOT get toggled between tables. If an "odd" number of toggles happens, then the table is switched between normal and extended baud rates (in either direction). The other part of this is, that to get to the extended baud rates, the "BRG Test" register needs to be toggled an "odd" number of times (from initial RESET) to be switched to the extended baud rates.

As the BIOS setup routines control the config of the SCC2691, it determines what gets loaded into each register and what gets toggled for setup. Using the BIOS routines to reinitialize the chip to the same or different parameters also needs to know what was initially setup. As the NMI Panic routine also uses the BIOS routines to reinitialize the UART, it also needs to know what the initial setup was. Finally, the Monitor code (which is separate) has an option to Reset the SBC by jumping to the cold start vector, which starts everything over, so it too needs to know what the initial setup was... whew!

So there's only two things that really need to change in the config:
1- change the baud rates for Receive/Transmit (one register)
2- determine if the "BRG Test" register needs to be toggled and toggle it if required.

Seems simple enough... until you realize that certain rates need to toggle the "BRG Test" register and other rates don't. Doing a software reset also needs to be aware of the active baud rate table and handle it accordingly.

I opted to use Bit 7 of Mode Register 2 (MR2) to indicate whether or not an extended rate is used. This saves using another config byte where we only need a single bit for indicating if the "BRG Test" register requires toggling to extended rates. The BIOS routine which initializes the chip has been modified slightly to sense Bit 7, take the appropriate action and also mask off Bit 7 to configure MR2. The NMI Panic routine (also part of the BIOS) was modified to ensure an even number of register toggles are done to maintain the baud rate table. Finally, the Monitor code also needed a slight modification of it's Reset (and Zero memory function, which calls Reset) routine to maintain the baud rate table.

I've done a fair amount testing and so far, it appears to work correctly. This basically means:
1- A Reset and/or Zero RAM function form the Monitor works
2- The Panic routine invoked via the NMI trigger works
3- A change in UART operating parameters can be done without a hardware reset

Unfortunately, there's no absolute guaranty that an end user could manage to get the table out of sync and have the baud rate changed on the fly. If however, an end user only uses the Monitor memory commands and BIOS routines for hardware access, it should be fine. I plan on running some additional tests and will post an updated BIOS and Monitor code shortly.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 29, 2019 9:19 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
floobydust wrote:
... There are two other registers which are toggled between two states simply by accessing them via a Read function. One is the "X1/X16 Test" register and the other is the "BRG Test" register. With both of these registers, you can only toggle them between states/tables. You can NOT reset them via software, only a physical hardware reset (activating the RESET line) will reset them to their default state ...

So these are write-only registers that are triggered by a read operation? Kinda kinky ... and kludgy ... Woz did it with the Apple ][ "soft-switches", but at least he used separate addresses for ON and OFF for the ones that mattered.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 29, 2019 9:56 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
barrym95838 wrote:
floobydust wrote:
... There are two other registers which are toggled between two states simply by accessing them via a Read function. One is the "X1/X16 Test" register and the other is the "BRG Test" register. With both of these registers, you can only toggle them between states/tables. You can NOT reset them via software, only a physical hardware reset (activating the RESET line) will reset them to their default state ...

So these are write-only registers that are triggered by a read operation? Kinda kinky ... and kludgy ... Woz did it with the Apple ][ "soft-switches", but at least he used separate addresses for ON and OFF for the ones that mattered.


Actually, it's bit stranger than that. The SCC2691 has 8- registers. Many have two dissimilar functions, one for write and another for read. Here's my I/O layout for them:

Code:
;**************************************************************************************************
IOPAGE         .EQU     $FE00            ;I/O Page Base Start Address
;**************************************************************************************************
SCC2691_BASE   .EQU     IOPAGE+$80              ;Beginning of Console UART address
;
UART_MODEREG   .EQU     SCC2691_BASE+$00        ;MR1/MR2 same address, sequential read/write
UART_STATUS    .EQU     SCC2691_BASE+$01        ;UART Status Register (READ)
UART_CLKSEL    .EQU     SCC2691_BASE+$01        ;UART Clock Select Register (WRITE)
UART_BRGTST    .EQU     SCC2691_BASE+$02        ;UART BRG Test Register (READ)
UART_COMMAND   .EQU     SCC2691_BASE+$02        ;UART Command Register (WRITE)
UART_RECEIVE   .EQU     SCC2691_BASE+$03        ;UART Receive Register (READ)
UART_TRANSMIT  .EQU     SCC2691_BASE+$03        ;UART Transmit Register (WRITE)
UART_CLKTEST   .EQU     SCC2691_BASE+$04        ;X1/X16 Test Register (READ)
UART_AUXCR     .EQU     SCC2691_BASE+$04        ;Aux Command Register (WRITE)
UART_ISR       .EQU     SCC2691_BASE+$05        ;Interrupt Status Register (READ)
UART_IMR       .EQU     SCC2691_BASE+$05        ;Interrupt Mask Register (WRITE)
UART_CNTU      .EQU     SCC2691_BASE+$06        ;Counter/Timer Upper Register (READ)
UART_CNTUP     .EQU     SCC2691_BASE+$06        ;Counter/Timer Upper Preset Register (WRITE)
UART_CNTL      .EQU     SCC2691_BASE+$07        ;Counter/Timer Lower Register (READ)
UART_CNTLP     .EQU     SCC2691_BASE+$07        ;Counter/Timer Lower Preset Register (WRITE)
;
;**************************************************************************************************


As per above, the BRG Test register (READ operation) is the Command Register when written to. The same applies to the X1/X16 Test register, when written to, it's the Aux Command register. In short, there's nothing of value to be read from either of these registers, so it's nothing more than a Read access that toggles them between states. A simple BIT "register" is all that's required to toggle it.

During development of my BIOS, I realized that the W65C02 was doing a false read of registers when using an indexed access to store to them. The CPU does a Read/Modify/Write when in the same page. This caused an issue with loading MR1 and MR2, as the register would also get incremented internally, so I couldn't load MR1 via an indexed loop (it pointed to MR2 before the first write). The same happens to the X1/X16 Test and BRG Test registers during setup, but as the end result is an even number of accesses, their state is not actually changed.

The baud rate tables have two rates which are the same for extended or normal, which are 38.4K and 19.2K, so if either of these are used, toggling the table is harmless. Use any other baud rates and toggling the table changes the rate to something different. This becomes the gotcha.. as an odd number of accesses need to happen to select the extended baud rates.

I hope my response is clear... it doesn't have to make sense... the 2691 is a bit of an odd chip, but once it's configured, it's rock solid. The issue arises when using it as a console on a SBC that allows anything and everything to be changed via the Monitor. You can get it hosed up... and it might require a hardware reset rather than jumping to the coldstart vector.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 30, 2019 4:38 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8406
Location: Midwestern USA
floobydust wrote:
...the 2691 is a bit of an odd chip, but once it's configured, it's rock solid. The issue arises when using it as a console on a SBC that allows anything and everything to be changed via the Monitor. You can get it hosed up... and it might require a hardware reset rather than jumping to the coldstart vector.

The false read when using indexed addressing on the UART is due to a bug in the 65C02 and is not specific to the 26xx/28xx NXP UARTs. Read before write has the potential to affect any hardware in which an unanticipated read can change something. For example, inadvertently touching an interrupt status register could clear a pending interrupt before it has been serviced. Unlike the 65C816, the 'C02 doesn't tell you when a particular address is bogus. As every 'C02 clock cycle results in bus activity of some sort, keeping the 'C02 from touching registers via an incidental read doesn't appear to be possible without sophisticated logic that knows the steps of each instruction and knows when to isolate D0-D7 of the MPU from the rest of the system.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 31, 2019 4:46 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
BigDumbDinosaur wrote:
floobydust wrote:
...the 2691 is a bit of an odd chip, but once it's configured, it's rock solid. The issue arises when using it as a console on a SBC that allows anything and everything to be changed via the Monitor. You can get it hosed up... and it might require a hardware reset rather than jumping to the coldstart vector.

The false read when using indexed addressing on the UART is due to a bug in the 65C02 and is not specific to the 26xx/28xx NXP UARTs. Read before write has the potential to affect any hardware in which an unanticipated read can change something. For example, inadvertently touching an interrupt status register could clear a pending interrupt before it has been serviced. Unlike the 65C816, the 'C02 doesn't tell you when a particular address is bogus. As every 'C02 clock cycle results in bus activity of some sort, keeping the 'C02 from touching registers via an incidental read doesn't appear to be possible without sophisticated logic that knows the steps of each instruction and knows when to isolate D0-D7 of the MPU from the rest of the system.


All true... this subject was also discussed in an older post when I was writing the BIOS for the 2691 UART. The comments in my source file also covers this.

viewtopic.php?f=2&t=4992#p57443

I still consider the 2691 UART a bit of an odd chip, as there's simply no ability to query the chip to find out what the BRG Test register status is. The same applies to the X1/X16 Test register. While I am sold on using the NXP UARTs going forward with newer designs, it's just the fact that one needs to realize what the CPU does and how the UART reacts... and what can be done to ensure it (UART) can be configured for the intended purpose.

I'm going to put some additional effort into getting the extended baud rates better implemented, but there are some hurdles that ultimately will limit the success, one being how the CPU does false reads, and the other being the inability to query the UART to determine status of the two test registers. I do appreciate everyone's feedback and assistence from the the earlier post above that helped with the initial BIOS release.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 31, 2019 5:01 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
BigDumbDinosaur wrote:
... As every 'C02 clock cycle results in bus activity of some sort, keeping the 'C02 from touching registers via an incidental read doesn't appear to be possible without sophisticated logic that knows the steps of each instruction and knows when to isolate D0-D7 of the MPU from the rest of the system.

Yeah, but ... I thought that the 'C02 did the bogus read from (PC) instead of (base address + 0), or am I mis-remembering again?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 31, 2019 1:37 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8406
Location: Midwestern USA
barrym95838 wrote:
BigDumbDinosaur wrote:
... As every 'C02 clock cycle results in bus activity of some sort, keeping the 'C02 from touching registers via an incidental read doesn't appear to be possible without sophisticated logic that knows the steps of each instruction and knows when to isolate D0-D7 of the MPU from the rest of the system.

Yeah, but ... I thought that the 'C02 did the bogus read from (PC) instead of (base address + 0), or am I mis-remembering again?

It's a bug involving an indexed write. Suppose you want to write to a device register at $D007. Let's also suppose the instruction you are using is STA $D000,X, where .X is loaded with $07. What the 'C02 will do is execute a false read on $D000 and then do the write to $D007.

Specific to the NXP 2691 UART, the mode register (MR) is at $00 in the device's register set. MR is actually two registers at the same address, each with a different function. A UART command sets MR to the first register (MR1). A subsequent access to MR reads/writes MR1 and then causes MR to increment and point to MR2.

Now, suppose MR has been set to MR1, which would be for initial setup of the 2691. If the STA $D000,X instruction is executed, the false read by the 'C02 will "touch" $D000, which is where MR is found. Ergo MR1 becomes MR2 and a subsequent write to MR ends up in the wrong register.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 31, 2019 2:40 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
barrym95838 wrote:
Yeah, but ... I thought that the 'C02 did the bogus read from (PC)
Yes, the 'C02 does have such a "fix" but it's flawed in that it only kicks in when there's a page crossing. Thus the flaw doesn't affect reads, which are the majority of cases (LDA AND ORA CMP etc). If a read has a page crossing then the fix kicks in, and if there's no page crossing then there'll be no extra cycle, and hence no threat. But if a STA -- a write -- uses abs,X or abs,Y address mode then the extra cycle will always be present, whether there's a page crossing or not. And the fix only kicks in if there is.

BigDumbDinosaur wrote:
Suppose you want to write to a device register at $D007. Let's also suppose the instruction you are using is STA $D000,X, where .X is loaded with $07. What the 'C02 will do is execute a false read on $D000 and then do the write to $D007.
Correct. In this case there's an abs,X STA with no page crossing. During the extra cycle, a read will occur from $D007, which is the address that's about to be written. Some IO chips will gag on that spurious access! But if the code is modified to instead do a STA $CFFF,X, with X equal to $08 then a page crossing occurs and the fix kicks in. During the extra cycle, the read will occur from PC.

The 'C02 abs,X bug is probably the most troublesome, judging by discussions I've noticed on this forum. But other questions will arise, for example regarding indexed modes other than abs,X and abs,Y, and the dead cycles that occur during indexed and non-indexed Read-Modify-Writes. For an complete description of these behaviors, see the document posted here.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sat Nov 02, 2019 7:42 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 4:32 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
So, first... thanks to all for responding to this post. I think the issue with false reads is a good topic to bring back front and center once in a while, as new members get some knowledge about some (65C02) caveats and some of us older folks get a reminder of them as well.

I've taken an approach of modifying my most recent C02 BIOS/Monitor code (was Version 2.04) and now have an interim 2.05 version. This code implements a software tracking of the BRG Test register to ensure it's properly set for the desired baud rate. As long the BIOS routines are used to change the UART config, it should maintain correct operation. The monitor code also required a slight modification as the Reset and Zero RAM commands end up calling the BIOS Cold Start routine, which assumes the BRG Test register is not active, as this routine starts from a hardware reset... and the page zero MATCH flag (see below) is cleared during startup.

At a simple level, I've used two extra bits of a MATCH flag that BIOS uses for other functions, (delays and benchmark counting). The flag is in page zero and Bit 5 is used to show that extended baud rates are configured based on config data. Bit 4 is used to show the current status of the BRG Test register, so the BIOS routines can either toggle it or not depending on the current status and the requested change. I've done a fair amount of testing and this has worked out quite well. Also note that Bit 7 of the MR2 config data is normally "0", but it is used to indicate extended baud rates when set to "1". This does require that the BIOS routines strip off Bit 7 before loading the data to the Mode Register.

There are two required changes to config data to change the baud rate. One is the baud rate register (XMIT/RCV) and the other is for the MR2 data (Bit 7 only). Changing the ROM default table changes the config from an initial startup. Making a soft change to these values which are copied to page $03 during startup, and calling BIOS routine $FF51 will temporarily change the baud rate until a hardware or software reset is done, or the panic routine (NMI trigger) is invoked.

I'll likely spend some time examining the code over the next few weeks as time permits and will eventually release a 2.06 version of the BIOS and Monitor, which might have other fixes and/or code changes. But for now, I'm attaching a zip file that has version 2.05. On the good side, I've been running four of my C02 Pocket SBC's for almost two years now... not a single hardware failure yet... and the BIOS and Monitor have matured quite a bit with version changes and a CMOS version of Enhanced Basic has been added. So far, I've been very satisfied with the results.

Attachment:
Version 2.05.zip [150.26 KiB]
Downloaded 124 times

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 6:58 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Save three bytes and 10 cycles, at the expense of some clarity? (untested):
Code:
 ;ASC2BIN subroutine: Convert 2 ASCII HEX digits to a binary (byte) value
 ; Enter: A register = high digit, Y register = low digit
 ; Return: A register = binary value
ASC2BIN         STZ     TEMP1           ;Initialize temp area
                JSR     BINARY          ;Convert high digit to 4-bit nibble
                ASL     A               ;Shift to high nibble
                ASL     A
                ASL     A
                ASL     A
                STA     TEMP1           ;Store it in temp area
                TYA                     ;Get Low digit and fall through
 ;
BINARY          EOR     #$30            ;ASCII -> HEX nibble
                CMP     #$0A            ;Check for result < 10
                BCC     BNOK            ;Branch if 0-9
                SBC     #$67            ;Compensate for A-F
BNOK            ORA     TEMP1           ;OR in the high nibble
RESERVED        RTS                     ;Return to caller
 ;

Save three bytes and four cycles in HEX2ASC? (untested):
Code:
               [...]
                LDA     BINVALL         ;Get the Low byte
                ORA     BINVALH         ;OR in the High byte (check for zero)
                BNE     CNVERT          ;Branch back until done
  ;
 ;Conversion is complete, get the string address, add offset, then call prompt routine and return
 ; note DATABUFF is fixed location in Page 0, carry flag need not be cleared as ROL A instruction
 ; above should never set flag.
                TXA                     ;Get buffer offset
                ADC     #<DATABUFF      ;Add Low byte Address (point to active data only)
                LDY     #>DATABUFF      ;Get High byte address, drop into PROMPTR and finish
                BRA     PROMPTR         ;Branch to PROMPTR to Print numeric string
  ;

Save one byte in HEXINPUT by replacing CPX #$00 with TXA? (untested)
Save one byte in SRCHBYT by replacing CPX #$00 with TXA? (untested)
Save two bytes in XFER_S19 by eliminating CPY #$00? (untested)

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 3:16 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8406
Location: Midwestern USA
floobydust wrote:
his code implements a software tracking of the BRG Test register to ensure it's properly set for the desired baud rate.

In other words. you've set up a shadow register for the 2691's auxiliary control register (ACR) at offset $04. It may be useful to also shadow MR if you are going to allow user code to tinker with UART settings a la stty in UNIX/Linux.

In developing the BIOS for my POC units, I selected one bit rate table and stuck with it. I do not have a BIOS call that would allow user-written code to change bit rates and/or data format, as that could potentially disrupt console operation. If an operating system were to implement its own UART driver, rather than make the (slower) BIOS calls for serial I/O, then it could also allow changes to the UART configuration via an "ioctl" type of kernel call.

BTW, if you decide to continue with the single-channel UART for your next design, I suggest you take a look at the 28L91. The programming model is a superset of the 2691's and the FIFOs will be available even without specifically configuring them. Also, the 28L91's bus interface is substantially faster than that of the 2691. The 28L91 is available in PLCC-44, actually takes up less space the the DIP-40 version of the 2691.

Attachment:
File comment: NXP 28L91 UART
28L91.pdf [268.4 KiB]
Downloaded 126 times

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 5:08 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
Mike,

Once again, thanks for doing some critique on the code and coming up with some updates. I also found one in the XMODEM code to delete a CPX as well, saving yet another two bytes. I've implemented most of these changes already and will do some additional testing. Danke!!

BDD,

Well, setting up a shadow of SCC2691 registers is of no value for trying to track the BRG Test Register, as you can't actually access it. All you can do is perform a read to it (offset at $02) which will toggle it between normal and extended tables. Based on this, I simply track it from a cold start of the system, which has it set to normal baud rates. Just as long as the BIOS routines are used to manage the UART config, it shouldn't create any problems. Granted, you can simply avoid changing the config altogether, but I wanted to additional flexibility should an odd case arise when trying to use a terminal program and/or machine that has limited async configs (granted, it would be very rare).

Moving forward, I already have several of the NXP DUARTS sitting around... for my next project ;-)

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2019 2:05 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8406
Location: Midwestern USA
floobydust wrote:
I've taken an approach of modifying my most recent C02 BIOS/Monitor code (was Version 2.04) and now have an interim 2.05 version.

I took some time reading the source code for your BIOS and would like to offer some constructive comments.

  1. Generally speaking, a simple ROM BIOS doesn't not need to vector API calls. The theory behind this is a BIOS's role is to provide something for the MPU to run following reset. Unless you can think of a scenario where you would need to wedge into a BIOS API function I suggest you eliminate the indirect jumps, conserving RAM and improving execution speed, and reducing the risk of a crash due to inadvertent vector overwrite.

    That said, I would retain the indirect jump vectors for the interrupt handlers, as they will be handy for wedging in extra ISR code for test purposes. I did this in my POC BIOS so I could develop the low-level, interrupt-driven SCSI driver routines without having to burn a new ROM each time I changed/fixed something.

  2. In examining the UART receiver interrupt handler (ISR), I see some things that could be refined to improve serial I/O (SIO) performance. Currently the receiver ISR is as follows:

    Code:
    UART_RCV       LDY      ICNT            ;Get input buffer count (3)
                   BMI      BUFFUL          ;Check against limit ($80), branch if full (2/3)
                   LDA      UART_RECEIVE    ;Else, get character from 2691 (4)
    ;
                   LDY      ITAIL           ;Get the tail pointer to buffer (3)
                   STA      IBUF,Y          ;Store into buffer (5)
                   INC      ITAIL           ;Increment tail pointer (5)
                   RMB7     ITAIL           ;Strip off bit 7, 128 bytes only (5)
                   INC      ICNT            ;increment character count (5)

    Consider the following:

    1. Currently, you are testing for receiver circular queue (RQ, symbol IBUF) space before getting a datum from the UART (note: what you are receiving are datums, not characters :D ). The problem with doing so is if a receiver IRQ is not serviced with alacrity, an overrun error during sustained, high-speed transfer is likely to occur. This is an important consideration with the 2691, which only has a 4-deep RHR. Better to prevent an overrun than to try to detect it after your data stream has been corrupted.

      Also consider that if your receiver is operating at 115.2 Kbps it will interrupt 11,520 times per second during sustained reception. This because you are not clearing the RHR on each IRQ, only reading one datum from it. The result is that as soon as your ISR terminates and the MPU returns to the foreground it will be interrupted again if the RHR is not empty, adversely affecting foreground performance—the MPU is executing those pre- and post-amble ISR instructions 11,520 times per second.

      A better procedure would be to read the RHR, verify that RQ space is available and if so, queue the datum. If the RQ is full, discard the datum, since there is no place to put it. For more efficient code, the receive part of your ISR should loop back and check to see if the RHR still has datums. If not, the receive part of the ISR is done. Otherwise, repeat the RQ test and datum store code. I will illustrate the technique later on.

    2. There is no need to maintain a separate datum count for RQ, as a comparison between IHEAD and ITAIL will tell you what you need to know about the state of the queue. If IHEAD = ITAIL RQ is empty. If (IHEAD+1) AND $7F = ITAIL RQ is full. Those are the only conditions about which your code needs to be concerned.

    3. The following code fragment is not as efficient as it could be, and is also non-portable:

      Code:
                     INC      ITAIL           ;Increment tail pointer (5)
                     RMB7     ITAIL           ;Strip off bit 7, 128 bytes only (5)
                     INC      ICNT            ;increment character count (5)

      The RMB/SMB instructions are non-portable due to they being available only with the 65C02 (and not all 65C02s). Neither the NMOS device (which shouldn't be used in new designs) or the 65C816 has them, the latter which usurped the corresponding opcodes for other purposes.

      You already have ITAIL loaded into .Y, so why not just copy it to .A, increment .A, AND .A to eliminate bit 7 and store the result?

    Here's how I would do the receiver ISR. This code eliminates references to ICNT and empties the RHR before moving on:

    Code:
    UART_RCV: ;read datum from UART & attempt to queue it
    ;
              lda uart_status      ;RHR status                      (4)
              lsr A                ;RHR empty?                      (2)
              bcc tl0010           ;yes, done with receiver...      (2,3,4)
    ;
    ;       ———————————————————————————————————————
    ;                    branch not taken: 2 clocks
    ;       branch taken wo/page crossing: 3 clocks
    ;        branch taken w/page crossing: 4 clocks
    ;       ———————————————————————————————————————
    ;
              ldy uart_receive     ;fetch waiting datum             (4)
              lda ihead            ;fetch RQ "get" index            (3)
              inc A                ;bump it &...                    (2)
              and #%01111111       ;wrap it                         (2)
              cmp itail            ;RQ "put" index                  (3)
              beq uart_rcv         ;RQ is full...                   (2,3,4)
    ;
    ;       —————————————————————————————————————————————————————————
    ;       If RQ is full we discard the datum & loop back,  Doing so
    ;       clears the RHR & prevents a receiver overrun.  This won't
    ;       happen if the foreground promptly processes the incoming
    ;       data stream.
    ;       —————————————————————————————————————————————————————————
    ;
             ldx itail             ;RQ "put" index                  (3)
             sty ibuf,x            ;queue newest datum              (5)
             txa                   ;copy current RQ "put" index     (2)
             inc A                 ;bump it,...                     (2)
             and #%01111111        ;wrap it &...                    (2)
             sta itail             ;store it                        (3)
             bra uart_rcv          ;go back for more                (3,4)
    ;
    tl0010   ...end of receiver ISR...

    The above code addresses a number of things:

    1. Each pass through the routine starts by determining if the RHR has at least one datum. If not, the routine terminates. Otherwise, the datum is gotten and an attempt is made to store it in the queue (IBUF). If IBUF is full the datum is discarded.

    2. No queue content counter is used or needed. Testing of the "get" and "put" indices (IHEAD and ITAIL, respectively) tells us all we need to know about the state of IBUF. A byte of zero page space is recovered for other purposes.

    3. IBUF management avoids the non-portable RMB instruction and uses register-based instructions that work with any 65C02, as well as the 65C816.

    4. All waiting datums are gotten from the UART in a single interrupt, avoiding the expense of processing back-to-back IRQs. Running at 115.2 Kbps continuous data flow results in 2880 IRQs per second instead of 11,520. Although the overall code has more instructions, the actual processing time is substantially reduced.

  3. A significant limitation of the 2691 is the lack of a transmitter FIFO, which means the looping technique illustrated in the receive ISR can't be used when a transmitter IRQ must be serviced. That is, there is no way to avoid the "IRQ storm" that sustained, high-speed transmission will cause (which is why the 26C92, 28L91 or 28L92 are better choices—they have transmit FIFOs). However, some code changes to the transmit ISR can be made to eliminate the OCNT queue counter and avoid the use of the non-portable RMB instruction. Additionally, management of the transmitter can be implemented with smaller code:

    Code:
    UART_XMT: ;fetch datum from transmit queue & write to UART
    ;
             lda uart_isr          ;get TxD IRQ status             (4)
             lsr A                 ;transmitter interrupting?      (2)
             bcc tl0020            ;no                             (2,3,4)
    ;
    ;       —————————————————————————————————————————————————
    ;       As the 2691 lacks a TxD FIFO, there is no need to
    ;       check the TxRDY bit in the status register.  This
    ;       is because the UART will interrupt as soon as all
    ;       bits in the THR are shifted onto the wire.
    ;       —————————————————————————————————————————————————
    ;
             ldx ohead             ;fetch TQ "get" index            (3)
    ;
    ;       ————————————————————————————————————————————
    ;       TQ is the transmit circular queue, aka OBUF.
    ;       ————————————————————————————————————————————
    ;
             cpx otail             ;TQ "put" index                  (3)
             beq tl0010            ;TQ is empty...                  (2,3,4)
    ;
             lda obuf,x            ;get datum from TQ &...          (5)
             sta uart_transmit     ;send it                         (4)
             txa                   ;copy "get" index                (2)
             inc A                 ;bump it,...                     (2)
             and #%01111111        ;wrap it &...                    (2)
             sta ohead             ;store it                        (3)
             bra tl0020            ;done for now                    (3,4)
    ;
    ;       ———————————————————————————————————————————————————————————
    ;       If TQ is empty, transmitter IRQs must be disabled so as to
    ;       avoid deadlock.  This is accomplished by disabling the
    ;       transmitter,  A flag is set to let the foreground know the
    ;       transmitter has been disabled.
    ;
    ;       Although the below code disables the transmitter without
    ;       checking the status of the THR, there is no problem in
    ;       doing so.  The UART will not actually shut down the trans-
    ;       mitter until the last bit has been shifted out to the wire.
    ;       ———————————————————————————————————————————————————————————
    ;
    tl0010   lda #%00001000        ;disable...                      (2)
             sta uart_command      ;transmitter                     (4)
             lda #10000000         ;tell foreground...              (2)
             sta txdflag           ;about it                        (3)
    ;
    tl0020   ...end of transmitter ISR...

    The above code does the following:

    1. The ISR begins by determining if the transmitter is interrupting. If it is not then it must have been disabled and no further processing is required. As the 2691 lacks a transmit FIFO there is no reason to check the status of THR—the UART will interrupt as soon as it shifts the datum in THR out to the wire.

    2. The queue management technique illustrated in the receive ISR is used here as well, eliminating the need for OCNT. Its location on zero page can be given to the txdflag transmitter status.

    3. The code that tested the state of THR has been eliminated, as it is unnecessary. The transmitter can be safely disabled while it is still shifting bits out to the wire. The actual shutdown will occur as soon as the final bit has been sent. txdflag is set to %10000000 so the transmit foreground function will know that it will have to restart the transmitter after queuing a datum. The actual flag value can be anything.

  4. The changes in §2 (above) require corresponding changes to the receive foreground to accommodate the revised queue management method. Also, the current CHRIN/CHRIN_NW code is needlessly redundant. There is no reason to put a spin loop into the BIOS just to wait for some data to arrive—the higher level functions that are waiting for data can spin if needed. Also, why consume resources by requiring that one routine JSR to another? Furthermore, the logic involving the use of carry to indicate if there is anything in RQ is contrary to the nearly universal programming practice that uses carry to indicate "true" (carry cleared) or "false" (carry set) status.

    The following code replaces the CHRIN/CHRIN_NW functions with a simplified routine that includes the customary use of carry as a status indicator:

    Code:
    CHRIN: ;fetch datum from receive queue w/immediate return
    ;
    ;   Calling syntax: jsr chrin
    ;                   bcs no_datum
    ;
    ;   Exit registers: .A: datum or entry value
    ;                   .X: entry value
    ;                   .Y: entry value
    ;                   SR: NV-BDIZC
    ;                       ||||||||
    ;                       |||||||+———> 0: datum returned
    ;                       |||||||      1: queue empty
    ;                       ++++++++———> undefined
    ;
             phx                   ;preserve &...                   (3)
             phy                   ;preserve                        (3)
             ldx ihead             ;fetch RQ "get" index            (3)
             cpx itail             ;RQ "put" index                  (3)
             beq tl0010            ;RQ is empty...                  (2,3,4)
    ;
    ;   ———————————————————————————————————————————————————————————
    ;   The ihead —> itail comparison automatically conditions .C.
    ;   ———————————————————————————————————————————————————————————
    ;
             ldy ibuf,x            ;get datum from RQ               (4)
             txa                   ;copy "get" index                (2)
             inc A                 ;bump it,...                     (2)
             and #%01111111        ;wrap it &...                    (2)
             sta ihead             ;store it                        (3)
             tya                   ;copy datum                      (2)
             clc                   ;datum gotten                    (2)
    ;
    tl0010   ply                   ;restore &...                    (4)
             plx                   ;restore                         (4)
             rts                   ;return to caller                (6)

  5. The changes in §3 (above) require corresponding changes to the transmit foreground to accommodate the revised queue and UART management methods. The following code addresses all that, plus offers both blocking and non-blocking modes:

    Code:
    CHROUT: ;write datum to transmit queue
    ;
    ;   Calling syntax: clc             ;block on full queue
    ;                   ...or...
    ;                   sec             ;return on full queue
    ;                   jsr chrout
    ;                   bcs queue_full
    ;
    ;   Exit registers: .A: entry value
    ;                   .X: entry value
    ;                   .Y: entry value
    ;                   SR: NV-BDIZC
    ;                       ||||||||
    ;                       |||||||+———> 0: datum accepted
    ;                       |||||||      1: queue full
    ;                       ++++++++———> undefined
    ;
             phy                   ;preserve,...                   (3)
             phx                   ;preserve &...                  (3)
             pha                   ;preserve                       (3)
             tax                   ;keep datum handy               (2)
             lda #0                ;pick up...                     (2)
             ror A                 ;carry &,,,                     (2)
             tay                   ;save as a flag                 (2)
    ;
    tl0010   lda ohead             ;fetch TQ "get" index           (3)
             inc A                 ;bump it &...                   (2)
             and #%01111111        ;wrap it                        (2)
             cmp otail             ;TQ "put" index                 (3)
             beq tl0040            ;TQ is full                     (2,3,4)
    ;
             txa                   ;recover datum                  (4)
             ldx otail             ;TQ "put" index                 (3)
             sta obuf,x            ;queue datum for transmission   (4)
             txa                   ;copy current TQ "put" index    (2)
             inc A                 ;bump it,...                    (2)
             and #%01111111        ;wrap it &...                   (2)
             sta otail             ;store it                       (3)
             clc                   ;datum queued                   (2)
             lda #%10000000        ;is the transmitter...          (2)
             trb txdflag           ;enabled?                       (5)
             beq tl0030            ;yes, we're done                (2,3,4)
    ;
    tl0020   lda #%00000100        ;wake up...                     (2)
             sta uart_command      ;transmitter                    (4)
    ;
    tl0030   pla                   ;restore,...                    (4)
             plx                   ;restore &...                   (4)
             ply                   ;restore                        (4)
             rts                   ;done                           (6)
    ;
    ;
    ;   handle full queue...
    ;
    tl0040   sec                                                   (2)
             tya                   ;blocking?                      (2)
             bmi tl0020            ;no, exit                       (2,3,4)
    ;
             wai                   ;yes, wait for next IRQ &...    (3+)
             bra tl0010            ;try again                      (3,4)

    Salient points of this function are:

    1. In a uni-tasking environment, blocking on any I/O is perfectly acceptable—the MPU really has nothing else it can do until the I/O completes. Should you eventually build an operating system kernel that can support pre-emption you can switch to non-blocking SIO. In the above code, blocking is performed by WAIting for a hardware interrupt, be it a UART IRQ, a jiffy IRQ, etc. If blocking has been disabled the routine will give the transmitter a kick before exiting to make sure it's running.

    2. It's not advisable to put spin loops into BIOS APIs that involve I/O. If the condition on which the MPU is spinning never materializes your system will go into deadlock, forcing a restart. For example, most Linux kernel I/O APIs do not block unless requested. Instead they return an error code to the calling function. It is up to the higher level function that called the API to spin/sleep/retry, etc.

    3. The TRB instruction does double duty by first determining if txdflag is set, thereby conditioning .Z. After that, TRB clears the flag. With this arrangement, fewer instructions are required to manage the transmitter and execution speed is improved. Also, this basic technique is easily adapted for use with DUARTs and QUARTs.

    4. Carry is cleared if the write operation is successful, which it will always be in blocking mode. You should follow that logic everywhere in your BIOS: carry clear = TRUE and carry set = FALSE. In this context, TRUE can also be OKAY and FALSE can be ERROR.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Dec 01, 2019 6:51 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 04, 2019 6:03 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
Hi BDD,

In a word... wow! You spent a lot of time going through my latest BIOS release and put together a very detailed and informative response. Very thoughtful and useful feedback and I certainly hope that other forum users do a hard read on this... much insight to be had. So, a huge thanks for spending the time to do this, very much appreciated.

So, in my defense of why I wrote the BIOS the way I did, here’s some feedback and details, albeit not in any particular order.

1- Using some instructions which are not in all 65(C)02 processors, namely the SMB/RMB opcodes:

- The Rockwell CMOS parts do have these instructions, as does the WDC part. In ALL my current designs, I only use the WDC part. I do have a small stock of 4MHz Rockwell parts which I use for updating older hardware, as it's a simple plug-in replacement. I'm aware that the NCR 65C02 does not support some of the instructions and perhaps the GTE/CMD parts may not as well. Still, I've found them to be very useful and most of my current coding uses them quite a bit. I do understand your comment, but I doubt many folks will be leveraging my code for their own projects. If and when I dive into a W65C816 design, I'll write a new BIOS from scratch to support whatever hardware is used at that time.

2- Indirect Jumps via the JMP table and vectored routines:

- Yes, I agree that it's not necessary and in fact, does add 3 clock cycles and 3 bytes of code for each BIOS function it front ends. However, it gives me the luxury of moving the BIOS code around for a different hardware requirement and all accesses are maintained in page $FF. The soft vectors in Page $03 are useful for adding other ISR routines for new hardware either pre- or post- processing. As I intend to add new hardware over time, it gives me some flexibility and it will be easy to insert additional routines. Also note that the (current) BIOS supports all available functions of the 2691. Size, including all I/O space (160 bytes), vector and HW config space (96 bytes allocated), JMP table (only 16 of 32 are used) and code is still less than 1KB of the currently allocated 2KB space for BIOS and I/O.

Much of the comments below are in the spirit of console use for the Monitor code and Basic interpreter. The Monitor also uses the UART for Xmodem-CRC upload/download.

3- CHRIN/CHRIN_NW and CHROUT routines:

- The CHRIN_NW routine was specifically added for special use, like Enhanced Basic, which requires the Carry flag to indicate data is available. Also, by using the ICNT variable, the time required to return when no data exists is very quick. This directly affects the performance of Enhanced Basic and would likely do the same for any other code that requires a CHRIN_NW routine. The current BIOS handles this in 14 clock cycles, or 17 clock cycles with the JMP table. As the Monitor program uses the BIOS via the JMP table, I would need to handle the loop (waiting for command/data input) in the Monitor code versus the BIOS. With clock cycle counts, my current BIOS is pretty quick.

- The CHROUT routine is also to provide output for the console and of course, handle Xmodem-CRC transmission. I do however, like your routine for handling a full buffer queue and the WAI instruction (which only exists on the WDC parts). The BIOS for the 2691 UART (and the terminal software) is configured for RTS/CTS handshaking, so if the queue is full and waiting for data transmission, it will simply loop waiting for the UART to send data and open up an available queue location. Once again, being used as a console port, there's not much else that could happen with the Monitor (or Enhanced Basic) if the transmit queue is full and waiting to be able to handle another byte of data. If there was a second serial port being used for something else, your routines are a more sensible approach.

- One last comment on my CHRIN/CHROUT routines is timing. I've tried to optimize both the ROM space and clock cycles each takes to execute. They are fairly small in size and pretty quick. Even with the JMP table indirect, they require 32/46 clock cycles. I also took an approach of activating the transmitter every time CHROUT puts data in the queue. While I could probably save a few clock cycles sensing it first, I opted not to and just take the small clock cycle hit and save some bytes in ROM as well.

4- ISR routines:

- The ISR routines are always a good area to optimize operation. My ROM routine handles saving and restoring of the registers, so that's not required in any of the 2691 support routines. There is a 25 clock cycle time to get to the interrupt vectored code, plus another 18 clock cycle time to finish up to RTI after the vectored return to ROM. Counting all clock cycles, receiving a data byte from the receive queue will take 106 clock cycles. With a 6MHz CPU clock, cycle time is 166.67 nanoseconds. Doing some simple math, one pass thru the BIOS’ UART receive will take about 16.67 microseconds. At 115.2K bps baud rate (11,520 interrupts per second), the time required to send or receive one byte of data is about 86.81 microseconds. Unless some bit of code has masked off interrupts, it’s going to be very difficult to get more than one byte in the RHR, as the ISR routines services the UART in a fraction of the time required. The same goes for the transmit as well, which is 105 clock cycles to send a byte from the queue. There’s also the timer/counter which is setup as a jiffy clock at 100 interrupts per second.

- Needless to say, running a slower CPU clock, down around 2MHz, and one might realize some issues where your UART_RCV routine would come in handy, albeit it does require more clock cycles to execute per queue entry. In my testing to date, the Monitor itself as well as Enhanced Basic has not had any issues running at any configured baud rate, from as low as 1200 to 115,200. I would note that Xmodem-CRC receive with S-Record support likely slows things down a bit, as everything is calculated realtime (i.e., no tables for CRC checking). However, having the console interface configured for RTS/CTS handshaking, there’s no data loss with Modem-CRC upload or download. A simple test to download a 27KB S-Record file shows a bit of it. At 38.4K baud rate, it took about 12 seconds to download. At 115.2K baud rate, the same file took 8 seconds.

- Some tests using Enhanced Basic were also interesting. Using the text based Mandelbrot posted by Bill O some time ago, there was no advantage in program execution running at 38.4K versus 115.2K, as Basic couldn’t generate output quick enough to run up the transmit queue.

Having said (written) all of this, I might try and do a test BIOS version that will leverage your routines (time permitting) and see what differences I might realize. I do have a SC28L92 which I’ll be working into my next project, so I’m thinking your routines will be more applicable to support two UARTs. Again, many thanks for the detailed review.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 07, 2019 6:59 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8406
Location: Midwestern USA
floobydust wrote:
So, in my defense of why I wrote the BIOS the way I did, here’s some feedback and details, albeit not in any particular order...

There's nothing for you to defend. If the software functions the way you want it to function then you've accomplished your goal. My intention was to highlight areas where some tweaking might help in various ways...food for thought, as it were.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: