Well, since I've rewritten the basic startup procedure and replaced all of the interrupt code plus character in/out routines (in my latest SyMon update), I started looking at execution times for sending and receiving a character and the overall percentage of CPU time it takes to manage an Async controller. For reference, I'll put the code execution time here and then show the code used to service the 6551 chip.
There are a handful of routines which initialize the soft vectors and the I/O devices and all interrupt service routines use indirect jumps to page $03 to get in and out of the ROM routines. This costs some time (10 clock cycles) but allows flexibility to intercept all of the interrupt processes (NMI/BRK/IRQ). The CHIN and CHOUT routines manage the 128-byte circular buffers for receive and transmit. In the case of CHOUT, it also checks a flag to see if the 6551 is enabled for XMIT IRQ and if not, turns it on. The IRQ service routine manages data in and out of the chip and the buffers inversely from the CHIN/CHOUT routines. It also handles the BRK instruction and a null character as a software panic break. The transmit routine also turns off the XMIT IRQ once the buffer empties out and updates the flag pointer. All pointers are in page $00 and the dual 128-byte buffers are page $02.
I've done the calculations for service times in clock cycles:
IRQ Service Routine:
7 clock cycles IRQ response latency (from current executing instruction to jump to the IRQ vector)
48 clock cycles for ROM IRQ pre-processing and post-processing including both indirect JMPs tp page $03
Shortest time thru IRQ vector (not 6551 as cause) = 7 clock cycles (62 total for non 6551 IRQ)
RCV routine branch = 15 clock cycles
XMT routine branch = 23 clock cycles
CTS Error (fall thru IRQ decision tree) = 31 clock cycles (86 total for CTS error)
RCVCHR routine: LO/HI = buffer wraparound = No/Yes
Clock cycles = 41/42 for no XMT bit on (1011/112 total for no xmit)
Clock cycles = 40/41 drop to XMTCHR routine (110/111 total for drop to xmit)
XMTCHR routine: LO/HI = buffer wraparound = No/Yes
Clock cycles = 32/33 if xmit stays on (110/111 total if XMIT left on)
Clock cycles = 44/45 if xmit to be turned off (122/123 total if XMIT gets turned off)
Clock cycles = 19 if no data left, turn off xmit and exit (97 total)
Character routines:
CHIN - 47/48 clock cycles to get character from buffer
- input loop is 6 clock cycles per then add 47/48 per above
CHOUT - 54/55 clock cycles if XMIT already on, 67/68 if XMIT needs to be turned on
- output loop is 6 clock cycles per then add 54/55 or 67/68 as per above
Pretty sure I got them right on clock cycles. If you use standard 19.2K bps on the 6551 chip, that's about 1920 interrupts per second when sending data continuously. That results in an IRQ generated by the 6551 every 521 microseconds. Looking at the number of clock cycles to send a character (assuming ideal conditions, once the XMIT is turned on), it takes 110 clock cycles per character and you add 1 cycle every time the buffer wraps around. That's 211215 clock cycles to send 1920 characters of data in 1 second of time. If sustained at this rate, that would take just over 21% of the CPU bandwidth for a 1MHz 65C02. Add in the CHOUT routine time to put the characters in the buffer and you need to add another 54 clock cycles per character and add 1 cycle for buffer wraparound. That's another 103695 clock cycles to put 1920 characters into the buffer. And another 10% of the CPU bandwidth. In total, 314910 clock cycles per second for sustained transmit (under ideal conditions), or about 31.5% of the CPU bandwidth. Add in sustained receive and things get worse, albeit the code can do a RCV/XMIT in one IRQ service under ideal conditions.
Two things come out of this, going to higher speed with a 6551 style chip (i.e., no on-board FIFOs) becomes a performance inhibitor. Second, it makes sense to run at much faster clock rates when possible. The same code doing the same function running on an 8MHz 65C02 would only require about 4% of the CPU bandwidth. For reference, here's the pertinent routines:
Code:
IRQ_VECTOR ;This is the ROM start for the BRK/IRQ handler
PHA ;Save A Reg (3)
PHX ;Save X Reg (3)
PHY ;Save Y Reg (3)
TSX ;Get Stack pointer (2)
LDA $0100+4,X ;Get Status Register (4)
AND #$10 ;Mask for BRK bit set (2)
BEQ DO_IRQ ;If not set, handle IRQ (2/3)
JMP (BRKVEC) ;Jump to Soft vectored BRK Handler (5) (24 clock cycles to vector routine)
DO_IRQ JMP (IRQVEC) ;Jump to Soft vectored IRQ Handler (5) (25 clock cycles to vector routine)
;
IRQ_EXIT ;This is the standard return for the IRQ/BRK handler routines
PLY ;Restore Y Reg (4)
PLX ;Restore X Reg (4)
PLA ;Restore A Reg (4)
RTI ;Return from IRQ/BRK routine (6) (18 clock cycles from vector jump to IRQ end)
;
Code:
BRKINSTR PLY ;Restore Y reg
PLX ;Restore X Reg
PLA ;Restore A Reg
STA ACCUM ;Save A Reg
STX XREG ;Save X Reg
STY YREG ;Save Y Reg
PLA ;Get Processor Status
STA PREG ;Save in PROCESSOR STATUS preset/result
TSX
STX SREG ;Save STACK POINTER
PLA ;Pull RETURN address from STACK then save it in INDEX
STA INDEX ;Low byte
PLA
STA INDEXH ;High byte
JSR CR2 ;Send 2 CR,LF to terminal
JSR PRSTAT ;Display contents of all preset/result memory locations
JSR CROUT ;Send CR,LF to terminal
JSR DISLINE ;Disassemble then display instruction at address pointed to by INDEX
LDA #$00 ;Clear all PROCESSOR STATUS REGISTER bits
PHA
PLP
BREAKEY2 LDA #$7F ;Set STACK POINTER preset/result to $7F
STA SREG
STZ ITAIL ;Zero out input buffer and reset pointers
STZ IHEAD
STZ ICNT
BR_NMON BRA NMON ;Done interrupt service process, re-enter monitor
;
BREAKEY PLY ;Pull Y Reg (4)
PLX ;Pull X Reg (4)
PLA ;Pull A Reg (4)
CLI ;Enable IRQ (2)
BRA BREAKEY2 ;Finish Break Key processing (2/3)
;
;new full duplex IRQ handler
;
INTERUPT BIT SIOSTAT ;xfer irq bit to n flag (4)
BPL REGEXT ;if set, 6551 caused irq,(do not branch) (2/3) (7 clock cycles to regexit if not)
;
ASYNC LDA SIOSTAT ;get 6551 status reg (4)
AND #%00001000 ;check receive bit (2)
BNE RCVCHR ;get received character (2/3) (15 clock cycles to jump to RCV)
;
LDA SIOSTAT ;get 6551 status reg (4)
AND #%00010000 ;check xmit bit (2)
BNE XMTCHR ;send xmit character (2/3) (23 clock cycles to jump tp XMIT)
;
;no bits on means cts went high
LDA #%00010000 ;cts high mask (2)
;
IRQEXT STA STTVAL ;update status value (3) (31 clock cycles to here of CTS fallout)
;
REGEXT JMP (IRQRTVEC) ;handle old irq (5)
;
BUFFUL LDA #%00001100 ;buffer overflow (2)
BRA IRQEXT ;branch to exit (2/3)
;
RCVCHR LDA SIODAT ;get character from 6551 (4)
BEQ BREAKEY ;If Break character, branch to Break Key process (2/3)
;
RCV0 LDY ICNT ;get buffer counter (3)
BMI BUFFUL ;check against limit, branch if full (2/3)
;
LDY ITAIL ;room in buffer (3)
STA IBUF,Y ;store into buffer (5)
INY ;increment tail pointer (2)
BPL RCV1 ;check for wraparound ($%80), branch if not (2/3)
LDY #$00 ;else, reset pointer (2)
RCV1 STY ITAIL ;update buffer tail pointer (3)
INC ICNT ;increment character count (5)
;
LDA SIOSTAT ;get 6551 status reg (4)
AND #%00010000 ;check for xmit (2)
BEQ REGEXT ;exit (2/3) (41 if exit, else 40 and drop to XMT)
;
XMTCHR LDA OCNT ;any characters to xmit? (3)
BEQ NODATA ;no, turn off xmit (2/3)
;
OUTDAT LDY OHEAD ;get pointer to buffer (3)
LDA OBUF,Y ;get the next character (4)
STA SIODAT ;send the data (4)
INY ;increment index (2)
BPL OUTD1 ;check for wraparound ($80), branch if not (2/3)
LDY #$00 ;else, reset pointer (2)
;
OUTD1 STY OHEAD ;save new head index (3)
DEC OCNT ;decrement counter (5)
BNE REGEXT ;If not zero, exit and continue normal stuff (2/3) (32 if branch, 31 if continue)
;
NODATA LDA #$09 ;get mask for xmit off / rcv on (2)
STA SIOCOM ;turn off xmit irq bits (5)
STZ OIE ; zero pointer (3)
BRA REGEXT ;exit (3) (13 clock cycles added for turning off xmt)
;
Code:
;CHOUT subroutine: takes the character in the ACCUMULATOR and places it in the xmit buffer
;this routine also preserves the character sent in the A reg on exit (standard one did also)
; new routine to work with the new IRQ service routine for the 6551
; now transmit is IRQ driven and buffered
; the output buffer is fixed at 128 bytes, so buffer management is added
;
CHOUT PHY ;save Y reg
OUTCH LDY OCNT ;get character output count in buffer
BMI OUTCH ;check against limit, loop back if full
;
PHP ;Save IRQ state
SEI ;Disable irq
LDY OTAIL ;Get index to next spot
STA OBUF,Y ;and place in buffer
INY ;Increment index
BPL OUTC1 ;Check for wrap-around ($80), branch if not
LDY #$00 ;Yes, zero pointer
;
OUTC1 STY OTAIL ;Update pointer
INC OCNT ;Increment character count
BIT OIE ;Is xmit on?
BMI OUTC2 ;Yes, operation done
;
LDY #$05 ;Get mask for xmit on
STY SIOCOM ;Turn on xmit irq
DEC OIE ;Update flag
;
OUTC2 PLP ;Restore IRQ flag
PLY ;Restore Y reg
RTS ;Return
;
;CHIN subroutine: Wait for a keystroke from input buffer, return with keystroke in A Reg
; new routine to work with the new IRQ service routine for the 6551
; the input buffer is fixed at 128 bytes, so buffer management is replaced
;
CHIN LDA ICNT ;Get character count
BEQ CHIN ;If zero (no character, loop back)
;
PHY ;Save Y reg
PHP ;Save CPU flag set
SEI ;Disable IRQ to work with buffer pointers
LDY IHEAD ;Get the buffer head pointer
LDA IBUF,Y ;Get the character from the buffer
INY ;Increment the buffer index
BPL CHIN1 ;Check for wraparound ($80), branch if not
LDY #$00 ;Reset the buffer pointer
CHIN1 STY IHEAD ;Update buffer pointer
DEC ICNT ;Decrement the buffer count
PLP ;Restore previous CPU flags (IRQ)
PLY ;Restore Y Reg
RTS ;Return to caller with character in A reg
;
Overall the code has been very solid. Outside of removing the indirect jumps for soft vectors, I don't see much of a way to streamline the code any further, sans increasing the buffers to 256 bytes each which would would streamline the buffer management a bit. My next board will hopefully be running at 8MHz and using a console chip running at 38.4k, but not the 6551 chip. Comments welcome.