. . . the synchronous serial port!
I couldn't determine whether to put this in Hardware or in Programming, because it kind of applies to both. But since I'm looking at this from a hardware engineering perspective, I'll place it here.
From what I can see, the only truely
sane way to offer general purpose, mid-speed expandability to a 6502-based system is to use the VIA's serial port.
I will be comparing the asynchronous, hardware-handshake method on port A and the use of the synchronous serial port.
On no occasion will I be taking software protocol overhead into consideration, since they will have comparable effects relative to both. All numbers will assume a 4MHz clocked VIA with a 4MHz 65816 microprocessor running with 8-bit wide index and accumulator registers.
By carefully examining the awfully confusing timing diagrams for the VIA's hardware handshake signals, I've determined that the best possible speed of transfer with the hardware handshake is about 500kBps by discarding the asynchronous nature of the protocol, and assuming the consumer will be able to keep up. The transmitter code looks like this:
Code:
LDA buffer,X ; 4 cycles
STA PORT_A ; 4 cycles
...repeated as necessary....
As you can see, this is very dedicated code, and won't be able to accomodate
any variations in timing whatsoever. Since this generally isn't a good thing to do, we need to introduce some kind of feedback for rate control:
Code:
TX: LDA buffer,X ; 4 cycles
STA PORT_A ; 4 cycles
1$: LDA PORT_B ; 4 cycles
BPL 1$ ; 2 or 3 cycles, depending on bit 7 of port B
INX ; 2 cycles
BNE TX ; 3 cycles mostly
RTS ;19 TOTAL
RX: LDA PORT_B ; 4 cycles
BPL 1$ ; 2 or 3 cycles
LDA PORT_A ; 4 cycles
STA buffer,X ; 4 cycles
INX ; 2 cycles
BNE RX ; 3 cycles usually
RTS ;19 TOTAL
This, under absolute best case conditions, now requires 19 cycles per transmitted byte, dropping the real-world best-case performance to only 210.5 kBps. Still not at all too shabby, all things considering. Assuming we place I/O in direct page somehow, then we can achieve only 17 cycles per byte, giving 235 kBps throughput. Note even this rate assumes that both the producer and consumer are 100% synchronized, and that the CA1 ("data available") signal is tied also to PB7.
Using the VIA's serial port produces some awfully compelling numbers in comparison, especially considering how many fewer pins are required! First, the code to receive or send data:
Code:
; All access to I/O space are assumed to be
; in direct page.
VIABASE = $E0
SR = VIABASE+10
IFR = VIABASE+13
RX: LDA #$04 ; 2 cycles
1$: BIT IFR ; 3 cycles
BEQ 1$ ; 2 cycles or 3 cycles
LDA SR ; 3 cycles
STA buffer,X ; 4 cycles
INX ; 2 cycles
BNE RX ; 3 cycles usually
RTS ;19 TOTAL
TX: LDA #$04 ; 2 cycles
2$: BIT IFR ; 3 cycles
BEQ 2$ ; 2 or 3 cycles
LDA buffer,X ; 4 cycles
STA SR ; 3 cycles
INX ; 2 cycles
BNE TX ; 3 cycles usually
RTS ;19 TOTAL
Regardless of whether or not we are sending or receiving and assuming we have I/O mapped to the direct page space (the 65816 makes this pretty easy by just changing the D register. In this case, the low byte of D must be $00 to get the timing advantage), we take 19 cycles per loop iteration best-case, thus providing a whopping 210.5kBps throughput!
Thus, we can see that using the serial port is
highly competitive with using the parallel port when it comes to raw data throughput, while still using 6 fewer pins at the peripheral adapter interface. Using the CLK-latching circuit to overcome the 6522's serial port bug requires a 74HC4066 chip to perform proper signal routing; PB7 (or any other I/O pin) can be used for this purpose. For example, when low, the serial clock will be configured for input. When high, the serial clock will be configured for output.
With creative addressing and some additional external TTL-ACT or TTL-HC logic, you can increase your data rate by 65% using the exact same code above, but with 16-bit wide registers:
Code:
; All access to I/O space are assumed to be
; in direct page.
;
; VIA is mapped into every *odd* byte starting at $E1.
; This lets every even byte within that range correspond
; to the lower half of a 16-bit word.
VIABASE = $E1
SR = VIABASE+19 ; -1 bias ensures all 16 bits loaded before shifting
IFR = VIABASE+26
RX: LDA #$04 ; 3 cycles
1$: BIT IFR ; 4 cycles
BEQ 1$ ; 2 cycles or 3 cycles
LDA SR ; 4 cycles
STA buffer,X ; 5 cycles
INX ; 2 cycles
INX ; 2 cycles
BNE RX ; 3 cycles usually
RTS ;25 TOTAL
TX: LDA #$04 ; 3 cycles
2$: BIT IFR ; 4 cycles
BEQ 2$ ; 2 or 3 cycles
LDA buffer,X ; 5 cycles
STA SR ; 4 cycles
INX ; 2 cycles
INX ; 2 cycles
BNE TX ; 3 cycles usually
RTS ;25 TOTAL
This software should get us a good 320kBps throughput by sending two concurrent 160kBps serial bytes.
Other feedback signals, such as GPIB's EOI or ATN, require separate signals regardless of the bus width. Therefore, these are treated more or less as fixed I/O pin costs.
The question might now arise, "How do you communicate with devices which might not process data that fast?" With GPIB and related protocols, there is an explicit feedback path to help control flow. What if we didn't ever have to depend on constantly checking some protocol handshake line in software? Instead, what if the device told us what data rate it can support without error
up-front, instead of having us find out every single character? It, in some respects, would make a lot of things easier.
This can be generalized too; if you know you're going to be addressing multiple listeners on the bus (to borrow from GPIB's vocabular), then naturally, you must use a set of timings that can accomodate
all the listeners being sent to. Use the minimum bit transmission rate, the maximum inter-byte gap, etc.
Discovering this information would be part of some higher-level protocol, and to facilitate it, all devices would need to handle some guaranteed bit-rate and byte-rate as a baseline. Once learned, of course, this can be cached for future use so that bus overhead is minimized by not having to re-determine this information every time.
Note that using the VIA's hardware handshaking features, port A must be dedicated to one and only one other device. Attempting to drive the CA signals with more than one peripheral requires making the Data-Available and Data-Received lines open-drain, which can impact maximum throughputs. The serial interface, however, can be arranged as a
true bus, thus allowing one talker and many listeners concurrently (indeed, the C128 burst-mode bus demonstrates this nicely). Other signals are needed for bus arbitration, however, whether the GPIB-like ATN and IFC signals or bus request/grant system.
I'm thinking of exploring this territory further by resurrecting Garth's SS-22 interface and turning it into a viable expansion bus with accompanying high-level protocol that can be considered "standard" for any 6502- or 65816-based project. This should be a fun (and perhaps even useful/beneficial) exercise.