Here's a cool way to provide a 65c02 system with an ultra-fast fast, 5-bit output port.
There are some limitations -- beginning with the fact it only works for 65
C02 -- but the speed utterly buries any conventional approach! The
total program overhead to write to a port is
a single cycle -- twice as fast as the purported 2-cycle minimum for a 65xx instruction. With a 14 MHz system doing 14 mega-operations per second you could be very brisk indeed when talking to SPI, to name just one example!
![Cool 8)](./images/smilies/icon_cool.gif)
The key lies in the use of "illegal" (actually just NOP) opcodes in the _3 and _B columns of the opcode map.
As drawn below, the circuit responds to the one-cycle NOP's in both the _3 and _B columns of the opcode map. The lowest three bits of these opcodes always equal
011. And the high 5 bits are "Don't Care"... to the cpu, but
not to us!
All 32 of the _3 and _B column opcodes are single-cycle NOPs (except only 30 for WDC; see note below). These 1-byte, 1-cycle NOP's produce no dead cycles and are free of bizarre effects like those seen for undefined opcodes on NMOS. When a Col_3 or Col_B NOP appears in the instruction stream, the 'C02 will instantly swallow it and proceed to fetch another opcode
on the immediately subsequent cycle. They truly are one-cycle NOPs, and all we have to do is...
- detect them, and
- capture the Don'tCare bits for our own use.
I've drawn the circuit as though implemented with 74xx building blocks, but programmable logic is an obvious alternative. In either case the key is simply to detect the coincidence of SYNC (indicating a 65c02 opcode fetch) with a certain binary pattern on the low bits of the data bus. The circuit detects binary xxxxx011 on the data bus. At the end of any such bus cycle, the high
5 bits of the data bus are copied all at once into the 74_377 register. For example, binary opcode
00000011 will output
00000 to the port, opcode
11111011 will output
11111 and so on.
Notice that
this is not memory- mapped IO; no address space or address decoding is used.
Attachment:
Ultra-fast_65c02_output_port.gif [ 4.69 KiB | Viewed 9715 times ]
One note regarding the 5-bit port:
- unlike Rockwell's 'C02, the WDC W65C02S has 2 legitimate opcodes in the _B column: $CB (WAI) and $DB (STP). When using a WDC chip you'll probably wanna avoid executing those. And you may choose to avoid all of Column B, aiming for Column 3 only. This yields a 4-bit (not 5-bit) output port.
Notes pertaining to the 5-bit port:
- the value that's output doesn't come from a CPU register. That's OK if you know in advance (ie; at assembly time) what value to output. But if the values are computed at run-time then you'll need to use self-modifying code or some sort of Case construct.
- WDC details all the 65C02 NOPs, including multi-cycle NOPs and the one-cycle NOPs used here, in Table 7-1 of their 'C02 data sheet. From experiments I performed, I know all NOPs on the Rockwell 'C02 have the same behavior as WDC's, and it seems a virtual certainty that 'C02s from Synertek and other manufacturers are the same.
- Interrupts will not be recognized while a one-cycle NOP (or a string of such NOP's) is executing. Interrupt are recognized on the first non-one-cycle instruction that follows. More info here.
- For a fast port scheme that allows input as well as output, visit Garth's circuit potpourri page, here.
Finally,
what would be the application for a circuit like this? The first thing that comes to my mind is accelerated
Bank Switching ! Using the one-byte output instructions might be a great way of updating memory mapping circuitry. (In fact I think ElEctric EyE and I discussed this a while back -- and Nightmare Tony presently has an application as well.) I myself prefer to avoid bank switching, but there's no denying that in some situations it can be a satisfactory solution. And having a supremely small and fast output instruction helps!
-- Jeff