Ok, I've come up with something completely different.
Looking at the standard interfaces, at 30,000 feet they're all very similar. Assert your position, put data on the bus, clock it in.
EPP requires that you put the bus in to "modes" to read and write, but it doesn't have a mechanism for the client to tell you that it has data waiting. GPIB is designed to arbitrate a bunch of daisy chained devices, a complexity I don't need.
Taking inspiration from something like the CHS376, I don't really need a hardwire bus protocol. I just need to clock data in, and to know when data is waiting.
So, here's what I've come up with:
Pin Definitions with Directions Host -> Processor
--> RST - Reset the co-processor
<-- RDY - Co-Processor Ready
--> CE - Co-Processor enable
<-- DA - Data Available
--> R/W - Read/Write mode
--> HDR - Host Data Ready
<-- CDR - Co-Processor Data Ready
<-> DAT - 8 Data pins
RST is a way for the 65xx to reset the co-processor in to a known state, basically abort any transfers, close any channels, etc.
RDY tells us that the co-processor is ready to go. Since, notably early on, at a minimum we have to wait for Linux to boot -- this can be a while.
CE enables the co-processor. Basically this is a chip select mechanic. It simply lets the CoP know it can pay attention to the rest of the signals.
DA is a signal from the CoP that there is data waiting to be read. This ideally is connected to an interrupt on the 65xx.
R/W tells us if we're reading from or writing to the CoP.
HDR Host Data Ready goes high when the data on the bus is OK or when the Host has clocked in the data.
CDR CoP Data Ready is the CoP side of HDR. Goes high when the data is clocked in, or the data on the bus is stable.
DAT are the 8 data pins.
So, I managed to eat up 15 pins out of 16. I don't really know if there's a need yet to reduce this. CE could be one of the first to go if I needed to, as the plan is to have it on a dedicated set of ports and not sharing with anything.
Here are my crude timing diagrams for the 3 basic transactions. During READ and WRITE, it's basically a game of appropriately cycling the HDR and CDR pins to clock the data.
Code:
RESET sequence
____________________________
CE
______________
RST ____/ \________
________________
RDY ====> _____/
_________
CDR _____________/ \____
WRITE Sequence
____________________________
CE
________________________
R/W ___/
_________
HDR ______/ \___________
DAT --<====================>----
_________
CDR _________/ \________
READ Sequence
____________________________
CE
________
DA ___/ \_______________
R/W ____________________________
_________
CDR ______/ \___________
DAT --<====================>----
_________
HDR _________/ \________
The other part is that there's an underlying protocol to the data being exchanged. I don't have that in detail, but there's a couple of high level concepts. For example, the system know how much data is coming, so there's no "End of packet" token -- it's all just prefixed with lengths.
Another one, which I lifted from IEEE 1284.4, is the idea of "Credit". In 1284.4, Credit is basically "how many packets of data can I send without handshake".
In practice what it means is that the CoP can't send any data that it doesn't have permission for.
The premise here is that it's very easy to visualize having lots of data queued up and ready to send, but the host simply isn't ready for it.
The hardware handshake that we have going is only good for clocking in the actual packets, but not for higher level handshaking.
So, a simple exchange would be for the Host to open a channel on the CoP. Say "open file 'text.txt'", which would give the CoP "credit" of 1 to send back a reply "Ok, file is open on channel 1" (or not). Since the CoP sent a packet, it consumed it's 1pt of credit, and can not send any more. Then, the Host sends "read channel 1", giving the CoP one more credit. When ready, the CoP replies with the first block of data. In order to get more data, the Host is going to have to send another read request.
Now you can also visualize a network socket.
Host tells the CoP to "open socket port 123", CoP says "OK". Next, host sends an "Listen on socket". Later, something connects to the socket, and using the credit of the Listen, sends back a connection. But, if the CoP gets another connection, it can't blindly send back a payload until the Host sends another Listen request.
Now, in theory, there's nothing to say that the Host can't send a Bulk Read to the CoP -- "send me back 4 packets from file XXX". But short term, I just want to enforce request/reply.
The DA pin is to tell the Host there is something to read. As the CoP services requests, it'll queue up replies. When it has one ready, it asserts DA, which can interrupt the Host.
The interrupt handler starts to read the packet, pulls out any meta data (notably the channel number), looks up the buffer location for the incoming packet, and routes the data to that buffer.
The problem is what happens if data becomes available while trying to write data. My thinking is that the DA pin can only be active with the R/W selector is in READ mode.
In order to write data, the system disables interrupts, turns on WRITE mode, in response, the CoP should de-assert the DA pin (if there were any data ready). The Host should probably check and wait for DA to go to zero. After enabling WRITE mode, the Host re-enables interrupts. When finished writing, the Host disables interrupts, turns off WRITE mode, leaving the CoP free reign to enable DA, then it re-enable interrupts. At this point DA can trigger the interrupt on the host.
That's my current scheme. There's a bunch of work at the actual packet level and data handling on both sides. Another nice feature is that I can prioritize I/O. For example, I can tell the CoP that incoming keyboard/terminal is more important than disk data, so when both are ready, the Keyboard I/O can be handled first.
In the end, the CoP can only queue up as much data as there are pending outstanding requests. I can have multiple processes: one loading a file, another waiting on the keyboard, a third listening to a socket. And even though the file has 1000 packets to go, the CoP can only queue up one. So, in this case, I can only have 3 pending I/O requests. Even a single process can have multiple outstanding I/O requests, it just polls them until they've been completed.
But at the same time, you can see that if there is more than one pending I/O request, they're just going to fire the interrupt right one right after the other. As soon as one packet is read, DA is going to go high against pretty much immediately.
But that's ok, that's the way of it. The Host gets to catch up on its own time. If the Host asks for 1000 packets (via the credit system), then odds are pretty high not a lot of other work is going to get done while all that streams over. But, that's the Host's choice to make.
It's starting to take shape.
Thoughts welcome.