6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 10, 2024 7:34 pm

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Cell Network Hardware
PostPosted: Mon Dec 21, 2020 2:00 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
For too long I've wanted to connect small, inexpensive computers with a network protocol. In ignorance of UART, I thought about the problem in abstract and thought about particular cases. The MVP [Minimum Viable Product] would be something like MIDI for process control. And, actually, why is MIDI not used for process control? One or more numbered devices would receive commands, measure variables and switch relays. It could be run in an open loop or, preferably, in a closed loop. At this point, it should be obvious to anyone that I should have investigated MIDI in more detail. It should also be obvious that MIDI lacks something.

A good start to this process was to devise a protocol with three byte payloads: device, command, payload. The intent was to send three byte triples around a ring. For every triple sent, it is also possible to receive a triple. Meanwhile, specific devices fill a (possibly dummy) payload with an acknowledgement or a measurement. Further, it is possible to decrement device number at each hop. This eliminates all of the tedious device numbering and associated device contention. Also, for a micro-controller, less pins for jumper switches means more pins for I/O.

The astute will notice this scheme is unworkable because there is no mechanism to prevent device, command or payload from being jumbled. The automatic numbering is merely an extra mechanism for packet mangling. It is possible to patch this scheme with reserved values and escape mechanisms but it becomes very inefficient and nothing like the original intention. How do most people solve this problem? Well, they don't. A ridiculous number of protocols require low error rates or no transmission error. If you've even *considered* bit errors then you're ahead of some people. If you're lucky, a protocol might have a 16 bit checksum. It might even be a good one. However, for a high volume of data or widespread deployment this is demonstrably inadequate.

I was stuck but not vanquished. Along the way I learned about parity, CRC polynomials, Hamming code, Reed Solomon, Viterbi, LDPC, Turbo codes, Raptor codes, Fountain codes, FEC and various aspects of oversampling. However, this is all very abstract in absence of wire format encodings, such as UART start/stop bits, Manchester encoding and bit stuffing. Techniques for bit stuffing may seem overly pedantic to a programmer but they are essential when handling the maximum size of a magnetic domain, the minimum length of a pit on a pressed Compact Disc or the accumulated charge over long distance cabling. Indeed, these are all related problems which may require mutually incompatible solutions.

Bit stuffed encodings typically use combinatorics and this may leave free variables or unused encodings. As someone trying to convey binary data, I initially ignored this quirk but I eventually realized that it solves problems further up the network stack (ISO seven layer or otherwise). Specifically, bit stuffing may provide a frame marker which prevents accidental mangling or malicious packet-in-packet attack.

The reason for ignoring this matter was encapsulation. I preferred each layer to be fully encapsulated because it decreases coupling and increases portability and the generality of the solution. In practice, it is common for one loose end to be used further up the stack. Obviously, one of the upper layers now depends a loose end of some form. This may lead to ridiculous hacks which can most generously be described as loose end encapsulation. The most ridiculous part is that we have an inversion of concerns and that we are encapsulating the upper layer from the vantage of the lower layer. One of the worst examples is AAL5 which is used to encapsulate Internet Protocol or raw Ethernet over ATM. If you've ever used copper or fiber broadband then you've very probably used AAL5. However, the one dangling bit used to determine the end of a sequence of fixed length cells is an offense to my sensibilities. It was sufficiently bad for me to devised something with better encapsulation and better channel capacity. (AAL5 requires 8-10 bytes of a 48 byte payload. Oh, plus one bit.)

This was enough to prod me a devise a network stack from the wire format upwards. Starting from the crudest 4/5 bit stuffing (one bit of every nybble is Manchester encoded), I was able to devise a 256 bit cell format which is ideally suited to 8 bit computing due to the use of, er, 8 bit counters. It has a 16 bit frame marker and three sets of 80 bit payloads. Each of these contain 64 bits of raw data. That's 24 bytes per cell and it is therefore a miniature version of ATM. With a minor alteration, I switched to 8/9 bit stuffing and gained enough space to add a Hamming code. This allows one bit fix per 80 bit payload. In optimistic cases, this allows multiple bit errors to be fixed.

This was deeply satisfying because I was now devising an almost universal PAN/LAN/WAN format which can also be adapted to tape or disk storage with minor alteration. However, there is only a finite amount of research which can be completed before embarking upon a project. And much of my research determined that I had spent an atypical amount of time on research. However, I might have saved time by concentrating on 1980s home computer formats. In particular, after implementation, I found the state diagram of the 6854 network adapter used in AppleTalk and Acorn's EcoNet. It is not often that I could have drawn the diagram myself.

Unfortunately, my work has determined why serial formats developed in such a piecemeal manner. A software-only, bit-banged magnetic/PAN/LAN/WAN format requires about 6.5KB ROM. It also requires a horrendous amount of processing power. If a device is doing nothing else, this is feasible. For example, when loading from tape. However, the same system is infeasible to obtain mouse position. This would explain at least some of the proliferation of serial formats. However, some of it is willful incompatibility. For example, CAN bus was developed for similar reasons to USB: to contain a proliferation of incompatible connectors and protocols. And it might have been acceptable for an industrial standard to develop one year ahead of a consumer standard. However, civil aircraft and military aircraft already provided two incompatible standards which were suitable in cars and lorries.

There's a babel of incompatible formats out there. It is absolute madness. There are the wired network formats, the wireless network format, the infrared formats, the tape formats, the floppy disk formats, the hard disk formats. Anyone with any sense has retreated to clocked serial protocols. However, even here there is incompatibility. Geeks argue about the merits of big-endian and little-endian (or the lesser known middle-endian). Geeks also argue about Ethernet and ATM sending the bits in a byte in opposite order. And for the clocked protocols, geeks argue about clocking on the rising edge, the falling edge - or both. If there were two ways to breathe, a geek would find a third method.

That's how we get ADB, SIO, IWM, PS/2 mouse, MIDI, AppleTalk, EcoNet, X10, DMX512, DALI, iLink, I2C, I2S, RC-5, IrDA, Ethernet, ATM and thousands of more obscure protocols. Unfortunately, when much of this was rolled into USB, the wire format was so awful that it required two incompatible replacements while failing to incorporate developments from FireWire and ThunderBolt. Radio formats are no better. BlueTooth and Zigbee may sound exotic but these 802.15.1 and 802.15.4 variants are merely 802.11 with incompatible headers, incompatible packet size and incompatible authentication running on incompatible radio frequencies. The net result is an increased attack surface with no appreciable gain.

So, my answer to this problem is add to the dung pile and define more protocol. Any fool can define a wire format with CRC in 8KB ROM. I am one such fool. Now try doing the same in 2KB or less. By the time 8KB became affordable, protocols had already fragmented. This would account for much of the difficulty in historical systems. For example, a superior tape format was planned for Galaksija but it required too much space and an inferior protocol was substituted. And why was firmware so expensive? Assuming four transistors per bit of ROM, 8KB requires 2^18 transistors. It is cheaper to implement dedicated hardware with a large number of useless options. More recently, I've found that:

  • A large number of protocols use a random doubling of 75Hz, 225Hz or 25MHz.
  • Protocols start at a random speed and get faster. Anything slower is always outside of the scope of the protocol.
  • Protocols start with a random size address-space and often require address extension.

This leads to inanity such as iLink with an 8 bit address-space, I2C with a 7 bit address-space, DALI with a 6 bit address-space, RC-5 with a 5 bit address-space and HDMI slow bus with a 4 bit address-space. Do we have any bids for a 3 bit address-space or smaller?

I have plans to avoid the numerous limitations of the past. Unfortunately, the result looks like a random mix of IPv4, 802.15.4, USB, SNA and SONET with a PS/2 connector. I apologize in advance. I initially planned to use a 7 pin Mini-DIN connector and allow mobile devices to share energy. However, due to the proliferation of PS/2 devices, 6 pin connectors are noticeably cheaper than other variants. Unfortunately, this requires some features to be dropped. Regardless, if the connectors are physically compatible with PS/2 it is worthwhile to investigate electrical compatibility. Specifically, it is possible for negotiate the protocol up from a PS/2 keyboard connection while hubs are expected to encapsulate legacy keyboards and mice.

While it is nice to think about an alternate history where CAN bus never existed and likewise for the "Universal" Serial Bus with its nine connectors, three wire formats and numerous optional extensions, there is the small matter of implementation.

I initially considered using a long chain of shift registers. Actually, a very long line of shift registers because I am trying to implement a 256 bit cell format. Assuming that I use 8 bit latching shift registers, that would be 32 chips to send and 32 chips to receive. Oh no, actually, due to the Nyquist limit, we have to 2x oversample or maybe 3x to compensate for the fun things like ring and ground bounce. I also thought that the magic frame marker could set input latches and trigger an interrupt. Obviously, bit errors interfere in this process but this is the bootstrap version. So, that's a minimum of 128 chips per channel and we probably want a minimum of four channels per computer. Ye gads!!! That's at least 512 chips. How does MyCPU solve this? MyCPU is a computer loosely based on 6502 and made from discrete components, oh, except the network interface which uses a micro-controller.

This is looking like an intractable problem. I am quite certain that it is possible to dash off a short circuit description and deploy it on FPGA. However, I hoped for discrete hardware implementation which didn't look like a DRAM shortage. And that's the answer. The design has a large amount of needless redundancy. After consideration of video display, and the minimal implementation of a Sinclair ZX Spectrum in particular, I realized that parallel to serial conversion only requires one 8 bit latching shift register. 3x oversampling can technically be performed with one chip but it may be preferable to collect each phase of sampling separately. The remainder of the exercise is FIFO and maybe the occasional interrupt. Yup, that's it. We're not conforming to any other standards beyond UART at one speed.

Does this work? Can we just count to 768 and interrupt a processor? Yes. Framing can be implemented in software. The device driver for the mag/PAN/LAN/WAN interface holds the previous 32 bytes and the current 32 bytes before looking at the most likely locations for a frame marker. This may drift around by one phase as sender and receiver clocks drift. Indeed, this technique is tolerant to bit errors in the frame marker while Hamming codes also allow bit errors in the cell payload. To reduce processor load, it may be desirable to provide a specialist barrel shifter which decodes the 8/9 wire format. This unit would be shared by all channels but is not essential.

Anyhow, the minimal implementation requires less than 20 chips and some of this is shared across multiple bi-directional channels. It is possible to start each channel in PS/2 compatibility mode then switch to 31kHz mode, 1MHz mode or faster. By running each port at different frequencies, it is possible to massively oversample UART with, for example, 256 samples for 9600 baud 8N1. This would be processed in the same manner as a 256 bit cell. Specifically, processing would occur with reference to the previous 256 samples across three phases.

Some people are surprised by some implementation details. For example, Hamming code is applied to encoded data. This allows a network switch to optionally re-generate a payload without decoding it. Indeed, on the least powerful devices with the least memory, such as 6502 with 2KB RAM, it may be preferable to buffer and route encoded 32 byte cells rather than decoded 24 byte payloads. Another surprise is the choice to not interleave the 80 bit fields. This would provide extra resilience against periodic noise, such as mains switching. However, I regard the transmission rate as incidental and specific sources of interference are most problematic at specific frequencies. It is preferable to provide an implementation in which cell size may be varied or Hamming codes may be applied in hardware at wire speed. These considerations are simplified by not interleaving. Bit stuffing is not current balanced but it guarantees a signal transition every 9-10 bits. Although 64/66 encoding and similar provide more channel capacity, 8/9 encoding provides easier options for framing and phase locking.

Finally, I discovered why MIDI is unsuitable for process control. It doesn't enforce checksums and is therefore unsuitable for critical applications.


Attachments:
cell-network1-0-1.odg [8.08 KiB]
Downloaded 44 times

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!
Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 21, 2020 8:10 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
Relating especially to your first paragraphs above: You may want to see the things we considered in the topic where we developed the 65SIB (6502.org Serial Interface Bus), at viewtopic.php?f=4&t=1064 . A minor thing I'm not too fond of is the 20-conductor ribbon. Sam Falvo was also going to make a version of it that involved serializing the address in order to cut the number of conductors down and use a smaller connector. I'm not aware of any completion of that project on his part. 65SIB allows, but does not require, a lot of intelligence for autoconfiguration and different things, but can also be used all the way down to dumb shift registers for I/O. It's quite flexible, IMO.

MIDI has a good thing in the fact that it's serial (not requiring many conductors, although I detest those DIN plugs!) and that it's optically isolated.

I think we talked about HP-IL in the topic above. It uses pulse transformers, so devices can have their grounds separated by even a thousand volts, without trouble. It goes in a loop of up to over 900 devices (with extended addressing; 31 devices with simple addressing), has total error-checking without complex algorithms of any kind (because every message gets passed all the way around the loop so the originator can compare to what it sent and see if it matches), auto-addressing, could pass control around, and a lot of other benefits. Cables were simple 2-conductor zip cord. There's no reason the idea could not have been taken to much higher speeds, even with optical fibers and long distances.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 28, 2020 12:25 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I've become a reluctant fan of serial protocols. Reading through the forum archive disabused me of the notion of making a 19 inch, 10MHz, parallel backplane like the MyCPU project. Ignoring signal skew, reliability concerns or the ultramagnetic shizzle going everywhere and making the normies complain, the clincher is energy consumption. The energy to drive the bus, per signal line, per card exceeds the reasonable energy consumption of a processor core. So, it is possible to switch from a parallel shared bus to a point-to-point serial bus, add cores and reduce the overall energy budget. Latency may suck but, hey, overall it is cheaper and faster.

Among serial protocols, there is common bus or point-to-point and there is clocked or self-synchronized. Anyone with a shred or sanity appears to have retreated to clocked serial because it eliminates concerns about baud rate and over-sampling. However, clocked serial has a concern which is analogous to little/middle/big endian. Specifically, data may be clocked on rising/falling/both edges. This is in addition to bit ordering and unusual word sizes which is common across many clocked and self-synchronized protocols.

In the forum archive, I've seen reference to 6 pin, 10 pin and 20 pin IDC clocked serial protocols. I am strongly dis-inclined to use the 6 pin or 10 pin options because they are physically compatible yet electrically incompatible with SPI used for Atmel AVR ICSP. However, the 20 pin 65SIB protocol is gorgeous and is my preferred choice for interoperable clocked serial above 1Mb/s. I particularly like the rotation of the fully decoded select lines (reminds me of Amiga floppy drives) and the graceful degradation of the automatic configuration system. I would dearly like an ITX motherboard with 20-way ribbon cable to a front panel. This would be particularly suitable for multiple MicroSD cards. Data transfer rate would be reasonable and it would only incur MicroSD's typical bit error rate.

I suggest that 65SIB can be complimented with unclocked tape, floppy, keyboard, mouse, modem and differential, unshielded local network up to 1Mb/s with the ability to fix 1:80 bit errors and reject more significant errors. This system would also permit a two phase commit with deferred time offset. In the context of MIDI, this allows two or more instruments to play notes at exactly the same time. Using latency trickery, it is possible to do this with 70% packet loss. While 65SIB is trivial to interface to 6522 or similar, I suggest a spiritual successor to 6854 networking which can be implemented with discrete DIP components and requires, at most, deep CMOS FIFOs to reduce interrupts. Indeed, with a processor stacked configuration, one core can alternate between raster display and networking while the other core drives 65SIB continuously. For the FPGA enthusiasts, I presume that a downwardly compatible implementation can run much faster. Also, I believe that it provides more channel capacity at the same bit rate than 100MHz Ethernet or AAL5 over 155MHz ATM.

Regarding connector, omni-directional is preferable but these are typically used for audio, video or power. Regardless, I am keen to reduce the hassle of reaching to the back of equipment and attempting to fit a round connector which has one orientation. Or the worse case with USB where the most likely orientation is tentatively tried, the reverse orientation is worse and it fits on the third attempt. I wanted to use 7 pin mini-DIN because a future iteration could be crimped to allow two-way or three-way rotational symmetry. This arrangement would also allow an optional center pin to selectively share power without incurring losses from a diode bridge. However, due to cost and device compatibility, it is preferable to abandon these considerations and agglomerate the remainder onto PS/2.

I thought that HP-IL's 900 address system was unusual. I presume it uses 5 bit addressing and 5 bit address extension minus reserved addresses. I also presume that the address extension is a loop of loops otherwise it would be like fixing a string of 900 Christmas lights. I hope to implement four tiers of smaller loops without address extension. In the trivial case, to an end-user, this looks like USB with the exception that there no address exhaustion, up to 16 ports per hub, PS/2 connectors and PS/2 keyboard/mouse compatibility. To a programmer, each port is a network with 32 bit addressing and no fussy time-outs.

Quote:
total error-checking without complex algorithms of any kind (because every message gets passed all the way around the loop so the originator can compare to what it sent and see if it matches)


That works in the general case. Unfortunately, there is a really awkward case where bits are flipped and the same bits are flipped back. Rare? Maybe. But how rare? How catastrophic? And how should it be avoided?

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 28, 2020 8:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
Sheep64 wrote:
I've become a reluctant fan of serial protocols. Reading through the forum archive disabused me of the notion of making a 19 inch, 10MHz, parallel backplane like the MyCPU project. Ignoring signal skew, reliability concerns or the ultramagnetic shizzle going everywhere and making the normies complain, the clincher is energy consumption. The energy to drive the bus, per signal line, per card exceeds the reasonable energy consumption of a processor core. So, it is possible to switch from a parallel shared bus to a point-to-point serial bus, add cores and reduce the overall energy budget. Latency may suck but, hey, overall it is cheaper and faster.

Among serial protocols, there is common bus or point-to-point and there is clocked or self-synchronized. Anyone with a shred or sanity appears to have retreated to clocked serial because it eliminates concerns about baud rate and over-sampling.

Without taking the time to look into it right now, I believe what HP-IL did, with its pulse transformers, was to send a positive pulse followed by a negative pulse for a '1', and vice-versa for a '0', with a dead space between bits but not between the two pulses of any given bit. Then a separate clock line is not needed, and you could have any number of 1's or 0's in a row, and a wide range of speed could be accepted as long as it's within the capability of the transformers.

Quote:
In the forum archive, I've seen reference to 6 pin, 10 pin and 20 pin IDC clocked serial protocols. I am strongly dis-inclined to use the 6 pin or 10 pin options because they are physically compatible yet electrically incompatible with SPI used for Atmel AVR ICSP.

That's always a problem; but a major goal of 65SIB, SPI-10, and I2C-6 was to be able to use commonly available connectors that could go in standard perfboard with holes on .100" centers, to make them extra hobbyist-friendly. At least they're keyed. Hot-pluggable would always be nice, but I couldn't find any suitable OTS connectors for this. Such a connector would make sure that the ground connections were mated first, followed by power, and then the signal connections, as the connectors are mated. Unplugging would be the reverse.

Quote:
However, the 20 pin 65SIB protocol is gorgeous and is my preferred choice for interoperable clocked serial above 1Mb/s. I particularly like the rotation of the fully decoded select lines (reminds me of Amiga floppy drives) and the graceful degradation of the automatic configuration system.

I'm glad you like it. :D It has been on my to-do list for a long time to make and offer a tiny PCB with a pair of the connectors (an IN which goes toward the controller, and an OUT which goes toward the next device) with the break-out for one address, and maybe another with two or even three addresses, and a regulator which could be set for 5V, 3.3V, or custom voltage, plus bypass capacitors. This would reduce the labor of building 65SIB devices.

Quote:
Regarding connector, omni-directional is preferable but these are typically used for audio, video or power. Regardless, I am keen to reduce the hassle of reaching to the back of equipment and attempting to fit a round connector which has one orientation.

Something like phone plugs do not require any orientation. 3.5mm ones are now commonly available up to four conductors and are very compact, but we're back to the problem of the jacks not fitting into common perfboard, among other problems.

Quote:
I thought that HP-IL's 900 address system was unusual. I presume it uses 5 bit addressing and 5 bit address extension minus reserved addresses.

I have never had any reason to look into extended addressing, as I have never had more that six or seven devices connected at once, and three or four of those were IEEE-488 equipment accessed by looking through the HP82169A HPIL-to-HPIB (IEEE-488) interface converter which was pretty much transparent from the user's perspective.

Quote:
Quote:
total error-checking without complex algorithms of any kind (because every message gets passed all the way around the loop so the originator can compare to what it sent and see if it matches)

That works in the general case. Unfortunately, there is a really awkward case where bits are flipped and the same bits are flipped back. Rare? Maybe. But how rare? How catastrophic? And how should it be avoided?

I can imagine such a case, where one error is followed by another one that happens to be exactly the opposite, so the overall effect appears in the end that there was no error; but I have never experienced an error in any serial communications of any kind, except when I made an ultra-cheap cassette tape modem and ran it at double the intended bit rate to see how often I'd find an error, and get some idea of the safety margin. It turned out that at double the rate, I got an error once every several hundred bytes, or maybe it was one every couple thousand bytes. I don't remember for sure. This used a 65c51 ACIA (UART) at 300bps which made the modem alternate between 1200 and 2400Hz. At 300bps I never, ever saw an error. At 600bps, there was the occasional error, but it was a lot less than I expected. We didn't need much speed anyway.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 15, 2021 11:41 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
URL: viewtopic.php?f=4&t=6414
Subject: Re: Cell Network Hardware

It occurred to me after devising my trivial (4,5) wire encoding that the design veered towards Commodore floppy disk format. However, I did not seriously consider a wire format which was optimized for Galvanic isolation between network nodes. The reason for this is quite simple. The standard technique for 10/100Mb/s Ethernet, MIL-STD-1553 and, apparently, HP-IL, is to use Manchester encoding (or a variant thereof). However, this consumes a minimum of 1/2 channel capacity.

Arguably, all serial protocols use less than 1/2 channel capacity because they either use a second synchronous channel (clocked serial) or oversampling plus fore-knowledge of the baud rate (unclocked serial). The use of Manchester encoding (to provide Galvanic isolation) discards 1/2 of the remaining channel capacity. It is possible to recover most of this second stage loss by using the undifferentiated magnetic formats pioneered by Commodore, Apple and others. Many of these techniques provably work when bit banged on 1Mhz 6502.

Ignoring the inherent losses of clocked/unclocked serial, the suggested wire format consumes exactly 1/4 of the channel capacity (64 bits out of 256 bits) while providing more resilience and throughput than Ethernet or ATM. The suggested cell header requires 1/2 of the first cell and 1/4 of all subsequent cells. In the optimistic case, channel capacity at the application layer approaches 9/16 and is typically better than 6/16. This exceeds the raw channel capacity of any format which uses Manchester encoding. I did not choose an (8,9) encoding which was most suitable for a pulse transformer because this would have required 9 bit look-up table. Likewise, I did not choose Commodore's (4,5) GCR because the step backwards to (4,5) encoding sacrifices placement of the Hamming code. The Hamming code can be re-introduced at a subsequent stage. However, this requires an additional 1/8 of the channel capacity. This is sufficient to skew the figures such that the application layer never attains 1/2 channel capacity. This erases one of the distinctions of a 24 byte cell format over 1500 byte Ethernet packets.

I don't encourage a proliferation of wire formats. (That is the forte of USB.) However, if Galvanic isolation is utterly critical to an application then it is possible to choose a non-trivial (4,5), (8,9), (16,18) or similar encoding. All have advantages and dis-advantages.

The trivial (8,9) encoding passes through a pulse transformer. However, this does not occur with maximum speed nor fidelity. These shouldn't be concerns given that the proposed second iteration remains slower than 10Mb/s Ethernet. I have instead maximized emphasis upon framing and error handling and placed moderate emphasis upon channel capacity knowing that the result is broadly compatible with 6502, Zener clamping and opto-isolation while also being a programmer friendly extension of PS/2. It is possible to maximize emphasis upon electrical isolation but it'll be at the expense of something else, such as cell size, channel capacity, error correction or it may require look-up tables which exceed 128KB.

Regarding connectors, I suppose that I want a six or seven connector jack which is physically incompatible with all other jacks or falls back to handling the numerous incompatible uses of a three of four connector jack. This includes stereo audio/composite video and the three or more mutually incompatible configurations for headphones/microphone. This is not impossible because more options will become available. For example, I believe that Apple patented a 1mm flat jack which works like a key. This is ideally suited for multiple connections in one or more orientations. Like keys, there are multiple options to physically exclude incompatible devices. Apple doesn't appear to license this patent to anyone but it'll be available to all sooner or later. Indeed, I look forward to the logical extension of a flat jack implementing the reversible hot-swap sequence of ground, power, data in the manner of rotating a key in a lock.

GARTHWILSON on Mon 28 Dec 2020 wrote:
At 300bps I never, ever saw an error. At 600bps, there was the occasional error, but it was a lot less than I expected.


Firstly, you are to be commended for following the Kansas City Tape Format so closely. Life would be easier if everyone followed standards so closely.

Secondly, I quite believe that it is possible to operate at 300bps for years without error. The problem is Moore's law and the related lambda, such as Butters' law, Hendy's law and Huang's law. For example, early versions of Ethernet ran at 4.77MHz. Nowadays, 40Gb/s Ethernet is becoming fairly common. If error per bit remains constant but bit-rate vastly increases, error becomes more likely than not. I note an anecdote from SlashDot from more than 10 years ago. Someone imaged an 80GB harddisk. 80GB was copied over two hops of an office network: from laptop, to switch, to server. The MD5 checksum on the server differed due to the false positive rate of the 16 bit CRC found in Ethernet and TCP/IP. In a more recent example, people are strongly discouraged from using RAID5. Read error per sector is approximately 1/10^10 or somewhat better. However, nowadays, storage clusters may have more than 10^10 sectors. Therefore, error is almost guaranteed during sequential scan, such as occurs during RAID rebuild. And, with RAID5, parity is present within the system - except during rebuild.

Anyhow, bits look increasing analog when there is an increasing quantity of them. How can this problem be overcome? Short answer: If the volume of data increases by 1000 then the CRC must grow by 10 bits to maintain the same rate of false positives. General answer: Over-specify error detection and error correction to accommodate future expansion. Of course, this leads to the next question: How is it possible to implement robust, scalable error detection and error correction on 6502?

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Fri May 21, 2021 11:13 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I may have a reputation for being drawn to dual core, triple screen systems and it should not be a surprise that I am also drawn to double clocked, quad SPI. I've noted that self-clocked serial (such as UART) incurs a Nyquist sampling limit and therefore never uses more than half of the channel capacity. I also noted that clocked serial, with two wires is no efficiency gain. However, there are gains with, for example, one clock and four data lines. These gains are lost as a bus becomes wider due to signal skew, cross talk and the reliability of so many insulated connections.

Double clocked serial is an illusory gain because the staggered clock and data remain subject to Nyquist limit. Nonetheless, the ability to atomically clock one byte over four data lines has a particular allure. Even in the trivial case of single clocked, single channel SPI, there is the very practical advantage that communication may be interrupted mid byte, switched to an unrelated device and then continuing. Double clocked, quad SPI would allow this without bit shuffling. In a practical application, I believe that 6502 may read or write to MicroSD at more than 1MB/s. This would be faster (and more deterministic) than a Raspberry Pi's MicroSD interface.

The ability to clock 1, 2, 3(?) or 4 wires on either edge or both edges would possibly allow:-

  • Any practical number of 65SIB ports.
  • Any practical number of MAX3100 SPI-to-UART bridges, possibly via 65SIB.
  • Any practical number of MicroSD volumes, possibly via 65SIB.
  • Any practical number of I2C devices via 4052 or similar.
  • Any practical number of Arduino devices or similar, via UART, SPI or I2C.
  • Any practical number of Commodore serial IEEE448 devices or similar.
  • Any practical number of cell network interfaces.
  • Zero or more PS/2 keyboards.

I don't want to force a bad fit. Therefore, I am least inclined to include bi-directional protocols, such as I2C. That is the intention but the mere problem remains of implementing double clocked, quad SPI. It should be fairly obvious that the solution is likely to include one 74x157 two nybble multiplexer but I was otherwise stuck. Indeed, I was concerned that a fixation on one component was hindering a solution. I was also concerned that good theory was hindering a solution. I may have high fallutin' theory 'bout 4-colorable graph, 3-address machines (and triggernometry). Discard all of that if you wish to simultaneously read and write one byte of I/O with, for example, LDA $PP00,X where RegX is written somewhere in page $PP while RegA is simultaneously read. However, a multiplexer and a 256 byte I/O region remains a partial solution.

The final advance came from clock stretching. With arbitrary scale clock stretching, it is possible to pause a processor and run through a fixed number of states. This works because we do not treat the clock stretching circuit as a sealed unit. Instead, counter outputs may be decoded in a similar manner as address decode. In this case, +64 wait state is divided in the following manner:-

  • Sacrifice one wait state for video phase compatibility.
  • Write first nybble of RegX to SPI out using 74x157.
  • Wait 16 phases.
  • SPI clock transition.
  • Wait 16 phases.
  • Write second nybble of RegX to SPI out. Latch first nybble of SPI in.
  • Wait 16 phases.
  • SPI clock transition.
  • Wait 16 phases.
  • Processor resumes and reads 8 bits into accumulator.

I might have to swap nybbles or SPI clock polarity. Regardless, discrete implementation, excluding multiplexing, requires a maximum of eight chips, although some of these may be shared with other subsystems. Specifically, it requires two chips for clock stretching, two chips for state machine, two chips for buffered output and two chips for buffered input. This may form the basis of all system I/O except audio and video. Even here, cases are divided. Of most interest to me, I now have the option of implementing cell network channels using little more than shift registers per channel and common, multiplexed 4 bit or 8 bit FIFOs - one for input and one for output. This may share or replace all UART circuitry.

Nonetheless, I share GARTHWILSON's enthusiasm for MAX3100 SPI-to-UART - and not only because multiple units may be accessed via 65SIB or my own circuit design. To the uninformed, MAX3100 may look like a protocol bodge. However, SPI (low voltage, high speed, synchronously clocked serial, traveling a short distance where physics is suitably constrained) is interfaced to UART (self-clocked with one less wire, over longer distance, at lower speed, at higher voltage where the physics is less favorable). This is not a protocol bodge. It is a matter of practicality. With generalized serial I/O, use of such device may be a corollary of other functionality.


Attachments:
quad-spi0-0-1.pdf [121.63 KiB]
Downloaded 52 times
quad-spi0-0-0.odg [8.85 KiB]
Downloaded 55 times

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!
Top
 Profile  
Reply with quote  
PostPosted: Thu May 27, 2021 12:28 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
In my quest for a general design, I overlooked the very obvious fact that MicroSD only has eight pins. With power, ground and clock, it is very obvious that MicroSD QSPI involves bus turn-around. I am therefore much more inclined to support bi-directional protocols, like I2C, if it also covers this case.

I have not investigated in detail but it has become apparent to me that the major source of MicroSD flakiness is an incredibly weak 16 bit checksum which may be downward compatible with 20MB IDE harddisk. This was known to be insufficient for 80GB and many consider MicroSD with less than 512GB to be laughably small. I would suggest using end-to-end 32 bit checksums or better on top of MicroSD. However, this only covers data. If data *and* meta-data use weak checksum then data may be scribbled anywhere (or nowhere) within the storage volume. Any significant use of MicroSD - without mirroring - is guaranteed to lose data. This may also be true for SATA and USB storage.

After publication, I re-considered the indexed write problem related to 6502 I/O. It is worth mentioning because my proposed design is in the corner case which is unaffected. 6502 indexed read which incurs carry from address low byte to high byte or indexed write (or 65816 in emulation mode doing same) incurs a dead cycle which will read or write junk. This is not a problem if the target address is ROM or RAM. However, it may be disastrous for I/O. A read indexed from a page boundary always escapes this problem. Likewise for a read constrained within a page. The suggested design narrowly avoids this problem because it "reads to write" - and does so from a page boundary. This was one of the original design goals along with "It'll probably use a 74x157." However, I cannot remember everything at all times. Thankfully, my peculiar design choices continue to meet this constraint.

While confirming that my design was not fatally flawed on 6502 or 65816, I found that 16 bit reads are possible on 65816. This was not an intended feature but it is a fortuitous bonus. Writes remain confined to 8 bit and may duplicate values. (Unless you wish to allocate 64KB or more of a 65816 memory map.) Reading 16 bit values to accumulator, possibly with no more than XBA opcode to byte swap IDE cruftiness, is exactly the type of operation which would benefit in a downwardly compatible manner from a processor feature test. In the common case of reading an even number of bytes on 65816, it is possible to do this one case faster - and do it in a manner which remains compatible with 6502.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 11, 2021 5:37 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
My investigation of serial protocols continues without bounds and now includes NES/SNES joypad protocol, WS2812B digital LEDs, KIM-1 Tape Format, SPDIF and SCN-485.

I wrongly assumed that PS/2 keyboard protocol was a full duplex UART protocol. That would be sensible. The 11 bit, half duplex protocol (clocked on opposite edges for input and output) looks like it was designed by a bunch of complicators for the specific purpose of peripheral incompatibility. Anyone attempting to implement this protocol has been duped. The sensible solution to this "Gourdian knot" is to hack of the keyboard controller and replace it with one counter chip, two row decode chips and three shift registers.

My most significant find comes from Plagurist Central. I quite enjoy reading Reddit's Arduino section and Ben Eater section; mostly because it is a constant "train wreck" of mis-applied technology. Occasionally, someone asks a question which saves me looking like the fool I am. Very occasionally, someone has a bright idea. Use of two 6522 chips to implement SPI is a rare pearl of wisdom which greatly changed my outlook.

When I joined the 6502 Forum, I regarded 6522 as an elegant design but otherwise an over-priced anachronism. I now believe that the ability to scale I/O with the 65xx/68xx bus protocol is greatly under-appreciated. In particular, system-on-chip designs get more expensive per pin as manufacturing yield decreases. Whereas, in the 1970s, I/O got less expensive per pin as the quantity of 6521 PIA, 6522 VIA, 6530 RRIOT, 6532 RIOT, 6551 ACIA and similar chips increased. One chip designs may be cheaper overall but we've definitely lost something along the way. This is especially true when a one chip design fails because the failure cannot be isolated or repaired.

Like violence and XML, if it doesn't work with one 6522, try increasing the quantity. GARTHWILSON has noted that one 6522 is insufficient to implement full duplex, single channel SPI in hardware. However, if CB1 is ganged, one 6522 may shift out while another 6522 may shift in. This allows, for example, easy interfacing with a MAX3100 UART. To reduce cost, one 6522 may be paired with a 74x595 serial to parallel shift register. To save space, a 0.3 inch wide DIP 74x595 may be placed under a socketed 0.6 inch wide 6522. It is an example of a mixed chip stack and chip stacking in various forms is a popular, ongoing topic on the 6502 Forum.

The addition of one 4052 allows clock and bi-directional data to be multiplexed across four channels. The set-up time for a 4052 is horrendous. However, a 4052 is rated to convey a 5MHz sine wave with minimal attenuation. It should be sufficient to convey square waves at 400kHz or considerably more. This generously allows one 6522 to switch between PS/2 keyboard, PS/2 mouse, I2C and half duplex, single channel SPI where one 6522 operates in a bi-directional mode and/or a subordinate 6522 or 74x595 provides the return data path. Optionally, 1-3 74x138 chips provide seven or more 65SIB ports. This resolves my concerns about the best grouping of incompatible protocols. PS/2, I2C and single channel SPI have stronger affinity than single/double/triple/quad channel SPI. And, yes, triple SPI exists in the form of the NES/SNES joypad protocol.

I hoped to use a 256 bit cell networking protocol for PAN/LAN/WAN. This covers a desktop peripheral bus which is intended to be more useful and easier to implement than ADB or USB. Following BigEd's advice that uni-directional streams are easier to implement, I found that input peripherals up to and including the complexity of the virtual reality Mattel Power Glove may variously extend NES/SNES joypad protocol and that such peripherals were developed exclusively for 6502/65816 systems. Going back to my original, unworkable idea of sending byte triples of Device-Command-Data, it is possible to make a 24 bit, daisy-chain, shift register arrangement if there is an explicit framing signal and errors are overcome through repetition. Furthermore, it is possible to implement this with three shift registers and zero firmware within a peripheral. Furtherfurthermore, is it possible retroactively include SNES joypads if meta-data, such as device type and register multiplexing is kept outside of the first 12 bits or so. Further yet, this may be implemented efficiently by ganging any number of 74x595 shift registers to one 6522. Indeed, one common 6522 may implement PS/2, I2C, 65SIB and SNES peripheral protocol. In this arrangement, one unused 65SIB channel may be used to clock all joypads, dance mats, hexpads, keyboards, mice and virtual reality gloves connected to the host - and there is no requirement to connect any particular device to any particular port. Understandably, PS/2 is not the preferred input protocol.

In Feb 2012, I downloaded a legal PDF version of the John Wiley & Sons book Advanced FPGA Design by Steve Kilts. In Oct 2021, I read it; partly to help along jfoucher and Proxy with their projects. Obviously, I should have read it sooner. Indeed, "Four hours in the lab often saves one hour in the library." In my case, I could have saved considerable effort with a working implementation of SPDIF. Sony/Philips Digital Interface Format is a 32 bit cell format intended for audio. It uses Biphase Mark Code which has the same channel capacity as Manchester encoding but has the advantage that it is polarity agnostic and that a violation of the pattern may be a frame marker. Three frame markers are defined: primary left channel, subsequent left channel and right channel. This is sufficient to convey stereo audio with 24 bit PCM - or significantly more channels in a downwardly compatible manner. However, it is also useful outside of this context. For example, BMC is suitable as the link layer of Ethernet and is compatible with a pulse transformer (and therefore works safely when there is an extremely high voltage gradient). Oh, and incidentally, the 24 bit PCM data of SPDIF tallies perfectly with my proposed 24 bit extension of SNES joypad protocol.

Biphase Mark Code is very much like a Boolean version of the Kansas City Tape Standard. Waves of single or double frequency convey zero or one. However, this is a stick-or-twist encoding which alternates with mandatory flips. As examples, 1111 is encoded as 10101010 (alternating pairs of bits) or the compliment. 0000 is conveyed as 11001100 (alternating pairs) or the compliment. 0101 is conveyed as 11010010 - a low frequency pair, a high frequency pair, another low frequency pair and another high frequency pair. Frame markers have three unwavering bits. Given that the shortest pulse and the longest pulse differ by no more than a factor of three and given that the shortest and longest duration pulses always occur within a frame marker, it is trivial to adjust the receiving speed by more than a factor of 1000. Furthermore, it is possible to achieve this with a Boolean grammar which never looks more than two bits ahead. This allows very compact FPGA implementation. In particular, a 6502/65816 system with audio/video FPGA could use SPDIF ports or similar for surround sound, LAN and, perhaps, input peripherals. Indeed, it is possible to make a very capable and accessible desktop computer which only has MicroSD ports, SNES ports, SPDIF ports and one VGA or HDMI connector. If the design is not wholy subsumed into one FPGA, this requires a maximum of four 40/44 pin chips and two 32 pin chips.

It may appear that I have deprecated my own network protocol through the advocacy of SNES protocol and SPDIF. However, SNES protocol is decidedly uni-directional and SPDIF only uses 3/8 channel capacity. A hypothetical bi-directional SPDIF connection could begin protocol negotiation at *any* speed from 4kHz to 4MHz, increase cell size and switch to (8,9) encoding and thereby double the channel capacity before increasing the bit rate. Or maybe skip a few steps along the way and have an FPGA implementation which only provides Rational DAC or SPDIF out for audio. The main problem is that trivial (8,9) encoding does not have the 3:1 pulse length guarantee. A maximum length run of bits may only be statistically likely. Regardless, it remains possible to statistically adjust bit rate such that matching occurs as desired.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: