6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Jun 15, 2024 10:37 pm

All times are UTC




Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Apr 29, 2020 1:33 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Problem: With traditional console protocols (eg. VT-100, WYSE, Acorn VDU), it is not possible to unambiguously determine whether or not you are in the middle of a control sequence, and thereby synchronise to the correct control/data sense if you begin receiving in the middle of a session. In a hobbyist context this could easily happen due to accidentally power-cycling your terminal device or jostling the serial cable.

With some console protocols, it is also necessary to maintain a table of how many bytes each command sequence consumes, even if that command sequence is not implemented in that particular terminal. This is an unnecessary complication and source of bugs for small, low-cost terminals. Some commands require a relatively large number of parameters, which may have moderately complex encodings.

Solution: UTF-8 to the rescue!

UTF-8 itself is designed as a self-synchronising protocol, in that it is unambiguous whether any given UTF-8 byte is a standalone codepoint (bit 7 clear), the beginning of a multi-byte sequence describing a codepoint (bits 7 and 6 both set), or the interior of a multi-byte sequence (bit 7 set, bit 6 clear). Additionally, several byte values are never found in a valid UTF-8 sequence containing only Unicode text; eg. $C0 and $C1 would represent the beginning of "overlong" encodings of 7-bit codepoints, which are defined as invalid. By defining our protocol as consisting of a modified UTF-8 sequence, we can exploit both of these properties.

The UTF-8 encoding can accommodate codepoints that are up to 31 bits long (as 6-byte sequences), although there are spare byte values that could theoretically extend this to 36 or even 42 bits (7 and 8 byte sequences). This means we can use UTF-8 itself as a variable-length encoding for command parameters, freeing us from the tyranny of having to make early design tradeoffs in that area.

Design Theorem: Commands begin with a byte below 32. Subsequent bytes of a multi-byte command sequence will always have bit 7 set (using overlong encodings if required). Each command sequence is followed by a byte with bit 7 clear, which may itself be the beginning of a new command sequence. Therefore, the protocol is self-synchronising with respect to command sequences.

With the above in mind, we can define a generic console protocol with basic capabilities, which can be extended at will. The same encoding is used in both directions, although most commands are meaningful only for the host-to-terminal direction. Receivers must be prepared to receive invalid sequences of any kind, and should resynchronise if that occurs. Receivers should also be prepared to receive parameters that are out of range for their capabilities, or are encoded oversize, and to process them in some reasonable manner.

  • $00 - NUL - A one-byte command which does nothing. Can be used to guarantee that the end of a command sequence is detected. Should be inserted between a multi-byte command sequence and a printable character from outside the ASCII range. Should also be the first byte sent by both host and terminal after each is reset.
  • $05 - ENQ - Enquiry. Used to obtain status and capability readback from the other device. Replies shall use the ACK command format. The only mandatory form of this command has one parameter, an encoded version of ENQ, which requests a human-readable identifier of the other device. Other values for this parameter allow determining whether commands are supported (by whether an ACK or a NAK is returned), and for reading back state settable by a command (eg. use HT to obtain the current column of the text cursor).
  • $06 - ACK - Acknowledge. Contains the response to any command requiring one. The first parameter contains the command code which triggered the ACK. The remaining parameters, if any, are specified by the triggering command.
  • $07 to $0D - BEL to CR - Standard ASCII meanings. On the terminal, cursor movement occurs without erasure. The terminal sends CR to indicate that the Return key was pressed, and codes $08 to $0B to indicate that cursor keys were pressed.
  • $15 - NAK - Negative Acknowledge. Indicates that a command was not recognised. Should not be sent for invalid encodings, only for valid encodings of unknown meaning. There must be at least one parameter, containing the triggering command code. If some form of that command code is supported, further parameters should describe the command up to the point of invalidity.
  • $16 - SYN - Synchronise. May take any number of parameters. The receiver must respond with an ACK command with the SYN reason, followed by the given parameters verbatim.
  • $1A - SUB - Reinitialise display. All persistent state is reset to the default condition; the entire display is cleared and made available for text output.
  • $1B - ESC - In the terminal-to-host direction, indicates that the Escape key was pressed. In the host-to-terminal direction, may be used to convey a legacy VT escape sequence, with the suffix characters in the sequence conveyed as parameters.
  • $20 to $7E - Printable ASCII characters. The terminal sends these to represent keypresses. The host sends them to be displayed at the cursor, which is then advanced to the next position on the same line. If a character would extend beyond the edge of the display if printed at the current cursor, the cursor is first moved to the beginning of the next line, scrolling the text area of the display if necessary to create a new line to display on.
  • $7F - DEL - Backspaces the cursor and erases the character there.
  • $80 to $BF - UTF-8 multibyte character suffixes. Each conveys the next-most-significant 6 bits of a codepoint in its least-significant end.
  • $C0 and $C1 - Start byte of overlength UTF-8 encodings of values $00-$7F. May be used in parameters to commands.
  • $C2 to $DF - Begins valid UTF-8 two-byte encodings of values $080-$7FF. May be used both as parameters to commands and as printable text.
  • $E0 to $EF - Begins valid UTF-8 three-byte encodings of values $0800-$FFFF. May be used both as parameters to commands and as printable text.
  • $F0 to $F7 - Begins UTF-8 four-byte encodings of values $010000-$1FFFFF, some of which are valid Unicode codepoints. May be used both as parameters to commands and, where valid, as printable text.
  • $F8 to $FB - Begins UTF-8 five-byte encodings of values $200000-$3FFFFFF. May be used in parameters to commands.
  • $FC and $FD - Begins UTF-8 six-byte encodings of values $4000000-$7FFFFFFF. May be used in parameters to commands.
  • $FE and $FF - Reserved for future expansion.

Encoded parameters may be either signed or unsigned integers, as defined for each command. Signed integers use two's complement representation, and are sign-extended from however many bits are specified in the UTF-8 encoding chosen by the sender. For example, in the six-byte encoding, an $FD initial byte would be used for negative numbers, and $FC for non-negative numbers.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 29, 2020 2:38 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10827
Location: England
Interesting!

I think, then, that if an unsynchronised decoder starts off seeing bytes with the high bit set, it needs to ignore them in case they are command parameters - but they could be examples of "printable character from outside the ASCII range." (Or do I misunderstand?)

Once synchronised, there's no ambiguity.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 29, 2020 2:45 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
A terminal coming online mid-stream could print a few bogons if it sees the middle of a command, but will reliably synchronise to the command stream at the end of that command. It could send a SYN command to alert the host that it might not be in the expected state, and then rely on the host to fix things up if it sees fit.

A host in the same situation should use the SYN command, and ignore any high-order character input until the corresponding ACK is seen. Spurious input is more dangerous than spurious output.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 29, 2020 9:14 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8214
Location: Midwestern USA
Chromatix wrote:
Problem: With traditional console protocols (eg. VT-100, WYSE, Acorn VDU), it is not possible to unambiguously determine whether or not you are in the middle of a control sequence, and thereby synchronise to the correct control/data sense if you begin receiving in the middle of a session. In a hobbyist context this could easily happen due to accidentally power-cycling your terminal device or jostling the serial cable.

Speaking as one who has installed and set up hundreds of host-based systems, as well as installed and set up thousands of terminals (and has debugged countless serial interfaces), I respectfully opine you are describing a non-problem.

Assuming the serial interface has proper flow control set up (meaning CTS/RTS, not XON/XOFF) an unplugged cable will automatically cause CTS to be deasserted at both ends, completely stopping all activity. Similarly, if the terminal is off-line for any reason (not powered, in setup mode, etc.) it will deassert its RTS output, which will deassert the host's CTS, causing the host to cease transmitting. The worst that could happen is a couple of boo-boos might appear on the screen when the terminal is powered (when in setup mode, all terminals with which I have experience deassert RTS, thus halting transmission from the host).

Quote:
$00 - NUL - A one-byte command which does nothing. Can be used to guarantee that the end of a command sequence is detected. Should be inserted between a multi-byte command sequence and a printable character from outside the ASCII range. Should also be the first byte sent by both host and terminal after each is reset.

In general, nulls are best avoided in serial communications. In some circumstances, a null may look like a break to the host if a framing error occurs and missing stop bit detection is not implemented. If the host poorly handles a break there is no telling what will happen next (on my POC units, reception of a break causes a spurious interrupt). Avoidance of nulls in the data stream was one of several reasons for the development of data transfer protocols such as Motorola S-record, Intel hex, XMODEM, etc.

Quote:
$16 - SYN - Synchronise. May take any number of parameters. The receiver must respond with an ACK command with the SYN reason, followed by the given parameters verbatim.

A solution in search of a problem.

Quote:
$7F - DEL - Backspaces the cursor and erases the character there.

That's not how <DEL> works. Your suggestion would be contrary to long-accepted industry standards.

In strict terms, <DEL> erases the character under the cursor, and does nothing else (back in my Tele-Type days, we referred to <DEL> as "rub-out," since that was its effect). The behavior you describe is a "destructive backspace," not a delete, and has to be synthesized in software by a sequence of procedures that no terminal with which I am familiar can do in hardware.

Quote:
Solution: UTF-8 to the rescue!

Yet another solution in search of a problem.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 29, 2020 11:50 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I can see situations where it may be advantageous to know where it a command data stream a system may find itself on enabling its receivers and coming out of reset.

Since you're willing to utilize UTF-8, and all of the processing that entails, how about simply considering a MARK parity to indicate the message payload, and a SPACE parity to mark the beginning of a message? This processing can all be built into the interrupt service routines that received/transmit character.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 30, 2020 12:35 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8214
Location: Midwestern USA
MichaelM wrote:
I can see situations where it may be advantageous to know where it a command data stream a system may find itself on enabling its receivers and coming out of reset.

I've been working with this stuff for 50 years and have yet to see a case where that mattered. Again, with proper hardware flow control, the host will not send anything to a receiver that is not on-line.

Quote:
Since you're willing to utilize UTF-8, and all of the processing that entails, how about simply considering a MARK parity to indicate the message payload, and a SPACE parity to mark the beginning of a message? This processing can all be built into the interrupt service routines that received/transmit character.

Modern UARTs cannot easily switch data formats mid-stream due to the use of a transmitter FIFO (NXP, for example, specifically cautions against switching format unless the transmitter FIFO is empty and the transmitter itself has been disabled). In order to be able to use parity as a message-synchronization device, the host would have to disable the FIFO and resort to datum-at-a-time processing, all the while keeping track of what is in the data stream so as to determine when to switch data formats. Naturally, the receiver would likewise have to switch formats at the right time, otherwise it would be processing gibberish.

From the host's perspective, disabling the FIFO would be terribly inefficient. The whole point of having a FIFO is to reduce the interrupt burden on the MPU. Lacking the FIFO, the UART will have to interrupt after each datum has been serialized and sent. The whole system would bog down to the speed at which the UART can dispose of datums.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 30, 2020 9:09 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10827
Location: England
[See my comment elsewhere: if everyone could just for a moment stop trying to prevail, and let ideas stand on their own merit, and allow anyone who finds something worth exploring to do their exploring, I suspect we'd all be happier.]


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 1:42 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
So, I had to check this to be certain, but the behaviour I specified for DEL ($7F) is actually standard behaviour - for the BBC Micro, just like several other codes I included. I think in a 6502 context, that's a pretty common source to crib from. The backspace-and-rubout behaviour is what you'd expect to happen when pressing the corresponding key, just so long as you're not a member of a design committee (or, perhaps, working with mechanical teletypes or punched cards).

This behaviour has also survived to the present day in the Acorn lineage, in the form of RiscOS Open, which you can run natively on a Raspberry Pi. So it's a de-facto standard, just not one that ran on your average minicomputer.

It is possible that including the SYN command is overkill, but it's easy to implement the response for, and you would only want to send it if you found a need for it. The payload is a nonce so that you can disambiguate the corresponding ACK from any previously queued ACKs, which might have been sitting around due to the RTS/CTS lines being disconnected.

And of course there is plenty of room left to add support for additional cursor controls, font redefinition, graphics, and even file transfers.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 2:02 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Chromatix wrote:
The backspace-and-rubout behaviour is what you'd expect to happen when pressing the corresponding key, just so long as you're not a member of a design committee (or, perhaps, working with mechanical teletypes or punched cards).

The use of DEL that I'm familiar with is on paper tape, where you can punch it over any other character to "delete" that character, basically creating a NOP. (Well, 7-bit-wise, anyway.) My feeling would be that it should be made a printing character that clearly indicates "the character at this position has been rubbed out" (the Apple IIe DEL glyph is good for this), but then again, printing terminals (which cannot erase what's already printed) are a concern of mine, which probably makes me quite outside the mainstream in this day and age.

For output to a VDT, I'd probably just use a BS-Space-BS sequence because that's easy and works reliably. For input of the "undo last char input" action I have a probably irrational very strong preference for backspace doing this. But of course in the real world people set it to whatever they like using stty.

Arguing about the "correct" or "standard" interpretation of ASCII control characters seems to me an utterly lost cause; so many have been interpreted in so many different ways over the last fifty years that you can now generate a plausible argument with a lot of support for seemingly almost any interpretation. (And this is why stty has so many options.) Certainly the ASCII commitee wasn't making any real attempt to define this stuff; they didn't even come up with a standard for what character or characters one uses to separate or terminate a line.

I think that this is an interesting and worthwhile discussion, though, because even if it doesn't result in a new standard, it's hepful to get an understanding of the problems we need to deal with when using existing systems.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 2:14 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
BigDumbDinosaur wrote:
In general, nulls are best avoided in serial communications.

Well, some would disagree with you. For many terminals (particularly printing ones), it's the standard padding character used to create time delays.

Quote:
Avoidance of nulls in the data stream was one of several reasons for the development of data transfer protocols such as Motorola S-record, Intel hex, XMODEM, etc.

I'm not buying that. First, XMODEM does not avoid nulls at all; if you have a $00 byte to send it sends it as-is. Further, even if your data has no $00 bytes in it, XMODEM itself might generate a $00 byte for the checksum for certain packets.

Motorola S-record and Intel hex format seem to me pretty clearly to have other design considerations much more important than just avoiding NULs, such as avoiding any non-printing characters other than line termination, ease of parsing, human readability, not to mention incuding address information along with the data. I've not seen any evidence that avoiding NUL was considered to be anything different from avoiding non-printining chars in general.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 2:34 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
One useful property of the above spec is that, when bringing up an SBC, you can ignore all of the multibyte control messages and just send 7-bit ASCII. The terminal will know that these do not form parameters to commands, and won't have any reason to send a SYN for you to ACK. It's actually quite pleasing.

However, I'm going to have to put this (and everything else) on hold for a bit, as I have something to work on in my day job.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 4:00 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Chromatix wrote:
$07 to $0D - BEL to CR - Standard ASCII meanings. On the terminal, cursor movement occurs without erasure. The terminal sends CR to indicate that the Return key was pressed, and codes $08 to $0B to indicate that cursor keys were pressed.

BS/$08 for cursor left is a nice overloading, and not uncomonly used in other systems (Apple II, for example). The same goes for LF/$0A as cursor down, though I don't know if this is as common an overloading.

But that leaves TAB/$09 and VT/$0B for cursor right and cursor up. The latter's not a big deal, being little used, but a lot of terminals have a tab key and it's often well-used in programming editors, both to generate an actual tab character and as an indentiation command.

I suggest leaving TAB/$09 alone and instead using the following assignments:
Code:
$08  BS  Ctrl-H  ←
$0A  LF  Ctrl-J  ↓
$0B  VT  Ctrl-K  ↑
$0C  FF  Ctrl-L  →

This has the disadvantage of conflicting with the not uncommon (but neither overwhelmingly common) convention of FF/$0C (form feed) as indicating "clear screen and reset cursor to upper left-hand corner." But on the other hand, it has advantage of giving you one of (arguably the) best known mappings of letters to directions in the world, usable simply by holding down the control key. And it's even compatible with a particularly popular series of 1970s terminals, as you can see from the attached photo of the ADM-3A keyboard. (The control codes sent to the terminal to do cursor motions were exactly the above, as well.)

While you're at it, it might or might not be worth definining RS/$1E (Ctrl-^ or Ctrl-~) as "move cursor to upper left corner" as the ADM-3A did.


Attachments:
LSI-ADM3A-full-keyboard.jpg
LSI-ADM3A-full-keyboard.jpg [ 380.6 KiB | Viewed 878 times ]

_________________
Curt J. Sampson - github.com/0cjs
Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 5:09 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
For me, that just trades one incompatibility for another. Given the opportunity, I'd rather have the meanings of input characters match those of outputs.

On the BBC Micro, four high-order ASCII codes are produced by the cursor keys ($88 to $8B, with an obvious relationship to the output codes), while Tab produces code $09. However, the Tab key is not given any useful meaning in BASIC and is only really used by applications. And the UTF-8 encoding implies that these high-order codes should map to printable characters.

Turning instead to the VDU output codes, code 31 ($1F) begins the TAB command sequence, used to position the text cursor arbitrarily. I might as well have the Tab key produce that code. I think there are other keyboard input problems to consider later, too - such as function and navigation keys, and querying whether a key is held down.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 5:45 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Chromatix wrote:
For me, that just trades one incompatibility for another. Given the opportunity, I'd rather have the meanings of input characters match those of outputs.

I was suggesting that Ctrl-H/J/K/L have the same meanings for both input and output. Perhaps I'm missing something here?

And I don't really see it as an even trade, since I think very few systems must use H/J/K/I as cursor movement keys (I've never heard of one).

Quote:
On the BBC Micro, four high-order ASCII codes are produced by the cursor keys ($88 to $8B, with an obvious relationship to the output codes), while Tab produces code $09.

If I'm correctly understanding your aim, you'd change the software interface so that someone typing on the keyboard would be sending $08/$09/$0A/$0B codes when they used the arrow keys, though, right?

Quote:
Turning instead to the VDU output codes, code 31 ($1F) begins the TAB command sequence, used to position the text cursor arbitrarily. I might as well have the Tab key produce that code.

You may not be able to on a lot of hardware out there. On the Apple IIe/IIc, for example, where one reads "ASCII" codes from the $C000 keyboard input address, the Tab key will always produce code $09. I've seen plenty of other keyboards with similar hard-wiring of the Tab key, sometimes to the level where one would need to cut traces and add jumpers to the keyboard PCB to change this.

And of course, with terminals on serial lines, even if you can reprogram the terminal's firmware or your terminal program to produce a different code for the Tab key, users may not be terribly willing to do this.

But again, maybe I'm misunderstanding where you're going with this.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Fri May 01, 2020 5:59 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Generally, a keyboard does not itself produce ASCII codes. It produces scancodes, which usually have little if any relation to ASCII. You have to translate them in software, and usually the OS can do that for you.

On the BBC Micro, by default the cursor keys actually produce no ASCII code at all. Instead they are trapped to perform an editing function. But if you issue the command to disable that function, they are translated to special codes which are clearly related to VT, HT, BS, LF - just with bit 7 set. You have to remember that the BBC Micro is not based on a serial terminal, but has a built-in keyboard. The scancodes for these keys, if you want to test them directly, are -58, -42, -26, -122; apparently chaotic.

The low-cost serial terminal hardware this project was inspired by interfaces to a PC-type keyboard. The scancode system in those is completely ridiculous. But we can map it to ASCII and/or some extended system in any way we like. Some of the keys will have no direct ASCII equivalent, and will need to be handled specially.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 27 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: