Page 1 of 2

Historical question about ASCII

Posted: Tue May 02, 2017 2:42 am
by Dan Moos
I've always wondered this.

Is there a reason that the ASCII codes 0-9 do not map to the same characters? As in ASCII 0 is "0", 1 is "1", ect...

Its just an annoyance to deal with when converting strings to numbers today, but on the computers of yesterday, wouldn't the extra step of subtracting "0" from the characters to make the numbers match have been a bigger deal computation-wise?

The only problem I see is that having "0" be 0 is that you couldn't have a NULL character. Is that reason enough? As I write this, that small thing does seem like a good reason I guess.

Anyone know how it came to be as it is?

Re: Historical question about ASCII

Posted: Tue May 02, 2017 4:53 am
by GARTHWILSON
I sure wish the A came right after the 9, so you wouldn't have to test in the conversion of hex numbers, just subtract $30.

Re: Historical question about ASCII

Posted: Tue May 02, 2017 5:30 am
by Arlet
GARTHWILSON wrote:
I sure wish the A came right after the 9, so you wouldn't have to test in the conversion of hex numbers, just subtract $30.
That wouldn't help me, because I generally prefer lower case :D

Re: Historical question about ASCII

Posted: Tue May 02, 2017 5:30 am
by BigDumbDinosaur
Dan Moos wrote:
I've always wondered this.

Is there a reason that the ASCII codes 0-9 do not map to the same characters? As in ASCII 0 is "0", 1 is "1", ect...

Its just an annoyance to deal with when converting strings to numbers today, but on the computers of yesterday, wouldn't the extra step of subtracting "0" from the characters to make the numbers match have been a bigger deal computation-wise?

The only problem I see is that having "0" be 0 is that you couldn't have a NULL character. Is that reason enough? As I write this, that small thing does seem like a good reason I guess.

Anyone know how it came to be as it is?
It's a long story and I suggest you do some searching, starting with Baudot code, which is the distant ancestor of ASCII that was intended for use with nineteenth century telegraph systems. Also, read about teleprinters, which were an essential tool of news services such as API and Reuters for many years.

Despite the seeming incongruities of ASCII with character sets, there is a method to the madness. Only those of us who have been around long enough to have worked with Tele-Type machines and Friden Flexowriters consider ASCII to be 100 percent logical. :D

Re: Historical question about ASCII

Posted: Tue May 02, 2017 5:32 am
by Arlet

Re: Historical question about ASCII

Posted: Tue May 02, 2017 10:55 am
by Rob Finch
I was wondering at one point why the newer standard Unicode doesn't support things that appear in a keyboard stream like cursor controls. I got talking to a Unicode expert about it one day and they seemed to have a reasonable explanation. So I made up my own set of virtual keycodes for one project.
http://www.finitron.ca/Documents/virtual_keycodes.html
I'd like to know what is the standard for virtual keycodes ?
ASCII is an older code which is great when characters fit into six or eight bits. But for any apps that need to be internationalized a wide code like Unicode is required.

Re: Historical question about ASCII

Posted: Tue May 02, 2017 9:22 pm
by BigDumbDinosaur
Rob Finch wrote:
ASCII is an older code which is great when characters fit into six or eight bits. But for any apps that need to be internationalized a wide code like Unicode is required.
Actually, there is only one form of ASCII, called "US-ASCII," and that is seven bits to the datum. ASCII does not define the meanings of data that are in the range $80 to $FF, inclusive. ASCII was strictly a product of the forerunner of the American National Standards Institute, hence the US-ASCII moniker. INCITS uses that reference to avoid confusion with informal extensions to the ASCII set, as well as ASCII-like enhancements, such as Unicode.

Unicode produces a bulkier data stream than ASCII and in situations in which the alphanumeric set plus punctuation and control codes is all that is needed, ASCII will be substantially more efficient and economical of bandwidth. For example, transmitting binary data in Intel or Motorola hex form can be done solely with seven bit ASCII, using only numerals, uppercase letters A-F and a few control codes (typically <CR>, <LF> and <EOT>). Western languages that use only the Latin alphabet are transmittable in ASCII and even when lacking some diacritical marks, are usually intelligible to native speakers. Unicode was primarily developed to handle Latin characters with diacritical marks, such as ü, å, etc., localized characters, such as Æ, as well as the characters found in non-Latin alphabets, e.g., Cyrillic.

Incidentally, one of the reasons the control codes in the ASCII set are $00-$1F is the mechanism in Tele-Types was arranged to recognize the low bit patterns as control functions, not printing characters. As ASCII evolved, this characteristic was accommodated so Tele-Types could be used as computer I/O devices. The spread between "0-9", "A-Z" and "a-z" exists because two bits decide if the character set will be numerals, uppercase letters or lowercase letters. It all makes sense in the contexts of doing case conversion or determining if a user typed a numeral or a letter of either case.

Re: Historical question about ASCII

Posted: Tue May 02, 2017 9:38 pm
by BigEd
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky - unless you regard 8 bits as more than 7 bits, which you might!

Re: Historical question about ASCII

Posted: Tue May 02, 2017 9:56 pm
by BigDumbDinosaur
BigEd wrote:
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky...
Correct on UTF-8, which parallels ASCII, but supports the $80-$FF range. I was referring to UTF-16.
Quote:
- unless you regard 8 bits as more than 7 bits, which you might!
Funny you mention that. We think in terms of eight bits to the byte, yet in serial communications, seven bits continue to be used in setups that use only ASCII, hand-held serial bar code scanners being one such case. As another example, I have here in my office a Welch-Allen ST6980 magnetic strip reader (MSR, aka credit card reader) that is interfaced to a TIA-232 port on my Linux software development machine. The reader uses seven bit data format, which means, in theory, the UART has less work to perform to serialize and deserialize a datum being exchanged with the MSR. :D However, I suspect that any performance gain in that regard will be vanishingly small. :D

Re: Historical question about ASCII

Posted: Tue May 02, 2017 10:01 pm
by BigEd
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)

Re: Historical question about ASCII

Posted: Tue May 02, 2017 10:37 pm
by commodorejohn
BigDumbDinosaur wrote:
However, I suspect that any performance gain in that regard will be vanishingly small. :D
Well, at the same baud rate, assuming a fairly standard-ish packet with one stop bit and one start bit per character, it's an 11% increase in throughput.

Re: Historical question about ASCII

Posted: Wed May 03, 2017 12:26 am
by BigDumbDinosaur
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)
I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.
commodorejohn wrote:
BigDumbDinosaur wrote:
However, I suspect that any performance gain in that regard will be vanishingly small. :D
Well, at the same baud rate, assuming a fairly standard-ish packet with one stop bit and one start bit per character, it's an 11% increase in throughput.
You are confusing the data rate in bits per second with baud rate—the two are not directly related. Baud refers to the symbol rate on the medium, not the data transmission rate. In the case of telephone modems, baud rate is often a fraction of the bit rate due to the encoding scheme being used. For instance, a typical analog telephone link that spans more than one central office cannot support frequencies much above three kilohertz. If the baud rate of a pair of modems using that link was the same as the practical bit rate you'd have a 3 Kbps link, neglecting the effects of errors. The 56K rate achieved with the V.90 standard was the result of using an advanced encoding scheme that allow many bits to be exchanged within a single symbol, using 8000 baud coming to the subscriber and 3429 going from the subscriber.

On a hardwired TIA-232 link, baud rate and bit rate are the same. If a pair of short-haul modems is introduced, as was a once-common arrangement in large factories and adjacent buildings sharing a common host, the baud rate between the modems will usually be lower than the bit rate on the TIA-232 connections to the modems, since the bandwidth limits of analog telephone lines still apply.

Incidentally, a format of seven data bits and two stops bits is possible with most serial devices, as is seven data bits, one stop bit and parity, the latter which is often used with MSRs and bar code scanners.

Re: Historical question about ASCII

Posted: Wed May 03, 2017 6:51 am
by BigEd
BigDumbDinosaur wrote:
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)
I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.
UTF-8 is really rather clever and interesting - I think perhaps you don't know what it is. Well worth looking into!

Re: Historical question about ASCII

Posted: Wed May 03, 2017 7:10 am
by rwiker
BigDumbDinosaur wrote:
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)
I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character
.
This is incorrect. UTF-8 and UTF-16 are different encodings of the same character set, Unicode. UTF-8 encodings can be up to 6 bytes long (I think - it's been a while since I looked at the details), but the US-ASCII subset of Unicode needs only one byte for each character.

Re: Historical question about ASCII

Posted: Wed May 03, 2017 9:48 am
by Bregalad
Quote:
I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.
UTF-8 fully supports traditional Chinese.
Quote:
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky - unless you regard 8 bits as more than 7 bits, which you might!
UTF-7 comes to the rescue, then.