6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Oct 01, 2024 11:39 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue May 02, 2017 2:42 am 
Offline

Joined: Sat Mar 11, 2017 1:56 am
Posts: 276
Location: Lynden, WA
I've always wondered this.

Is there a reason that the ASCII codes 0-9 do not map to the same characters? As in ASCII 0 is "0", 1 is "1", ect...

Its just an annoyance to deal with when converting strings to numbers today, but on the computers of yesterday, wouldn't the extra step of subtracting "0" from the characters to make the numbers match have been a bigger deal computation-wise?

The only problem I see is that having "0" be 0 is that you couldn't have a NULL character. Is that reason enough? As I write this, that small thing does seem like a good reason I guess.

Anyone know how it came to be as it is?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 4:53 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
I sure wish the A came right after the 9, so you wouldn't have to test in the conversion of hex numbers, just subtract $30.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 5:30 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
I sure wish the A came right after the 9, so you wouldn't have to test in the conversion of hex numbers, just subtract $30.

That wouldn't help me, because I generally prefer lower case :D


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 5:30 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8408
Location: Midwestern USA
Dan Moos wrote:
I've always wondered this.

Is there a reason that the ASCII codes 0-9 do not map to the same characters? As in ASCII 0 is "0", 1 is "1", ect...

Its just an annoyance to deal with when converting strings to numbers today, but on the computers of yesterday, wouldn't the extra step of subtracting "0" from the characters to make the numbers match have been a bigger deal computation-wise?

The only problem I see is that having "0" be 0 is that you couldn't have a NULL character. Is that reason enough? As I write this, that small thing does seem like a good reason I guess.

Anyone know how it came to be as it is?

It's a long story and I suggest you do some searching, starting with Baudot code, which is the distant ancestor of ASCII that was intended for use with nineteenth century telegraph systems. Also, read about teleprinters, which were an essential tool of news services such as API and Reuters for many years.

Despite the seeming incongruities of ASCII with character sets, there is a method to the madness. Only those of us who have been around long enough to have worked with Tele-Type machines and Friden Flexowriters consider ASCII to be 100 percent logical. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 5:32 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
A bit of explanation:
https://en.wikipedia.org/wiki/ASCII#History


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 10:55 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 452
Location: Canada
I was wondering at one point why the newer standard Unicode doesn't support things that appear in a keyboard stream like cursor controls. I got talking to a Unicode expert about it one day and they seemed to have a reasonable explanation. So I made up my own set of virtual keycodes for one project.
http://www.finitron.ca/Documents/virtual_keycodes.html
I'd like to know what is the standard for virtual keycodes ?
ASCII is an older code which is great when characters fit into six or eight bits. But for any apps that need to be internationalized a wide code like Unicode is required.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:22 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8408
Location: Midwestern USA
Rob Finch wrote:
ASCII is an older code which is great when characters fit into six or eight bits. But for any apps that need to be internationalized a wide code like Unicode is required.

Actually, there is only one form of ASCII, called "US-ASCII," and that is seven bits to the datum. ASCII does not define the meanings of data that are in the range $80 to $FF, inclusive. ASCII was strictly a product of the forerunner of the American National Standards Institute, hence the US-ASCII moniker. INCITS uses that reference to avoid confusion with informal extensions to the ASCII set, as well as ASCII-like enhancements, such as Unicode.

Unicode produces a bulkier data stream than ASCII and in situations in which the alphanumeric set plus punctuation and control codes is all that is needed, ASCII will be substantially more efficient and economical of bandwidth. For example, transmitting binary data in Intel or Motorola hex form can be done solely with seven bit ASCII, using only numerals, uppercase letters A-F and a few control codes (typically <CR>, <LF> and <EOT>). Western languages that use only the Latin alphabet are transmittable in ASCII and even when lacking some diacritical marks, are usually intelligible to native speakers. Unicode was primarily developed to handle Latin characters with diacritical marks, such as ü, å, etc., localized characters, such as Æ, as well as the characters found in non-Latin alphabets, e.g., Cyrillic.

Incidentally, one of the reasons the control codes in the ASCII set are $00-$1F is the mechanism in Tele-Types was arranged to recognize the low bit patterns as control functions, not printing characters. As ASCII evolved, this characteristic was accommodated so Tele-Types could be used as computer I/O devices. The spread between "0-9", "A-Z" and "a-z" exists because two bits decide if the character set will be numerals, uppercase letters or lowercase letters. It all makes sense in the contexts of doing case conversion or determining if a user typed a numeral or a letter of either case.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:38 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10940
Location: England
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky - unless you regard 8 bits as more than 7 bits, which you might!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:56 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8408
Location: Midwestern USA
BigEd wrote:
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky...

Correct on UTF-8, which parallels ASCII, but supports the $80-$FF range. I was referring to UTF-16.

Quote:
- unless you regard 8 bits as more than 7 bits, which you might!

Funny you mention that. We think in terms of eight bits to the byte, yet in serial communications, seven bits continue to be used in setups that use only ASCII, hand-held serial bar code scanners being one such case. As another example, I have here in my office a Welch-Allen ST6980 magnetic strip reader (MSR, aka credit card reader) that is interfaced to a TIA-232 port on my Linux software development machine. The reader uses seven bit data format, which means, in theory, the UART has less work to perform to serialize and deserialize a datum being exchanged with the MSR. :D However, I suspect that any performance gain in that regard will be vanishingly small. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 10:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10940
Location: England
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 10:37 pm 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 276
Location: Placerville, CA
BigDumbDinosaur wrote:
However, I suspect that any performance gain in that regard will be vanishingly small. :D

Well, at the same baud rate, assuming a fairly standard-ish packet with one stop bit and one start bit per character, it's an 11% increase in throughput.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 03, 2017 12:26 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8408
Location: Midwestern USA
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)

I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.

commodorejohn wrote:
BigDumbDinosaur wrote:
However, I suspect that any performance gain in that regard will be vanishingly small. :D

Well, at the same baud rate, assuming a fairly standard-ish packet with one stop bit and one start bit per character, it's an 11% increase in throughput.

You are confusing the data rate in bits per second with baud rate—the two are not directly related. Baud refers to the symbol rate on the medium, not the data transmission rate. In the case of telephone modems, baud rate is often a fraction of the bit rate due to the encoding scheme being used. For instance, a typical analog telephone link that spans more than one central office cannot support frequencies much above three kilohertz. If the baud rate of a pair of modems using that link was the same as the practical bit rate you'd have a 3 Kbps link, neglecting the effects of errors. The 56K rate achieved with the V.90 standard was the result of using an advanced encoding scheme that allow many bits to be exchanged within a single symbol, using 8000 baud coming to the subscriber and 3429 going from the subscriber.

On a hardwired TIA-232 link, baud rate and bit rate are the same. If a pair of short-haul modems is introduced, as was a once-common arrangement in large factories and adjacent buildings sharing a common host, the baud rate between the modems will usually be lower than the bit rate on the TIA-232 connections to the modems, since the bandwidth limits of analog telephone lines still apply.

Incidentally, a format of seven data bits and two stops bits is possible with most serial devices, as is seven data bits, one stop bit and parity, the latter which is often used with MSRs and bar code scanners.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed May 03, 2017 6:51 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10940
Location: England
BigDumbDinosaur wrote:
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)

I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.

UTF-8 is really rather clever and interesting - I think perhaps you don't know what it is. Well worth looking into!


Top
 Profile  
Reply with quote  
PostPosted: Wed May 03, 2017 7:10 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 284
BigDumbDinosaur wrote:
BigEd wrote:
(Umm, UTF-8 expresses any Unicode character - as can UTF-16, but UTF-8 is more compact.)

I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character
.


This is incorrect. UTF-8 and UTF-16 are different encodings of the same character set, Unicode. UTF-8 encodings can be up to 6 bytes long (I think - it's been a while since I looked at the details), but the US-ASCII subset of Unicode needs only one byte for each character.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 03, 2017 9:48 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Quote:
I don't believe UTF-8 supports many non-Latin character sets, such as traditional Chinese. In fact, I seem to recall that pairs of UTF-16 words may be used in such cases, resulting in 32 bits being passed per character.

UTF-8 fully supports traditional Chinese.

Quote:
Unicode, when encoded in UTF-8, and carrying ASCII, is no more bulky - unless you regard 8 bits as more than 7 bits, which you might!

UTF-7 comes to the rescue, then.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: