Rob Finch wrote:
ASCII is an older code which is great when characters fit into six or eight bits. But for any apps that need to be internationalized a wide code like Unicode is required.
Actually, there is only one form of ASCII, called "US-ASCII," and that is seven bits to the datum. ASCII does not define the meanings of data that are in the range $80 to $FF, inclusive. ASCII was strictly a product of the forerunner of the American National Standards Institute, hence the US-ASCII moniker. INCITS uses that reference to avoid confusion with informal extensions to the ASCII set, as well as ASCII-like enhancements, such as Unicode.
Unicode produces a bulkier data stream than ASCII and in situations in which the alphanumeric set plus punctuation and control codes is all that is needed, ASCII will be substantially more efficient and economical of bandwidth. For example, transmitting binary data in Intel or Motorola hex form can be done solely with seven bit ASCII, using only numerals, uppercase letters A-F and a few control codes (typically <CR>, <LF> and <EOT>). Western languages that use only the Latin alphabet are transmittable in ASCII and even when lacking some diacritical marks, are usually intelligible to native speakers. Unicode was primarily developed to handle Latin characters with diacritical marks, such as ü, å, etc., localized characters, such as Æ, as well as the characters found in non-Latin alphabets, e.g., Cyrillic.
Incidentally, one of the reasons the control codes in the ASCII set are $00-$1F is the mechanism in Tele-Types was arranged to recognize the low bit patterns as control functions, not printing characters. As ASCII evolved, this characteristic was accommodated so Tele-Types could be used as computer I/O devices. The spread between "0-9", "A-Z" and "a-z" exists because two bits decide if the character set will be numerals, uppercase letters or lowercase letters. It all makes sense in the contexts of doing case conversion or determining if a user typed a numeral or a letter of either case.