BigDumbDinosaur wrote:
Yuri wrote:
To be honest, I don't think I ever generally worried too much about a few spaces vs. a tab even when I was working on my 386 and storing stuff on floppy. Considering that files all are going to take a multiple of 512byte blocks I don't think i ever noticed the difference in disk usage.
Hardly terrabytes and gigabytes of space though.
Quote:
White space that consists of blanks (ASCII 32) is rendered as a single space. An extended blank (ASCII 160) is treated by most browsers as a distinct character—a string of them will be rendered as an equivalent number of blanks like this.
The HTML spec accounts for non-breaking spaces though. Most lay people wouldn't know how to type a literal character 160 on their keyboard, hence the use of and other HTML entities.
Quote:
Browser behavior when encountering a tab character varies. Some browsers seem to render a tab as a series of blanks. I’m not up on the latest HTML standard, so I have no clue what a browser is supposed to do when it encounters a tab or other control character in HTML.
Quote:
Who’s talking about a spreadsheet? I’m referring to computing in general. BTW, I’ve never found a good use for a spreadsheet. 
In any event, I was pointing out that the practice of using a TAB (or other such single character) as a data delimiter is not dead and very much still alive and in use.
Quote:
I can tell you haven’t worked enough with real computers!
Just kidding!
IDK, think I'm going on 40 years now; parents couldn't pry me off the computer when I was a kid, as much as they sometimes tried.
Quote:
One of the most useful of the 31 control codes is <ESC>. If you look very carefully at how devices such as “dumb” terminals (which, starting with the WYSE 60, became quite smart) work, or how a modern page printer understands what the computer is telling it to do, you will see how it is possible to extend those 31 control codes almost to infinity by beginning your control sequence with <ESC>. 
Quote:
I, as a user, am interested in my documents being properly formatted, but I don’t particularly care if the actual mumbo-jumbo that, say, selects italic Helvetica as the current font is human-readable—I'm not going to do a hex dump of the file. All I care about is what ends up on the printed page.
Quote:
Yes, but how do you explain the near-universality of Hewlett-Packard’s printer control language (PCL), which is not a plain-text format? PCL commands always start with either <ESC> or certain other ASCII control codes, e.g., <FF> (ASCII 12) to dump the image buffer to the page and then eject it. A typical PCL command might be <ESC>&l0O, which selects portrait orientation. Don’t you think if H-P had thought human readability was important they would have instead implemented <orientation portrait> or similar, instead of some ESCape mumbo-jumbo? H-P did what they did because they were mostly concerned about throughput...the less overhead passed in the data stream, the better the throughput.
Wikipedia states it "became a de facto industry standard," suggesting to me that it wasn't really HP's intention to make any kind of standard at all. And I'd guess that what happened is others reverse engineered it to make a "compatible" product.
(Pure speculation on my part though)
Form feed got a lot of meaning from the teletype machines much like CR and LF did. I don't doubt some of the other characters had other specialized purposes back in the day, but heck if I know what "Device Control One" (17), "Device Control Two" (18), etc are supposed to mean/do.
That being said, there are formats that do have both text and binary versions. PDF and FBX for example come to mind.
Plain text formats that also comes to mind, other than HTML, would be TeX, CSV, innumerable Un*x config files (e.g. /etc/fstab), and a goodly number of various internet protocols. (SMTP, HTTP, IRC, to name a few)
Quote:
Clearly, the ability of disparate systems to parse a data stream couldn’t been much of a problem with the widely-used ANSI/ECMA control sequences used with many displays, including the Linux console. A typical ANSI/ECMA sequence starts with that ubiquitous <ESC> and finishes with mumbo-jumbo that even someone like me who has been working with it for some forty years still can’t decipher on sight.
<ESC>[(ignore until); only works part of the time. Thus leading Linux and other Un*x like variants to use the (in)famous termcap file. Which is just a massive database, that, as I vaguely recall someone else here putting it, "must be maintained by someone who is mentally unstable." (not a direct quote)
Heck, even just trying to support the VT-100 codes for my own 65xxx software is a bit of a nightmare because different terminals all want to send the codes in their own "special" way..... >_>
"You said you wanted us to send VT-100 codes, but we decided that for these 4 keys, we'd use the XTerm escapes instead....." <screams internally>
Don't get me started with what my poor friend has to deal with, when working on their BBS software..... (I never hear the end of it when a bug comes in about another BBS or terminal that doesn't work because it follow the "spec" *eyeroll*)
If I'm not mistaken, isn't the termcap file itself plain text? (Or starts out as such and then gets compiled or something like that?)
Quote:
ASCII is the character-encoding standard in the computing world and has been so since the 1960s, IBM’s EBCDIC notwithstanding. GSM is primarily a telecommunications thing—it’s not something that an E- mail server would use to forward a message to another server.