A continuation from:
Crenshaw - Let's Build a CompilerBigDumbDinosaur wrote:
Yuri wrote:
To be honest, I don't think I ever generally worried too much about a few spaces vs. a tab even when I was working on my 386 and storing stuff on floppy. Considering that files all are going to take a multiple of 512byte blocks I don't think i ever noticed the difference in disk usage.
I started working with computers before microprocessors even existed. The first system I professionally programmed had 8K of RAM...that’s 8 kiloBITs, not bytes. You can be sure we were “encouraged” to economize so code and data would fit into available core.Fair enough, the first computer I worked with was a Mac 128K, so I had considerably more memory than that starting out. *shrug*
Hardly terrabytes and gigabytes of space though.
Quote:
White space that consists of blanks (ASCII 32) is rendered as a single space. An extended blank (ASCII 160) is treated by most browsers as a distinct character—a string of them will be rendered as an equivalent number of blanks like this.
Character 160 is a nonstandard 8-bit ASCII character, and is "non-breaking space". Under UTF-8 that character would get translated into two bytes, 0xC2 0xA0. UTF-16 encodes it as 0x00A0 (160).
The HTML spec accounts for non-breaking spaces though. Most lay people wouldn't know how to type a literal character 160 on their keyboard, hence the use of and other HTML entities.
Quote:
Browser behavior when encountering a tab character varies. Some browsers seem to render a tab as a series of blanks. I’m not up on the latest HTML standard, so I have no clue what a browser is supposed to do when it encounters a tab or other control character in HTML.
I think it is supposed to ignore it and pass it on to whatever handles actually rendering text to deal with. (Basically leave it up to the character encoding on the OS to figure out what to do with it)
Quote:
Who’s talking about a spreadsheet? I’m referring to computing in general. BTW, I’ve never found a good use for a spreadsheet.
I've used them for all sorts of things. Millage may vary of course depending on what you're used to.
In any event, I was pointing out that the practice of using a TAB (or other such single character) as a data delimiter is not dead and very much still alive and in use.
Quote:
I can tell you haven’t worked enough with real computers!
Just kidding!
IDK, think I'm going on 40 years now; parents couldn't pry me off the computer when I was a kid, as much as they sometimes tried.
Quote:
One of the most useful of the 31 control codes is
<ESC>. If you look very carefully at how devices such as “dumb” terminals (which, starting with the WYSE 60, became quite smart) work, or how a modern page printer understands what the computer is telling it to do, you will see how it is possible to extend those 31 control codes almost to infinity by beginning your control sequence with
<ESC>.
Not sure I see much difference really between
<ESC>(insert sequence of chars to interpret)
; and
<(insert sequence of chars to interpret)
>. In the end they are just characters and when it comes down to it, what the computer sees are just numbers; it's the meaning that is assigned to them that makes it significant. The only real significant difference in my mind is that almost all software will try and intercept the press of the escape key and do something special with it. (It is a control character/key after all.)
Quote:
I, as a user, am interested in my documents being properly formatted, but I don’t particularly care if the actual mumbo-jumbo that, say, selects italic Helvetica as the current font is human-readable—I'm not going to do a hex dump of the file. All I care about is what ends up on the printed page.
But if you were making a document to be rendered on multiple different computers without the need to write a specialized editor application, you probably would be. (Hence HTML for example)
Quote:
Yes, but how do you explain the near-universality of Hewlett-Packard’s printer control language (PCL), which is not a plain-text format? PCL commands always start with either <ESC> or certain other ASCII control codes, e.g., <FF> (ASCII 12) to dump the image buffer to the page and then eject it. A typical PCL command might be <ESC>&l0O, which selects portrait orientation. Don’t you think if H-P had thought human readability was important they would have instead implemented <orientation portrait> or similar, instead of some ESCape mumbo-jumbo? H-P did what they did because they were mostly concerned about throughput...the less overhead passed in the data stream, the better the throughput.
I'm not overly familiar with PCL or how it came to be, that being said, as far as I know the intention was to allow a driver to communicate with a printer.
Wikipedia states it "became a
de facto industry standard," suggesting to me that it wasn't really HP's intention to make any kind of standard at all. And I'd guess that what happened is others reverse engineered it to make a "compatible" product.
(Pure speculation on my part though)
Form feed got a lot of meaning from the teletype machines much like CR and LF did. I don't doubt some of the other characters had other specialized purposes back in the day, but heck if I know what "Device Control One" (17), "Device Control Two" (18), etc are supposed to mean/do.
That being said, there are formats that do have both text and binary versions. PDF and FBX for example come to mind.
Plain text formats that also comes to mind, other than HTML, would be TeX, CSV, innumerable Un*x config files (e.g. /etc/fstab), and a goodly number of various internet protocols. (SMTP, HTTP, IRC, to name a few)
Quote:
Clearly, the ability of disparate systems to parse a data stream couldn’t been much of a problem with the widely-used ANSI/ECMA control sequences used with many displays, including the Linux console. A typical ANSI/ECMA sequence starts with that ubiquitous <ESC> and finishes with mumbo-jumbo that even someone like me who has been working with it for some forty years still can’t decipher on sight.
The one I often shake my fist at because there isn't a clean way to determine when a sequence starts and ends if you don't know what those random letters are? The one that as I recall, started out as simply enough and got bastardized over the decades to become the monster it is now?
<ESC>[(ignore until)
; only works part of the time. Thus leading Linux and other Un*x like variants to use the (in)famous termcap file. Which is just a massive database, that, as I vaguely recall someone else here putting it, "must be maintained by someone who is mentally unstable." (not a direct quote)
Heck, even just trying to support the VT-100 codes for my own 65xxx software is a bit of a nightmare because different terminals all want to send the codes in their own "special" way..... >_>
"You said you wanted us to send VT-100 codes, but we decided that for these 4 keys, we'd use the XTerm escapes instead....."
<screams internally>Don't get me started with what my poor friend has to deal with, when working on their BBS software..... (I never hear the end of it when a bug comes in about another BBS or terminal that doesn't work because it follow the "spec" *eyeroll*)
If I'm not mistaken, isn't the termcap file itself plain text? (Or starts out as such and then gets compiled or something like that?)
Quote:
ASCII is the character-encoding standard in the computing world and has been so since the 1960s, IBM’s EBCDIC notwithstanding. GSM is primarily a telecommunications thing—it’s not something that an E- mail server would use to forward a message to another server.
Yet, perhaps somewhat ironically, that is almost exactly what the software I work on in my day job does.