Kinda hijacked the conversation here....
BigDumbDinosaur wrote:
BigEd wrote:
My personal preference is never to insert or retain tabs in source code. Tab as a key on the keyboard, which the editor can use to insert an appropriate amount of space, is a good way to go, in my opinion.
The UltraEdit editor, which I heavily use, may be configured so striking [Tab] will insert an arbitrary number of spaces in place of a single tab. Unfortunately, the editor in the Kowalski package doesn’t have that feature, although it can be configured to render a tab as a configurable number of blanks on the screen—however, a tab is stored in the file. Annoyingly, as Garth infers, tab rendering in programs is arbitrary, which can produce a display that has little resemblance to what was originally intended.Outside of word processing programs and the like, this has usually been why I tend to avoid using them. In most word processing you can explicitly set where in the document tabs lie and how they behave; and those details get stored along with the document.
To be honest, I don't think I ever generally worried too much about a few spaces vs. a tab even when I was working on my 386 and storing stuff on floppy. Considering that files all are going to take a multiple of 512byte blocks I don't think i ever noticed the difference in disk usage.
Quote:
Web browsers, in particular, seem to have no standard way to render a tab.
As I recall, white space in HTML is all usually treated like a single space character used for token/word separation. I'd have to track down the specs for it to be certain though.
and other entities were intended to encode special formatting characters like tab when they are needed. The rules also change depending on the block element your text is in. (E.g. <pre> or <code>)
I want to say those rules are defined clearly in the HTML 4.01, 5.0 and XHTML 1.0 DTDs, but I can't swear to that; I haven't looked at them in years. Really haven't needed them for any of the modern web stuff I work on day to day to be honest; formatting is largely handled with CSS these days.
Quote:
However, real tabs have their place in a data stream. For example, I wrote a program that can read the address book used by the Mozilla Thunderbird E-mail client and generate a list consisting of E-mail addresses and matching names (i.e., the display name field in each address book record), with the output sorted by name—the resulting list can be read and parsed by external programs. Since the name field will likely have at least one blank, a blank obviously cannot be considered a field separator. So I use <HT> (horizontal tab), which, conveniently, is easy for BASH and PHP to parse for word-splitting purposes.
To this day most spreadsheets still will happily import a CSV, TAB or other delineated text file just fine.
Quote:
As Garth notes, one of the reasons for the use of
<HT> as a field separator (aside from ease of parsing) is the desire to make files smaller and faster-loading. My professional computing experience began during a time when file-size conservation was front-and-center in any program’s design philosophy—something I unconsciously continue to perpetuate in my code.
Also, since all ASCII values below 32 are control values, it’s a snap to distinguish in-band control information from actual data.
All well and good until you run out of control codes to mean stuff; there are only 32 (31) of them after all; how would I, for example, encode details about the selected font for block of text in a Word processor when I could have any list of fonts installed on my computer independent of the list on your computer?
At that point you have to add additional meta data about what needs to be loaded to render that correctly. Sure you could add a control code that says, "switch to next listed font" but at that point it isn't much different to use <font name="foo" /> which can then be edited by hand. Yea, a set of control codes could reduce the size of that file, and if space is a premium that would be a thing, but if readability is what you need, then it is a determent.
Quote:
Taking advantage of the control range means quite a bit of metadata can be embedded into a data stream without consuming a lot of precious space—the programmer is free to interpret those 31 control codes as he or she sees fit.¹
I can see upsides and downsides to that. If you're working on a singular program that doesn't need to worry about interoperability too much that'd be fine; and certainly something I've done myself.
But when it comes to trying to make a format that can at least be parsed by many different systems, that idea starts to quickly break down. Like it as not, plain text formats have been a thing for a long time and continue to be a thing if only because they are just easy to work with.
Quote:
It seems much of the ASCII control range is neglected in contemporary programs. I guess the current thinking now is that we’ve got gigs of RAM, terabytes of disk space, and MPUs running a thousand times faster than what we had when I started out doing this stuff. So who needs to worry about file sizes?
IDK if neglected is really the term I'd use. Forgotten is probably a more likely answer. I would imagine most people know what CR and LF do in modern systems, but i'd wager they have no clue where they got the names "carriage return" and "line feed" from.
Kinda like this little gem:
Attachment:
fj793d9i8mmb1.png [ 491.74 KiB | Viewed 138 times ]
*face palm*
Quote:
¹I don’t consider <NUL> to be a true control character—in string data, I only use it as a terminator.
I should point out that it also all depends on the character encoding. ASCII is one such encoding, I tend to do a lot of work with GSM 03.38 which encodes character 0 as the at sign. (among other oddities)
GSM 03.38