Text Editor Shifting Mechanics

drogon · Post by **drogon** » Sun Aug 31, 2025 4:19 pm

stefan1 wrote:

Hi Gordon,

Linked arrays in different forms are a common solution indeed.

And easy to "blow up" - ie run off the end and corrupt the structure. My BCPL version keeps "guard" words at the start and end of each allocated chunk, so that if nothing else the 'free' routine can check and determine if corruption has happened.

The tricky part is working out what to do next - especially in an environment without any sort of memory protection...

Quote:

Impressive work on ”Ruby”!

Thanks.

Back to the editor ...

One thing we shouldn't forget is that almost all BASICs are effectively editors for text - inserting and deleting (and, worse) overwriting lines. Most BASICs will store the program text (tokenised or not) linearly with nothing much more than a simple line length of the current line that you need to manipulate to get the start of the next line - yet, they all handle insert and delete very efficiently in the same 64KB address space (often much less, who wrote large BASIC programs that filled memory up back in the day?)

Maybe we ought to study them more and take a leaf out of their books?

My own tinyBasic follows this traditional format too - to overwrite a line, I delete the current line (move memory down), then insert space for the new line (move memory up), then copy the line into the gap. A line has a line number (2 bytes), then the line length byte (one byte) then the line data... And TinyBasic does nothing more - no tokenisation, no formatting, etc. it's almost a 'perfect' line editor... Maybe I'll artificially create a 32KB program in my Ruby system and see just how long it actually takes...

It doesn't have a LIST N command (not enough room to include it, just LIST the whole program), but if it had, then it would need to do a linear search for the line - no worse than GOTO/GOSUB though...

Anyone ever use AppleWorks Writer? I never used it in anger, but the secretaries and admin assistants at the uni I was at used it way back to ptoduce some fairly large documents... I wonder if there are any technical documents about how that worked internally... Maybe we're re-inventing wheels that are already polished and perfectly circular...

-Gordon

gfoot · Post by **gfoot** » Sun Aug 31, 2025 5:01 pm

Line editors like most micro BASICs have the advantage that they only need to insert text in large chunks (a line at a time) and at a point where the user is not in the middle of typing something so can put up with a little slowdown. Early Unix editors like ed are presumably similar and the sliding gap approach would work very well there. I believe vi - though not a line editor - works in a similar way.

barnacle · Post by **barnacle** » Sun Aug 31, 2025 7:52 pm

That was my thought with NTB... I wanted to make a line-number-free version, but that required at least a line editor, and better, a character based screen editor. For the time being, I gave up until I get a system with storage together.

Neil

stefan1 · Post by **stefan1** » Mon Sep 01, 2025 4:44 am

drogon wrote:

Anyone ever use AppleWorks Writer? I never used it in anger, but the secretaries and admin assistants at the uni I was at used it way back to ptoduce some fairly large documents... I wonder if there are any technical documents about how that worked internally... Maybe we're re-inventing wheels that are already polished and perfectly circular...

I found the Speedscript source online, https://github.com/gillham/speedscript/ ... ript32r2.s

If I’m not mistaken, the entire tail of the text buffer is shifted one step on each insert. Maybe they got away with that if the buffer was small.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 01, 2025 6:40 am

commodorejohn wrote:

As long as chunk size isn't too huge, defragmentation could also be an "idle task" when the editor is waiting on user input (which, in a text editor, is gonna be a fair bit of the time) without causing noticeable lag.

Yep! From a microprocessor’s perspective, even a slow one, eons elapse in between keystrokes.

Analysis over the years by people who study office efficiency (probably one of the more boring of occupations) has indicated that in typed documents, an average word’s length—no whitespace—is five characters (likely a lowball figure if you’re an Oxford English professor

). It was noted that most skilled typists are able to sustain 70-80 words per minute (WPM). Assuming the best case, 80 WPM, the sustained keystroke rate would be 400 per minute or 6.6 per second. In a 6502 system with interrupt-driven I/O, that will definitely not cause the MPU to work up a sweat, even at a sedate 1 MHz.

My POC V1.3 unit runs at 16 MHz (Ø2 cycle time of 62.5 nanoseconds) and has a 100 Hz jiffy IRQ—the I/O subsystem is the other source of IRQs. The jiffy IRQ itself doesn’t do a lot; mainly it updates two software timers—system uptime and binary sequential time, and those are updated only once per second. A sleep() function slaved to the uptime timer is the only other thing driven by the jiffy IRQ and if no sleep time has been set, that code isn’t even executed.

Given all that, I estimate the amount of “idle” time between jiffies to be around 158,000 cycles. Even after accounting for IRQs being triggered by those 6.6 keystrokes each second, there is a huge amount of processing time that is going to waste as the system awaits input. In fact, most of the waiting is done with a WAI instruction.

So the MPU isn’t even doing anything in between interrupts.

A lot of stuff can get done in an editor while the main input loop is waiting for the bloke at the keyboard to do something. If console I/O is interrupt-driven, the input loop can rigged up to do housekeeping things in between IRQs. When an IRQ hits, that would be the editor’s cue to check for a keystroke. If one is available, then process it, update the display and return to whatever task was being carried out before the IRQ. If the interrupt was not from typing, the loop can continue with defragging the text buffer, doing a timed save, checking for loose PCB connections, etc., until the next IRQ hits.

barnacle · Post by **barnacle** » Mon Sep 01, 2025 7:03 am

Sounds like you've reinvented the event loop, BDD. Though that gets potentially messy when you include external storage, depending whether or not it's blocking - on a 6502 it probably is. Which means - actually, probably means anyway - that you need at least a couple of IO-to-storage threads running as well.

Most of the time, I suspect that it would a case of:
- key was hit
- insert it in text buffer
- update screen
- add to undo buffer
but every now and then, there would be a case where you'd want to move to the next buffer, possibly retrieving it from disc.

How about a schema that, when a file is initially opened for editing expands the file to a new 'live' file for actual editing, consisting of 128 bytes of text and 128 bytes of space (so now we have the original file on disc - for 'oh dear' moments - and a working file also on disc in the same format it would be in memory. Final save would compress the working file back to text, including following any links to new text outside the existing space (tagged on the end of the file?) and putting them in the right order. Probably after renaming the original file .bak or something similar.

The in-memory space could be much reduced this way, probably a few k or so would be enough, enough for a couple of screens of text. But it does rely on a fast and efficient disc subsystem.

Neil

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 01, 2025 7:17 am

stefan1 wrote:

I found the Speedscript source online...If I’m not mistaken, the entire tail of the text buffer is shifted one step on each insert. Maybe they got away with that if the buffer was small.

As I recall when I tried out Speedscript on my C-64, as I neared a full editing buffer (about 10 single-spaced pages), there was a small-but-perceptible lag from when I typed near the start of the document to when I saw the change on the screen—my recollection it was a couple of tenths of a second. That lag would have been caused by the editing buffer being shifted up in memory. As the C-64’s keyboard buffer only held 10 keystrokes, it was possible to out-type Speedscript if within the first couple of lines of a large document.

Right around the time Speedscript became available, I got my first C-128 and I purchased Precision Software’s SuperScript 128 (SS128) to use for writing reports for work. SS128 was interesting in that the program itself ran in RAM1 and the main editing buffer was in RAM0, using RAM from $01C00 to $0FEFF, nearly 58KB, which was good for 16 pages of single-spaced text. The cut-and-paste buffer was also in RAM1, right below the boot signature. The software was copy-protected and when started, automatically kicked the C-128 into FAST mode if the 40/80 key was down.

After I got my Lt. Kernal hard drive system, I decided to see about hacking SS128 to defeat the copy protection so I could run it right off the hard drive—initially, I had to load it from floppy disk. It took a while, but I finally was able to halt the program once it had been loaded and save an image to the hard disk. I soon found the code that handled the copy protection and disposed of it.

During the hacking exercise (tons of time spent in the M/L monitor), I was able to figure out how the main text buffer was managed—it used the “sliding window” approach, taking advantage of the relatively large amount of RAM. Precision Software’s buffer management-algorithm was efficient; unlike Speedscript on the C-64, I was not able to out-type SS128, even with the cursor at the first character in a 15-page document.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 01, 2025 8:11 am

barnacle wrote:

Sounds like you've reinvented the event loop, BDD.

Sort of...just out-loud thinking, so to speak.

Quote:

Though that gets potentially messy when you include external storage, depending whether or not it's blocking - on a 6502 it probably is. Which means - actually, probably means anyway - that you need at least a couple of IO-to-storage threads running as well.

Realistically, you need a preemptive kernel to avoid getting bogged down by blocking mass storage. At least with SCSI, IRQs can tell the kernel when to tend to the disk and when to do something else. So that would (theoretically) take care of a non-blocking disk subsystem. Even there, the complexity can get...er...complex.

Quote:

Most of the time, I suspect that it would a case of:
- key was hit
- insert it in text buffer
- update screen
- add to undo buffer
but every now and then, there would be a case where you'd want to move to the next buffer, possibly retrieving it from disc.

The screen update is the other part where things could get bogged down. In my application, I am using some thin clients as “dumb” terminals, which means my POC unit doesn’t have the burden of managing the display hardware. All it has to do to put something on the screen is write to a UART channel—the thin client’s hardware does the laborious part. With such an arrangement, display updates happen with alacrity.

If, on the other hand, I had a video display controller to manage, then a lot of grunt work would be going on in the BIOS to determine where to write in video RAM, keep track of the cursor’s location, scroll the screen when needed (video RAM memory move), turn on or off video attributes, etc., and things would slow down.

Quote:

How about a schema that, when a file is initially opened for editing expands the file to a new 'live' file for actual editing, consisting of 128 bytes of text and 128 bytes of space (so now we have the original file on disc - for 'oh dear' moments - and a working file also on disc in the same format it would be in memory. Final save would compress the working file back to text, including following any links to new text outside the existing space (tagged on the end of the file?) and putting them in the right order. Probably after renaming the original file .bak or something similar.

That all assumes one’s 6502 contraption is running an operating system that includes efficient filesystem support. I, for one, am not yet at that point.

Quote:

The in-memory space could be much reduced this way, probably a few k or so would be enough, enough for a couple of screens of text. But it does rely on a fast and efficient disc subsystem.

Fast and efficient...hmm.

Reminds me of an old joke about traveling by train during the 19th century, especially in the USA. The railway can offer speed, safety and comfort. Pick any two...

barnacle · Post by **barnacle** » Mon Sep 01, 2025 8:56 am

Cheops' Law (regarding pyramid building): On time, on spec, on budget. Pick two.

White Flame · Post by **White Flame** » Tue Sep 02, 2025 10:09 am

stefan1 wrote:

That sounds interesting! Is your editor publicly available?

I've not finished with the implementation, so there's nothing but notes and some partial algo implementations. But feel free to incorporate the ideas.

drogon wrote:

Which is the essence of my editor prviously mentioned - a list of addresses to each line and each line is allocated from memory using malloc() in the C version and getvec() in the BCPL version.

No, this is just a concatenated stream of lines across allocated sectors, file bytestream style, without any per-line indices. Empty gaps between lines will be garbage collected and compacted. So there's really no space overhead for per-line storage besides the 1-byte EOL marker, which needs to exist anyway as a linefeed character or whatever. A malloc-based approach would add another 2 or 3 byte pointer per line, plus all the allocation table overhead. If the EOL marker contains a length, then traversing line to line is quite fast, though would limit line lengths to ~255 chars. But the main benefit is that any edit requires only local changes to 2 sectors max to take effect, without space overhead of indices. The compactor can run in human idle time, making small atomic local compactions iteratively. Contrast to BASICs that, at least on the C64, could take quite a few seconds to update a single line in a large program after pressing Return, because the whole rest of the program needs to be shifted. Gap buffers, even though they're simple, would have this same overhead for large files.

My design is a singly-linked list of sector pointers, which means only 1 byte of overhead per 255 bytes of content (0.39% overhead), but with a temporary cache indexing every 256 lines, for faster forward-only starting points. That cache can be quite flexible in how/what it contains and can be flushed or optional, and is a fixed size regardless of file size. Even having a 2-byte sector link (16MB capacity) brings the storage overhead to 0.78%. If your average line length is, say 15 or 20 characters, then individual line pointers plus their malloc structures is going to be significantly larger overhead, well into the double digits percentage.

barnacle · Post by **barnacle** » Tue Sep 02, 2025 10:41 am

So your lines are sequential in memory, allowing you to scan backwards?

Neil

White Flame · Post by **White Flame** » Tue Sep 02, 2025 5:20 pm

No, the lines could contain any value 0-255, so a backwards scan isn't intended and the tags wouldn't be unambiguous in that direction. Minimizing space overhead means singly linked and it can only traverse forward, that's why there's a cache of prior lines to retrace forward from.

barnacle · Post by **barnacle** » Tue Sep 02, 2025 6:26 pm

Ah, ok. Thanks.

No True Scotsman · Post by **No True Scotsman** » Mon Jan 26, 2026 6:56 pm

BigDumbDinosaur wrote:

gfoot wrote:

I don't remember what it is called but it's a method where characters before the cursor are stored at the start of the block and characters after the cursor are at the end of the block, with a big gap in between to allow fast editing. When the cursor moves (or at least when an edit starts) you move a chunk of data to create this situation before inserting the new text.

That was brought up a number of posts ago...the phrase “sliding window” comes to mind.

Is this what they call a piece table data structure?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Jan 26, 2026 8:59 pm

No True Scotsman wrote:

BigDumbDinosaur wrote:

gfoot wrote:

I don't remember what it is called but it's a method where characters before the cursor are stored at the start of the block and characters after the cursor are at the end of the block, with a big gap in between to allow fast editing. When the cursor moves (or at least when an edit starts) you move a chunk of data to create this situation before inserting the new text.

That was brought up a number of posts ago...the phrase “sliding window” comes to mind.

Is this what they call a piece table data structure?

Dunno...I haven’t time right now to read the entirety of the paper for context.

Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics

Re: Text Editor Shifting Mechanics