ChuckT wrote:
I think that a more modern system could be devised where the variable system acts more like a disk drive in RAM.
I kept running out of variable space on the Commodore 64.
The C-64 sandwiched variables between the end of program text and the start of the BASIC interpreter ROM, which is at $A000. Not only was variable storage constrained, garbage collection was painfully slow if a lot of string manipulation occurred. Exacerbating the situation was the reduction in BASIC RAM that occurred if the fake RS-232 functions in the kernel were used, as the RxD and TxD buffers were set up right below the interpreter ROM.
Interpreted business BASICs (BB), such as BBx and Thoroughbred, use a indexing system for variables that does have some resemblance to a filesystem on disk. The index is actually part of the program—that is, stored as part of the program text—in the form of a symbol table (in fact, Thoroughbred has a function that can examine the symbol table of a disk-resident program). The symbol table, in which variable names are lexically sorted, is generated when a program is saved, and thus is loaded into memory when the program is run. Hence when a variable reference is made, the interpreter runtime engine has only minimal work to do to find the variable and access the data that it contains. A simple binary search of the symbol table is all it takes to find a variable descriptor.
The BB indexing method, in addition to improving execution speed, preserves variables when program chaining is used in a large application, a common technique in BB software. Indexing also allows arbitrarily long variable names, up to 33 characters in the case of Thoroughbred. Using a 3-for-2 encoding scheme, only 22 bytes are needed to store the longest variable name (variable names are case-insensitive).
Contrast this to the method used in Microsoft BASIC, in which variable descriptor information is not part of the program and must be recreated each time the program is run. Also, as descriptors are stored in the order in which they are created, a linear search of descriptor space is required to locate a variable—which, incidentally, explains the two-character limit for variable names found in older versions of MS BASIC. The linear search characteristic also explains why MS BASIC programs will run faster when the most frequently used variables are declared first.
In order for the BB indexing methodology to work, memory assigned to variable content storage obviously has to be independent of program storage, which is a feature of BASIC 7.0 on the Commodore 128. The C-128 does it by having separate 64K RAM banks for programs and variables, and uses a set of cross-bank transfer subroutines to read and write variables. As the transfer functions have to change the memory map twice for each byte retrieved or written, variable access tends to be slow, even in 2 MHz mode. When the linear search of variable descriptors is taken into account, it's easy to see why C-128 BASIC programs benefited so much from optimizing compilers like Blitz-128. The C-64 didn't have to constantly remap the system to access variables, so execution was as fast or faster than with the C-128.
I have been thinking about how to adapt Lee's EhBASIC to my
next generation POC computer. The memory mapping scheme I envision for that unit produces 48K banks—up to 256 of them if a full complement of RAM is installed—that are available for programs and data. I could conceivably add code to EhBASIC that would implement a cross-bank variable storage function, with the kernel doling out storage as needed. At 20 MHz, the 65C816 would handle the transfers with alacrity, thus avoiding the principle performance bottleneck seen with BASIC in the C-128.
However, large amounts of variable storage are a hallmark of a BASIC environment in which limited mass storage facilities are available. All of the systems for which BB interpreters have been developed include ample mass storage, and the language implementation includes indexing methods that allow disk-resident files to be essentially treated as large variable arrays. Hence, the traditional methods of variable storage in, say, MS BASIC, are less important—a BB programmer is not likely to set up a 1000 element string variable array in RAM and write code to sort and search it when s/he can use a disk-resident B-tree index and some simple BASIC statements to accomplish the same thing.
As I have the hardware needed to implement SCSI-2 on my next generation POC, I may try to devise an extension to EhBASIC to implement the keyed-index file functions of BB. I do have some experience with implementing a B-tree in the C-128 environment on the Lt. Kernal hard drive subsystem (along with a matching record-oriented random access data file), so at least I wouldn't being starting from a point of total ignorance. Yet another thing to add to the "bucket list"...