Not sure where you downloaded it but there's a chance it's a NUFX archive (a.k.a a Shrinkit archive), which is a funky Apple II archive format (Shrinkit is the name of the application that creates and extracts these files). NuFX files usually have a .shk extension. NuFX files can be identified by the first 6 bytes: $4E $F5 $46 $E9 $6C $E5, which is the ASCII string "NuFile" with bit 7 alternating. The header for the first (archived) file starts at the 48th byte, so those 4 bytes will be $4E $F5 $46 $D8, which is the string "NuFX", again with bit 7 alternating. (The header for each archived file contains the "NuFX" ID string.) Anyway, there's a C program out there specifically designed for creating/extracting these files on non-Apple systems. However, you might be able to save yourself a lot of trouble. I have already "unshrunk" Qforth source code sitting around that I downloaded from somewhere, the Ground archive I think. If you need an exact URL, I can look it up.
Anyway, if you don't want to go wading through all the Qforth source code, here's a brief description of hashing (at least as it applies to Forth dictonary searching), and how Qforth does it. The grouping by name length that you have proposed above is a form of hashing. The item you are searching for is called the key. In a Forth dictionary lookup, the key is the name of the word. The idea is to divide the dictonary up into smaller pieces. Each piece is referenced by a number. You compute this number using the hash function, which is a function of the key, in other words, f(key) is your hash function. In your group-by-name-length method, f(name) -- remember, in this instance, the name of the word is the key -- is:
Code:
f(name) = length(name) if length(name) < 10
= 10 if length(name) >= 10
Ideally, you want to choose a hash function that distributes the words as evenly as possible between the various pieces. If you have N total words and K pieces, then each piece could contain as few as N/K words. In other words, f(key) returns only K distinct values. In your example, K=10. You also want a hash function that can be quickly computed, since it doesn't do much good for the search to be faster if the preparation time (i.e. computing the hash function) eats up the savings. There is something of an art to choosing a hash function. Length, while easy to compute, isn't really ideal, since there are (well, I haven't counted them, but there probably are) more standard 3 character words than standard 9 character words, for instance. In Qforth, K=128, and the hash function is:
f(name) = (char1 XOR char2 XOR ... XOR charL) AND $7F
where L is the number of characters in the word. Then Qforth uses that to look up the address of the appropriate piece in an array. However, if you use a word like MARKER you will have to save the current lengths (or ends) of all K pieces so that you can dispose of the heads. (You could generate a new array when you get to MARKER, but that's also space consuming.) An alternative is to rebuild all the pieces when its time to dispose of the heads, which is time consuming. QForth, for example, builds each of the pieces at cold start, which does add to the amount of initialization time.
One other reason why you may see less speed up than you hoped (with any method) particularly in Forth is that many definitions use the last few words just defined, which will be found almost immediately even if those words are merely stored in a single linked list like in FIG-Forth.
Another option with separated heads is to group by name length, then arrange the heads of the words in alphabetical order. Then each name comparison rules out about half of the dictionary. (You do have to think it through to make sure that your search implementation works correctly.) So, it will take at most about log2(N) comparisions to find a word. Very fast, and there isn't much of a space penalty. Creating new heads (as well as deleting a head) can be slow, since a lot bytes may need to be moved to insert the new head in alphabetical order. So this method isn't so good for user defined words, but it works well for words in the kernel.
A third option is to use an AVL tree. There's an article in Forth Dimensions V2#4 starting on p.96 (it starts on p. 112 of the V2 .pdf file available at
www.forth.org) about AVL trees that includes some 6800 assembly and Forth code for searching, inserting, and deleting. AVL trees are also descibed (among other places) in Knuth's "The Art Of Computer Programming", Volume 3, Section 6.2.3.
Briefly, the idea is similar to pre-alphabetizing the dictionary, but instead of arranging the names in a particular order, you simply add 2 pointers to EACH head. One points to words that are "lower" (i.e. alphabetically) and the other points to words that are "higher" (alphabetically). Adding to the tree is easy, but the trick is to keep the tree balanced. In other words, if you have N heads, the maximum height of an optimal tree is only about log2(N). For example, the optimal arrangement of the words A through G is:
Code:
D
/ \
/ \
B F
/ \ / \
A C E G
It turns out that maintaining an optimal tree when adding heads is difficult. An AVL tree won't always be optimal (it depends on what order you add the heads), but its worst case height is less than 1.5*log2(N) which is still very efficient. (There's a more precise formula for the worst case height but it involves log base sqrt(1.25)+0.5. Really.) Insertion (including re-balancing the tree) is very fast and easy, no more than a single re-balancing operation is needed. Deleting heads is a bit more involved, but at most log2(N) re-balancing operations are needed, and even that isn't bad.
It only takes a single pointer to define a new tree, so MARKER can be easily implemented also. In that case, you would search the new tree first, and the old tree second, which would take 1.5*log2(N2)+1.5*log2(N1) comparisons, rather than the 1.5*log2(N2+N1) comparisons that a single tree would take, but it's still very fast. It's even possible to handle redefined words without overwriting the old head.
As I alluded to above, the biggest issue is that it requires 2 pointers plus 1 byte (a balance factor) PER HEAD. So for 2 byte pointers, that's 5*N additional bytes of head space required. The balance factor isn't used when searching for words, it's only used to re-balance the tree when adding or deleting heads. (There are only three possible values for the balance factor, so it could conceivably be stored in two bits fairly easily, but masking off those two bits -- wherever they may packed -- will take additional time.)