dictionary hashing

GARTHWILSON · Post by **GARTHWILSON** » Mon Nov 08, 2004 6:01 am

I'm looking for ideas on two related things I'm considering for possible future implementation in my 65816 Forth:

A. Samuel has talked about ultra-fast Forth compilation. I'm usually not very interested in compilation speed because it does not take a high percentage of any given project's time; but there are times I think it would be nice to be able to compile an entire large application in just a second. I know my own 65c02 and 65816 Forths' compilation is slowed way down by the fact that FIND has to look through possibly up to a thousand headers all in the same linked list to find a particular word. Obviously if the list were separated into, for example, a list of one-character names, another list of two-character names, another of three-character names, and so on, the searches would be much shorter, and the compilation would be much faster. How much? I don't know—hopefully 5-10 times as fast.

B. I've wanted my 65816 ITC Forth to run in bank 0 only, requiring only two bytes per address for normal Forth running, and leaving other banks for data, huge look-up tables, perhaps long assembly-language routines or other non-Forth applications, etc.. With all the options loaded (math extensions, assembler, etc..) and an application with lots of headers, the headers could take 10-15K, which is a big chunk to take out of bank 0. The 65816's more efficient assembly language means it's more common with the '816 than it is with the '02 for a word's header to take more space than the code itself. If I could put the headers in bank 1, there would be a lot more of bank 0 left for the application.

Now to put them together: Without the code itself having to be put with the headers, what sorts of arrangements have been used in other systems to optimize search speed? Samuel, in one of our previous E-mails which I lost due to a virus (not your fault), I used the term "dictionary hashing" incorrectly, and you set me straight on it. Now I don't remember which of these two things above that refers to.

The first method that comes to mind would be similar to having various vocabularies, with addresses in a variable array pointing to the last header in each so the search through words of the particular name length desired can be started there. Link fields would tell where the next header of that name length is, since the lists would be intertwined. There are so few names with ten or more characters that you could just about put all of those in the same list; but then you'd have to check the length as you search, unlike the other lists. It seems like it'd be better if FIND worked the same regardless of which list you're searching.

Another possibility would be to anticipate a maximum amount of bank-1 memory you'd want to allocate for each list, keeping lists separate, so you don't need link fields to get from one header to the next in search of a match. The increment for each hop would always be the same for the particular list.

Any other ideas I can come up with are just variations on these themes. Is there another slick way I'm missing? Are there better ways to separate names in a vocabulary than by length?

TMorita · Post by **TMorita** » Mon Nov 15, 2004 6:03 am

GARTHWILSON wrote:

I'm looking for ideas on two related things I'm considering for possible future implementation in my 65816 Forth:

A. Samuel has talked about ultra-fast Forth compilation. I'm usually not very interested in compilation speed because it does not take a high percentage of any given project's time; but there are times I think it would be nice to be able to compile an entire large application in just a second. I know my own 65c02 and 65816 Forths' compilation is slowed way down by the fact that FIND has to look through possibly up to a thousand headers all in the same linked list to find a particular word. Obviously if the list were separated into, for example, a list of one-character names, another list of two-character names, another of three-character names, and so on, the searches would be much shorter, and the compilation would be much faster. How much? I don't know-- hopefully 5-10 times

You should look at my Qforth. I use hashed chained linked list to hold the words. It's superfast.

Toshi

GARTHWILSON · Post by **GARTHWILSON** » Mon Nov 15, 2004 6:26 am

> You should look at my Qforth.

How? I downloaded your QForth1.zip and QForth2.zip, but the computer just gives me an error message saying, "No Zip file, bad Zip file, or part of a spanned/split Zip file." There doesn't seem to be any way to unzip it.

dclxvi · Post by **dclxvi** » Fri Nov 19, 2004 5:51 am

Not sure where you downloaded it but there's a chance it's a NUFX archive (a.k.a a Shrinkit archive), which is a funky Apple II archive format (Shrinkit is the name of the application that creates and extracts these files). NuFX files usually have a .shk extension. NuFX files can be identified by the first 6 bytes: $4E $F5 $46 $E9 $6C $E5, which is the ASCII string "NuFile" with bit 7 alternating. The header for the first (archived) file starts at the 48th byte, so those 4 bytes will be $4E $F5 $46 $D8, which is the string "NuFX", again with bit 7 alternating. (The header for each archived file contains the "NuFX" ID string.) Anyway, there's a C program out there specifically designed for creating/extracting these files on non-Apple systems. However, you might be able to save yourself a lot of trouble. I have already "unshrunk" Qforth source code sitting around that I downloaded from somewhere, the Ground archive I think. If you need an exact URL, I can look it up.

Anyway, if you don't want to go wading through all the Qforth source code, here's a brief description of hashing (at least as it applies to Forth dictonary searching), and how Qforth does it. The grouping by name length that you have proposed above is a form of hashing. The item you are searching for is called the key. In a Forth dictionary lookup, the key is the name of the word. The idea is to divide the dictonary up into smaller pieces. Each piece is referenced by a number. You compute this number using the hash function, which is a function of the key, in other words, f(key) is your hash function. In your group-by-name-length method, f(name) -- remember, in this instance, the name of the word is the key -- is:

Code: Select all

f(name) = length(name) if length(name) < 10
        = 10           if length(name) >= 10

Ideally, you want to choose a hash function that distributes the words as evenly as possible between the various pieces. If you have N total words and K pieces, then each piece could contain as few as N/K words. In other words, f(key) returns only K distinct values. In your example, K=10. You also want a hash function that can be quickly computed, since it doesn't do much good for the search to be faster if the preparation time (i.e. computing the hash function) eats up the savings. There is something of an art to choosing a hash function. Length, while easy to compute, isn't really ideal, since there are (well, I haven't counted them, but there probably are) more standard 3 character words than standard 9 character words, for instance. In Qforth, K=128, and the hash function is:

f(name) = (char1 XOR char2 XOR ... XOR charL) AND $7F

where L is the number of characters in the word. Then Qforth uses that to look up the address of the appropriate piece in an array. However, if you use a word like MARKER you will have to save the current lengths (or ends) of all K pieces so that you can dispose of the heads. (You could generate a new array when you get to MARKER, but that's also space consuming.) An alternative is to rebuild all the pieces when its time to dispose of the heads, which is time consuming. QForth, for example, builds each of the pieces at cold start, which does add to the amount of initialization time.

One other reason why you may see less speed up than you hoped (with any method) particularly in Forth is that many definitions use the last few words just defined, which will be found almost immediately even if those words are merely stored in a single linked list like in FIG-Forth.

Another option with separated heads is to group by name length, then arrange the heads of the words in alphabetical order. Then each name comparison rules out about half of the dictionary. (You do have to think it through to make sure that your search implementation works correctly.) So, it will take at most about log2(N) comparisions to find a word. Very fast, and there isn't much of a space penalty. Creating new heads (as well as deleting a head) can be slow, since a lot bytes may need to be moved to insert the new head in alphabetical order. So this method isn't so good for user defined words, but it works well for words in the kernel.

A third option is to use an AVL tree. There's an article in Forth Dimensions V2#4 starting on p.96 (it starts on p. 112 of the V2 .pdf file available at www.forth.org) about AVL trees that includes some 6800 assembly and Forth code for searching, inserting, and deleting. AVL trees are also descibed (among other places) in Knuth's "The Art Of Computer Programming", Volume 3, Section 6.2.3.

Briefly, the idea is similar to pre-alphabetizing the dictionary, but instead of arranging the names in a particular order, you simply add 2 pointers to EACH head. One points to words that are "lower" (i.e. alphabetically) and the other points to words that are "higher" (alphabetically). Adding to the tree is easy, but the trick is to keep the tree balanced. In other words, if you have N heads, the maximum height of an optimal tree is only about log2(N). For example, the optimal arrangement of the words A through G is:

Code: Select all

     D
    / \
   /   \
  B     F
 / \   / \
A   C E   G

It turns out that maintaining an optimal tree when adding heads is difficult. An AVL tree won't always be optimal (it depends on what order you add the heads), but its worst case height is less than 1.5*log2(N) which is still very efficient. (There's a more precise formula for the worst case height but it involves log base sqrt(1.25)+0.5. Really.) Insertion (including re-balancing the tree) is very fast and easy, no more than a single re-balancing operation is needed. Deleting heads is a bit more involved, but at most log2(N) re-balancing operations are needed, and even that isn't bad.

It only takes a single pointer to define a new tree, so MARKER can be easily implemented also. In that case, you would search the new tree first, and the old tree second, which would take 1.5*log2(N2)+1.5*log2(N1) comparisons, rather than the 1.5*log2(N2+N1) comparisons that a single tree would take, but it's still very fast. It's even possible to handle redefined words without overwriting the old head.

As I alluded to above, the biggest issue is that it requires 2 pointers plus 1 byte (a balance factor) PER HEAD. So for 2 byte pointers, that's 5*N additional bytes of head space required. The balance factor isn't used when searching for words, it's only used to re-balance the tree when adding or deleting heads. (There are only three possible values for the balance factor, so it could conceivably be stored in two bits fairly easily, but masking off those two bits -- wherever they may packed -- will take additional time.)

bogax · Post by **bogax** » Fri Nov 19, 2004 4:25 pm

dclxvi wrote:

Ideally, you want to choose a hash function that distributes the words as evenly as possible between the various pieces. If you have N total words and K pieces, then each piece could contain as few as N/K words. In other words, f(key) returns only K distinct values. In your example, K=10. You also want a hash function that can be quickly computed, since it doesn't do much good for the search to be faster if the preparation time (i.e. computing the hash function) eats up the savings. There is something of an art to choosing a hash function. Length, while easy to compute, isn't really ideal, since there are (well, I haven't counted them, but there probably are) more standard 3 character words than standard 9 character words, for instance. In Qforth, K=128, and the hash function is:

I think it's worth keeping in mind (although I don't suppose anybody really needs it pointed
out to them) that what may be ideal in some sense, may yet be suboptimal in light
of other considerations.
eg. you might be better off with an"ideal hash" (using ideal here in the slightly different
sense of no collisions) on some subset of more probable words than with a hash that has
a better distribution on the possible words.
ie if 90% of your searches are on 20 predefined words, it may be worth putting
some effort into optimizing for those 20 words even if it results in a hash that's a little less
than "ideal" in some way.
Or even after you've taken in to account other (more obvious or structural) biases
(like 3 characters v 9 characters).

dclxvi · Post by **dclxvi** » Wed Nov 24, 2004 8:39 am

You are right that when it comes to hashing, the word "ideal" generally means "no collisions". There are some things to keep in mind when specifically applying hashing to a Forth dictionary search, though. (I wasn't intentionally trying to gloss over hashing in general, but I was trying to stick to hashing in Forth.)

First, in Forth, it can be especially difficult to anticipate what names will be used for user-defined (application) words, especially since words can, and often will, contain ANY character, except (ordinarily) space.

Second, if a word is redefined, the most recent definition is the one that must be returned by subsequent searches. So if a new word collides with one of the "20 most used" words, but the two have a different name, that "most used" word will have to be found last, and therefore be found less quickly.

There are some options that allow you put the "most used" word first, so that it can be found immediately, though. One way is to handle a redefined word by simply overwriting the old head, and forget about using a word like MARKER. If you wanted to reset the dictionary (to what it was at cold start) you would have to reload the kernel from scratch. Another way is to handle a redefined word by creating a new head (which would be found last), but make the new head point to the old definition, and the old head (i.e. the one that would be found soonest) point to the new defintion. In other words, swap the heads when the name is identical. Note that this means that you have to check if a word is being redefined every time you create a new head. Since redefined words aren't common, this makes creating a new head slower on average.

For the reasons above, I am personally somewhat skeptical of hash functions that are specifically intended to have no collisions for the most frequently used words. I have the sneaky suspicion that it would be all too possible to hit a worst case scenario where the new words collide ONLY (or at least chiefly) with the most frequently words. Of course it's not so easy to find a hash function that distributes words evenly, either!

Your suggestion to consider tailoring to the situation is good advice. Just be aware that Forth can pose unique challenges in that regard.

GARTHWILSON · Post by **GARTHWILSON** » Tue Nov 30, 2004 7:04 am

Well, I've written very little here in the last two weeks as I've spent most it cleaning and re-arranging my office to make better use of space. I also got my raster graphics going on my plain ol' analog oscilloscope with the workbench computer. It's the first time there has ever been text on its screen. I haven't done any non-text graphics yet.

Back to the hashed dictionary— Thankyou for the (mostly) very clear explanations. You raised a lot of good considerations.

I understand you to be saying that the hashing is usually more than just separating the lists, but further, trying to keep the lists of more-or-less equal lengths so some don't become significantly longer than the others and defeat the purpose, and even further, where possible, arrange things within a given list such that the most-often sought items are found the fastest. Then there's the AVL tree, where up to half the items are on the last row anyway, but can always be arrived at by a very short path. That one will take some chewing and digesting. Obviously the searches will be fast, but I see what you mean about the difficulty in keeping the tree balanced as it gets constructed.

Ultimately my inclination will probably be to not make it too complicated. Virtually anything I do will make a big improvement over the current search speed, but you've given me some good things to mull over. Since I'm not writing a metacompiler but rather source code for an assembler to generate the Forth kernel (possibly for ROM), whatever way I chose will have to be something I can pull off with the macro, conditional-assembly, and assembler-variable capabilities of common 6502/816 PC cross-assemblers. Once Forth is loaded on the target computer, I can have it do its compilation any way I want; but I'll be more limited in what I can do to generate the kernel itself with assembler source code on the PC.

Quote:

Length, while easy to compute, isn't really ideal, since there are (well, I haven't counted them, but there probably are) more standard 3 character words than standard 9 character words, for instance.

I have 153 headers with 3-character names, and the ones with two to eight characters in the names are close runners up. There are only 52 with 9 characters. The other lengths drop down quickly. The long ones are mostly hardware things like VIA3T1CL and ACIA2CTRLREG (which should be intuitive to anyone familiar with the 6522 and 6551).

Quote:

First, in Forth, it can be especially difficult to anticipate what names will be used for user-defined (application) words, especially since words can, and often will, contain ANY character, except (ordinarily) space.

I would add at least <NUL>, <BS>, <CR>, <LF>, and probably <ESC> to the list of no-no's for use in Forth names, but that still leaves a lot of specials. The Greek letters are used all the time in engineering work, as well as things like ±½º and I²C_CLK .

Bruce, you must have been a FIG member from way back. I got in on the last few years of it before it seemed to go into a coma, and I wish I had bought more of the books and magazine back issues when they were selling them. I looked up the AVL article online from your link, but some things in the diagrams were nearly impossible to make out. Those early issues had a lot of references to the 6502.

TMorita · Post by **TMorita** » Tue Nov 30, 2004 6:30 pm

"Ideal hash" isn't the phrase used to describe a hash with no collisions.

It's called a "perfect hash".

There's at least one perfect hash generator floating around on the web in source form, but I don't remember the name offhand. I think it's a GNU package.

Toshi

bogax · Post by **bogax** » Tue Nov 30, 2004 9:05 pm

TMorita wrote:

"Ideal hash" isn't the phrase used to describe a hash with no collisions.

It's called a "perfect hash".

There's at least one perfect hash generator floating around on the web in source form, but I don't remember the name offhand. I think it's a GNU package.

Toshi

Technically correct of course

And for my part , perhaps I should have just left out the "no collisions" bit
or elaborated somewhat.

There are a number of techniques for achieving perfect hashes along with the attendant example code and there used to be several web pages that
would generate them for you.

They usually (to my knowledge) involve elaborate parameterized algorithms
that use a brute force search for paramteres to map arbitrary keys
to the smallest possible
space (few or no empty spots).

The term "perfect hash" is associated in my mind with these techniques
that involve so much concomitant overhead they're essentially useless for
less than millions or billions of keys.

I deliberately chose not to use the term "perfect hash"
For one thing, it might have sounded I was quibbling over semantics
then too, you might derive a hash with no collisions relatively easily if you don't mind
a sparsely populated hash space, but it's not what I think of as a perfect hash.
In this case it's not a problem because other keys could fill the empty spots.

But no collisions for some subset of words was just an appropriate example

I was really trying to offer a reminder not to let one's thinking become too limited. The term "perfect hash" would just introduce another set of
mental constraints and that would be the opposite of what I was trying to do.

I don't know, maybe other people don't have the same problem I have with
not letting their unconscious assumptions narrow their view

dclxvi · Post by **dclxvi** » Fri Dec 03, 2004 10:14 pm

In retrospect, I probably skipped to some of the Forth-specific issues a little too quickly, and didn't cover the basics of hashing adequately. So let's start again.

The idea behind hashing is to speed up a search by calculating a number from the key (i.e. the data you searching for). The key could be any kind of data, but in all of the examples below, the key will be a string. In other words, we'll be searching for a string. That's all hashing really consists of, calculating a number that is a function of the key (i.e. the string). This function is called the hash function.

Suppose you're writing a traditional two-pass assembler. An NMOS 6502 has 56 different mnemonics. To keep this example simple, we'll only have four assembler directives: DB, DW, EQU, and ORG, bringing us to a total of 60 different opcodes. If we arranged the opcodes alphabetically, we could ensure that only 6 string comparisions are needed rather than 60. But suppose we had a function whose input was a string (the opcode), and returned a different number (between 0 and 59) for each of the 60 valid opcodes. Now we use this number to index into an array that consists of pointers to routines that handle each of the 60 opcodes. You'd likely want to catch an invalid opcode error, so you'd still be left with 1 string comparison. But if the hash function could be quickly computed, that one and only string comparison would make the search very fast.

Even though the opcodes could be in any order (i.e. it isn't necessary for ASL to be 0 and TYA to be 59, as long as each valid opcode results in a different number), finding such a function that can be calculated easily and quickly could be quite a challenge. Suppose we could find a function that returns a different number between 0 and 255 for each of the 60 valid opcodes. This is slightly less memory efficient (yet still not too bad), but on a 6502 it could be just as fast if your array was implemented as two tables (one for the low byte of the pointer, the other for the high byte) and was accessed by Absolute,X or Absolute,Y addressing. This new function should be easier to find than the old function, and might even be easier and faster to compute as well (since it's less restrictive).

Incidentally, if the assembler was part of a C compiler system (where the compiler generates a human-readble source file, which is then fed to the assembler to produce the object file) instead of a stand-alone assembler, you can go from one string comparison to zero string comparisons. The compiler will only feed the assembler valid opcodes, either a valid 6502 mnemonic or a valid assembler directive, so you don't need to check for the invalid opcode error. So an opcode can be determined by simply computing the hash function.

Getting back to the stand-alone assembler example, another place where hashing could be used is the symbol table. When the object code is intended for a specific system, it's common to have a file that contains a set of EQUates of standard names for ROM entry points, I/O addresses, etc. These labels are known beforehand and their names won't be changed. But you also have the labels that the user chooses. Some of these names can be anticipated. Labels like COUNT, LOOP, PTR, and variations thereof are common no matter what type of program the user writes. But the names of most of the user's labels will be difficult to predict, which raises a couple of issues:

1. The hash function might return the same number for two different labels

2. The number of labels the user will define won't be known beforehand, and the number of possible values that the hash function should return is a question. Note that if the hash function returns 256 different values and the user defines 257 labels, the hash function must return the same number for at least 2 of the labels.

When a hash function returns the same value for two different label names in the symbol table, this is called a collision. Normally, the goal with hashing is to have no collisions. But collisions are not the end of the world. For example, if the hash function returns the same value for the labels AB, CD, EF, and GH, after computing the hash function, all you have to do is perform a few string comparisions to tell these labels apart. These label names could be stored in a linked list, or in an AVL tree, etc. But keep in mind that if you had, say 100 labels, and those four were the only ones where there was a collision, 4 string comparisons MAXIMUM (if those four were stored in a linked list) vs. 50 string comparsions AVERAGE (if hashing wasn't used and the entire symbol table -- all 100 labels -- was stored as a linked list) is still a substantial speed improvement, even though no collisions would be faster yet.

Now let's consider how hashing can be applied to a Forth dictionary search. The examples above have laid much of the foundation. You have words like IF and SWAP whose names are known beforehand and won't be changing, and then you have the word names that the user will choose. There are several things to consider with Forth specifically, which have been mentioned above, so I'll just list them here without much elaboration.

1. It is much more difficult to predict the names of the words that a user will define since not just 0-9 and A-Z but pretty much any character is fair game.

2. If you are using multiple vocabularies (ASSEMBLER, EDITOR, etc.), each vocabulary will ordinarily need its own pointer array (remember that from several paragraphs ago?), which can get space-expensive.

3. Likewise, if you are using MARKER, the entire pointer array must be saved (or at least any information needed to reconstruct it), so that its current state can be restored later.

4. Most colon definitions make use of the last few words that were just defined, which can ordinarily be found immediately with any search method.

5. If words are fairly evenly distributed among the possible return values of the hash function, this can help improve the worst case.

6. If you can set things up so the most commonly used words have no collisions, the overall search time is likely to improve.

7. When a word is redefined, the new definition must be found before (or instead of) the old. This means that a frequently used word in the kernel like DUP is usually found last, so collisions with user defined words can make searching slower.

For these reasons, you may want to combine hashing with other techniques to improve performance. In Forth, it shouldn't be too hard to write some debugging code that collects statistics on how words are distributed and how frequently they are used. You could use that information as you experiment.

By the way, I only started learning Forth...oh it must have been 1999 or so -- it was definitely not before 1998. Admittedly, there is a lot of source code in those FD .pdf files that is barely legible. Using 200% or 400% size is practically a necessity in some cases. Fortunately, I have some other (non-Forth) material on AVL trees, so I haven't had to rely solely on that FD article.

GARTHWILSON · Post by **GARTHWILSON** » Mon Dec 06, 2004 1:09 am

Thankyou for the detailed explanations! I will no doubt refer back to it many times (and hopefully others will too).

Well, I did some timing experiments on my workbench computer running 65c02 Forth with all the words in one linked list. (I have not tried it yet with the 65816 Forth which is more optimized although so far it still has all the words in a single linked list too.)

FIND takes 100 times as long to find the first word as it does to find the last. The list was almost 500 names long, but a little overhead keeps the ratio from being 500:1. But that's just FIND . That does not take into consideration the other time-consumers in compilation. It looks like I'd have to make some other changes besides the "FINDing" to speed up the compilation by more than a factor of 2 or 3 (aside from the fact that tripling the clock speed on the next workbench computer to 15MHz or better will help that much by itself.) FIND is a primitive, but WORD is a pretty long secondary, as is EXPECT , QUERY , and INTERPRET which hands the found word off to COMPILER which is also a secondary. At least WORD uses primitives SKIP and SCAN for the iterative processes, but EXPECT has a loop in high level that runs once per character received.

But that's not all. Since it's a 4.5" x 6.5" SBC with no mass storage of its own (yet) and I'm using the PC to write, edit, and store the source code, the TIB is being filled from the RS-232 receive buffer, and that's being filled by an interrupt-service routine that services the serial port. All that's being done in high-level too, including the interrupt service. Clearly I never took the compilation speed seriously, and now it's the computer's sorriest (is that a word?) performance area. You can still write a new word a few lines long and compile it and have it running almost instantly, but compiling several pages of source code at a time definitely gives you some time to twiddle your thumbs. If I left the extensions out of the kernel and had to compile those along with a big application, I might as well get lunch. Not practical.

Again, I have not tried it on my 65816 Forth yet. That one is much more optimized in general, but not particularly in compilation performance. So while I think through a possible next step for a quicker FIND based on what you've said above, it won't be too hard to improve some of these other areas. The serial port interrupts were serviced in high-level Forth with my '02 Forth because I wanted to be able to use the high level to report possible framing errors, overrun errors, or whatever, and give choices of what to do about it. I didn't want to be stuck in an assembly-language ISR that may be interrupting the execution of a Forth word. My '816 Forth OTOH can do it in assembly and still hand it off to high level Forth if a need like that arises. As I move toward having the source code in flash memory onboard, at least the action of receiving the source code should present less overhead.

kc5tja · Post by **kc5tja** » Sat Dec 11, 2004 1:26 am

GARTHWILSON wrote:

Thankyou for the detailed explanations! I will no doubt refer back to it many times (and hopefully others will too).

That's good, because until now, I'm rather shocked that nobody offered another option, one which is wholesale independent of the data structure used to locate words with: caches.

One method to solve the problem of taking longer to find the earliest defined words (such as @ and !, and most of the other Forth primitives) is to cache your search results. FIND, in this case, would be the sole maintainer for the cache -- no other subsystem need be aware of it.

The way FIND works in this case is simple:

Code: Select all

1) Find word in the cache.  The cache can be maintained as a last-recently used (LRU) structure -- typically a doubly-linked list where the found item is brought back to the front, under the assumption that, if you used it once, you'll be needing it again soon.

2) Normal dictionary lookup (hashed-based systems work very well; the expense of MARKER is justified, I think) if and only if the word being sought is not already in the cache.  If a word is found here, insert it into the cache head.  The last element of the cache (that which was formerly the least recently used) is discarded on the assumption that it won't be used in the foreseeable future.

Note that it has worse worst-case performance, since FIND is scanning two data structures -- a list (O(m) time) and, say, a hash (O(n/k) time), for a total of O(m)+O(n/k) time. But, you really have to write some horrifyingly pathological code to encounter this -- more often than not, the cost of a cache miss will be simply O(n/k)/2, and the cost of a cache hit will, on average, be O(m)/2.

As long as your cache is small (for example, 16 to 64 elements tops), this should markedly improve performance for Forth compilation. Since the cache does not need to be in strict chronological order, you will find that words like !, @, BEGIN, WIHLE, IF, etc. will tend to be at the head of the list, in contrast to the end of the list. But so will words that you last defined too. Which makes sense -- given some software you're coding, you rarely will ever tend to use middle-level words when you're coding high-level words, because that substrate has already been coded. Instead, you tend to use language primitives and the words of the software layer you're working with just below the one you're defining. E.g.,

Code: Select all

+------------------------+
| Current software layer |
+------------------------+
| next lower layer       |  <----- caching optimizes searches for this
+------------------------+         layer, and for the primitives layer.
| ...                    |
+------------------------+
| Forth Primitives       |  <-----
+------------------------+

Which, in my opinion, should be the way things are done anyway. Note that it's not strict of course -- caches are used precisely because they're dynamic -- if it turns out that you end up using intermediate layers more often that another, then of course, access to that intermediate layer will be faster too.

Anyway, I hope I have explained this well. I need to head off to work now.

TMorita · Post by **TMorita** » Thu Mar 24, 2005 6:53 am

kc5tja wrote:

GARTHWILSON wrote:

Thankyou for the detailed explanations! I will no doubt refer back to it many times (and hopefully others will too).

That's good, because until now, I'm rather shocked that nobody offered another option, one which is wholesale independent of the data structure used to locate words with: caches.

...

You're making things much more complicated than it needs to be IMHO.

You can accomplish the same effect with a hashed chained linked list by moving the found item to the head of the list for that hash value, and this way you don't need a separate cache.

Toshi

GARTHWILSON · Post by **GARTHWILSON** » Thu Mar 24, 2005 10:05 am

I did very much like the cache idea. But are you recommending changing the link fields of some of the headers of a particular hash value each time a word in that list is found, such that the search order of words in the list for that hash value changes? (This would of course mean the headers all have to be in RAM, not ROM.) This does not initially seem any simpler or more efficient to me. Am I missing something?

I have not been able to get back to work on my 65816 Forth yet, so I'm still very open to ideas. It is operational, but I know that some very significant improvements could be made while leaving most of my code unchanged. (That's not to say I'm against major changes—just that experimenting with them will have to wait longer before I have time to do major reworks.)

kc5tja · Post by **kc5tja** » Thu Mar 24, 2005 7:26 pm

TMorita wrote:

You're making things much more complicated than it needs to be IMHO.

You can accomplish the same effect with a hashed chained linked list by moving the found item to the head of the list for that hash value, and this way you don't need a separate cache.

This is true for processors that lack data caches. But modern CPUs tend to have caches. To maximize locality, you want as much data fetched per cache-line-fetch as possible. Hence, even with your hash+linked-list solution, you still end up jumping all over hell and creation in RAM, thus invalidating the very purpose and utility of the microprocessor's cache. Don't believe me? Benchmark it.

And then, as Garth indicates above, you can't even use the hash list rechaining technique for ROM-resident words.

dictionary hashing

dictionary hashing

Re: dictionary hashing

Perfect hash

Re: Perfect hash