Now that the weather is finally horrible again I get to come back inside and code. And I have had an idea to speed up Dictionary search in Forth and probably save memory as well that I'd like to run by y'all.
Traditionally, the Forth dictionary on simple machines is a simply linked list. The Dictionary Pointer (DP) points to the last (or rather, most recently added) element of that list, which contains a link to the next element, and so forth (no pun intended) until we find a link that is zero, which signals its end.
Code:
DP --> (W) --> (W) --> ... --> (W) --> (0000)
To see if a word is in the Dictionary, we take the (addr n) pair of the string from PARSE-NAME and pass it to FIND-NAME (see
viewtopic.php?f=9&t=4364 for details). This walks through the list, comparing the n value of the string - its length - to the length of the word of the current list element, which is found in the header. If they are the same, the string is compared character-by-character until it is clear if they are the same or different.
The problem is that this is can be really slow for long lists (I think real computer scientists talk about "theta(n)" search time). Yes, you can stack the deck by putting the most common words earlier in the Dictionary, but if you're unlucky, you have to search the whole damn thing.
Excluding stuff like hash tables that are (probably) beyond the capacity of the poor 65c02, my idea is to use
16 different simply linked lists instead of one, divided by some cheap modulo-version
length of the word (by bit masking the length value probably). So the single-letter words would get one list, the words with two characters would get their own, etc.
Code:
DP01 --> (W01) --> (W01) --> ... --> (W01) --> (0000)
DP02 --> (W02) --> (W02) --> ... --> (W02) --> (0000)
DP03 --> (W03) --> (W03) --> ... --> (W03) --> (0000)
...
DP16 --> (W16) --> (W16) --> ... --> (W16) --> (0000)
The trick here is that PARSE-NAME gives us the length of the string as n anyway for free. Therefore, we can put the DPs in a table, use that n value as an offset to the beginning of said table, and get to the correct list of a word of that length very quickly. This means we only have to search the list of words with the same length. This should save a lot of time. In pseudo-code:
Code:
addr, n = parse_name
our_dp = dp_table_start + n
while our_dp <> 0
if string_from(our_dp) == string_at(addr) then
yay_we_found_the_word
else
our_dip = next_word_from(our_dp)
oops_word_not_in_dict
The downside would seem to be that we need 16 * 2 bytes for the new table, and we have to limit the length of a word at 16 characters. I don't think the last problem is really real-world relevant.
Also, as far as I can tell from just thinking about this, multiple simply linked lists by size would eliminate the necessity of including the size value in the header of a word
at all. So if we have, say, 100 words in the Dictionary at the beginning, we save 100 bytes.
Obviously I'll need to write some actual code to test this all. Any feedback on the general idea would be most welcome - this can't be a totally new idea after all.