whartung wrote:
I haven't looked at how F83 FORGETs words. It hashed the words across 4 different paths. But each of those were, effectively, just 4 independent dictionaries, so the word headers were not changed at all to accommodate this technique.
Simply, you'd hash the value, decide which of the 4 lanes to pursue, and then search that lane to find your word.
Similarly, when a word is added, it picks the lane, and just appends the word to that lane.
I didn't actually change the header structure, just which part of the previous words header the link field pointed to. I'm not that familiar with F83's header structure, for all I know it might have the link fields point to link fields. For a 6502 based Forth that has the link field point to the previous word's name field, I believe it would be worth changing the system so the link field of a word points to the previous word's link field. There should be no problem doing this
with a dictionary that is hashed across 4 or more different paths (like 8 or 16) to yield even better performance.
I've already experimented with making a kernel where the dictionary is hashed across 4 paths. I used a code primitive
>VT, to vocabulary thread, to add an offset to the address of the vocabulary. Each vocabulary had 4 cells for the dictionary paths.
>VT exclusively ored the low two bytes of the count with the low two bytes of the first character of the words name, left shifted once ( double the offset, the cells were two bytes ) and added this offset to the address of the vocabulary. The find primitive,
(FIND) was used like this
Code:
CONTEXT @ >VT @ (FIND)
With the improved
(FIND), the code would look like this:
Code:
CONTEXT @ >VT (FIND)
But it gets better! When I get time, I'll take another look at this version of my Forth kernel and rewrite
>VT like so:
Code:
CODE >VT ( ADR VOC -- ADR VOC+OFFSET )
HERE 6 + JSR,
NEXT JMP,
<code to hash name and add offset of 0-3 multiplied by 2 to vocabulary address>
RTS,
Create will need to call
>VT but
(FIND) can jsr to the body of
>VT:
Code:
CODE (FIND) ( ADR VOC -- ADR2 FLAG )
' >VT @ 6 + JSR, \\ PERFORM 4 WAY HASH
<REST OF FIND>
So that
(FIND) is used like this:
Code:
CONTEXT @ (FIND) // (FIND) HASHES NAME AND SEARCHES ALONG CORRECT THREAD
But for now, it's on a 'back burner'. I didn't pursue it further because I was concerned about how much more memory was used.
FORGET became more complicated. It had to prune not just the vocabulary link and all surviving vocabularies, it had to prune each thread for each vocabulary.
CREATE only got slightly larger but the code for
WITH-WORDS, the word used by
WORDS, also got more complicated. When I use
WORDS, I like to see the words I defined displayed first, not some of the words I defined then a quarter of the words in the system, then more of the words I defined followed by another quarter of the system, etcetera.
Given the Commodore 64's slow disk drives, it just didn't seem worth it. If I ever get a chance to build an SBC and had source code sent over a serial line from the PC, it would be useful. On the other hand, the alterations to support a faster
(FIND) resulted in a smaller memory footprint and made the system feel more responsive when interactively experimenting at the console, so it should be useful on a Commodore computer or an SBC.
Cheers,
Jim
Edit: Actually, my Forth didn't have
WITH-WORDS at that time. The extra complexity in
WORDS,
SEEALL ( a word to test that
SEE works correctly for all words in the context vocabulary), and
SMUDGE-TEST ( a metacompiler word to test if there are any smudged words, and therefore wasted space, in the new kernel) motivated me to write
WITH-WORDS as a common factor for
WORDS,
SEEALL, and
SMUDGE-TEST which I then wrote for my simple linear-list implementation. In a way, experimenting with dictionary hashing helped me improve my non hashed implementation.
Cheers,
Jim