6502.org

Posted: **Fri Aug 17, 2018 11:20 pm**

whartung wrote:

I haven't looked at how F83 FORGETs words. It hashed the words across 4 different paths. But each of those were, effectively, just 4 independent dictionaries, so the word headers were not changed at all to accommodate this technique.

Simply, you'd hash the value, decide which of the 4 lanes to pursue, and then search that lane to find your word.

Similarly, when a word is added, it picks the lane, and just appends the word to that lane.

I didn't actually change the header structure, just which part of the previous words header the link field pointed to. I'm not that familiar with F83's header structure, for all I know it might have the link fields point to link fields. For a 6502 based Forth that has the link field point to the previous word's name field, I believe it would be worth changing the system so the link field of a word points to the previous word's link field. There should be no problem doing this with a dictionary that is hashed across 4 or more different paths (like 8 or 16) to yield even better performance.
I've already experimented with making a kernel where the dictionary is hashed across 4 paths. I used a code primitive >VT, to vocabulary thread, to add an offset to the address of the vocabulary. Each vocabulary had 4 cells for the dictionary paths. >VT exclusively ored the low two bytes of the count with the low two bytes of the first character of the words name, left shifted once ( double the offset, the cells were two bytes ) and added this offset to the address of the vocabulary. The find primitive, (FIND) was used like this

Code: Select all

CONTEXT @ >VT @ (FIND)

With the improved (FIND), the code would look like this:

Code: Select all

CONTEXT @ >VT (FIND)

But it gets better! When I get time, I'll take another look at this version of my Forth kernel and rewrite >VT like so:

Code: Select all

CODE >VT ( ADR VOC -- ADR VOC+OFFSET )
   HERE 6 + JSR,
   NEXT JMP,
   <code to hash name and add offset of 0-3 multiplied by 2 to vocabulary address>
   RTS,

Create will need to call >VT but (FIND) can jsr to the body of >VT:

Code: Select all

CODE (FIND)  ( ADR VOC -- ADR2 FLAG )
   ' >VT @ 6 + JSR,  \\ PERFORM 4 WAY HASH
   <REST OF FIND>

So that (FIND) is used like this:

Code: Select all

CONTEXT @ (FIND)  // (FIND) HASHES NAME AND SEARCHES ALONG CORRECT THREAD

But for now, it's on a 'back burner'. I didn't pursue it further because I was concerned about how much more memory was used. FORGET became more complicated. It had to prune not just the vocabulary link and all surviving vocabularies, it had to prune each thread for each vocabulary. CREATE only got slightly larger but the code for WITH-WORDS, the word used by WORDS, also got more complicated. When I use WORDS, I like to see the words I defined displayed first, not some of the words I defined then a quarter of the words in the system, then more of the words I defined followed by another quarter of the system, etcetera.
Given the Commodore 64's slow disk drives, it just didn't seem worth it. If I ever get a chance to build an SBC and had source code sent over a serial line from the PC, it would be useful. On the other hand, the alterations to support a faster (FIND) resulted in a smaller memory footprint and made the system feel more responsive when interactively experimenting at the console, so it should be useful on a Commodore computer or an SBC.
Cheers,
Jim

Edit: Actually, my Forth didn't have WITH-WORDS at that time. The extra complexity in WORDS, SEEALL ( a word to test that SEE works correctly for all words in the context vocabulary), and SMUDGE-TEST ( a metacompiler word to test if there are any smudged words, and therefore wasted space, in the new kernel) motivated me to write WITH-WORDS as a common factor for WORDS, SEEALL, and SMUDGE-TEST which I then wrote for my simple linear-list implementation. In a way, experimenting with dictionary hashing helped me improve my non hashed implementation.

Cheers,
Jim

Posted: **Fri Aug 17, 2018 11:55 pm**

chitselb wrote:

In PETTIL I separated the header from the code entirely, putting names and CFAs in a symbol table. I also divided the dictionary in half with anything that touches the symbol table in the upper half (interpret, editor, assembler, compiler...). The lower half contains a 'core' cadre of words, and any user code that gets compiled. So when all the coding is said and done, the symbol table and developer environment can go away and free up 10K or so.

I tried something over twenty years ago that you might find useful, a turnkey generator. Sadly, we moved and my project got put on a 'back burner' and never got looked at again. My decompiler decompiles high level as well as low level code. What I was working on was a decompiler that didn't display the code, it recursively decompiled the high level code until it got down to assembly code. it even followed jumps outside the assembly code.
It would reconstruct the assembly code in virtual memory ( using Forth's blocks with block numbers so high that the Ram Expansion Unit was used ) and reconstruct the high level code on top of that. There were regions it would not decompile but just use the address as is, such as the Commodore 64 kernel.
I only got as far as testing a simple application before I had to set this project aside. I still had not gotten around to figuring out how to determine how large the parameter field was for a CREATE DOES> or a CREATE ;CODE word and I'm sure there were other issues. One thing I recall is that the data stack was too small. I had to include code to monitor the stack size and move some of it to and from a large buffer, complete with hysteresis. If it was three fourths full, the lower half was buffered and that remaining quarter was moved down, if it was one quarter full and there was stack data in the buffer ( it had a pointer ) that quarter was moved to the upper half ( or by an amount equal to what was in the buffer if it was less that half a stacks worth ) and the lower half read in from the buffer. It was a rather complicated and slow process, but it only needed done once to produce a program that didn't have any of the code in the system that was not needed, and it didn't have any headers, just the code needed by the high level word that was turnkey compiled.
Hopefully, this will suggest some ways to strip out unneeded code from your completed project.
I hope you succeed,
Jim

Edit: I'm sorry, but I will not be able to provide code for this anytime soon. I no longer have a C64 ( I use VICE, the C64 simulator on my linux box) and I don't have any of my C64 source files from then.

Cheers,
Jim

6502.org

dictionary hashing

Re: dictionary hashing

Re: dictionary hashing