6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 13, 2024 3:32 am

All times are UTC




Post new topic Reply to topic  [ 32 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: Fri Mar 25, 2005 11:11 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
Hmmm... I think that would foul things up for FORGET or MARKER too. Still, let me know if I'm misunderstanding your idea.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 26, 2005 5:06 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
Hmmm... I think that would foul things up for FORGET or MARKER too. Still, let me know if I'm misunderstanding your idea.


Well, obviously, any modifications you make to the dictionary search structure, you'll want to suitably modify FORGET and MARKER too, since it is an intrinsic "method" of the database. Remember: when adding any facility to a system, you always want to CYA. And to do that with information services, you need to think of the following triad: ADD, EDIT, DELETE. :, DEFER, CODE, and CREATE obviously are good ad ADDing things. FORGET/MARKER at DELETE. IS (if DEFER is supported) constitutes the EDIT capability. Some Forths even have a PATCH capability for words which were not defined to be revectorable.

Anyway, for any given information service, ADD, EDIT, and DELETE all need to know the internal structure of the database they're adding to, editing, or removing from, and all need to know how to search for their data. Otherwise, they couldn't do their task. Hence, if you make changes to how things are added or edited, you can bet that you're going to need to adjust the deletion procedures too.

In other words, what I'm trying to say is, "expect it, and prepare for it." :)

I'm actually undergoing this right now with my own FTS/Forth (a native-booting Forth for x86 PCs!!), being based on RetroForth 8 running under the L4 microkernel. Well, I've had to rewrite pretty close to 66% of the code, because of all the dependencies the code has on the vocabulary and how it handled user input. (When you think about it, there are only three major components to any Forth system: keyboard input (which I'm going to rewrite shortly), video output (already rewritten), and dictionary database (which I've rewritten about half the code for). So yeah, I'm easily approaching a 66% rewrite of the code. :-)

I'm going to end up rewriting everything again soon enough anyway, but right now I'm concentrating getting as much of the original code running as possible. Believe it or not, I actually have about four people waiting for its completion, including (ironically) the author of RF8. :-)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 28, 2005 4:26 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
GARTHWILSON wrote:
(This would of course mean the headers all have to be in RAM, not ROM.)


You don't have to put the whole head in RAM, just the link field. For example, assuming that A, B, C, and D, were defined in that order:

Code:
       ORG TOP_OF_RAM-NUMBER_OF_HEADS*4
LINK4: DW  LINK3,D_HEAD
LINK3: DW  LINK2,C_HEAD
LINK2: DW  LINK1,B_HEAD
LINK1: DW  END_OF_LINKS,A_HEAD


In other words, link fields build downward (toward lower addresses) in RAM. You can use MVN or MVP to set up the link fields at cold start. Notice that it doesn't matter whether the heads are ROM or RAM. The head and definition of say : NIP SWAP DROP ; might look something like:

Code:
NIP_HEAD: DB  3,"NIP"                 ; name length, name
NIP:      DW  DO_COLON,SWAP,DROP,EXIT


To handle FORGET or MARKER, just reconstruct the links, which can be done several ways. It may not be particularly fast, but FIND is used far more often than MARKER, so you will usually win big even if FIND is only slightly faster and MARKER is really slow.

Anyway, having the entire dictionary cached does have some interesting properties.

1. You don't have to decide how big to make the cache, or whether you would want to allow it to be resized.

2. FIND will be a smaller, simpler (but not necessarily faster) routine, since it only has to search the cache, not the cache then the dictionary.

3. FIND doesn't have to compare the same word twice. For example, if A is a recently defined word and is in the cache, and B is not recent and not in the cache, then when using a separate cache and searching for B, FIND will usually compare B to A twice, once when searching the cache and once when searching the dictionary.

4. Words that are used occasionally (i.e. they wouldn't usually be found in a separate cache), would not necessarily fall all the way to the end (i.e. last word found) of the dictionary, so you might be able to do without hashing. I should point out that the use of a cache does not depend on any particular dictionary implementation (hashing, linked lists, etc.).

So there are some trade-offs here, but it's an interesting approach.


Top
 Profile  
Reply with quote  
 Post subject: Re: Perfect hash
PostPosted: Fri Jul 22, 2005 1:21 am 
Offline

Joined: Wed Jul 20, 2005 11:08 pm
Posts: 53
Location: Hawaii
TMorita wrote:
It's called a "perfect hash".

It's pretty impossible to have a perfect hash function. Considering that on the 6502, we can store 256 diffrent values. ASCII is an 7-bit code. 128/8=16. According to The Art Of Computer Programming, Vol 3, there are 41e31 approx. 10e50 possible functions from a 31-element set into a 41-element set, which is less then 128. 41*40*...*11=41!/10! approx 10e43 are perfect. Factor this to the fact that the 6502 is 8 bits and there is no possible way to get a perfect hash (with a dictionary that has more then 256 (65536 on 816) elements). Even on the 65C816.

_________________
Sam

---
"OK, let's see, A0 on the 6502 goes to the ROM. Now where was that reset vector?"


Last edited by asmlang_6 on Sun Aug 21, 2005 4:52 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Perfect hash
PostPosted: Fri Jul 22, 2005 3:06 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
asmlang_6 wrote:
TMorita wrote:
It's called a "perfect hash".

It's pretty impossible to have a perfect hash function. Considering that on the 6502, we can store 256 diffrent values. ASCII is an 7-bit code. 128/8=16. According to The Art Of Computer Programming, Vol 3, there are 41e31 approx. 10e50 possible functions from a 31-element set into a 41-element set, which is less then 128. 41*40*...*11=41!/10! approx 10e43 are perfect. Factor this to the fact that the 6502 is 8 bits and there is no possible way to get a perfect hash. Even on the 65C816.


41e31 is substantially larger than 128, as is 10e50. I'm very confused about what you're trying to say there.

The 6502 and 65816 can store a whole lot more than just 256 data. The 6502 has a 64KB address space, and the 65816 can address 16MB.

*NO* processor, regardless of how wide its data paths, can arrange for a perfect hash with an arbitrary set of keys. Even MD5 has recently had collisions.

The name "Perfect Hash" stems from the fact that, given n keys, you can guarantee that each key will hash into buckets 0..n-1, without any collisions. But there are many restrictions: see http://burtleburtle.net/bob/hash/perfect.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Perfect hash
PostPosted: Sun Jun 18, 2006 12:53 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
asmlang_6 wrote:
TMorita wrote:
It's called a "perfect hash".

It's pretty impossible to have a perfect hash function. Considering that on the 6502, we can store 256 diffrent values. ASCII is an 7-bit code. 128/8=16. According to The Art Of Computer Programming, Vol 3, there are 41e31 approx. 10e50 possible functions from a 31-element set into a 41-element set, which is less then 128. 41*40*...*11=41!/10! approx 10e43 are perfect. Factor this to the fact that the 6502 is 8 bits and there is no possible way to get a perfect hash (with a dictionary that has more then 256 (65536 on 816) elements). Even on the 65C816.


No, you're missing the point.

Perfect hashes are used for things like decoding statically-defined keywords, so all the keywords will hash to a different value and you only need to perform one string check.

It's not to be used for general-purpose string matching.

Toshi


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Sep 20, 2010 4:44 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
I don't know if anyone's following chitselb's adventures in comp.lang.forth
but I thought I'd point out this thread:

"What are my options for dictionary hashing?"

There's some interesting stuff there, unfortunately there's also a lot of
rubbish to wade through.


Looking at the statistics there I think I'd try finding some very simple
(ie fast) hash that put the most frequently used words in seperate buckets
(while retaining a reasonable distribution for everything else) then
promote the most recently used to the second spot in the list (leaving
the predetermined most used in first place)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Sep 20, 2010 6:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
bogax, thanks for the alert. I've been reading for a long time and I'm still only half way through it. As for the "rubish," yes, there are a couple of particularly abrasive people on there dukin' it out. It's rather distasteful but I can't say I haven't learned anything from it. There's a ton of material there (taking several pages) and Elizabeth Rather herself even posted there. It seems to me that we made more-efficient progress here.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 22, 2010 1:01 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
Wow, I didn't know this thread was here. Guess I have some catching up to do. Thanks for waking it up after 5 years, bogax! I admit it, I was about as thrilled to see Elizabeth Rather posting as I would have been to have Grace Hopper check out my COBOL program or having Phil Zimmermann sign my PGP key. Oh wait, he did that :)

I'd like to get your feedback on what I'm doing, to wit:
* make a lot of internals words headerless since users won't need them
* standard linked list searching/chaining in child vocabularies...
* until getting to the top of the root (FORTH) vocabulary
* 4-bit Pearson hash indexing into...
* sixteen uniformly-distributed threads...
* sorted in ascending order by name length...
* with a Bloom filter to prevent us from trying to find most numbers

It could always be "more optimal" but it's already going to blow the doors off single linked list (and worst case for all numeric literals) that fig/Blazin'/et al do.


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Wed Apr 09, 2014 5:20 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA
Another bump ...

I have noticed that the different Forths out and about have quite different dictionary orders. Some seem to group primitives together, then secondaries. Others seem to group by function, and others seem to group by popularity. Do any of the more-experienced users here have a preference for a simple linear-search (FIND) implementation? I realize that certain words like FORGET can have much-different global effects based on the dictionary orders, right? Or is the 'base' vocabulary usually FENCEd-off out of harm's way?

I'm sorry if these questions have painfully-obvious answers. I'm a Forth n00b, but I couldn't decide whether to post here or in 'Newbies'.

Mike


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Wed Apr 09, 2014 1:49 pm 
Offline

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91
I also like a simple linear-list implementation. But then I work in embedded systems; execution speed is important, but compilation speed is not.

I believe some people have gone to the trouble of analyzing which kernel words are most often used in compilation, and placing them later in the linked list so they're found quickly. (I vaguely recall that New Micros' 68HC11 Max-Forth did this.)

IIRC, the only requirement ANS Forth places on dictionary order is that if you redefine a word, the newest definition must appear when you search the dictionary for that word.

As a rule, one should not FORGET words which are in the kernel. :) Many Forths deliberately block this action. The ANS Forth MARKER word avoids this problem.

_________________
Because there are never enough Forth implementations: http://www.camelforth.com


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Fri Aug 17, 2018 12:40 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 858
Has anyone tried tinkering with the header structure?
My forth for the C64 has the following header structure: link field name field code field parameter field.
The link field of one word would point to the name field of the previous word in the dictionary. When I changed the system so the link field of a word pointed to the link field of the previous word, I was able to rewrite (FIND), the find primitive, so that it took about two thirds the time and it was also smaller. (FIND) also now took the address of a vocabulary instead of the address of the last word's name field in that vocabulary.
Code:
This:
CONTEXT @ @ (FIND)
Became this:
CONTEXT @ (FIND)
And this:
CURRENT @ @ (FIND)
Became this:
CURRENT @ (FIND)

I also had to rewrite FORGET, CREATE, and my metacompiler's version of CREATE but it was worth it! Faster dictionary searches without any hash table or hash function.
I also wrote a word for use in WORD to speed things up. >HERE is a code word that takes an address and a count. It copies the string to HERE as a counted string with a trailing blank as per Forth-83.

Cheers,
Jim


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Fri Aug 17, 2018 1:06 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 858
Brad R wrote:
As a rule, one should not FORGET words which are in the kernel. :) Many Forths deliberately block this action. The ANS Forth MARKER word avoids this problem.

My Forth for the Commodore 64, a Forth-83 standard system, has the variable FENCE, which can be lowered, it also has a hidden internal fence built into FORGET, a literal address FORGET compares the requested forget point with. Only words above FORTH-83 can be forgotten, provided FENCE is low enough. It seemed like the reasonable thing to do.


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Fri Aug 17, 2018 4:16 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
In PETTIL I separated the header from the code entirely, putting names and CFAs in a symbol table. I also divided the dictionary in half with anything that touches the symbol table in the upper half (interpret, editor, assembler, compiler...). The lower half contains a 'core' cadre of words, and any user code that gets compiled. So when all the coding is said and done, the symbol table and developer environment can go away and free up 10K or so.


Top
 Profile  
Reply with quote  
 Post subject: Re: dictionary hashing
PostPosted: Fri Aug 17, 2018 8:49 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I haven't looked at how F83 FORGETs words. It hashed the words across 4 different paths. But each of those were, effectively, just 4 independent dictionaries, so the word headers were not changed at all to accommodate this technique.

Simply, you'd hash the value, decide which of the 4 lanes to pursue, and then search that lane to find your word.

Similarly, when a word is added, it picks the lane, and just appends the word to that lane.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 32 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: