6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 7:24 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Sep 13, 2017 12:20 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Now that the weather is finally horrible again I get to come back inside and code. And I have had an idea to speed up Dictionary search in Forth and probably save memory as well that I'd like to run by y'all.

Traditionally, the Forth dictionary on simple machines is a simply linked list. The Dictionary Pointer (DP) points to the last (or rather, most recently added) element of that list, which contains a link to the next element, and so forth (no pun intended) until we find a link that is zero, which signals its end.
Code:
DP --> (W) --> (W) --> ... --> (W) --> (0000)
To see if a word is in the Dictionary, we take the (addr n) pair of the string from PARSE-NAME and pass it to FIND-NAME (see viewtopic.php?f=9&t=4364 for details). This walks through the list, comparing the n value of the string - its length - to the length of the word of the current list element, which is found in the header. If they are the same, the string is compared character-by-character until it is clear if they are the same or different.

The problem is that this is can be really slow for long lists (I think real computer scientists talk about "theta(n)" search time). Yes, you can stack the deck by putting the most common words earlier in the Dictionary, but if you're unlucky, you have to search the whole damn thing.

Excluding stuff like hash tables that are (probably) beyond the capacity of the poor 65c02, my idea is to use 16 different simply linked lists instead of one, divided by some cheap modulo-version length of the word (by bit masking the length value probably). So the single-letter words would get one list, the words with two characters would get their own, etc.
Code:
DP01 --> (W01) --> (W01) --> ... --> (W01) --> (0000)
DP02 --> (W02) --> (W02) --> ... --> (W02) --> (0000)
DP03 --> (W03) --> (W03) --> ... --> (W03) --> (0000)
...
DP16 --> (W16) --> (W16) --> ... --> (W16) --> (0000)
The trick here is that PARSE-NAME gives us the length of the string as n anyway for free. Therefore, we can put the DPs in a table, use that n value as an offset to the beginning of said table, and get to the correct list of a word of that length very quickly. This means we only have to search the list of words with the same length. This should save a lot of time. In pseudo-code:
Code:
addr, n = parse_name
our_dp = dp_table_start + n

while our_dp <> 0

    if string_from(our_dp) == string_at(addr) then
        yay_we_found_the_word
    else
        our_dip = next_word_from(our_dp)

oops_word_not_in_dict
The downside would seem to be that we need 16 * 2 bytes for the new table, and we have to limit the length of a word at 16 characters. I don't think the last problem is really real-world relevant.

Also, as far as I can tell from just thinking about this, multiple simply linked lists by size would eliminate the necessity of including the size value in the header of a word at all. So if we have, say, 100 words in the Dictionary at the beginning, we save 100 bytes.

Obviously I'll need to write some actual code to test this all. Any feedback on the general idea would be most welcome - this can't be a totally new idea after all.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 13, 2017 12:43 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Having separate lists will make FORGET harder to implement.

As its a compile time issue is it really that important? A 6502 forth will never have that many words to search.

Is the search being performed by a compiled word or a native word? Coding the search as a native word or having more native primitives called by it might speed it up enough without any structural changes.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 13, 2017 7:38 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Here's a piece copied out of a doc file about my '816 Forth, regarding the number of headers in each of the files (most of them being optional INCLude files) and the number of names of each lenth:

Code:
How 'bout dictionary hashing to make FIND faster?  If you keep a different linked list for each name length, FIND
won't need to check the length.  Actually I like Samuel Falvo's cache idea more.  Set aside 4K of bank 1 for 128
32-byte fields in a circular buffer.  It gets init'd at all 00, and opened-up space after FORGET is also zeroed.

number of headers of each length in each .asm file:
number of char.s │  1   2   3   4   5   6   7   8   9  10  11  12  13
─────────────────┼────────────────────────────────────────────────────────────────────────────────────
816FORTH.ASM     │ 31  59  50  51  59  44  32  11   9   2   1           350 headers, taking 2700 bytes
816ASSY.ASM      │  1   0  87  28  29  34  46  43  29   6               303 headers, taking 2730 bytes
816HDWR.ASM      │  0   6   3  12  11  11  27  35   8  12   5  14       148 headers, taking 1576 bytes
816TIME.ASM      │  0   2   5   1   2   4   9   8   3   2   3            39 headers, taking  398 bytes
816PIRQ.ASM      │                      1   3   1   1                     6 headers, taking   64 bytes (needs updating)
816XMATH.ASM     │      4   7   6   2   2   5   3   1       1            24 headers, taking  184 bytes
816STR.ASM       │      2       4   4   1   1   1       1                12 headers, taking  102 bytes
816TOOLS.ASM     │          1   2   2   2   5   3   1       1            17 headers, taking  170 bytes
─────────────────┼────────────────────────────────────────────────────────────────────────────────────
totals           │ 32  73 153 104 109  99 128 105  52  23  11  14       899 headers, taking 7924 bytes

Yeah, 8KB in headers is a lot. There's a lot of great stuff there though, and again, the user can elect to leave out source-code files he doesn't need.

scotws wrote:
The downside would seem to be that we need 16 * 2 bytes for the new table, and we have to limit the length of a word at 16 characters. I don't think the last problem is really real-world relevant.

As shown in the table above, I have no names that are that long. The user might legitimately have such long names though. A possible solution would be to put all the names that are above 15 or 16 characters in length in one short list.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 13, 2017 8:22 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
To be pedantic, it should be "Vocabulary" searches, rather than "Dictionary" searches. Since it's Vocabularies that actually hold the word names.

F83 uses what it calls "threads" for its vocabularies, and it has 4 of them. It's based on the lower 2 bits of the first letter of the word to search.

Each vocabulary stores, essentially, 4 pointers to linked lists, one for each thread.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 13, 2017 11:08 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Charlie has shared some of his thoughts on FORGET here. It is a bit too advanced for me to understand 100%, but maybe you can give it a browse and see if there's anything there that could prove immediately useful.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 6:06 am 
Offline

Joined: Tue Jun 08, 2004 11:51 pm
Posts: 213
An alternate thought is to make an xor of all the letters in each header.
You can then either use 7, 6, 5 or less bits to make strings of list. If 5, that
would break it into 32, roughly, equal list.
Just a thought
Dwight


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 10:41 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Thanks for all the feedback! FORGET will not be a problem, because it was replaced in ANSI Forth by MARKER (https://www.complang.tuwien.ac.at/forth ... words.html and http://lars.nocrew.org/forth2012/core/MARKER.html). That will still be a pain - you'd have to include any "mark" in every single of the lists - but should probably be doable. FIND-NAME is native code, so this would give it extra speed (speed, like RAM and screen space, is always good).

But there might be a different way that is even better - Mike's link contains an example of using Pearson hash (https://en.wikipedia.org/wiki/Pearson_hashing) for PETTIL I wasn't aware of yet (it doesn't seem to be in Cormen et al, unfortunately, 8-bit processors just don't get any respect). I'll try to understand the details, including the headless stuff at https://pettil.tumblr.com/post/87229455 ... d-and-done next.

Thanks again!


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 2:14 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
scotws wrote:
Excluding stuff like hash tables that are (probably) beyond the capacity of the poor 65c02, my idea is to use 16 different simply linked lists instead of one, divided by some cheap modulo-version length of the word (by bit masking the length value probably).

Yeah, hash tables would be way too hard on the 65c02, so you created ... drum roll ... a hash table! ;)

You should measure how all words in some example dictionary would distribute across the different buckets, because you get a bit of the counting problem here. If most word names are, say, 8 characters or fewer, then you'll only ever really use up to 8 lists. A hash can't take an input vocabulary of 8 distinct units and distribute them among 16 distinct output buckets, no matter how much you transform it.

Even simpler than the Pearson hash, I wonder if just taking the low nybble of the first character would give you a reasonably good distribution, similar to what whartung references.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 15, 2017 5:53 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
scotws wrote:
Thanks for all the feedback! FORGET will not be a problem, because it was replaced in ANSI Forth by MARKER (https://www.complang.tuwien.ac.at/forth ... words.html and http://lars.nocrew.org/forth2012/core/MARKER.html). That will still be a pain - you'd have to include any "mark" in every single of the lists - but should probably be doable. FIND-NAME is native code, so this would give it extra speed (speed, like RAM and screen space, is always good).

But there might be a different way that is even better - Mike's link contains an example of using Pearson hash (https://en.wikipedia.org/wiki/Pearson_hashing) for PETTIL I wasn't aware of yet (it doesn't seem to be in Cormen et al, unfortunately, 8-bit processors just don't get any respect). I'll try to understand the details, including the headless stuff at https://pettil.tumblr.com/post/87229455 ... d-and-done next.

Thanks again!


The 'headless' stuff was an effort to rip all the name fields and link fields out of the dictionary, leaving nothing but code and a little data behind. Then it turned out it would be pretty easy to divide all that code roughly in half, between "things that need the symbol table" and "things that don't". Programmer code is compiled onto the end of 'core', but after compilation a program could reuse that memory for something else. RAM allocation works out to be: the lowest 1K to the commodore/6502 gods, $0400-$1B80 is 6K for PETTIL core (including Sweet16). The symbol table and Transient Dictionary get copied to the top of RAM by startup, $5800-$7FFF. about 10K there. I'd like to achieve 16K+ available memory for the programmer when the system starts up. Even cooler would be to make it run on a 16K PET, or a C=64.

The Wikipedia article on Pearson Hashing is where I figured most of it out. Using the bottom four bits of the first character in the name would certainly be faster, probably not as 'fair' about balancing the number of items in each thread, and definitely not as fun. Toys like this wouldn't make it into the 'core' (the 6K of all the compressed nuts and bolts) but I enjoy including them in the developer tools.

I introduced some other gratuitous complication in the symbol table too, like setting a flag on the length byte and appending an unsigned 8-bit integer (vocabulary id) to the end of names that aren't in the main 'forth' vocabulary. Assembler is vocabulary #1 and Editor is vocabulary #2, #3 is the first vocabulary the user creates. Vocabulary id 0 (Forth) is implied but never appended, because that's what the flag on the length byte is for. I use 'smudge' for a slightly different purpose. Names have the smudge bit set during compilation so they won't 'find themselves' like a Klein bottle. But when a word is redefined, I keep the earlier entry in the symbol table, with the smudge bit set. Because we might need the code field address again, if the programmer invokes FORGET. Only one 'clone' can be unsmudged, the one most recently defined. FORGET was a bear to write, but wrestling bears is its own reward. FORGET is also necessary as a memory maintenance tool if you define 'a lot' of new words. The symbol table eventually crashes upward into the Transient Dictionary. Also newly created symbols are just appended to the end of the last pearson hash thread so searching will slow down a bit. Every few hundred bytes of new symbols the programmer needs to utter REHASH to rebuild the symbol table and fix things : REHASH HERE (FORGET) ; , and (forget) is simply FORGET with a CFA passed on the stack instead of a name token ahead of it in the input stream. Sorry for no tl;dr here. Maybe an example will help.

Code:
vocabulary assembler
assembler definitions
4 cconstant n

symbol table entry and object code for "assembler N"
59DD: 36 7C 41 4E 01
7C36: 20 94 10 04


vocabulary editor
editor definitions
: n   ( -- ) 1 >np ;
: p   ( -- ) -1 >np ;

symbol table entry and object code for "editor N"

651C: 14 7C 41 4E 02
7C14: 20 79 11 90 0A F3 7B D3 11   
( jsr enter; .word one; .word donp; .word exit )
N (assembler vocabulary) puts '4' on the stack, which is the base address of the `N` storage area on zeropage

N (editor vocabulary) "next" adds one to SCR (the user variable that knows what screen you're looking at)
L (editor vocabulary) "list" the current screen (in PETTIL I just unpack the RLE-compressed block packet directly to the screen)
P (editor vocabulary) "prev" subtracts one from SCR. Straight up Leo Brodie editor commands here.
>NP (editor vocabulary, no symbol) is N and P sharing code to modify SCR with min/max filter

Code:
forth definitions 
: n ; 
: n ; immediate
: n ;

symbol table entry for "multiple redefinitions of plain old Forth N" (which doesn't exist)
6687: 2F 1E 81 4E 34 1E A1 4E 39 1E 01 4E 00 00 00
1E2F: 20 79 11 D3 11 20 79 11 D3 11 20 79 11 D3 11 <here>
The 1st one has the smudge($80) bit set, the 2nd one has the smudge($80) and immediate($20) bits set, the 3rd one has no flags. None of them has the vocabulary($40) bit set, so there are no vocabulary id bytes appended after the N character here either. The triple-null at the end marks the end of the symbol table


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 16, 2017 11:36 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
After reading up a bit more on PETTIL (very impressive, and three cheers for the TRS-80, the first computer I ever physically touched!), I started to wonder how any of this will work with multitasking. Does PETTIL in fact do multitasking? I haven't found it mentioned (apols if missed), and "PAUSE" for instance seems to do something totally different.

It would seem at first glance that my envisioned model would force every task to have its own 32 byte block of DPs, which is a lot of memory. In that case, multitasking would have higher priority, and I'd go back to a simple linked list.

But I'd still have learned what Pearson Hashing is, so big win either way :D .


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 17, 2017 12:21 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
Multitasking would probably involve separated uservariable spaces for each process, but probably just a few of them. I haven't done it, leaving all of that to the application layer. It is an NMOS 6502 after all, having a decent Forth with integrated Sweet16 and 6502 assembly is probably ambitious enough. The thing I like best about using PETTIL is how the native PET editor was so easily transformed into a block editor. I use "STOP" as an escape key in the editor to prefix one of a dozen or so commands. Hitting STOP reverses the entire screen, and the next keystroke is the command, which reverses it back (and does whatever editor command, e.g. Previous Screen)

PAUSE ( flag -- ) is just a "hit any key" i/o, with the flag indicating whether to print a message or not. I think I put it in so I could pause/resume a long DUMP by tapping the spacebar.

The current direction I'm taking it is to have small teams of words ('conspiracies') be able to use and reuse different Sweet16 registers within primitive and secondary Forth words. Like for the outer interpreter, uservariables like >IN BLK SPAN are copied to R5 R6 R7 and used there. Things become faster and smaller by having 8-bit addressing modes, less stack activity, etc... Words acting in a conspiracy are obligated to update the true copies of uservariables by the time it's over.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 15, 2018 9:20 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
chitselb wrote:
PAUSE ( flag -- ) is just a "hit any key" i/o, with the flag indicating whether to print a message or not. I think I put it in so I could pause/resume a long DUMP by tapping the spacebar.

Blazin' Forth also had a word PAUSE that was not related to multitasking. It didn't require any parameters, but it left a flag on the stack. If no key was pressed, it left FALSE on the stack. If the STOP key was pressed, it left TRUE. If any other key was pressed, it would wait for another keypress. If the STOP key was pressed, it would leave TRUE on the stack, otherwise it would leave FALSE on the stack.
My Forth has a word to do the same as Blazin' Forth's PAUSE but it is called DONE?. PAUSE is a defered word used for multitasking support. SINGLE sets it to NOOP, a no operation word. NOOP lacks a parameter field. Its code field points to NEXT.
I think it would be prudent to reserve the name PAUSE for the task switcher in a multitasking system. Even if your system lacks multitasking, using that name for something else can lead to misunderstanding.

Cheers,
Jim


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 16, 2018 5:51 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Historically, the term of art for PAUSE functionality in cooperative multi tasking systems is called YIELD.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 17, 2018 7:50 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
whartung wrote:
Historically, the term of art for PAUSE functionality in cooperative multi tasking systems is called YIELD.

What Forth system?
Back in the November/December 1983 issue of Forth Dimensions ( FD vol. 5 no. 4 ), Henry Laxen introduces the word PAUSE as the task switcher in his article "Techniques Tutorial: Multi-tasking, Part I.
In 1993, in his book "Real Time Forth", Tim Hendtlass uses version 3.56 of F-PC. Page 124 identifies PAUSE as the task switcher.
From his paper Forth Multitasking in a Nutshell,
Brad Rodriguez wrote:
The Forth word which switches tasks is traditionally called PAUSE.


I've seen YIELD mentioned in discussions concerning coroutines in Forth.


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 6:06 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
JimBoyd wrote:
whartung wrote:
Historically, the term of art for PAUSE functionality in cooperative multi tasking systems is called YIELD.

What Forth system?

I didn't say any Forth system. I just said cooperative multi tasking systems generically.

F83 uses PAUSE as well.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: