Redesigning Forth

IamRob · Post by **IamRob** » Sun May 31, 2020 6:27 am

I don't know if this is frowned upon to deviate from the Forth standard, but I will give some background on an idea I have to greatly improve the compile time speed, makes a few words redundant and greatly reduces the size of a finished Forth program.

A little background first.

A little while back I wrote a very flexible sorting algorithm that is basically just an insertion sort with a twist. It is not the word itself that gets inserted but a pointer to the word. Words are added to memory in the order they are received, then a binary search is done where the word should go alphabetically. A pointer is then created for that word and inserted into the pointer list. The pointer list has all the pointers to the words in an order that makes the words alphabetical. This also greatly increases the insertion speed as only 2 bytes for each pointer are moved up instead of 5-8 character words on average.

The idea I had then was that all Forth words could be sorted in alphabetical order (yes, it sorts symbols and punctuation too), to make finding a word much quicker. I believe a binary search can find any word out of hundreds in an average of 7-9 tries.

Storing words in a separate list from the Forth system dictionary will result in these savings.

The Word length and Word Name are no longer part of the Forth system dictionary which results in a final program of very compact code.
Link fields are no longer required since doing a binary search to find a word is much faster. Instead the address to a Word's CFA is stored with the word which is only used in compiling and not needed to run the final program and also is not included in the final program.

Words like "Traverse", "LFA" and "CFA" are no longer needed, and PFA is just CFA+2.

Printing the words also couldn't be easier as they are already stored in memory in order they were entered, or they can also be retrieved in alphabetical order along with CFA address so to know where that Word's description starts.

I believe doing it this way would allow for better quality, and more compact applications or games to be created.

Any thoughts, suggestions or criticisms of this way of programming Forth?

GARTHWILSON · Post by **GARTHWILSON** » Sun May 31, 2020 7:06 am

I think I would need a little more explanation to catch on; but you will be interested in the "Dictionary hashing" topic at viewtopic.php?f=9&t=555 .

barrym95838 · Post by **barrym95838** » Sun May 31, 2020 7:07 am

I understand just enough about Forth to get myself in trouble, but it seems to me that you have sufficient skill and momentum to give it a try and share your results with us. Do you need any coders or testers to help you out with your plan, because I might be just barely dangerous enough to qualify for either (very limited spare time permitting)?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun May 31, 2020 6:16 pm

IamRob wrote:

A little while back I wrote a very flexible sorting algorithm that is basically just an insertion sort with a twist. It is not the word itself that gets inserted but a pointer to the word...The idea I had then was that all Forth words could be sorted in alphabetical order (yes, it sorts symbols and punctuation too), to make finding a word much quicker. I believe a binary search can find any word out of hundreds in an average of 7-9 tries.

I don't want to rain on your parade, but your "twist" dates back to the 1950s.

In performing a binary search, the theoretical number of execution iterations is log2N, or O(logN) in Big-O notation, where N is the number of fixed-length elements in the list. Iterations, of course, include the string comparison, the speed of which will primarily be determined by the length of the search element. i.e., the word being looked up.

The value of implementing this algorithm in Forth is something for the Forth enthusiasts here to debate. Other discussion has been held on using hashing, which may outperform a binary search in some cases. Not being a Forth user, I can't opine on how much (or little) using a binary search to look up words will help performance.

IamRob · Post by **IamRob** » Sun May 31, 2020 10:10 pm

GARTHWILSON wrote:

I think I would need a little more explanation to catch on; but you will be interested in the "Dictionary hashing" topic at viewtopic.php?f=9&t=555 .

I should have quoted that thread as I had read it before posting and I think I left my original post somewhat incomplete and should elaborate a little more.

I am an 8-bit programmer but prefer the 65c02 over the 6502. And love programming ways to make use of Auxiliary memory. A very fast sort routine with similar results to the ultimate fastest sort routine is preferred if the code is quite a bit smaller. Such as my insertion sort over a hashing algorithm sort.

I tend to shy away from hashing as I still struggle to follow the code, and if I can't follow the code easily, then I also struggle to find other uses for it. Plus I tend to find hashing algorithms a little larger than I would like to use on a computer with limited memory. If I am reading Hashing right, it would require a bit of math to calculate what group a word should be in.

My version of the insertion sort can handle all the basic needs required for Forth. Finding words, adding words and forgetting a range of words as well as single words, and is not only quite a bit faster than a link list, but does not even require a link to other words. All words would be stored in Auxiliary memory along with its CodeFieldAddress (CFA) of the location in main memory for the description of that word.

I intentionally left out the twists as I didn't want to go into too much detail as they are not needed in Forth. A couple of the things added to the insertion routine was the ability to sort numbers in numerical order that are a part of a string. Most sorts will sort strings with numbers like this: Test1, Test11, Test111, Test2, Test 22, Test3 ... etc. My version will return: Test1, Test2, Test3, Test11, Test22, Test111 ... etc

The other twist I added, was the ability to return the previous word, if a word is not found. Can't begin to tell how powerful this was in programming a disassembler.

And a couple of last things it does is keep an original word order as well as a sorted list, and it can sort ascending as well as descending order. I don't think I have come across any other sort that can accomplish everything listed.

That wasn't supposed to look like bragging. I just wanted to mention what all could be done with this insertion sort that I find a little hard to do with other sorts. The bragging part is still yet to come.

Not counting the little bit of overhead to access Auxiliary memory, the actual sort routine and all its twists worked out to less than 512 bytes. 499 to be exact.

Sorry if I seem to be a little excited to find another use for my twisted version of a insertion sort. I use it in pretty much in everything from disk catalogs to databases to disassemblers, and now Forth. Although it may not be the fastest out there, it certainly is one of the most diverse compared to other sorts and at such a small memory footprint.

BigEd · Post by **BigEd** » Mon Jun 01, 2020 5:57 am

This does seem like an interesting idea, but I'm not enough of a Forth user (let alone an expert) to comment.

Having said that... I thought one idea in Forth is the idea of redefinition, such that the same word might appear several times in the dictionary, with the newest one used. (I have no idea whether earlier definitions are still accessible, or whether they remain in use by words defined since they were but before the next redefinition.)

GARTHWILSON · Post by **GARTHWILSON** » Mon Jun 01, 2020 7:32 am

BigEd wrote:

I thought one idea in Forth is the idea of redefinition, such that the same word might appear several times in the dictionary, with the newest one used. (I have no idea whether earlier definitions are still accessible, or whether they remain in use by words defined since they were but before the next redefinition.)

True. If you re-define a word, words previously compiled with the old definition keep using it, but the new definition is the one that gets used in subsequently defined words. Also, in multiple vocabularies, you could have the same word with different meanings in different contexts, just as "ball" has different meanings in dance, bearings, football, etc., and which one you use in a new definition depends on which vocabulary you're using. It's kind of like different files with the same name in different folders on a hard disc.

IamRob · Post by **IamRob** » Mon Jun 01, 2020 6:02 pm

I have seen that done a few times using the same word for a different meaning. Using the directory analogy, different directories can have a file with the same name, but it is the directory that differentiates which same-name file to go to.

My question is then, how does Forth differentiate if you want to use an older word of the same-name or a newer word?

I am also not sure of the benefit of having multiple words with the same name, but the insert-sort program can handle numbers after a word, so same-name words can just have a 1,2,3,4 etc appended after it, which the sort program can handle.

And on a similar question, when trying to create a "colon word" and if an error happens when Forth tries to compile that word, the smudge bit is set and one can't FORGET that word or do anything with it. The solution was to either unSMUDGE the last word that was one trying to create, then FORGET IT, or, to FORGET a previous word that also FORGET'S the word with the error.

Is the word with the error still editable? I haven't yet figured out how or seen any info on it. Can someone provide a description how to edit?

GARTHWILSON · Post by **GARTHWILSON** » Mon Jun 01, 2020 7:46 pm

The topic "Fig Forth Vocabulary describes how the vocabularies work—although I can't say I've taken the time to really understand it.

IamRob wrote:

My question is then, how does Forth differentiate if you want to use an older word of the same name or a newer word?

Normally the search begins with the latest and works back. You can involve the old definition of a word in the new definition because the new one, which is still being compiled, still has the smudge bit in the header set, so FIND skips over it. RECURSE however does not use FIND, but uses LATEST NAME> , instead.

Quote:

I am also not sure of the benefit of having multiple words with the same name, but the insert-sort program can handle numbers after a word, so same-name words can just have a 1,2,3,4 etc appended after it, which the sort program can handle.

A prime example would be the FORTH versus the ASSEMBLER vocabularies, where you would have AND in both, and they do different things. I have not implemented different vocabularies in my own Forths, partly, I suppose, because of laziness to understand the above, or just that I've had higher priorities in my Forth development because my own onboard assembler uses the addressing mode stuck to the mnemonic so the assembler doesn't have to do any parsing. So I have for example AND_ZP, AND#, AND_ABS, AND_ZP,X, etc., which will never be confused with the AND of Forth.

Quote:

And on a similar question, when trying to create a "colon word" [called a "colon definition," or "secondary' —GW] and if an error happens when Forth tries to compile that word, the smudge bit is set and one can't FORGET that word or do anything with it. The solution was to either unSMUDGE the last word that was one trying to create, then FORGET IT, or, to FORGET a previous word that also FORGET'S the word with the error.

If the amount of memory taken by an incomplete word whose compilation was aborted by an error is not a problem, you can just ignore it.

Quote:

Is the word with the error still editable? I haven't yet figured out how or seen any info on it. Can someone provide a description how to edit?

I suppose you could write the tools to do that, but if they take just as much memory as you would otherwise waste, there's probably no benefit.

I have often had several versions of a word with the same name as I try things in my development. I might make a small change, try it, decide I don't like it, and FORGET it, which then leaves me with the last previous version. Or, I might be unsure that I want to get rid of it yet, do another, and another, and later do the FORGET two or three times and get back to the original, or even FORGET that one too and re-do it with the ideas I was finally satisfied with.

JimBoyd · Post by **JimBoyd** » Mon Jun 01, 2020 8:01 pm

IamRob wrote:

My question is then, how does Forth differentiate if you want to use an older word of the same-name or a newer word?

As Garth already stated, words defined with the older word still use the older word ( they are already compiled with the address of the older word's code field). As for newer words, they use the redefinition because when Forth searches for a word in a vocabulary, it starts with the most recent entry and stops the search as soon as a matching name is found.

Quote:

And on a similar question, when trying to create a "colon word" and if an error happens when Forth tries to compile that word, the smudge bit is set and one can't FORGET that word or do anything with it. The solution was to either unSMUDGE the last word that was one trying to create, then FORGET IT, or, to FORGET a previous word that also FORGET'S the word with the error.

The intent is not to make the word unforgettable, but to render it unable to be found so you don't accidentally execute a word that is not complete.

Quote:

Is the word with the error still editable? I haven't yet figured out how or seen any info on it. Can someone provide a description how to edit?

Although I vaguely recall a Forth by Charles Moore, the inventor of Forth, that decompiled words for editing and recompiling, normally the editing is done on a source file.

JimBoyd · Post by **JimBoyd** » Mon Jun 01, 2020 8:06 pm

I started to reply because I hadn't seen anyone else do so. I see Garth posted a reply before I could finish mine. His is also more complete.

JimBoyd · Post by **JimBoyd** » Mon Jun 01, 2020 8:16 pm

GARTHWILSON wrote:

I have often had several versions of a word with the same name as I try things in my development. I might make a small change, try it, decide I don't like it, and FORGET it, which then leaves me with the last previous version. Or, I might be unsure that I want to get rid of it yet, do another, and another, and later do the FORGET two or three times and get back to the original, or even FORGET that one too and re-do it with the ideas I was finally satisfied with.

Here is a neat trick for that. Suppose your test word(s) is named TEST . The following will delete every non-smudged version of TEST until there are none. In Fleet Forth FORGET only forgets a word if it can be found in the CURRENT vocabulary. It does not search parent vocabularies.

Code: Select all

FORGET TEST >IN OFF

GARTHWILSON · Post by **GARTHWILSON** » Mon Jun 01, 2020 8:59 pm

That is a neat trick. I also have EXISTS? which is for interpretation mode and just puts a YES or a NO in the display. You would do for example:

Code: Select all

EXISTS?  FOOBAR

and get your answer.

NEED is similar, but would more normally be used during compilation. It leaves a flag on the stack, with the intention that if the word is needed, ie, not already there, action would be taken to compile it.

IamRob · Post by **IamRob** » Tue Jun 02, 2020 4:01 am

I guess I have used a same-name word a few times without realizing it.

I use "IT" when creating new words and once the bugs have all been worked out, then I just FORGET IT.

Without realizing it, I did have multiple ITs sometimes.

I have not tried creating multiple vocabularies yet. But have read a little about it.

I think having a word automatically forgotten when an error is encountered would be more productive than having a dead word that just takes up space. Will have to look into that.

JimBoyd · Post by **JimBoyd** » Tue Jun 02, 2020 8:39 pm

But then you wouldn't have any data available to see what went wrong. As an example, suppose you wanted to add a new control structure to your Forth. It compiles successfully, but it's not quite right. When you test it in a new word, that word fails to compile. Now, wouldn't you rather have the data available to use tools such as DUMP and SEE to try to find out what went wrong?
Also, some Forths have a word, EMPTY , that will remove all words added since Forth was last loaded. Loaded as in, the Forth language program is called in from removable media and run on the computer.

Redesigning Forth

Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth

Re: Redesigning Forth