various PETTIL design considerations

GARTHWILSON · Post by **GARTHWILSON** » Fri May 30, 2014 7:41 pm

If you FORGET the word, will the previous version be FINDable again?

Brad R · Post by **Brad R** » Fri May 30, 2014 7:53 pm

How about this, instead:

Store the CFA of the new word you're creating in a variable, e.g. NEWEST-CFA. (Is this HERE in your system?)
FIND the address of a pre-existing head, or add a new one. Duplicates (same name) are never created.
If it's a new head, set the smudge bit.
Either way, remember the address of that head in a variable, e.g. NEWEST-HEAD.
When closing the definition, store NEWEST-CFA in NEWEST-HEAD. Clear the smudge bit if it's set.

That uses two variables instead of one, but doesn't change the compiler logic.

chitselb · Post by **chitselb** » Fri May 30, 2014 8:58 pm

GARTHWILSON wrote:

If you FORGET the word, will the previous version be FINDable again?

Wups. I forgot about FORGET. Thanks for reminding me. I'm dispensing with the One True Head aspect and from now on each body can have only one head, but a word can have as many bodies and heads as it wants. Smudge bit will tell FIND if this head is the active definition. Blog updated.

chitselb · Post by **chitselb** » Sat May 31, 2014 8:46 pm

Brad R wrote:

I'd say it depends on what you want to use vocabularies for. There's no rule that says vocabularies have to be kept in multiple linked lists. You could, for example, add a single byte to the header of each word, indicating what vocabulary that word belongs to. (If you make that byte part of the name, the vocabulary match is done as part of the name comparison.) That would let you have 256 vocabularies, which is enough for any Forth application I've ever seen.

This idea is great! Thank you. I'm going to do it.

BL WORD COUNT (get the word's buffer address and the length)
add 1 to the length
append the context vocabulary identifier byte to the end of the word I'm search for
set the $20 bit on the length byte (meaning "this word is a member of a vocabulary")

In the symbol table, it's still 2 bytes of CFA, then length/flags, followed by the name. For a vocabulary member, the length includes 1+ for the vocabulary byte and the length will have the $20 bit set. But I'll only scan $1F AND (up to 31) bytes of name. No vocabulary member can have a name longer than 30, and no core word name can be longer than 31.

This vastly simplifies the FIND code vs. chaining vocabularies together in a linked list. They can be hashed like any other word. Their (fake) 32+ length will cause them to float to the top of the hash thread they belong to.

The body of a vocabulary will be
jsr dovocab (it's direct-threaded)
.word parentvocabulary (or 0 for core)
.byt ident (unique identifier e.g. 1=editor; 2=assembler; 3=first user defined vocabulary...)

So for the editor, something like this --
body of EDITOR
jsr dovocab (direct-threaded code so no CFA. The return address tells us where to find the parameters)
.word 0 (editor is a member of core)
.byt 1 (identifier appended to all editor vocabulary members)

The head of the EDITOR vocabulary would be
(CFA of EDITOR) 2 bytes
7, "EDITOR"
The head of EDIT would be
(CFA of EDIT)
5 | $20, "EDIT", 1

Nested vocabularies will be possible. I doubt I'll implement ONLY/ALSO and to get into a vocabulary that's a grandchild of core would require setting CONTEXT to the one in the middle (child of core, parent of grandchild) first.

Quote:

Of course, implementing it that way does not speed compilation in the slightest. I do know of people who have broken their dictionary into multiple vocabularies in order to speed dictionary search. But I tend to think of that as an incidental benefit of certain implementations. The purpose of vocabularies (IMHO) is to give you multiple, independent namespaces.

I'm not doing this multiple vocabularies thing because I already break up the dictionary into 16 roughly equal sized chunks, and I agree with this statement about the purpose of vocabularies.

chitselb · Post by **chitselb** » Sun Jun 01, 2014 5:47 am

Wow! Preliminary results are in. After moving all of the compiler and outer interpreter words to the end of memory (but leaving behind things that might be useful to the application developer, such as NUMBER and WORD) the memory footprint of the core language bodies is a measly 5577 bytes. The operating system takes up 1024 bytes from $0000-$0400. That leaves 26167 bytes for application code on a 32K PET. Not bad.

Update 2014-06-03 file sizes of an actual build
-rw-rw-r-- 1 chitselb chitselb 5593 Jun 3 17:34 pettil-core.obj
-rw-rw-r-- 1 chitselb chitselb 2285 Jun 3 17:34 pettil.sym
-rw-rw-r-- 1 chitselb chitselb 3235 Jun 3 17:34 pettil-tdict.obj

COLD is a big chunk of pettil-core.obj. Being run-once code, I parked it at HERE and force a FORGET (which sorts the symbol table)

GARTHWILSON · Post by **GARTHWILSON** » Sun Jun 01, 2014 7:05 am

You will be able to do a lot with that.

chitselb · Post by **chitselb** » Mon Jun 02, 2014 3:51 pm

I have to leave a path in and out from BASIC mode anyway, to accommodate the PET's need to drop to BASIC if the user hits the STOP key during tape I/O. The approach I took is to swap out the bottom half of zero page with a buffer when going back and forth. This leaves the CHRGET routine intact and otherwise makes the PET happy to be in BASIC.

I'm pondering the wisdom of making it possible for the developer to blend all four languages (BASIC, Forth, Sweet16, and native 6502). BASIC code could set up string variables, do things that are more easily done in BASIC (floating point), and use SYS or USR() to get back to PETTIL. I could either start with TXTTAB above the core dictionary, or put the core dictionary above STREND, depending on whether the environment called for adding BASIC or adding Forth. In either case, PETTIL's symbol table, compiler/interpreter bodies, and editor buffers would be capped at FRETOP, or FRETOP-(some arbitrary size) if BASIC strings are expected to grow.

TXTTAB 0028-0029 40-41 Pointer: Start of BASIC Text
VARTAB 002A-002B 42-43 Pointer: Start of BASIC Variables
ARYTAB 002C-002D 44-45 Pointer: Start of BASIC Arrays
STREND 002E-002F 46-47 Pointer End of BASIC Arrays (+1)
FRETOP 0030-0031 48-49 Pointer: Bottom of String Storage
FRESPC 0032-0033 50-51 Utility String Pointer
MEMSIZ 0034-0035 52-53 Pointer: Highest Address Used by BASIC

GARTHWILSON · Post by **GARTHWILSON** » Mon Jun 02, 2014 7:42 pm

Quote:

I'm pondering the wisdom of making it possible for the developer to blend all four languages (BASIC, Forth, Sweet16, and native 6502).

Forth frequently has an onboard assembler (I wrote my own in an evening) to enable on-the-fly assembly of primitives and runtimes. That should enable the Swee16 ability too. My assembler is not suitable for whole applications, but works fine for the pieces of assembly I might want here and there in an otherwise Forth application, and you can write a piece of assembly and then assemble and try it immediately (in as little as a fraction of a second) just like you can a new Forth word. My assembler takes between three and four KB, but if I were trying to be as thrifty with memory as you are, I would change the approach a bit and save more than a K.

chitselb · Post by **chitselb** » Mon Jun 02, 2014 8:02 pm

I'll most likely be using the Ragsdale assembler as modified by Scott Ballantyne for Blazin' Forth, and add my own Sweet16 words. Unless something better comes along. I can't for the life of me imagine a use case where I'd want to mix BASIC into the soup, other than for the floating point. The figForth I found uses the floating point. The extracted source and binaries for figForth are in a subdirectory on the PETTIL github repo. As it stands, the Compiler, Assembler and Editor can get as big as they want to become, because it all goes away when applications are deployed.

chitselb · Post by **chitselb** » Wed Jun 11, 2014 4:21 am

REHASH is working, it takes just under 10 seconds to crunch all 265 symbols. What REHASH does is copy them from the symbol table to pad, sorted by size, then copy them back to the symbol table sorted by pearson hashcode. Since the copy-sort proceeds in ascending address order, they wind up back in the symbol table sorted by size within hash. Now FIND can bail out early once it gets past the length it is looking for. And I fixed NUMBER too. The outer interpreter is working once more, and I'm very happy with that.

I am considering adding two words, sort of cousins of CMOVE and CMOVE> to do a double memory move, but I'm not clear on what the stack diagram would look like yet. I think I need five elements, the ( from to howmany ) for the region being CUT or PASTEd, and the upper and lower limits of the buffer, to keep things inside the lines. I'm sure this problem has been solved a few times before. I get nervous about code smell whenever I see more than four arguments on the stack and begin to wonder if I'm factoring improperly. Maybe something like...
: CUT ( bufferstart buffersize start target howmany -- )
: PASTE ( bufferstart buffersize start target howmany -- )

?

chitselb · Post by **chitselb** » Fri Jun 13, 2014 1:43 am

GARTHWILSON wrote:

If you FORGET the word, will the previous version be FINDable again?

This problem, as it turns out, is difficult. What FORGET needs to also do is unsmudge any symbols that have a CFA between old HERE and new HERE, if they are smudged, and if they have the highest CFA (most recently defined) for that symbol. Not just the word I'm FORGETting. I am momentarily stumped. I might somehow be able to take advantage of the fact that all copies of any redefined symbols will be in the same pile (either SYMNEW or 1/16 of SYMTAB).

My present strategy is to have a symbol in the symbol table for each time a word is defined or redefined, with an associated CFA address. The length byte has three flag bits. The Immediate bit (or $80) still conforms to its normal FIG meaning. Vocabularies are represented by setting a Vocabulary flag bit (or $40) in the symbol length byte and appending a single byte (I'm calling it the Vocabulary ID) to the symbol when it is created, and to the name when it is sought. I use the Smudge bit (or $20) in the symbol length byte to indicate that a symbol is inactive.

Smudged symbols are not FINDable, either because
a) the word has been redefined
b) is currently inactive (e.g. LATEST unclosed colon definition)

By those rules, what follows is NOT a redefinition of DUP. Both instances are unique, because they exist in two isolated vocabulary namespaces. Core DUP is findable because it is in the parent (FORTH) vocabulary of EDITOR.

Code: Select all

EDITOR DEFINITIONS
: DUP DUP ;

What it looks like in the symbol table:

The original DUP
[CFA of DUP] [03] DUP

This clone of DUP in the editor vocabulary
[CFA of Editor DUP] [44] DUP [01]

Right now, FORGET will FIND the token following it in the input stream, and then it just drops the dictionary pointer back to the CFA of that word. After which, the entire symbol table gets copied down to PAD (ordered by length) and then copied back up to SYMTAB (ordered by pearson hash, a value 0..15 calculated to balance the dictionary into roughly equal-sized piles). Finally, SYMNEW and SYMTAIL get set to the end of that area of memory. That operation is called REHASH and can be invoked by the user without doing FORGET. FORGET is run initially at the end of startup, to FORGET COLD and create the initial SYMTAB with an empty SYMNEW area

SYMNEW is the area where newly created symbols (since the last REHASH) are appended. SYMTAIL is where the next new one goes. This SYMNEW area is checked first by FIND, before it looks in the proper size-ordered pile stored at SYMTAB.

chitselb · Post by **chitselb** » Fri Jun 13, 2014 11:44 pm

I'm also feeling stuck on implementing defining words. CREATE works, and all the words it creates have JSR DOCREATE in the code field. I'd like to have constants, colon definitions, etc... share that, and it's easy enough to do, but it doesn't quite get me to the generalized parent definer. From what I can tell, there are two flavors

: foo CREATE ... put stuff in the parameter field ;CODE
JMP dofoo

: bar CREATE ... parameter field builder stuff DOES> HIGH LEVEL BARSTUFF ;

In CREATE I started with ,CFA to set up the code field with a page-alignment NOP at $xxFC (if needed) followed by $20 C, [ docreate ] LITERAL , and want to do something like
: CONSTANT CREATE , [ doconstant ] LITERAL LATEST CFA! ;
: CFA! ( goaddress nfa -- )
NAME> ( goaddress cfa )
HERE SWAP DP !
SWAP ,CFA DP ! ;

but it's just so inelegant.

barrym95838 · Post by **barrym95838** » Sat Jun 14, 2014 1:17 am

I haven't quite reached that area in my translation of camelForth from the '430 to the 'm32. Have you tried checking out some more modern Forths than FIG to get some answers? I don't know how familiar you are with other processors, but there are Forth implementations out there for the 6809, pdp-11, and MSP430 that are quite readable (much more so than the x86 for me), and they can be a rich source of information and inspiration.

Mike

chitselb · Post by **chitselb** » Sat Jun 14, 2014 4:23 am

Sometimes I try to grab too much design all at once, instead of delivering smaller pieces and then recognizing some bit of synergy that lets me refactor it within a bigger picture frame. I've had exposure to Java, Ruby, C++, Delphi and some other object-oriented languages, and it's easy to muddle up ideas about classes and objects with Forth defining word behavior.

So I went back to rules, goals and objectives. Forth defining words are characterized by having a single piece of shared code that all instances of it will reuse. All colon definitions reuse "enter" and "exit" (semicolon). All constants reuse that piece of code that pushes the parameter field contents onto the stack, etc...

shared/reusable code connected to parent ("common child runtime")
minimize per child word memory requirements, ideally to just a JSR in the code field + PFA info
all shared/reused common child runtime code goes in core dictionary
defining word ("parent compile time code") is part of the compiler and disappears along with the symbol table
Page alignment within colon definitions (and DOES> segments) must be respected, but variables, strings, and other defining words need not consider page alignment because their contents aren't targets of JMP(indirect)
I kind of like the symmetry of <BUILDS ... DOES> and I'm pretty sure it was in Loeliger, so...

Current plan -
there's a hidden (headerless) word "(CREATE)" that does the heavy lifting of all the parent/defining words, including CREATE, but this word takes an address as an argument. CREATE cannot do this, because "standards". And that address (CREATE) wants to see will be compiled into the dictionary code field as the target of the JSR. Since colon definitions would be very upset if this JSR instruction was at $xxFC, (CREATE) will stuff a NOP in there if DPis exactly four bytes shy of the top of page. Like so:
(CREATE) ( addr -- ; scans ahead in the input stream, constructs a symbol table entry linked to a new code field in the dictionary )

: CREATE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) ;
: VARIABLE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) 2 ALLOT ; ( or 0 , if you prefer )
: CONSTANT ( n == ; -- n ) [ 'doconst ] LITERAL (CREATE) , ;

CREATE VARIABLE CONSTANT <BUILDS DOES> : ; ;CODE CODE END-CODE 2VARIABLE 2CONSTANT $ "

Those last two are for strings. The quote character isn't really a defining word, but a state-smart immediate word that scans ahead in the input stream until it encounters a close-quote, and either leaves the address of PAD (which now has a counted string stored there) or encloses a counted string in the dictionary at compile time and leaves its address on the stack at runtime.

" HELLO, WORLD" COUNT TYPE
^ should work inside or outside a colon definition

20 $ FOO " THIS IS FOO" FOO $!
^ strings would work something like this

: $ ( maxlen == ; -- addr ) 1+ [ 'docreate ] LITERAL (CREATE) ALLOT ;

Is it a necessary corollary to rule 4 that no spawning/creation of children can occur once the kids can no longer be named? Does this design render ;CODE useless? What sort of metacreation options would be cool?

chitselb · Post by **chitselb** » Sat Jun 14, 2014 5:28 am

chitselb wrote:

GARTHWILSON wrote:

If you FORGET the word, will the previous version be FINDable again?

This problem, as it turns out, is difficult. What FORGET needs to also do is unsmudge any symbols that have a CFA between old HERE and new HERE, if they are smudged, and if they have the highest CFA (most recently defined) for that symbol. Not just the word I'm FORGETting. I am momentarily stumped. I might somehow be able to take advantage of the fact that all copies of any redefined symbols will be in the same pile (either SYMNEW or 1/16 of SYMTAB).

My present strategy is to have a symbol in the symbol table for each time a word is defined or redefined, with an associated CFA address. The length byte has three flag bits. The Immediate bit (or $80) still conforms to its normal FIG meaning. Vocabularies are represented by setting a Vocabulary flag bit (or $40) in the symbol length byte and appending a single byte (I'm calling it the Vocabulary ID) to the symbol when it is created, and to the name when it is sought. I use the Smudge bit (or $20) in the symbol length byte to indicate that a symbol is inactive.

FORGET is getting really expensive now in terms of wall clock. REHASH running at the end takes 10 seconds. This isn't going to be very fast either. Might take about thirty seconds total? "The ugly has to go somewhere." -- a developer I used to work with. But maybe this:

Code: Select all

At the beginning of FORGET
find the next word in the input stream.  Its CFA will become the new DP
iterate the entire symbol table (outer loop)   BEGIN   0= UNTIL
    Is this word active?  (SMUDGE bit clear)
    And its CFA is between new DP and old DP?
    It is being forgotten.  Does it have any ancestors?
    set ancestor CFA = 0
    iterate the entire symbol table   BEGIN   0= UNTIL inner loop
        Is this word inactive?  (SMUDGE bit set)
        And its CFA is between ancestor CFA and new DP?
        it's a candidate for most-recent ancestor.   
            Update Ancestor CFA / Ancestor NFA
    next symbol in inner loop
    Reawaken the one with highest CFA, if there is one
    If ancestor CFA = 0?  
       There were no ancestors
    Else
        toggle smudge bit (activate) most recent ancestor
    Endif
next symbol in outer loop

various PETTIL design considerations

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design