various PETTIL design considerations
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: screen editor design
If you FORGET the word, will the previous version be FINDable again?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: screen editor design
How about this, instead:
- Store the CFA of the new word you're creating in a variable, e.g. NEWEST-CFA. (Is this HERE in your system?)
- FIND the address of a pre-existing head, or add a new one. Duplicates (same name) are never created.
- If it's a new head, set the smudge bit.
- Either way, remember the address of that head in a variable, e.g. NEWEST-HEAD.
- When closing the definition, store NEWEST-CFA in NEWEST-HEAD. Clear the smudge bit if it's set.
Because there are never enough Forth implementations: http://www.camelforth.com
Re: screen editor design
GARTHWILSON wrote:
If you FORGET the word, will the previous version be FINDable again?
Re: screen editor design
Brad R wrote:
I'd say it depends on what you want to use vocabularies for. There's no rule that says vocabularies have to be kept in multiple linked lists. You could, for example, add a single byte to the header of each word, indicating what vocabulary that word belongs to. (If you make that byte part of the name, the vocabulary match is done as part of the name comparison.) That would let you have 256 vocabularies, which is enough for any Forth application I've ever seen.
BL WORD COUNT (get the word's buffer address and the length)
add 1 to the length
append the context vocabulary identifier byte to the end of the word I'm search for
set the $20 bit on the length byte (meaning "this word is a member of a vocabulary")
In the symbol table, it's still 2 bytes of CFA, then length/flags, followed by the name. For a vocabulary member, the length includes 1+ for the vocabulary byte and the length will have the $20 bit set. But I'll only scan $1F AND (up to 31) bytes of name. No vocabulary member can have a name longer than 30, and no core word name can be longer than 31.
This vastly simplifies the FIND code vs. chaining vocabularies together in a linked list. They can be hashed like any other word. Their (fake) 32+ length will cause them to float to the top of the hash thread they belong to.
The body of a vocabulary will be
jsr dovocab (it's direct-threaded)
.word parentvocabulary (or 0 for core)
.byt ident (unique identifier e.g. 1=editor; 2=assembler; 3=first user defined vocabulary...)
So for the editor, something like this --
body of EDITOR
jsr dovocab (direct-threaded code so no CFA. The return address tells us where to find the parameters)
.word 0 (editor is a member of core)
.byt 1 (identifier appended to all editor vocabulary members)
The head of the EDITOR vocabulary would be
(CFA of EDITOR) 2 bytes
7, "EDITOR"
The head of EDIT would be
(CFA of EDIT)
5 | $20, "EDIT", 1
Nested vocabularies will be possible. I doubt I'll implement ONLY/ALSO and to get into a vocabulary that's a grandchild of core would require setting CONTEXT to the one in the middle (child of core, parent of grandchild) first.
Quote:
Of course, implementing it that way does not speed compilation in the slightest. I do know of people who have broken their dictionary into multiple vocabularies in order to speed dictionary search. But I tend to think of that as an incidental benefit of certain implementations. The purpose of vocabularies (IMHO) is to give you multiple, independent namespaces.
Re: screen editor design
Wow! Preliminary results are in. After moving all of the compiler and outer interpreter words to the end of memory (but leaving behind things that might be useful to the application developer, such as NUMBER and WORD) the memory footprint of the core language bodies is a measly 5577 bytes. The operating system takes up 1024 bytes from $0000-$0400. That leaves 26167 bytes for application code on a 32K PET. Not bad.
Update 2014-06-03 file sizes of an actual build
-rw-rw-r-- 1 chitselb chitselb 5593 Jun 3 17:34 pettil-core.obj
-rw-rw-r-- 1 chitselb chitselb 2285 Jun 3 17:34 pettil.sym
-rw-rw-r-- 1 chitselb chitselb 3235 Jun 3 17:34 pettil-tdict.obj
COLD is a big chunk of pettil-core.obj. Being run-once code, I parked it at HERE and force a FORGET (which sorts the symbol table)
Update 2014-06-03 file sizes of an actual build
-rw-rw-r-- 1 chitselb chitselb 5593 Jun 3 17:34 pettil-core.obj
-rw-rw-r-- 1 chitselb chitselb 2285 Jun 3 17:34 pettil.sym
-rw-rw-r-- 1 chitselb chitselb 3235 Jun 3 17:34 pettil-tdict.obj
COLD is a big chunk of pettil-core.obj. Being run-once code, I parked it at HERE and force a FORGET (which sorts the symbol table)
Last edited by chitselb on Tue Jun 03, 2014 9:39 pm, edited 2 times in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: screen editor design
You will be able to do a lot with that.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: screen editor design
I have to leave a path in and out from BASIC mode anyway, to accommodate the PET's need to drop to BASIC if the user hits the STOP key during tape I/O. The approach I took is to swap out the bottom half of zero page with a buffer when going back and forth. This leaves the CHRGET routine intact and otherwise makes the PET happy to be in BASIC.
I'm pondering the wisdom of making it possible for the developer to blend all four languages (BASIC, Forth, Sweet16, and native 6502). BASIC code could set up string variables, do things that are more easily done in BASIC (floating point), and use SYS or USR() to get back to PETTIL. I could either start with TXTTAB above the core dictionary, or put the core dictionary above STREND, depending on whether the environment called for adding BASIC or adding Forth. In either case, PETTIL's symbol table, compiler/interpreter bodies, and editor buffers would be capped at FRETOP, or FRETOP-(some arbitrary size) if BASIC strings are expected to grow.
TXTTAB 0028-0029 40-41 Pointer: Start of BASIC Text
VARTAB 002A-002B 42-43 Pointer: Start of BASIC Variables
ARYTAB 002C-002D 44-45 Pointer: Start of BASIC Arrays
STREND 002E-002F 46-47 Pointer End of BASIC Arrays (+1)
FRETOP 0030-0031 48-49 Pointer: Bottom of String Storage
FRESPC 0032-0033 50-51 Utility String Pointer
MEMSIZ 0034-0035 52-53 Pointer: Highest Address Used by BASIC
I'm pondering the wisdom of making it possible for the developer to blend all four languages (BASIC, Forth, Sweet16, and native 6502). BASIC code could set up string variables, do things that are more easily done in BASIC (floating point), and use SYS or USR() to get back to PETTIL. I could either start with TXTTAB above the core dictionary, or put the core dictionary above STREND, depending on whether the environment called for adding BASIC or adding Forth. In either case, PETTIL's symbol table, compiler/interpreter bodies, and editor buffers would be capped at FRETOP, or FRETOP-(some arbitrary size) if BASIC strings are expected to grow.
TXTTAB 0028-0029 40-41 Pointer: Start of BASIC Text
VARTAB 002A-002B 42-43 Pointer: Start of BASIC Variables
ARYTAB 002C-002D 44-45 Pointer: Start of BASIC Arrays
STREND 002E-002F 46-47 Pointer End of BASIC Arrays (+1)
FRETOP 0030-0031 48-49 Pointer: Bottom of String Storage
FRESPC 0032-0033 50-51 Utility String Pointer
MEMSIZ 0034-0035 52-53 Pointer: Highest Address Used by BASIC
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: screen editor design
Quote:
I'm pondering the wisdom of making it possible for the developer to blend all four languages (BASIC, Forth, Sweet16, and native 6502).
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: screen editor design
I'll most likely be using the Ragsdale assembler as modified by Scott Ballantyne for Blazin' Forth, and add my own Sweet16 words. Unless something better comes along. I can't for the life of me imagine a use case where I'd want to mix BASIC into the soup, other than for the floating point. The figForth I found uses the floating point. The extracted source and binaries for figForth are in a subdirectory on the PETTIL github repo. As it stands, the Compiler, Assembler and Editor can get as big as they want to become, because it all goes away when applications are deployed.
Re: screen editor design
REHASH is working, it takes just under 10 seconds to crunch all 265 symbols. What REHASH does is copy them from the symbol table to pad, sorted by size, then copy them back to the symbol table sorted by pearson hashcode. Since the copy-sort proceeds in ascending address order, they wind up back in the symbol table sorted by size within hash. Now FIND can bail out early once it gets past the length it is looking for. And I fixed NUMBER too. The outer interpreter is working once more, and I'm very happy with that.
I am considering adding two words, sort of cousins of CMOVE and CMOVE> to do a double memory move, but I'm not clear on what the stack diagram would look like yet. I think I need five elements, the ( from to howmany ) for the region being CUT or PASTEd, and the upper and lower limits of the buffer, to keep things inside the lines. I'm sure this problem has been solved a few times before. I get nervous about code smell whenever I see more than four arguments on the stack and begin to wonder if I'm factoring improperly. Maybe something like...
: CUT ( bufferstart buffersize start target howmany -- )
: PASTE ( bufferstart buffersize start target howmany -- )
?
I am considering adding two words, sort of cousins of CMOVE and CMOVE> to do a double memory move, but I'm not clear on what the stack diagram would look like yet. I think I need five elements, the ( from to howmany ) for the region being CUT or PASTEd, and the upper and lower limits of the buffer, to keep things inside the lines. I'm sure this problem has been solved a few times before. I get nervous about code smell whenever I see more than four arguments on the stack and begin to wonder if I'm factoring improperly. Maybe something like...
: CUT ( bufferstart buffersize start target howmany -- )
: PASTE ( bufferstart buffersize start target howmany -- )
?
Re: screen editor design
GARTHWILSON wrote:
If you FORGET the word, will the previous version be FINDable again?
My present strategy is to have a symbol in the symbol table for each time a word is defined or redefined, with an associated CFA address. The length byte has three flag bits. The Immediate bit (or $80) still conforms to its normal FIG meaning. Vocabularies are represented by setting a Vocabulary flag bit (or $40) in the symbol length byte and appending a single byte (I'm calling it the Vocabulary ID) to the symbol when it is created, and to the name when it is sought. I use the Smudge bit (or $20) in the symbol length byte to indicate that a symbol is inactive.
Smudged symbols are not FINDable, either because
a) the word has been redefined
b) is currently inactive (e.g. LATEST unclosed colon definition)
By those rules, what follows is NOT a redefinition of DUP. Both instances are unique, because they exist in two isolated vocabulary namespaces. Core DUP is findable because it is in the parent (FORTH) vocabulary of EDITOR.
Code: Select all
EDITOR DEFINITIONS
: DUP DUP ;
What it looks like in the symbol table:
The original DUP
[CFA of DUP] [03] DUP
This clone of DUP in the editor vocabulary
[CFA of Editor DUP] [44] DUP [01]SYMNEW is the area where newly created symbols (since the last REHASH) are appended. SYMTAIL is where the next new one goes. This SYMNEW area is checked first by FIND, before it looks in the proper size-ordered pile stored at SYMTAB.
Re: screen editor design
I'm also feeling stuck on implementing defining words. CREATE works, and all the words it creates have JSR DOCREATE in the code field. I'd like to have constants, colon definitions, etc... share that, and it's easy enough to do, but it doesn't quite get me to the generalized parent definer. From what I can tell, there are two flavors
: foo CREATE ... put stuff in the parameter field ;CODE
JMP dofoo
: bar CREATE ... parameter field builder stuff DOES> HIGH LEVEL BARSTUFF ;
In CREATE I started with ,CFA to set up the code field with a page-alignment NOP at $xxFC (if needed) followed by $20 C, [ docreate ] LITERAL , and want to do something like
: CONSTANT CREATE , [ doconstant ] LITERAL LATEST CFA! ;
: CFA! ( goaddress nfa -- )
NAME> ( goaddress cfa )
HERE SWAP DP !
SWAP ,CFA DP ! ;
but it's just so inelegant.
: foo CREATE ... put stuff in the parameter field ;CODE
JMP dofoo
: bar CREATE ... parameter field builder stuff DOES> HIGH LEVEL BARSTUFF ;
In CREATE I started with ,CFA to set up the code field with a page-alignment NOP at $xxFC (if needed) followed by $20 C, [ docreate ] LITERAL , and want to do something like
: CONSTANT CREATE , [ doconstant ] LITERAL LATEST CFA! ;
: CFA! ( goaddress nfa -- )
NAME> ( goaddress cfa )
HERE SWAP DP !
SWAP ,CFA DP ! ;
but it's just so inelegant.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: screen editor design
I haven't quite reached that area in my translation of camelForth from the '430 to the 'm32. Have you tried checking out some more modern Forths than FIG to get some answers? I don't know how familiar you are with other processors, but there are Forth implementations out there for the 6809, pdp-11, and MSP430 that are quite readable (much more so than the x86 for me), and they can be a rich source of information and inspiration.
Mike
Mike
Re: screen editor design
Sometimes I try to grab too much design all at once, instead of delivering smaller pieces and then recognizing some bit of synergy that lets me refactor it within a bigger picture frame. I've had exposure to Java, Ruby, C++, Delphi and some other object-oriented languages, and it's easy to muddle up ideas about classes and objects with Forth defining word behavior.
So I went back to rules, goals and objectives. Forth defining words are characterized by having a single piece of shared code that all instances of it will reuse. All colon definitions reuse "enter" and "exit" (semicolon). All constants reuse that piece of code that pushes the parameter field contents onto the stack, etc...
there's a hidden (headerless) word "(CREATE)" that does the heavy lifting of all the parent/defining words, including CREATE, but this word takes an address as an argument. CREATE cannot do this, because "standards". And that address (CREATE) wants to see will be compiled into the dictionary code field as the target of the JSR. Since colon definitions would be very upset if this JSR instruction was at $xxFC, (CREATE) will stuff a NOP in there if DPis exactly four bytes shy of the top of page. Like so:
(CREATE) ( addr -- ; scans ahead in the input stream, constructs a symbol table entry linked to a new code field in the dictionary )
: CREATE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) ;
: VARIABLE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) 2 ALLOT ; ( or 0 , if you prefer )
: CONSTANT ( n == ; -- n ) [ 'doconst ] LITERAL (CREATE) , ;
CREATE VARIABLE CONSTANT <BUILDS DOES> : ; ;CODE CODE END-CODE 2VARIABLE 2CONSTANT $ "
Those last two are for strings. The quote character isn't really a defining word, but a state-smart immediate word that scans ahead in the input stream until it encounters a close-quote, and either leaves the address of PAD (which now has a counted string stored there) or encloses a counted string in the dictionary at compile time and leaves its address on the stack at runtime.
" HELLO, WORLD" COUNT TYPE
^ should work inside or outside a colon definition
20 $ FOO " THIS IS FOO" FOO $!
^ strings would work something like this
: $ ( maxlen == ; -- addr ) 1+ [ 'docreate ] LITERAL (CREATE) ALLOT ;
Is it a necessary corollary to rule 4 that no spawning/creation of children can occur once the kids can no longer be named? Does this design render ;CODE useless? What sort of metacreation options would be cool?
So I went back to rules, goals and objectives. Forth defining words are characterized by having a single piece of shared code that all instances of it will reuse. All colon definitions reuse "enter" and "exit" (semicolon). All constants reuse that piece of code that pushes the parameter field contents onto the stack, etc...
- shared/reusable code connected to parent ("common child runtime")
- minimize per child word memory requirements, ideally to just a JSR in the code field + PFA info
- all shared/reused common child runtime code goes in core dictionary
- defining word ("parent compile time code") is part of the compiler and disappears along with the symbol table
- Page alignment within colon definitions (and DOES> segments) must be respected, but variables, strings, and other defining words need not consider page alignment because their contents aren't targets of JMP(indirect)
- I kind of like the symmetry of <BUILDS ... DOES> and I'm pretty sure it was in Loeliger, so...
there's a hidden (headerless) word "(CREATE)" that does the heavy lifting of all the parent/defining words, including CREATE, but this word takes an address as an argument. CREATE cannot do this, because "standards". And that address (CREATE) wants to see will be compiled into the dictionary code field as the target of the JSR. Since colon definitions would be very upset if this JSR instruction was at $xxFC, (CREATE) will stuff a NOP in there if DPis exactly four bytes shy of the top of page. Like so:
(CREATE) ( addr -- ; scans ahead in the input stream, constructs a symbol table entry linked to a new code field in the dictionary )
: CREATE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) ;
: VARIABLE ( == ; -- addr ) [ 'docreate ] LITERAL (CREATE) 2 ALLOT ; ( or 0 , if you prefer )
: CONSTANT ( n == ; -- n ) [ 'doconst ] LITERAL (CREATE) , ;
CREATE VARIABLE CONSTANT <BUILDS DOES> : ; ;CODE CODE END-CODE 2VARIABLE 2CONSTANT $ "
Those last two are for strings. The quote character isn't really a defining word, but a state-smart immediate word that scans ahead in the input stream until it encounters a close-quote, and either leaves the address of PAD (which now has a counted string stored there) or encloses a counted string in the dictionary at compile time and leaves its address on the stack at runtime.
" HELLO, WORLD" COUNT TYPE
^ should work inside or outside a colon definition
20 $ FOO " THIS IS FOO" FOO $!
^ strings would work something like this
: $ ( maxlen == ; -- addr ) 1+ [ 'docreate ] LITERAL (CREATE) ALLOT ;
Is it a necessary corollary to rule 4 that no spawning/creation of children can occur once the kids can no longer be named? Does this design render ;CODE useless? What sort of metacreation options would be cool?
Last edited by chitselb on Sat Jun 14, 2014 5:40 am, edited 1 time in total.
Re: screen editor design
chitselb wrote:
GARTHWILSON wrote:
If you FORGET the word, will the previous version be FINDable again?
My present strategy is to have a symbol in the symbol table for each time a word is defined or redefined, with an associated CFA address. The length byte has three flag bits. The Immediate bit (or $80) still conforms to its normal FIG meaning. Vocabularies are represented by setting a Vocabulary flag bit (or $40) in the symbol length byte and appending a single byte (I'm calling it the Vocabulary ID) to the symbol when it is created, and to the name when it is sought. I use the Smudge bit (or $20) in the symbol length byte to indicate that a symbol is inactive.
Code: Select all
At the beginning of FORGET
find the next word in the input stream. Its CFA will become the new DP
iterate the entire symbol table (outer loop) BEGIN 0= UNTIL
Is this word active? (SMUDGE bit clear)
And its CFA is between new DP and old DP?
It is being forgotten. Does it have any ancestors?
set ancestor CFA = 0
iterate the entire symbol table BEGIN 0= UNTIL inner loop
Is this word inactive? (SMUDGE bit set)
And its CFA is between ancestor CFA and new DP?
it's a candidate for most-recent ancestor.
Update Ancestor CFA / Ancestor NFA
next symbol in inner loop
Reawaken the one with highest CFA, if there is one
If ancestor CFA = 0?
There were no ancestors
Else
toggle smudge bit (activate) most recent ancestor
Endif
next symbol in outer loop