various PETTIL design considerations

chitselb · Post by **chitselb** » Sat Jun 14, 2014 6:52 am

Since the symbols are always in the same thread (=1/16 of the dictionary, or SYMNEW...SYMTAIL), and all ancestors/descendants will have matching pearson hash values, if I do the reawakening thing after REHASH then I only have to scan within that thread on the inner loop. It becomes

Code: Select all

REHASH   (this empties SYMNEW list)
loop all hashes 0..15
    outer loop iterate all words until we hit a shorter word (at start of next thread) 
        inner loop iterate all words within this thread (same end-of-thread detection test)

EDIT: well no, that wasn't it exactly, because REHASH removed all traces of forgotten symbols. But during REHASH, after each copy-by-length-from-symtab-to-pad iteration, it needs to
identify each active symbol being forgotten.
deactivate it
search for its youngest living ancestor.
if found, activate it

Brad R · Post by **Brad R** » Tue Jun 17, 2014 12:44 am

chitselb wrote:

3. all shared/reused common child runtime code goes in core dictionary
4. defining word ("parent compile time code") is part of the compiler and disappears along with the symbol table

Just a brief comment here -- your #4 makes perfect sense given your goals and requirements, but it's not how the CamelForth model works, so looking at CamelForth as an example may not be helpful. Offhand I'm not aware of any Forths, other than metacompiled Forths, that work the way you describe. Your idea of passing the runtime code address as an argument to (CREATE) looks like a good solution.

(As a side note, metacompiled Forths may face the problem that the compile-time code only exists on the host machine, and the run-time code can only execute on the target machine -- e.g. if the target machine is a different CPU. I've yet to see a really elegant solution to this.)

chitselb · Post by **chitselb** » Tue Jun 17, 2014 6:36 am

Status:
FORGET / REHASH are pretty much working. There's a teensy bug still for defining a word twice and forgetting it, but I'm very close. The ancestors and descendants thing with the active/inactive (smudge) bit was the way to go, given the limited knowledge available in the symbol table. The automatic memory management is in there too. Halfwy through REHASH, AUTOMEM runs and parks the symbol table about 512 bytes away from the ceiling, and moves the virtual memory buffers too if they're present. I'm pretty happy with it, but pretty appalled at how creeping featuritis has taken over.

CREATE works. : works for straight line code. I'm puzzling over the ?>MARK ?<RESOLVE stuff in the compiler, trying to figure out what is going on. CONSTANT, VARIABLE, 2VARIABLE, etc... are all good. I'm almost at the point where I get to finish debugging the editor and getting it to LOAD screens. Speed is off the hook. I tried a few things vs. Blazin' Forth on the C=64 (both in VICE, but it's allegedly cycle accurate).

What's vexing me now is coming from where I'd least expect it, EXPECT. Here's the code

Everything is good, unless the user just hits return. I'd like the line to be ignored, but there's a space in the buffer and SPAN is 1, so FIND tries to find it, and can't, and then sends it off to NUMBER, which also has a bug because hitting return pushes a 0 on the stack. Grrr... I'm looking for the least dodgy way to fix it. Invoking -TRAILING before handing off to INTERPRET might be the fix.

Code: Select all

expect   ( buffer maxsize -- )
    jmp expectvector
expectvector
    ldy #1
    jsr locals  ; this takes Y args off the split stack and parks them in zero page at n, n+2, n+4, etc... leaving TOS alone
    stx storex
    dey
expect01
    iny
    cpy tos
    bcs expect02
    jsr CHRIN  ; standard Commodore ROM routine at $FFCF, inputs a line of text and returns the next character
    cmp #$0d
    beq expect02
    sta (n),y
    bne expect01
expect02
    tya
    ldy span+3  ; the +3 gets past the JSR USERVAR code, since it's DTC
    sta (up),y
    iny
    lda #0
    sta (up),y
    ldx storex
    jmp pops

EDIT: Fixed! INTERPRET checks for WORD returning a zero-length result (which it does) and skips past FIND and NUMBER

chitselb · Post by **chitselb** » Tue Jun 17, 2014 9:08 pm

new blog post

BigEd · Post by **BigEd** » Wed Jun 18, 2014 9:05 am

Thanks - got this on my RSS feeds - which is probably invisible to you! My reader doesn't even tell me how many other subscribers there are, so I can't share that with you. Appreciate the bloggy writeups.
Cheers
Ed

chitselb · Post by **chitselb** » Thu Jun 19, 2014 4:32 am

Thanks, BigEd, coming from you it means a lot. When I first attempted this (writing a Forth for PET) in 1981, there was no way I could have pulled off these kinds of results. The support of my "invisible friends" (read: all the smart guys here and on c.l.f. Oh, and Elizabeth too) and spending a year (1984) working in Forth were both essential experience.

chitselb · Post by **chitselb** » Thu Jun 19, 2014 5:22 am

hm... separate loop stack... hm...

chitselb · Post by **chitselb** » Sun Jun 22, 2014 4:19 am

I have the documentation problem solved. Using TiddlyWiki5 and a Ruby script. The script grinds through the assembler source which looks like this:

Code: Select all

;--------------------------------------------------------------
#if 0
name=@
stack=( addr -- 16b )
tags=forth-79,nucleus
!!! pronunciation:"fetch"
  16b is the value at addr.
#endif
fetch
    ldy #0
    lda (tos),y
    pha
    iny
    lda (tos),y
    tay
    pla
    jmp put

;--------------------------------------------------------------

On the tiddlywiki, it looks like this, and that "code" button toggles the code on and off. All "live" from the most recent build.

6502.org wrote:

Image no longer available: http://chitselb.com/files/wikiview.png

The same ruby script also generates symbols, unless there's a 'nosymbol' tag for words like BRANCH which just pollute the output of WORDS, in which case there's a wiki entry with code, but nothing in the symbol table on the running system

Brad R · Post by **Brad R** » Sun Jun 22, 2014 12:29 pm

I've been using ROBODoc for some time now. It was the first automated documentation tool I found that could easily support Forth and assembler (and almost any other language). I also find its syntax more friendly, and more human-readable, than some of the other tools I looked at. (I admit I haven't looked at TiddlyWiki.)

chitselb · Post by **chitselb** » Tue Jun 24, 2014 4:00 pm

Code: Select all

Brad, that ROBODoc link is a Tiddlywiki

. This looks like a very well-organized and versatile documentation tool. I went with dumping raw wiki markup in between #ifdef 0 and #endif in my assembler source, which doesn't give me as many options on the output side. The ruby script generates the binary symbol table (contains a code field address, length/flags byte and the name of each word), a textfile with just the list of word names (I feed that to the pearson hash cruncher tool) and a json file for the tiddywiki itself

Here's the code for building the docs for PETTIL from a json file. The json file has title, tags, and text for each tiddler (say it three times fast). That's it. I had to remove one line from the tiddlywiki.info file (the tiddlyweb plugin) to get it hostable from a file section.

echo . . . . Building docs/tiddlypettil.html
mkdir -p ./build/tiddlypettil/tiddlers
cp ./docs/statictiddlers/tiddlywiki.info ./build/tiddlypettil/
cp ./docs/statictiddlers/*.tid ./build/tiddlypettil/tiddlers/
cd ./build/tiddlypettil/
tiddlywiki --load ../pettil.json >/dev/null
tiddlywiki --rendertiddler $:/core/save/all tiddlypettil.html text/plain >/dev/null

I just rewrote the outer interpreter/compiler shell and am debugging that. I had been using Scott Ballantyne's outer interpret from Blazin' Forth, but there was a lot of weird stuff. His has a word RUN in it? His code for ']' had the compiler while the code for 'INTERPRET' didn't, which seemed like a good idea until I tried compiling a multi-line colon definition and it doesn't compile anything after the first line of input.

I am also pondering <BUILDS DOES> , specifically
http://www.vintagecomputer.net/fjkraan/ ... _00-09.pdf and this http://www.vintagecomputer.net/fjkraan/ ... -index.pdf . The second paper is all about <BUILDS DOES>

a) Why was <BUILDS renamed into CREATE ?
b) Given a two-dictionary model (compiler/interpreter/editor/assembler/symbols all in a transient dictionary, headless core dictionary at bottom of memory) what's a good way to set up <BUILDS ... DOES> such that compiler time code disappears along with the transient dictionary, while child-word runtime behavior goes in the core?
c) Who ever uses <BUILDS DOES> for anything?
d) Does <BUILDS have to be the very first thing in the definition of the builder? It was in every example I could find. Then it could be coded as (rewinding the dictionary pointer and starting the definition's CFA over)

Code: Select all

LATEST NAME> DP ! 'dodoes CFA,

otherwise it becomes verbose and clunky
e) If I'm calling it <BUILDS (not CREATE), and it's no longer in the standard, can I play fast and loose with how it operates?

more on e)
I wanted a word ?: to work like the ternary operator in Ruby/C/Java.

Code: Select all

a = true  ? 'a' : 'b' #=> "a"
b = false ? 'a' : 'b' #=> "b"

The advantage of ?: is it compiles in just six bytes ( ?: affirmative noway )

Usage like

Code: Select all

( args flag ) ?: AFFIRMATIVE NOWAY MORECODE

performs identically to

Code: Select all

( args flag ) IF AFFIRMATIVE ELSE NOWAY THEN MORECODE

which has a bigger compiled footprint because of branching

Code: Select all

: ?:   ( == ) ( flag -- )
    <BUILDS ['] , ['] , DOES> 0= -2 * + @ EXECUTE ( somehow move IP beyond both words ) ; IMMEDIATE

does> would see the address in the dictionary immediately following ?: which is where the trueword is stored.
I want

Code: Select all

['] , ['] ,

to go in the transient (compile time) dictionary and

Code: Select all

0= -2 * + @ EXECUTE ;

to go in the core (runtime)

This would not work if <BUILDS performs a CREATE operation. There would be a JSR DOCREATE immediately following ?: in the core, and a new symbol table entry for TRUEWORD. The corollary to this statement is that words using <BUILDS need to perform their own CREATE. That's where I play with the conventional meaning of <BUILDS

Code: Select all

: CONSTANT   <BUILDS  CREATE ,   DOES> @ ;

And this is where I get all confused, between code fields (the JSR at the beginning of a definition) and execution tokens (the two-byte address of a code field). To distill it all down to a simple question, I suppose that would be "What if CREATE and <BUILDS were different, and <BUILD (in the typical use case) had to invoke CREATE ?"

Brad R · Post by **Brad R** » Thu Jun 26, 2014 1:26 pm

I have not read your linked PDFs, but will comment based on my own knowledge/opinion.

chitselb wrote:

a) Why was <BUILDS renamed into CREATE ?

This is half history and half speculation on my part: the original Fig-Forth <BUILDS DOES> was rather inefficient; it compiled an additional cell into the parameter field of each "child" definition. Some clever programmer figured out how to dispense with that. Having done so, someone noticed that <BUILDS and CREATE were identical except in what they compiled into the code field, and since the new DOES> changes the code field, CREATE could be used wherever <BUILDS was used. So for economy, they got rid of the redundant <BUILDS. (If you haven't read it already, my writeup on DOES> is at http://www.bradrodriguez.com/papers/moving3.htm .)

I should add that I have since found a situation -- compiling directly to Flash memory -- where CREATE cannot be used with DOES>, and I have begun putting <BUILDS (new version, not the old version) back into my own Forth implementations.

Quote:

b) Given a two-dictionary model (compiler/interpreter/editor/assembler/symbols all in a transient dictionary, headless core dictionary at bottom of memory) what's a good way to set up <BUILDS ... DOES> such that compiler time code disappears along with the transient dictionary, while child-word runtime behavior goes in the core?

The dividing line is DOES>, an IMMEDIATE word that runs at compile time. Right now it commonly compiles the execution token of (;CODE) , complies a machine code fragment, and then the compiler continues through the remaining code:

... foo foo DOES> bar bar ... is compiled to
|foo xt|foo xt|(;CODE) xt|JSR DODOES|bar xt|bar xt|

...foo foo ;CODE assembly-code ... is compiled to
|foo xt|foo xt|(;CODE) xt|assembly code|

(;CODE) is performed while executing the parent word; its action is to stuff the following address into the code field of the newly created child word, and then exit the parent word. So the child word ends with a code field pointing to the JSR DODOES, and the action of JSR DODOES is to stack the child word's parameter field address, and start the thread that follows the JSR DODOES.

What you'd need to do is have DOES> lay down the xt of (;CODE) and a pointer to the permanent dictionary, and then switch to the permanent dictionary to continue compilation:

... foo foo DOES> bar bar ... is compiled to
(transient) |foo xt|foo xt|(;CODE) xt|adrs of X|
(permanent) X: |JSR DODOES|bar xt|bar xt|

So now (;CODE), instead of stuffing the following address into the child's code field, fetches the contents of the following address, and stuffs that into the child's code field.

I hope that's clear. The revised ;CODE is left as an exercise for the student.

Quote:

c) Who ever uses <BUILDS DOES> for anything?

Lots of Forth programmers. Any time you have a common action that needs to be applied to different data, <BUILDS DOES> can provide economy and simplification. I suspect most people first encounter the use of <BUILDS DOES> when looking at the code for Forth assemblers. (See, for example, my MSP430 assembler.) I sometimes tell people that Forth is an object-oriented language, with the restriction that each object can have only one method (the DOES> code).

Quote:

d) Does <BUILDS have to be the very first thing in the definition of the builder? It was in every example I could find. Then it could be coded as (rewinding the dictionary pointer and starting the definition's CFA over)

Code: Select all

LATEST NAME> DP ! 'dodoes CFA,

otherwise it becomes verbose and clunky

Neither <BUILDS nor CREATE must be the first thing in the parent. It may, for example, be necessary to perform some action on the dictionary before creating the new dictionary header. When the logic of the code allows it, it's customary to put <BUILDS or CREATE first, simply to let the reader know that it's a defining word, but that's not a requirement.

Quote:

e) If I'm calling it <BUILDS (not CREATE), and it's no longer in the standard, can I play fast and loose with how it operates?

Yes you can. Bear in mind that I've resurrected <BUILDS for my own purposes, so there will be at least one competing alternative, but as far as the Standard is concerned, you can do whatever you want. (I would love to see <BUILDS returned to the Standard, but I'd need 80-bit floating point to express the odds of that happening.)

More to follow...

Brad R · Post by **Brad R** » Thu Jun 26, 2014 1:51 pm

chitselb wrote:

Usage like

Code: Select all

( args flag ) ?: AFFIRMATIVE NOWAY MORECODE

performs identically to

Code: Select all

( args flag ) IF AFFIRMATIVE ELSE NOWAY THEN MORECODE

which has a bigger compiled footprint because of branching

Code: Select all

: ?:   ( == ) ( flag -- )
    <BUILDS ['] , ['] , DOES> 0= -2 * + @ EXECUTE ( somehow move IP beyond both words ) ; IMMEDIATE

does> would see the address in the dictionary immediately following ?: which is where the trueword is stored.
I want

Code: Select all

['] , ['] ,

to go in the transient (compile time) dictionary and

Code: Select all

0= -2 * + @ EXECUTE ;

to go in the core (runtime)

First, you want to use ' instead of ['] inside your ?: word. What you have written will put the xt of , on the parameter stack, twice, and won't compile anything between <BUILDS and DOES>.

Second, you only use <BUILDS or CREATE when you are defining a new Forth word. What you are doing is essentially a new control structure; look at the source code for IF THEN or BEGIN WHILE REPEAT for inspiration. Following the custom of naming the run-time action with parentheses, you want something like this:

Code: Select all

: ?:  ['] (?:) ,   ' ,   ' ,  ; IMMEDIATE

First ['] gets the xt of the following word given at compile time, namely (?:) , and that is compiled into the dictionary. Then ' reads a word name from the input stream and gets its xt, and that is compiled into the dictionary. And again for the second word name in the input stream. (Note that in ANS Standard Forth you'd do the first step with POSTPONE (?:) ) This word can reside in your transient dictionary.

The run-time action is in the word (?:) which must reside in your permanent dictionary. It will look something like this:

Code: Select all

: (?:)  ( f -- )   0= NEGATE 2*  R@ + @ EXECUTE  R> 4 + >R ;

This assumes conventional use of the return stack in a direct- or indirect-threaded Forth. 0= NEGATE 2* (or 0= NEGATE CELLS in a Standard Forth, with a two's-complement environmental dependency) gives you 0 or 2. R@ + @ EXECUTE will perform AFFIRMATIVE or NOWAY while leaving nothing of its own on the stack -- that is important if AFFIRMATIVE or NOWAY consume or return stack values. Finally R> 4 + >R (in Standard Forth, R> 2 CELLS + >R) skips IP over AFFIRMATIVE and NOWAY. (I give Standard Forth examples to illustrate proper form, but this source code will never be a Standard Forth application word -- it is too dependent upon the specifics of the implementation.)

Note: I haven't tested that code; it's just off the top of my head.

chitselb · Post by **chitselb** » Thu Jul 03, 2014 5:47 am

Brad,

Thanks, that really cleared things up for me. <BUILDS always creates a word, and that's that.

Here's what the ?: code turned out to be:

Code: Select all

;--------------------------------------------------------------
#if 0
name=?:
stack=( "name1" "name2" -- )
tags=control,compiler,unimplemented
flags=immediate
Immediate word that compiles its own runtime word (?:) and two branches. The first branch is the true
branch, and the second is the false branch.  One of those is executed by (?:) at runtime.

Used in the form

```
( flag ) :? this that

is equivalent to

if this else that then

: ?:   ( "name1" "name2" -- )
    ?comp compile (?:) ' , ' , ; immediate
```
#endif
_querycolon
#include "enter.i65"
	.word _qcomp
#include "page.i65"
	.word _compile
	.word pquerycolon
#include "pad.i65"
	.word _tick
#include "page.i65"
	.word _comma
#include "page.i65"
	.word _tick
#include "page.i65"
	.word _comma
#include "page.i65"
	.word exit

;--------------------------------------------------------------
#if 0
name=(?:)
stack=( flag -- )
tags=control,inner,nosymbol
The runtime of [[?:]]
#endif
pquerycolon
    ldy #5
    lda tos
    ora tos+1                   ; evaluate the flag
    beq pquerycolon01           ; false?
    dey
    dey                         ; true
pquerycolon01
    lda (ip),y
    sta tos+1
    dey
    lda (ip),y
    sta tos                     ; branch CFA to TOS
    lda ip+1
    pha
    lda ip
    pha                         ; preserve mainline IP
#include "toforth.i65"
    .word execute               ; perform one of the branches
#include "page.i65"
    .word to6502
    pla
    sta ip
    pla
    sta ip+1                    ; restore mainline IP
    lda #6
    jmp pad                     ; skip past both branches

chitselb · Post by **chitselb** » Thu Jul 03, 2014 6:50 am

It's time to write the code to redirect INTERPRET's input stream from the keyboard to the blocks. I've sort of painted myself into a corner by committing to a plan that treats a BLOCK (of data or code) a bit differently from a BLOCK containing a "screen" (of code, if we're interpreting it). Misinterpreting INTERPRET is the blog post if anyone wants to

a) try to talk some sense into me about just using vanilla blocks and resist even more of the Feeping Creaturitis that seems to be going on here
b) help me do it

chitselb · Post by **chitselb** » Fri Jul 04, 2014 4:07 pm

I found this thread, "The most elegant Forth interpreter" and it got me thinking, what if I dispense with TIB #TIB SPAN >IN and just had WORD ? If I vector it, using JMP indirect (someuservariable) I can point it to one of a few conformant handlers. One to fetch tokens from the keyboard, another to fetch tokens from the 1000 PETSCII screen codes and 25 bits of linewrap I call a "screen", another to fetch tokens from 1024 byte classic ASCII blocks, and another to read sequential files.

Other than losing Forth-83 compliance (TIB #TIB SPAN >IN are required words) what would be wrong with that?

various PETTIL design considerations

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design

Re: screen editor design