Fleet Forth design considerations

JimBoyd · Post by **JimBoyd** » Sat Feb 06, 2021 1:39 am

It has been mentioned that on some Forth systems a zero is stored in the last byte of a screen of source, and possibly in the text input buffer, to signal termination of compiling or interpreting. Since I've seen no mention of this in the Forth-83 standard, I will describe how Fleet Forth interprets/compiles text.
Here is Fleet Forth's QUIT loop:

Code: Select all

: QUIT  ( -- )
   [COMPILE] [
   BEGIN
         RP! CR QUERY INTERPRET
         STATE @ 0=
      CS-DUP UNTIL  // CONTINUE
      ."  OK"
   AGAIN ; -2 ALLOT

Notice that QUIT doesn't directly clear BLK to zero. QUERY does:

Code: Select all

HEX
: QUERY  ( -- )
   'TIB IS TIB  TIB 50 EXPECT
   SPAN C@ DUP #TIB !
   1+ CHARS +!  BLK OFF >IN OFF ;

'TIB is a metacompiler macro that evaluates to $2A7, the starting address for the text input buffer. QUERY sets the system value TIB (actually a constant, the system is in RAM) to the correct address and reads in characters with EXPECT . The value returned by SPAN is stored in #TIB and CHARS is updated. CHARS is used to keep track of the number of characters sent to a line for words that use text formatting. BLK and >IN are zeroed.
EXPECT stores text up to 80 characters or until a carriage return is encountered. It does not store the carriage return. It does not store a zero as an end of string marker.
INTERPRET , which interprets or compiles depending on the STATE , uses WORD to parse the text stream. WORD uses 'STREAM to return the address of the text stream.

Code: Select all

: 'STREAM  ( -- ADR U )
   BLK @ ?DUP
   IF
      BLOCK B/BUF
   ELSE
      TIB #TIB @
   THEN
   DUP >IN @ UMIN DUP NEGATE
   UNDER+ UNDER+ ;

'STREAM returns the address of the current position in the text stream, as indicated by the offset >IN , and the number of bytes of text stream yet to parse. When the text stream is exhausted (when >IN is equal to or greater than the size of the text stream) 'STREAM returns a text stream size of zero. 'STREAM uses UMIN to return the unsigned minimum of the text stream size and >IN thus treating both sizes as unsigned numbers.
WORD uses >HERE to copy the parsed text to HERE. >HERE takes an address and a count and returns HERE . It places the count at HERE followed by the text at the given address and then a trailing blank. If the text stream is exhausted, WORD supplies an address just past the text stream and a count of zero to >HERE . >HERE places the zero at HERE and the trailing blank.
When INTERPRET executes FIND with the address of HERE , FIND finds an entry in the Forth vocabulary with a count of zero and a blank for its name. This word is an alias for EXIT which is immediate. Executing EXIT or its alias is all that is needed to exit INTERPRET and resume the quit loop.
Loading a block is similar. Here is the definition of LOAD

Code: Select all

: LOAD  ( U -- )
   0 SWAP LINELOAD ;

and LINELOAD

Code: Select all

: LINELOAD  ( LINE# BLK# -- )
   ?DUP 0=
   ABORT" CAN'T LOAD 0"
   BLK 2@ 2>R
   BLK !  C/L * >IN !
   INTERPRET  2R> BLK 2! ;

Note: BLK is a regular variable (not a user variable) but it has two cells. >IN is actually a constant that points to the second cell. It behaves just like a variable with its data just after BLK's data.
LOAD first saves the values of BLK and >IN , by way of LINELOAD , and gives them new values. INTERPRET is then executed. Once again, when INTERPRET executes the alias for EXIT it exits. The original values of BLK and >IN are restored and interpretation resumes at the terminal.
The system doesn't see individual lines in a source block. Those are for readability. The system effectively "sees" the screen as a 1024 byte string.

EVALUATE interprets/compiles strings. It takes an address and a count. The string to be evaluated, like a screen or the terminal input buffer, does not need a terminating zero. Here is the source:

Code: Select all

: EVALUATE  ( ADR U -- )
   BLK 2@ 2>R
   TIB #TIB @ 2>R
   #TIB ! (IS) TIB
   BLK OFF >IN OFF
   INTERPRET
   2R> #TIB ! (IS) TIB
   2R> BLK 2! ;

As with LOAD and LINELOAD , EVALUATE saves the original values of BLK and >IN . It also saves the values of TIB and #TIB . It sets TIB to the address of the string to be evaluated and sets #TIB to the count. It clears BLKand >IN .
Once INTERPRET exits, EVALUATE restores the values that were saved.
EVALUATE , like LOAD , is nestable. An evaluated string can load a block and a screen (text in a block) can evaluate a string. They can both be mutually nested.
The inclusion of EVALUATE is why QUERY sets TIB to the proper address.

Loading source from a file is easy enough in Fleet Forth. Here is one way to implement Ansi Forth's INCLUDED in Fleet Forth:

Code: Select all

: INCLUDED  ( ADR U -- )
   DR# DUP DUP OPEN IOERR
   BEGIN
      DR# CHKIN IOERR
      -2 BUFFER DUP B/BUF (EXPECT)
      CLRCHN
      STATUS >R
      SPAN @ EVALUATE
      ." ."
      R> DONE? OR
   UNTIL
   DR# CLOSE ;

The first line opens a file on the current disk drive and checks for an I/O error. This disk is not a disk for Forth blocks.
The first line after BEGIN redirects input from this disk drive and checks for an I/O error.
BUFFER is used to return the address of a block buffer. Since it doesn't read from external storage, as long as no blocks are in the buffer this is perfectly safe. It would also be safe if blocks were being accessed from a drive other than the one with the sequential file on it.
Each line is read in with EXPECT's vector, (EXPECT) .It can be used because each line is terminated with a carriage return ($0D).
CLRCHN clears the I/O back to the defaults until the next line is read. A decimal point is displayed for each line loaded just in case it is a large file.
Each line is evaluated until the end of file is reached or until the user decides this file doesn't need to finish loading, perhaps it was the wrong one.
Only when loading a file is there something new in the data. A carriage return terminates each line. In this case the lines are real. The file could be quite large and it would be difficult (on a C64) to read in the entire file as one really long line. As is, this version of INCLUDED can only handle files with lines that are no longer than 1024 bytes.

JimBoyd · Post by **JimBoyd** » Thu Feb 18, 2021 1:37 am

Based on a discussion in another thread, I forked a new version of Fleet Forth to modify the assembler and source so the assembler's mnemonics and control flow words are no longer comma terminated. These three macros are a holdover from the Ragsdale assembler which I also incorporated in my assembler.

Code: Select all

: BOT  ( -- N )  0 ,X ;
: SEC  ( -- N )  2 ,X ;
: RP)  ( -- N )  101 ,X ;

This definition of SEC will conflict with the name of the mnemonic SEC, if the comma is removed.
I personally don't use these "macros"; however, I have seen RPN Forth assembly code that is rife with them.
I'm going to remove these three words from my assembler and adjust the editor tool I'm writing to replace each before replacing the comma terminated version of the assembler mnemonics.

Code: Select all

BOT  -->  0 ,X
SEC  -->  2 ,X
RP)  -->  101 ,X

I've gotten used to the comma terminated version of the assembler words and I'm still not sure which version to go with but it's still early.

GARTHWILSON · Post by **GARTHWILSON** » Thu Feb 18, 2021 1:59 am

I use TOS (top of stack, ie, data stack) and NOS (next on stack) which I got elsewhere and I can't remember where anymore. I have 3OS (third on stack) too, plus TOS_LO, TOS_HI, etc. to specify which byte of each stack cell. SEC (second cell on stack?) would conflict with SEt Carry even if you have a separate assembler vocabulary (which I don't).

JimBoyd · Post by **JimBoyd** » Thu Feb 18, 2021 2:17 am

I don't know why Ragsdale named the word BOT that performs 0 ,X . Like you, I think TOS would be a more appropriate name. Likewise NOS rather than SEC . Personally I never have used them. I just use 2 ,X mnemonic 3 ,X mnemonic. I left BOT SEC and RP) in my assembler for portability. For the version of my assembler without comma terminated mnemonics and control flow, I'm just leaving them out.

JimBoyd · Post by **JimBoyd** » Sun Mar 21, 2021 9:01 pm

Although Fleet Forth's editor is based on the editor presented in Forth Dimensions volume 3 issue 3, there are some differences.
WHERE is in the kernel, takes no parameters, and is used by (ABORT") .
(SPREAD) is named SPREAD because it's not just used by other words, I also use it directly.
Some of the words not directly used by the programmer are different or have different names.
The version of MATCH in the article returns a flag and a cursor offset while the version in Fleet Forth returns the flag on top.

In both versions, MATCH does not skip over spaces. The search string might contain spaces. MATCH takes the address and length of a string to search and an address and length of a string to find. MATCH starts at the beginning of the string to search and progresses through it one byte at a time until it finds a match or it has made it through the string.
There is a cursor variable, R# . When editing a screen, the starting address and length for the string to be searched are calculated using the value of R# and the offset returned by MATCH is added to R# .
Here is the source for Fleet Forth's MATCH

Code: Select all

: MATCH  ( ADR1 LEN1 ADR2 LEN2 -- OFFSET FLAG )
   2OVER BOUNDS
   ?DO
      2DUP I TEXT=
      IF
         NIP ROT I SWAP - +
         TUCK U< 0=
         UNLOOP EXIT
      THEN
   LOOP
   2DROP NIP FALSE ;

TEXT= is a code word that takes an address and a count (length) and another address. It returns TRUE if the strings at both addresses are identical over the length specified by the count.
As long as a match is not found and the search space is not exhausted, the DO LOOP executes five words that are all code words (primitives).
2DUP , I , TEXT= , ?BRANCH ( compiled by IF ), and (LOOP) ( compiled by LOOP ). The words in the IF THEN structure calculate the cursor offset. a TRUE flag is returned only if the cursor offset is not larger than LEN1, otherwise a FALSE flag is returned. The loop parameters are discarded and MATCH exits at this point. If the DO LOOP runs to completion, the search string was not found and a cursor offset equal to LEN1 and a FALSE flag are returned.
Another difference is how MATCH is used. In both versions, R# is used to find the line and position on the current line to start the search. In both versions, the search performed by the word TILL is limited to the current line.
There are two other search words. F finds the next occurrence of the search string in the screen. S finds all occurrences of the search string in a range of screens. Well, all but pathological cases. Consider this screen:

Code: Select all

0 FH LIST 
SCR# 8100 
0: 
1: 
2: 
3: 
4: 
5: 
6: 
7:    DUPDUPDUP
8: 
9: 
A: 
B: 
C: 
D: 
E: 
F: 
 OK

When searching for the string DUP , all three are found.

Code: Select all

1 FH S DUP 
   DUP^DUPDUP                                                     7 8100 
   DUPDUP^DUP                                                     7 8100 
   DUPDUPDUP^                                                     7 8100  
OK

Notice that I did not say "the word DUP" but rather "the string DUP" . These search words know nothing of Forth words or space delimited parsing. They are editing tools looking for one string within another.
When searching for the string DUPDUP , only one occurrence is found.

Code: Select all

1 FH S DUPDUP 
   DUPDUP^DUP                                                     7 8100  
OK

With Fleet Forth's editor and the one in the article, F uses the word SEEK ( spelled (SEEK) in the article ) to find the search string. While both versions search from the current cursor position, the version in the article uses the cursor variable, R# , to calculate the address of the cursor position and the remaining length of that line. If the string is not found, the next line is searched from the beginning, each line searched until the string is found or there are no more lines to search. Fleet Forth's version of SEEK uses the cursor variable, R# , to calculate the address of the cursor position and the remaining length of the entire screen.
The version of S in the article also searches for the search string one line at a time whereas the version in Fleet Forth searches the entire screen.

GARTHWILSON · Post by **GARTHWILSON** » Sun Mar 21, 2021 10:58 pm

That's neat! The only additional thing I could wish for is wildcards, which I have on my HP-71 hand-held computer but I haven't had much use for in the way I use Forth. By "wildcards," I mean for example that the search string could specify that it has to start with one thing and end with another but not specify what's in the middle, or you could specify for example that you only want the string if it's at the beginning of a line, or other such things. That's something even my wonderful MultiEdit editor on the PC doesn't have.

JimBoyd · Post by **JimBoyd** » Thu Mar 25, 2021 2:27 am

That sounds interesting but I haven't needed wildcards so I didn't try to implement them. The editor really is quite simple. Here is my find and replace function.

Code: Select all

: FR
   BEGIN F R AGAIN ; -2 ALLOT

Unlike S , F ( find) and R ( replace) are limited to one screen at a time. If F can't find the string it aborts. I'm changing that to QUIT so it will not clear the data and auxiliary stacks.
It is used like this:

Code: Select all

FR <STRING1>^<STRING2>

As long as I didn't do anything to alter the contents of the find and insert buffers I can perform the same search and replace on other screens just by typing FR .
The C64 has a screen editor. If I scroll up to a line and hit the enter key, the entire line is read in by EXPECT as if I had typed it. This made it easy to add a screen editor for screens as opposed to just a line editor.

Code: Select all

HEX
EDITOR DEFINITIONS
: XED  ( N -- )
   CREATE C, DOES>  ( -- )
   C@ C/L * R# !
   0 TEXT C/L PAD C!
   IBUF BUFMOVE (R)
   QUIT ; -2 ALLOT

 0 XED 0:    1 XED 1:    2 XED 2:
 3 XED 3:    4 XED 4:    5 XED 5:
 6 XED 6:    7 XED 7:    8 XED 8:
 9 XED 9:   0A XED A:   0B XED B:
0C XED C:   0D XED D:   0E XED E:
0F XED F:

I modified LIST to always show the line numbers in hexadecimal so I only need sixteen XED child words.

Code: Select all

HEX
VOCABULARY EDITOR
VARIABLE R#
: LINE  ( LINE# -- LADR CNT )
   0F OVER U< ABORT" OFF SCREEN"
   C/L *  SCR @ BLOCK +  C/L ;
: LIST  ( SCR -- )
   RB EDITOR  R# OFF DUP SCR !
   CR ." SCR# " U. HEX  10 0
   DO
      CR I 1 .R ." : "
      I LINE -TRAILING QTYPE
      DONE? ?LEAVE
   LOOP
   CR ;

IamRob · Post by **IamRob** » Thu Mar 25, 2021 2:57 am

I haven't seen much opportunity to use wildcards yet, either. Even under ML programming, the only use I have for wildcards is when cataloging a disk. And even then, I combine the wildcard <search by name> with, <search by type>, <search by date>, <search by filesize>, etc...

As of yet, I don't have many volume or disk utilities in Forth.

JimBoyd · Post by **JimBoyd** » Sat Apr 03, 2021 12:17 am

I discussed the structure of Fleet Forth's vocabularies here.
In a recent modification to my metacompiler, I took advantage of that structure. The following words are defined exactly the same in the main ( host ) assembler vocabulary and the target assembler vocabulary:

Code: Select all

TABLE     MODE      .A        X)
)Y        #         )         MEM
,X        ,Y        Z         OFFSET
VS        CS        0=        0<
>=        NOT

The body of a Fleet Forth VOCABULARY has three fields, a pointer to the link field of the latest word in the vocabulary, a pointer to the vocabulary's parent vocabulary, and as a link in the VOC-LINK chain. The first word defined in a vocabulary has a link field with a value of zero.
To avoid redefining the words shown above, I made sure they were the first words defined in the original ASSEMBLER vocabulary and modified the metacompiler's ASSEMBLER vocabulary before it had any words.

Code: Select all

SCR# 51 
// TIE IN WITH FLEET FORTH ASSEMBLER
TARGET VOCABULARY ASSEMBLER
LATEST NAME> >BODY
HOST ASSEMBLER ' NOT >LINK SWAP!

The metacompiler's assembler is chained to the main assembler at this point and the first word defined in the metacompiler's ASSEMBLER vocabulary will be linked to the word NOT in the original ASSEMBLER vocabulary; therefore the words shown above are searched as part of the metacompiler's vocabulary. Since Fleet Forth's vocabularies do not have a fake name in the vocabulary, but a link to the parent vocabulary, a search in the metacompiler's assembler does not immediately continue in the main Forth vocabulary. The search progresses from the metacompiler's assembler to the target Forth vocabulary on down to the host ( or main) Forth vocabulary.

Code: Select all

ASSEMBLER \ metacompiler's assembler vocabulary
FORTH     \ target Forth vocabulary
SHADOW    \ for constants that "shadow" variables in the target
META      \ metacompiler aliases for C@ @ C! ! etc.
FORTH     \ original Forth vocabulary

JimBoyd · Post by **JimBoyd** » Mon Apr 12, 2021 11:39 pm

In addition to .S to display the contents of the data stack and .AS to display the contents of the auxiliary stack, Fleet Forth also has .RS to display the contents of the return stack. These three words can aid in tracking down errors.
Here is the source for Fleet Forth's ABORT , the word executed by (ABORT") .

Code: Select all

: ABORT  ( -- )
   SINGLE ERR SP! AP!
   ['] LIT (IS) WHERE
   QUIT ; -2 ALLOT

SINGLE switches off multitasking, setting the deferred word PAUSE to execute the word NOOP , a no op.
SP! and AP! clear the data stack and the auxiliary stack.
The line

Code: Select all

   ['] LIT (IS) WHERE

re enables WHERE .
-2 ALLOT reclaims the memory used by EXIT , as there is no returning from QUIT .
ERR is a deferred word normally set to the no op, NOOP . To help with troubleshooting, ERR can be set to a word like (ERR) .

Code: Select all

: (ERR)
   CR ." DATA: " .S
   CR ."  AUX: " .AS
   CR ."  RET:"  .RS ;

This will display all three stacks when an error is encountered.

I recently redefined .RS . The old version worked well enough. Here is a test case:

Code: Select all

: INNER
   1 0  DO  .RS  LOOP ;
: OUTER
   1 0  DO  INNER  LOOP ;
: WRAPPER
   OUTER ;

Executing WRAPPER produced something like the following:

Code: Select all

WRAPPER
5807 .RS
7FFF
8001
580B
5823 INNER
7FFF
8001
5827
5839 OUTER
21A0 EXECUTE
21EB INTERPRET
 OK

This does show the nesting to the word .RS , but it is not accurate and I desired that accuracy for some of my Forth experiments.
The address 21EB is not in INTERPRET , but in QUIT . Do-colon places the address 21EB on the return stack when QUIT executes INTERPRET , just as the address 21A0 is placed on the return stack when WRAPPER is executed. The improved .RS uses the word AFIND to show which word contains a given address.

Code: Select all

AFIND  ( ADDR -- ADDR LINK1 LINK2 )

ADDR is the address in question. LINK1 is the closest link address to ADDR from below and LINK2 is the closest link address to ADDR from above.
For each value on the return stack, .RS displays it as an unsigned number. If, as an address, it is within the dictionary, the name of the word containing that address is displayed.
Here is the result of using the new .RS with the above test case (I've added lowercase comments to the session log):

Code: Select all

WRAPPER 
5807 INNER
7FFF
8001
580B INNER      \ one of the DO LOOP parameters in INNER
5823 OUTER
7FFF
8001
5827 OUTER      \ one of the DO LOOP parameters in OUTER
5839 WRAPPER
21A0 INTERPRET
21EB QUIT
 OK

Although the address 580B is in the word INNER , it is not placed there by executing another word. It is the first DO LOOP parameter placed on the return stack. It is the branch address used by any LEAVE or ?LEAVE ( of which there are none in this test ) and used by LOOP and +LOOP when a DO LOOP terminates through LOOP or +LOOP.

JimBoyd · Post by **JimBoyd** » Fri Apr 16, 2021 11:11 pm

I made a change to Fleet Forth's LINELOAD , which is used by LOAD . LINELOAD takes a line number and a screen number as parameters. It loads a given screen starting at the line specified.
I read somewhere (I don't remember where) a recommendation that LOAD should set BASE to decimal prior to loading a screen. I've given this some thought and can not really see a down side. As it is, I have been in the habit of specifying the number base at the start of a screen. With this modification, I will not have to specify the number base if I am using decimal for a particular screen.
Here is the source for Fleet Forth's LINELOAD and LOAD

Code: Select all

: LINELOAD  ( LINE# SCR# -- )
   ?DUP 0=
   ABORT" CAN'T LOAD 0"
   RB DECIMAL
   BLK 2@ 2>R
   BLK !  C/L * >IN !
   INTERPRET  2R> BLK 2! ;
: LOAD  ( U -- )
   0 SWAP LINELOAD ;

This line:

Code: Select all

   RB DECIMAL

Sets the base to decimal and also causes LINELOAD and LOAD to revert to the number base in use prior to loading a screen.

GARTHWILSON · Post by **GARTHWILSON** » Sat Apr 17, 2021 12:56 am

I think it's a bit short-sighted for someone to say decimal should be the default. In my own uses, it make sense to keep it in hex most of the time, and change to decimal or binary only temporarily and in limited places.

JimBoyd · Post by **JimBoyd** » Sat Apr 17, 2021 2:12 am

The idea isn't so much to have decimal as a default, but to avoid having no default. This change doesn't change anything for screens where I still set the base to hexadecimal on line one since having LOAD set a particular base doesn't preclude changing the base while loading the screen. I usually use line zero for a comment that the word INDEX displays and line one to set the base. On the screens that will compile the same in hexadecimal or decimal, it is tempting to not specify the base. Without a default of some kind, it may be necessary to specify the number base for each screen since they can be loaded individually for testing purposes.
With the other change, RB causes LOAD to restore base to the value it had prior to loading a screen. Base can be changed in a given screen (possibly multiple times). After a screen is loaded, base is restored to what it was before I loaded that screen.
As an example, suppose I'm working in binary to check out some information that makes more sense to view in binary and I have some bit manipulation words defined on screen 250. No base was set in the source because that screen will load just fine in hexadecimal or decimal. It just won't load correctly in binary. While working in binary, I can load these words by typing:
#250 LOAD ( or $FA LOAD ) without leaving binary and I'm right back in binary after the screen loads.
I suppose this change to LOAD has to do with my recent change to NUMBER? . Before that change, If I was working in binary and wanted to load screen number 250, I would have to change to decimal (or hexadecimal) to load it, since that would be more convenient and easier to remember than typing:
11111010 LOAD

There is one caveat: If there is an error while loading a screen, Fleet Forth will be left in whatever base was in use until I next change base. This is no different than before. RB can't do anything about that.

GARTHWILSON · Post by **GARTHWILSON** » Sat Apr 17, 2021 2:20 am

I guess it would be a little strange to have screens numbered in something other than decimal.

JimBoyd · Post by **JimBoyd** » Sat Apr 17, 2021 2:36 am

Funny you should say that. Part of the reason I chose decimal as the default is that I'd been thinking a little too much in hexadecimal when working on Fleet Forth. When I would print a range of screens to a print dump file, I was in the habit of printing them in hexadecimal.

Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations

Re: Fleet Forth design considerations