It has been mentioned that on some Forth systems a zero is stored in the last byte of a screen of source, and possibly in the text input buffer, to signal termination of compiling or interpreting. Since I've seen no mention of this in the Forth-83 standard, I will describe how Fleet Forth interprets/compiles text.
Here is Fleet Forth's QUIT loop:
HEX
: QUERY ( -- )
'TIB IS TIB TIB 50 EXPECT
SPAN C@ DUP #TIB !
1+ CHARS +! BLK OFF >IN OFF ;
'TIB is a metacompiler macro that evaluates to $2A7, the starting address for the text input buffer. QUERY sets the system value TIB (actually a constant, the system is in RAM) to the correct address and reads in characters with EXPECT . The value returned by SPAN is stored in #TIB and CHARS is updated. CHARS is used to keep track of the number of characters sent to a line for words that use text formatting. BLK and >IN are zeroed. EXPECT stores text up to 80 characters or until a carriage return is encountered. It does not store the carriage return. It does not store a zero as an end of string marker. INTERPRET , which interprets or compiles depending on the STATE , uses WORD to parse the text stream. WORD uses 'STREAM to return the address of the text stream.
: 'STREAM ( -- ADR U )
BLK @ ?DUP
IF
BLOCK B/BUF
ELSE
TIB #TIB @
THEN
DUP >IN @ UMIN DUP NEGATE
UNDER+ UNDER+ ;
'STREAM returns the address of the current position in the text stream, as indicated by the offset >IN , and the number of bytes of text stream yet to parse. When the text stream is exhausted (when >IN is equal to or greater than the size of the text stream) 'STREAM returns a text stream size of zero. 'STREAM uses UMIN to return the unsigned minimum of the text stream size and >IN thus treating both sizes as unsigned numbers. WORD uses >HERE to copy the parsed text to HERE. >HERE takes an address and a count and returns HERE . It places the count at HERE followed by the text at the given address and then a trailing blank. If the text stream is exhausted, WORD supplies an address just past the text stream and a count of zero to >HERE . >HERE places the zero at HERE and the trailing blank.
When INTERPRET executes FIND with the address of HERE , FIND finds an entry in the Forth vocabulary with a count of zero and a blank for its name. This word is an alias for EXIT which is immediate. Executing EXIT or its alias is all that is needed to exit INTERPRET and resume the quit loop.
Loading a block is similar. Here is the definition of LOAD
Note: BLK is a regular variable (not a user variable) but it has two cells. >IN is actually a constant that points to the second cell. It behaves just like a variable with its data just after BLK's data. LOAD first saves the values of BLK and >IN , by way of LINELOAD , and gives them new values. INTERPRET is then executed. Once again, when INTERPRET executes the alias for EXIT it exits. The original values of BLK and >IN are restored and interpretation resumes at the terminal.
The system doesn't see individual lines in a source block. Those are for readability. The system effectively "sees" the screen as a 1024 byte string.
EVALUATE interprets/compiles strings. It takes an address and a count. The string to be evaluated, like a screen or the terminal input buffer, does not need a terminating zero. Here is the source:
As with LOAD and LINELOAD , EVALUATE saves the original values of BLK and >IN . It also saves the values of TIB and #TIB . It sets TIB to the address of the string to be evaluated and sets #TIB to the count. It clears BLKand >IN .
Once INTERPRET exits, EVALUATE restores the values that were saved. EVALUATE , like LOAD , is nestable. An evaluated string can load a block and a screen (text in a block) can evaluate a string. They can both be mutually nested.
The inclusion of EVALUATE is why QUERY sets TIB to the proper address.
Loading source from a file is easy enough in Fleet Forth. Here is one way to implement Ansi Forth's INCLUDED in Fleet Forth:
: INCLUDED ( ADR U -- )
DR# DUP DUP OPEN IOERR
BEGIN
DR# CHKIN IOERR
-2 BUFFER DUP B/BUF (EXPECT)
CLRCHN
STATUS >R
SPAN @ EVALUATE
." ."
R> DONE? OR
UNTIL
DR# CLOSE ;
The first line opens a file on the current disk drive and checks for an I/O error. This disk is not a disk for Forth blocks.
The first line after BEGIN redirects input from this disk drive and checks for an I/O error. BUFFER is used to return the address of a block buffer. Since it doesn't read from external storage, as long as no blocks are in the buffer this is perfectly safe. It would also be safe if blocks were being accessed from a drive other than the one with the sequential file on it.
Each line is read in with EXPECT's vector, (EXPECT) .It can be used because each line is terminated with a carriage return ($0D). CLRCHN clears the I/O back to the defaults until the next line is read. A decimal point is displayed for each line loaded just in case it is a large file.
Each line is evaluated until the end of file is reached or until the user decides this file doesn't need to finish loading, perhaps it was the wrong one.
Only when loading a file is there something new in the data. A carriage return terminates each line. In this case the lines are real. The file could be quite large and it would be difficult (on a C64) to read in the entire file as one really long line. As is, this version of INCLUDED can only handle files with lines that are no longer than 1024 bytes.
Based on a discussion in another thread, I forked a new version of Fleet Forth to modify the assembler and source so the assembler's mnemonics and control flow words are no longer comma terminated. These three macros are a holdover from the Ragsdale assembler which I also incorporated in my assembler.
: BOT ( -- N ) 0 ,X ;
: SEC ( -- N ) 2 ,X ;
: RP) ( -- N ) 101 ,X ;
This definition of SEC will conflict with the name of the mnemonic SEC, if the comma is removed.
I personally don't use these "macros"; however, I have seen RPN Forth assembly code that is rife with them.
I'm going to remove these three words from my assembler and adjust the editor tool I'm writing to replace each before replacing the comma terminated version of the assembler mnemonics.
I use TOS (top of stack, ie, data stack) and NOS (next on stack) which I got elsewhere and I can't remember where anymore. I have 3OS (third on stack) too, plus TOS_LO, TOS_HI, etc. to specify which byte of each stack cell. SEC (second cell on stack?) would conflict with SEt Carry even if you have a separate assembler vocabulary (which I don't).
I don't know why Ragsdale named the word BOT that performs 0 ,X . Like you, I think TOS would be a more appropriate name. Likewise NOS rather than SEC . Personally I never have used them. I just use 2 ,X mnemonic 3 ,X mnemonic. I left BOT SEC and RP) in my assembler for portability. For the version of my assembler without comma terminated mnemonics and control flow, I'm just leaving them out.
Although Fleet Forth's editor is based on the editor presented in Forth Dimensions volume 3 issue 3, there are some differences. WHERE is in the kernel, takes no parameters, and is used by (ABORT") . (SPREAD) is named SPREAD because it's not just used by other words, I also use it directly.
Some of the words not directly used by the programmer are different or have different names.
The version of MATCH in the article returns a flag and a cursor offset while the version in Fleet Forth returns the flag on top.
In both versions, MATCH does not skip over spaces. The search string might contain spaces. MATCH takes the address and length of a string to search and an address and length of a string to find. MATCH starts at the beginning of the string to search and progresses through it one byte at a time until it finds a match or it has made it through the string.
There is a cursor variable, R# . When editing a screen, the starting address and length for the string to be searched are calculated using the value of R# and the offset returned by MATCH is added to R# .
Here is the source for Fleet Forth's MATCH
: MATCH ( ADR1 LEN1 ADR2 LEN2 -- OFFSET FLAG )
2OVER BOUNDS
?DO
2DUP I TEXT=
IF
NIP ROT I SWAP - +
TUCK U< 0=
UNLOOP EXIT
THEN
LOOP
2DROP NIP FALSE ;
TEXT= is a code word that takes an address and a count (length) and another address. It returns TRUE if the strings at both addresses are identical over the length specified by the count.
As long as a match is not found and the search space is not exhausted, the DO LOOP executes five words that are all code words (primitives). 2DUP , I , TEXT= , ?BRANCH ( compiled by IF ), and (LOOP) ( compiled by LOOP ). The words in the IF THEN structure calculate the cursor offset. a TRUE flag is returned only if the cursor offset is not larger than LEN1, otherwise a FALSE flag is returned. The loop parameters are discarded and MATCH exits at this point. If the DO LOOP runs to completion, the search string was not found and a cursor offset equal to LEN1 and a FALSE flag are returned.
Another difference is how MATCH is used. In both versions, R# is used to find the line and position on the current line to start the search. In both versions, the search performed by the word TILL is limited to the current line.
There are two other search words. F finds the next occurrence of the search string in the screen. S finds all occurrences of the search string in a range of screens. Well, all but pathological cases. Consider this screen:
1 FH S DUP
DUP^DUPDUP 7 8100
DUPDUP^DUP 7 8100
DUPDUPDUP^ 7 8100
OK
Notice that I did not say "the word DUP" but rather "the string DUP" . These search words know nothing of Forth words or space delimited parsing. They are editing tools looking for one string within another.
When searching for the string DUPDUP , only one occurrence is found.
With Fleet Forth's editor and the one in the article, F uses the word SEEK ( spelled (SEEK) in the article ) to find the search string. While both versions search from the current cursor position, the version in the article uses the cursor variable, R# , to calculate the address of the cursor position and the remaining length of that line. If the string is not found, the next line is searched from the beginning, each line searched until the string is found or there are no more lines to search. Fleet Forth's version of SEEK uses the cursor variable, R# , to calculate the address of the cursor position and the remaining length of the entire screen.
The version of S in the article also searches for the search string one line at a time whereas the version in Fleet Forth searches the entire screen.
That's neat! The only additional thing I could wish for is wildcards, which I have on my HP-71 hand-held computer but I haven't had much use for in the way I use Forth. By "wildcards," I mean for example that the search string could specify that it has to start with one thing and end with another but not specify what's in the middle, or you could specify for example that you only want the string if it's at the beginning of a line, or other such things. That's something even my wonderful MultiEdit editor on the PC doesn't have.
That sounds interesting but I haven't needed wildcards so I didn't try to implement them. The editor really is quite simple. Here is my find and replace function.
Unlike S , F ( find) and R ( replace) are limited to one screen at a time. If F can't find the string it aborts. I'm changing that to QUIT so it will not clear the data and auxiliary stacks.
It is used like this:
As long as I didn't do anything to alter the contents of the find and insert buffers I can perform the same search and replace on other screens just by typing FR .
The C64 has a screen editor. If I scroll up to a line and hit the enter key, the entire line is read in by EXPECT as if I had typed it. This made it easy to add a screen editor for screens as opposed to just a line editor.
I haven't seen much opportunity to use wildcards yet, either. Even under ML programming, the only use I have for wildcards is when cataloging a disk. And even then, I combine the wildcard <search by name> with, <search by type>, <search by date>, <search by filesize>, etc...
As of yet, I don't have many volume or disk utilities in Forth.
I discussed the structure of Fleet Forth's vocabularies here.
In a recent modification to my metacompiler, I took advantage of that structure. The following words are defined exactly the same in the main ( host ) assembler vocabulary and the target assembler vocabulary:
TABLE MODE .A X)
)Y # ) MEM
,X ,Y Z OFFSET
VS CS 0= 0<
>= NOT
The body of a Fleet Forth VOCABULARY has three fields, a pointer to the link field of the latest word in the vocabulary, a pointer to the vocabulary's parent vocabulary, and as a link in the VOC-LINK chain. The first word defined in a vocabulary has a link field with a value of zero.
To avoid redefining the words shown above, I made sure they were the first words defined in the original ASSEMBLER vocabulary and modified the metacompiler's ASSEMBLER vocabulary before it had any words.
SCR# 51
// TIE IN WITH FLEET FORTH ASSEMBLER
TARGET VOCABULARY ASSEMBLER
LATEST NAME> >BODY
HOST ASSEMBLER ' NOT >LINK SWAP!
The metacompiler's assembler is chained to the main assembler at this point and the first word defined in the metacompiler's ASSEMBLER vocabulary will be linked to the word NOT in the original ASSEMBLER vocabulary; therefore the words shown above are searched as part of the metacompiler's vocabulary. Since Fleet Forth's vocabularies do not have a fake name in the vocabulary, but a link to the parent vocabulary, a search in the metacompiler's assembler does not immediately continue in the main Forth vocabulary. The search progresses from the metacompiler's assembler to the target Forth vocabulary on down to the host ( or main) Forth vocabulary.
ASSEMBLER \ metacompiler's assembler vocabulary
FORTH \ target Forth vocabulary
SHADOW \ for constants that "shadow" variables in the target
META \ metacompiler aliases for C@ @ C! ! etc.
FORTH \ original Forth vocabulary
In addition to .S to display the contents of the data stack and .AS to display the contents of the auxiliary stack, Fleet Forth also has .RS to display the contents of the return stack. These three words can aid in tracking down errors.
Here is the source for Fleet Forth's ABORT , the word executed by (ABORT") .
: ABORT ( -- )
SINGLE ERR SP! AP!
['] LIT (IS) WHERE
QUIT ; -2 ALLOT
SINGLE switches off multitasking, setting the deferred word PAUSE to execute the word NOOP , a no op. SP! and AP! clear the data stack and the auxiliary stack.
The line
re enables WHERE . -2 ALLOT reclaims the memory used by EXIT , as there is no returning from QUIT . ERR is a deferred word normally set to the no op, NOOP . To help with troubleshooting, ERR can be set to a word like (ERR) .
This does show the nesting to the word .RS , but it is not accurate and I desired that accuracy for some of my Forth experiments.
The address 21EB is not in INTERPRET , but in QUIT . Do-colon places the address 21EB on the return stack when QUIT executes INTERPRET , just as the address 21A0 is placed on the return stack when WRAPPER is executed. The improved .RS uses the word AFIND to show which word contains a given address.
ADDR is the address in question. LINK1 is the closest link address to ADDR from below and LINK2 is the closest link address to ADDR from above.
For each value on the return stack, .RS displays it as an unsigned number. If, as an address, it is within the dictionary, the name of the word containing that address is displayed.
Here is the result of using the new .RS with the above test case (I've added lowercase comments to the session log):
WRAPPER
5807 INNER
7FFF
8001
580B INNER \ one of the DO LOOP parameters in INNER
5823 OUTER
7FFF
8001
5827 OUTER \ one of the DO LOOP parameters in OUTER
5839 WRAPPER
21A0 INTERPRET
21EB QUIT
OK
Although the address 580B is in the word INNER , it is not placed there by executing another word. It is the first DO LOOP parameter placed on the return stack. It is the branch address used by any LEAVE or ?LEAVE ( of which there are none in this test ) and used by LOOP and +LOOP when a DO LOOP terminates through LOOP or +LOOP.
I made a change to Fleet Forth's LINELOAD , which is used by LOAD . LINELOAD takes a line number and a screen number as parameters. It loads a given screen starting at the line specified.
I read somewhere (I don't remember where) a recommendation that LOAD should set BASE to decimal prior to loading a screen. I've given this some thought and can not really see a down side. As it is, I have been in the habit of specifying the number base at the start of a screen. With this modification, I will not have to specify the number base if I am using decimal for a particular screen.
Here is the source for Fleet Forth's LINELOAD and LOAD
I think it's a bit short-sighted for someone to say decimal should be the default. In my own uses, it make sense to keep it in hex most of the time, and change to decimal or binary only temporarily and in limited places.
The idea isn't so much to have decimal as a default, but to avoid having no default. This change doesn't change anything for screens where I still set the base to hexadecimal on line one since having LOAD set a particular base doesn't preclude changing the base while loading the screen. I usually use line zero for a comment that the word INDEX displays and line one to set the base. On the screens that will compile the same in hexadecimal or decimal, it is tempting to not specify the base. Without a default of some kind, it may be necessary to specify the number base for each screen since they can be loaded individually for testing purposes.
With the other change, RB causes LOAD to restore base to the value it had prior to loading a screen. Base can be changed in a given screen (possibly multiple times). After a screen is loaded, base is restored to what it was before I loaded that screen.
As an example, suppose I'm working in binary to check out some information that makes more sense to view in binary and I have some bit manipulation words defined on screen 250. No base was set in the source because that screen will load just fine in hexadecimal or decimal. It just won't load correctly in binary. While working in binary, I can load these words by typing: #250 LOAD ( or $FA LOAD ) without leaving binary and I'm right back in binary after the screen loads.
I suppose this change to LOAD has to do with my recent change to NUMBER? . Before that change, If I was working in binary and wanted to load screen number 250, I would have to change to decimal (or hexadecimal) to load it, since that would be more convenient and easier to remember than typing: 11111010 LOAD
There is one caveat: If there is an error while loading a screen, Fleet Forth will be left in whatever base was in use until I next change base. This is no different than before. RB can't do anything about that.
Funny you should say that. Part of the reason I chose decimal as the default is that I'd been thinking a little too much in hexadecimal when working on Fleet Forth. When I would print a range of screens to a print dump file, I was in the habit of printing them in hexadecimal.