Note: some of the quoted text is from the old General Discussions thread.
The point is compactness and uniformity of representation.
You bring up a very good example. What I had in mind was processors that have very small amounts of RAM or can only execute ROM code (Harvard architecture). In fact, the fast BASIC interpreter I proposed is similar to a headlerless token-threaded Forth.
I know it's taboo in Forth, but there's nothing that says the code is supposed to be readable after being de-compiled, and we are after all talking about internal optimizations. If the user wants it readable, he'll have to look at the source code I provide that shows what it looks like before these optimizations are carried out.
I'd much rather have the compiled code be efficient than easy to decompile. I don't even bother with SEE. However, if a word is too complicated for SEE, you could always add view fields to the head. Typically, a view field indicates where (i.e. which block) the definition starts, but you could use the view field to tell SEE how to handle difficult situations instead. If you didn't want to add a view field to every definition, you could use a "view field present" bit, implemented like the "immediate" bit in a definition.
For example, in my Forth, the definition name is stored as a counted string, except that bit 7 of the count byte indicates whether or not the definition is immediate, which allows definition name of up to 127 characters. If bit 6 indicated whether or not there was a view field, that would still allow a 63 character name, which would be more than plenty.
How much optimization would you have the program itself do automatically as it compiles?
None! Well...almost none. Currently, no optimizations are performed (e.g. EXIT always compiles an RTS). The main reason is practical: I didn't want to keep rewriting optimization handling in the compiler while I was still weighing implementation decisions (e.g. speed, space). I also like the idea of having leaving the decision about where and when to optimize up to the programmer.
The reason I am writing my Forth is to have a practical 6502 ANS Forth for my own use. Optimization has never been a priority. Obviously, I've gotten sidetracked as I've learned about the uniqueness of STC. A couple of my goals are: (1) a "lean", no-frills Forth, and (2) anything written in standard words should not have be to modified if that is at all possible, even if it is the result is not optimally efficient.
Right now, a standard word name (e.g. UNTIL) is always unoptimized. For common optimizations, I have defined special words, with similar, but non-standard names. For example, IF currently compiles a BNE followed by a JMP (the Z flag is prepared first, of course), which allows the branch distance to be more than half a page. A different word is used for compiling a BEQ only, an optimization that can almost always be made.
I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)
Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors. For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange. Thus it is clear what a word does based on its name (e.g. DROP drops a cell from the data stack), and whether it is executed or compiled with what, if any, type of optimization based on its color. (I should also point out that, for example, you don't necessarily have to restrict brown words to those which can be specifically space optimized. If a JSR is the most space efficient way a word could be compiled, a brown word could just default to the green word behavior.) This you allows you to easily optimize one section for speed, and another for space, with high legibility. (In ANS Forth, you could maintain a "optimization state" variable, and define words that switch between the space, speed, and no optimization states, but with colors it is far more clear where and when optimizations occur.)
If detecting longer optimizable sequences such as DUP 0< IF isn't convenient, you could use the color gray to tell the compiler to start looking for an optimizable sequence (e.g. if DUP 0< IF were all gray or something). You could use the color pink if you encountered some other special scenario that isn't covered by any the above, and so on.
Anyway, a 6502 colorForth is another possible project. However, I'd like it to be suitable for use in a monochrome enviroment. There are a couple of advantages to this approach: (1) a hobbyist machine might well use a monochrome display or be limited in the number of colors it displays, and (2) code that is not colored will be much simpler to print, photocopy, cut-and-paste, or post to a message board. I have some vague ideas about how to accomplish this, but they are definitely not ready for public consumption at this point. I would like to implement the ideas above (and there are some other things I might do differently than colorForth as well) at some point, but this is way down the list of 6502 projects at the present time.
Getting back to the ANS Forth that I am actually writing, the optimizations that I really am considering handling automatically are: turning a JSR RTS sequence into a JMP, compiling BEQ rather than BNE JMP whenever possible with backward branches, and handling 0= IF and 0= UNTIL by turning a BEQ into a BNE and vice versa.