I'm proposing we continue here our discussion from
6502.org Forum Index -> General Discussions -> How to make a BASIC compiler?
at http://www.6502.org/forum/viewtopic.php?p=3335#3335
since we kind of hijacked pbug's thread and quickly turned it into Forth material that is definitely beyond those who are new to Forth.
I'd like to hear more. What ideas do you have for how the subroutine-threaded Forth would compile new words into its own dictionary? (I can more-or-less imagine it from my experience with ITC, but I'd probably waste a lot of time re-inventing wheels if I had to implement it myself at this point.) How much optimization would you have the program itself do automatically as it compiles?
cont. from BASIC compiler
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
cont. from BASIC compiler
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: cont. from BASIC compiler
GARTHWILSON wrote:
I'd like to hear more. What ideas do you have for how the subroutine-threaded Forth would compile new words into its own dictionary? (I can more-or-less imagine it from my experience with ITC, but I'd probably waste a lot of time re-inventing wheels if I had to implement it myself at this point.) How much optimization would you have the program itself do automatically as it compiles?
A more interesting variation of STC is outright native code compilation, where the compiled code is kept in an ITC-like representation (almost exact, in fact!), but a small bit more detailed. From this, interesting thingslike type inferencing can be done, common subexpression elimination, etc -- the end result of this form of compilation is machine language that can rival what is produced from C or Oberon compilers, for example.
So the gamut is pretty wide. STC is itself just an enabler for the more sophisticated code compilation techniques.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
I am not sure I understand the question. Compiling STC is just as easy as ITC or DTC. A subroutine call is compiled $6C C, address , . Numeric constants are compiled similarly (JSR dolit, DW literal) OR you can directly inline the constant (DEX, DEX, LDA #<value, STA 0,X, LDA #>value, STA 1,X). etc.
I guess that's partly what I'm getting at. For example, do you use some kind of number value in the header to compare against a user variable to tell whether to compile the JSR or to straightline the code, based on the size/speed balance chosen at compile time? Of course some would always be straightlined, like INX INX instead of JSR DROP, but there would be a large middle ground where the choice is not necessarily automatic, like chosing between a 7-byte straightlining and a 3-byte JSR with its extra 12 clocks of JSR-RTS overhead. And then in the numeric constant example you gave, the straightlining is not the same as just copying the subroutine's code right there, since the straightlined dolit code no longer has to use the return stack to figure out where to get the data to put on the stack. You can just do the LDA# as you said.
BTW, 6C is JMP(abs). JSR is op code 20.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
GARTHWILSON wrote:
I guess that's partly what I'm getting at. For example, do you use some kind of number value in the header to compare against a user variable to tell whether to compile the JSR or to straightline the code
Quote:
Of course some would always be straightlined, like INX INX instead of JSR DROP, but there would be a large middle ground where the choice is not necessarily automatic, like chosing between a 7-byte straightlining and a 3-byte JSR with its extra 12 clocks
I could easily make an interpretable version:
Code: Select all
: DROP DROP ;
Re: cont. from BASIC compiler
Note: some of the quoted text is from the old General Discussions thread.
You bring up a very good example. What I had in mind was processors that have very small amounts of RAM or can only execute ROM code (Harvard architecture). In fact, the fast BASIC interpreter I proposed is similar to a headlerless token-threaded Forth.
I'd much rather have the compiled code be efficient than easy to decompile. I don't even bother with SEE. However, if a word is too complicated for SEE, you could always add view fields to the head. Typically, a view field indicates where (i.e. which block) the definition starts, but you could use the view field to tell SEE how to handle difficult situations instead. If you didn't want to add a view field to every definition, you could use a "view field present" bit, implemented like the "immediate" bit in a definition.
For example, in my Forth, the definition name is stored as a counted string, except that bit 7 of the count byte indicates whether or not the definition is immediate, which allows definition name of up to 127 characters. If bit 6 indicated whether or not there was a view field, that would still allow a 63 character name, which would be more than plenty.
None! Well...almost none. Currently, no optimizations are performed (e.g. EXIT always compiles an RTS). The main reason is practical: I didn't want to keep rewriting optimization handling in the compiler while I was still weighing implementation decisions (e.g. speed, space). I also like the idea of having leaving the decision about where and when to optimize up to the programmer.
The reason I am writing my Forth is to have a practical 6502 ANS Forth for my own use. Optimization has never been a priority. Obviously, I've gotten sidetracked as I've learned about the uniqueness of STC. A couple of my goals are: (1) a "lean", no-frills Forth, and (2) anything written in standard words should not have be to modified if that is at all possible, even if it is the result is not optimally efficient.
Right now, a standard word name (e.g. UNTIL) is always unoptimized. For common optimizations, I have defined special words, with similar, but non-standard names. For example, IF currently compiles a BNE followed by a JMP (the Z flag is prepared first, of course), which allows the branch distance to be more than half a page. A different word is used for compiling a BEQ only, an optimization that can almost always be made.
I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)
Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors. For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange. Thus it is clear what a word does based on its name (e.g. DROP drops a cell from the data stack), and whether it is executed or compiled with what, if any, type of optimization based on its color. (I should also point out that, for example, you don't necessarily have to restrict brown words to those which can be specifically space optimized. If a JSR is the most space efficient way a word could be compiled, a brown word could just default to the green word behavior.) This you allows you to easily optimize one section for speed, and another for space, with high legibility. (In ANS Forth, you could maintain a "optimization state" variable, and define words that switch between the space, speed, and no optimization states, but with colors it is far more clear where and when optimizations occur.)
If detecting longer optimizable sequences such as DUP 0< IF isn't convenient, you could use the color gray to tell the compiler to start looking for an optimizable sequence (e.g. if DUP 0< IF were all gray or something). You could use the color pink if you encountered some other special scenario that isn't covered by any the above, and so on.
Anyway, a 6502 colorForth is another possible project. However, I'd like it to be suitable for use in a monochrome enviroment. There are a couple of advantages to this approach: (1) a hobbyist machine might well use a monochrome display or be limited in the number of colors it displays, and (2) code that is not colored will be much simpler to print, photocopy, cut-and-paste, or post to a message board. I have some vague ideas about how to accomplish this, but they are definitely not ready for public consumption at this point. I would like to implement the ideas above (and there are some other things I might do differently than colorForth as well) at some point, but this is way down the list of 6502 projects at the present time.
Getting back to the ANS Forth that I am actually writing, the optimizations that I really am considering handling automatically are: turning a JSR RTS sequence into a JMP, compiling BEQ rather than BNE JMP whenever possible with backward branches, and handling 0= IF and 0= UNTIL by turning a BEQ into a BNE and vice versa.
kc5tja wrote:
The point is compactness and uniformity of representation.
You bring up a very good example. What I had in mind was processors that have very small amounts of RAM or can only execute ROM code (Harvard architecture). In fact, the fast BASIC interpreter I proposed is similar to a headlerless token-threaded Forth.
GARTHWILSON wrote:
I know it's taboo in Forth, but there's nothing that says the code is supposed to be readable after being de-compiled, and we are after all talking about internal optimizations. If the user wants it readable, he'll have to look at the source code I provide that shows what it looks like before these optimizations are carried out.
I'd much rather have the compiled code be efficient than easy to decompile. I don't even bother with SEE. However, if a word is too complicated for SEE, you could always add view fields to the head. Typically, a view field indicates where (i.e. which block) the definition starts, but you could use the view field to tell SEE how to handle difficult situations instead. If you didn't want to add a view field to every definition, you could use a "view field present" bit, implemented like the "immediate" bit in a definition.
For example, in my Forth, the definition name is stored as a counted string, except that bit 7 of the count byte indicates whether or not the definition is immediate, which allows definition name of up to 127 characters. If bit 6 indicated whether or not there was a view field, that would still allow a 63 character name, which would be more than plenty.
GARTHWILSON wrote:
How much optimization would you have the program itself do automatically as it compiles?
The reason I am writing my Forth is to have a practical 6502 ANS Forth for my own use. Optimization has never been a priority. Obviously, I've gotten sidetracked as I've learned about the uniqueness of STC. A couple of my goals are: (1) a "lean", no-frills Forth, and (2) anything written in standard words should not have be to modified if that is at all possible, even if it is the result is not optimally efficient.
Right now, a standard word name (e.g. UNTIL) is always unoptimized. For common optimizations, I have defined special words, with similar, but non-standard names. For example, IF currently compiles a BNE followed by a JMP (the Z flag is prepared first, of course), which allows the branch distance to be more than half a page. A different word is used for compiling a BEQ only, an optimization that can almost always be made.
I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)
Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors. For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange. Thus it is clear what a word does based on its name (e.g. DROP drops a cell from the data stack), and whether it is executed or compiled with what, if any, type of optimization based on its color. (I should also point out that, for example, you don't necessarily have to restrict brown words to those which can be specifically space optimized. If a JSR is the most space efficient way a word could be compiled, a brown word could just default to the green word behavior.) This you allows you to easily optimize one section for speed, and another for space, with high legibility. (In ANS Forth, you could maintain a "optimization state" variable, and define words that switch between the space, speed, and no optimization states, but with colors it is far more clear where and when optimizations occur.)
If detecting longer optimizable sequences such as DUP 0< IF isn't convenient, you could use the color gray to tell the compiler to start looking for an optimizable sequence (e.g. if DUP 0< IF were all gray or something). You could use the color pink if you encountered some other special scenario that isn't covered by any the above, and so on.
Anyway, a 6502 colorForth is another possible project. However, I'd like it to be suitable for use in a monochrome enviroment. There are a couple of advantages to this approach: (1) a hobbyist machine might well use a monochrome display or be limited in the number of colors it displays, and (2) code that is not colored will be much simpler to print, photocopy, cut-and-paste, or post to a message board. I have some vague ideas about how to accomplish this, but they are definitely not ready for public consumption at this point. I would like to implement the ideas above (and there are some other things I might do differently than colorForth as well) at some point, but this is way down the list of 6502 projects at the present time.
Getting back to the ANS Forth that I am actually writing, the optimizations that I really am considering handling automatically are: turning a JSR RTS sequence into a JMP, compiling BEQ rather than BNE JMP whenever possible with backward branches, and handling 0= IF and 0= UNTIL by turning a BEQ into a BNE and vice versa.
Re: cont. from BASIC compiler
GARTHWILSON wrote:
How much optimization would you have the program itself do automatically as it compiles?
In the future, I intend on optimizing out common sequences, like "45 +" to compile a simple ADD EAX,45 instruction. These simple optimizers work on a very simple principle: if the compiler just laid down code to push a literal on the stack, then (a) the literal is already in memory, and (b) we know precisely what pattern of opcodes to look for and where to find them. So a word like + would first check to see if a literal was laid down first, and if so, it would "back up" the dictionary pointer, then re-compile the appropriate ADD reg,imm opcode.
ColorForth uses this technique to very, very good effect. It still doesn't produce the greatest code, but it's works fantastically, and still produces obnoxiously compact code. It's also very fast, because it completely eliminates the stack accesses that would normally occur otherwise.
Since x86 does support indexed addressing modes, I do intend on also detecting + @ and + ! and having them compile to a simple MOV instruction that takes advantage of the CPU's indexed addressing modes, with or without a literal in front of the +.
Quote:
I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)
ANSI Forth really is intended to model the Forth-83 standard, and run in a F83-like runtime environment.
Quote:
Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors.
Quote:
For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange.
Re: cont. from BASIC compiler
kc5tja wrote:
There are some ambiguities with ANSI Forth using this method too. I do not recommend using it unless you're willing to break some amount of ANSI compatibility.
kc5tja wrote:
ColorForth can also use fonts too.
kc5tja wrote:
It is not clear from your text, but I'd like to point how this is not how ColorForth (Chuck Moore's version) works. I'm assuming this is back-of-the-envelop designwork for your own interpretation of how a color-Forth would be used..
colorForth was tailored to a specific machine, and it is neither a standard nor a model (as FIG-Forth is), so I don't see a reason not to do things differently on a different processor like the 6502. On the x86 family, a lot of common words can be compiled as a single (or two) inline instruction, as you have alluded to above. This isn't the case on the 6502.
Anyway, what I referring to above was really just one of the concepts of colorForth -- colored words. In a traditional indirect or direct threaded Forth, for the most part, words are compiled between : and ; (or ;CODE) except between [ and ] and executed otherwise, so it very easy to keep track of whether a word will be executed or compiled. In fact, easier than that, since [ and ] are not used within most colon definitions.
With subtroutine threading, inline assembly is not only available without penalty, but frequently useful. Now you're switching between execution and compilation more often and already it's a bit more complicated to keep track of whether a word is executed or compiled. Enter color as an elegant solution. Now it's clear whether a word is executed or compiled, and you don't have even to look at its surrounding context anymore.
If you wish to write a Forth that emphasizes compilation options, you could do so with a traditional Forth, and keep an optimization state variable as I suggested. Now you have a lot to keep track of when you're to follow the code, plus the code will be more cluttered with optimization directives that don't have anything to do with what the word being defined actually does. The concept of colored words lends itself very nicely to being extended for this situation. The function of the code and the optimization being performed will be visually distinct. You look at the word names to see the function of the code, and you look at the colors to see the optimization being performed.
In fact, I myself wouldn't add all those extra colors I've suggested, certainly not at first. But subroutine threading has many optimization options available, and using additional colors for this or for other things opens up a variety of possibilities.
Re: cont. from BASIC compiler
dclxvi wrote:
Code: Select all
AND LDA #$35
BNE MATH
OR LDA #$15
BNE MATH
XOR LDA #$55
BNE MATH
MINUS JSR NEGATE
PLUS CLC
LDA #$75
MATH STA MATH2
JSR MATH1
MATH1 LDA 2,X
MATH2 ADC 0,X
STA 2,X
INX
RTS
That's clever the way the INX instruction before the RTS allows the same section of code to handle the math for both the low and high bytes while dropping the second stack parameter; however, the following doesn't seem to work.
dclxvi wrote:
Code: Select all
ONEMINUS LDA 0,X
BNE ONEM1
DEC 1,X
ONEM1 DEC 0,X
RTS
ABS LDA 1,X
BNE ABS1
NEGATE JSR ONEMINUS
INVERT JSR INV1
INV1 LDA #$FF
EOR 0,X
STA 0,X
INX
ABS1 RTS
It looks like the result is being dropped.
Shouldn't it be this?
Code: Select all
ONEMINUS LDA 0,X
BNE ONEM1
DEC 1,X
ONEM1 DEC 0,X
RTS
ABS LDA 1,X
BNE ABS1
NEGATE JSR ONEMINUS
INVERT DEX
DEX
JSR INV1
INV1 LDA #$FF
EOR 2,X
STA 2,X
INX
ABS1 RTS