6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 26, 2024 5:43 pm

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Wed Sep 08, 2004 7:51 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
I'm proposing we continue here our discussion from
6502.org Forum Index -> General Discussions -> How to make a BASIC compiler?
at http://www.6502.org/forum/viewtopic.php?p=3335#3335
since we kind of hijacked pbug's thread and quickly turned it into Forth material that is definitely beyond those who are new to Forth.

I'd like to hear more. What ideas do you have for how the subroutine-threaded Forth would compile new words into its own dictionary? (I can more-or-less imagine it from my experience with ITC, but I'd probably waste a lot of time re-inventing wheels if I had to implement it myself at this point.) How much optimization would you have the program itself do automatically as it compiles?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 08, 2004 8:52 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
I'd like to hear more. What ideas do you have for how the subroutine-threaded Forth would compile new words into its own dictionary? (I can more-or-less imagine it from my experience with ITC, but I'd probably waste a lot of time re-inventing wheels if I had to implement it myself at this point.) How much optimization would you have the program itself do automatically as it compiles?


I am not sure I understand the question. Compiling STC is just as easy as ITC or DTC. A subroutine call is compiled $6C C, address , . Numeric constants are compiled similarly (JSR dolit, DW literal) OR you can directly inline the constant (DEX, DEX, LDA #<value, STA 0,X, LDA #>value, STA 1,X). etc.

A more interesting variation of STC is outright native code compilation, where the compiled code is kept in an ITC-like representation (almost exact, in fact!), but a small bit more detailed. From this, interesting thingslike type inferencing can be done, common subexpression elimination, etc -- the end result of this form of compilation is machine language that can rival what is produced from C or Oberon compilers, for example.

So the gamut is pretty wide. STC is itself just an enabler for the more sophisticated code compilation techniques.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 08, 2004 9:34 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Quote:
I am not sure I understand the question. Compiling STC is just as easy as ITC or DTC. A subroutine call is compiled $6C C, address , . Numeric constants are compiled similarly (JSR dolit, DW literal) OR you can directly inline the constant (DEX, DEX, LDA #<value, STA 0,X, LDA #>value, STA 1,X). etc.

I guess that's partly what I'm getting at. For example, do you use some kind of number value in the header to compare against a user variable to tell whether to compile the JSR or to straightline the code, based on the size/speed balance chosen at compile time? Of course some would always be straightlined, like INX INX instead of JSR DROP, but there would be a large middle ground where the choice is not necessarily automatic, like chosing between a 7-byte straightlining and a 3-byte JSR with its extra 12 clocks of JSR-RTS overhead. And then in the numeric constant example you gave, the straightlining is not the same as just copying the subroutine's code right there, since the straightlined dolit code no longer has to use the return stack to figure out where to get the data to put on the stack. You can just do the LDA# as you said.

BTW, 6C is JMP(abs). JSR is op code 20.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Sep 09, 2004 5:33 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
I guess that's partly what I'm getting at. For example, do you use some kind of number value in the header to compare against a user variable to tell whether to compile the JSR or to straightline the code


This is implementation specific. In my Forth cross-compiler for the x86 platform, I use no such thing. All non-immediate words are JSRed to. Immediate words are used to compile primitives inline, through direct injection of machine language code. Doing it this way keeps the complexity of the compiler very managable.

Quote:
Of course some would always be straightlined, like INX INX instead of JSR DROP, but there would be a large middle ground where the choice is not necessarily automatic, like chosing between a 7-byte straightlining and a 3-byte JSR with its extra 12 clocks


This is why native code compilers are highly, highly, highly platform dependent. I do not have a colon-definition or interpretable implementation of DROP, for example, in FTS/Forth. DROP is an *immediate* word, which compiles the appropriate x86 code directly inline.

I could easily make an interpretable version:

Code:
: DROP DROP ;


It's far easier to make a colon-definition out of an inline definition than the reverse. My advice: don't even try it. The generation of native code should always be from higher-level to lower-level, never the reverse. One way is to create a set of immediates for each Forth primitive, then create a set of colon-definitions from it. Another way is to just keep everything as an array of word references, and let the compiler produce native code just-in-time (see http://citeseer.ist.psu.edu/ertl92new.html for details on how such a system works). Note that the former is not nearly as flexible (but nonetheless can still do basic peephole optimization with some introduction of complexity), while the latter is enormously complex. However, you get what you pay for -- a RAFTS-like approach *will* produce the best code possible, always.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 10, 2004 7:38 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
Note: some of the quoted text is from the old General Discussions thread.

kc5tja wrote:
The point is compactness and uniformity of representation.


You bring up a very good example. What I had in mind was processors that have very small amounts of RAM or can only execute ROM code (Harvard architecture). In fact, the fast BASIC interpreter I proposed is similar to a headlerless token-threaded Forth.

GARTHWILSON wrote:
I know it's taboo in Forth, but there's nothing that says the code is supposed to be readable after being de-compiled, and we are after all talking about internal optimizations. If the user wants it readable, he'll have to look at the source code I provide that shows what it looks like before these optimizations are carried out.


I'd much rather have the compiled code be efficient than easy to decompile. I don't even bother with SEE. However, if a word is too complicated for SEE, you could always add view fields to the head. Typically, a view field indicates where (i.e. which block) the definition starts, but you could use the view field to tell SEE how to handle difficult situations instead. If you didn't want to add a view field to every definition, you could use a "view field present" bit, implemented like the "immediate" bit in a definition.

For example, in my Forth, the definition name is stored as a counted string, except that bit 7 of the count byte indicates whether or not the definition is immediate, which allows definition name of up to 127 characters. If bit 6 indicated whether or not there was a view field, that would still allow a 63 character name, which would be more than plenty.

GARTHWILSON wrote:
How much optimization would you have the program itself do automatically as it compiles?


None! Well...almost none. Currently, no optimizations are performed (e.g. EXIT always compiles an RTS). The main reason is practical: I didn't want to keep rewriting optimization handling in the compiler while I was still weighing implementation decisions (e.g. speed, space). I also like the idea of having leaving the decision about where and when to optimize up to the programmer.

The reason I am writing my Forth is to have a practical 6502 ANS Forth for my own use. Optimization has never been a priority. Obviously, I've gotten sidetracked as I've learned about the uniqueness of STC. A couple of my goals are: (1) a "lean", no-frills Forth, and (2) anything written in standard words should not have be to modified if that is at all possible, even if it is the result is not optimally efficient.

Right now, a standard word name (e.g. UNTIL) is always unoptimized. For common optimizations, I have defined special words, with similar, but non-standard names. For example, IF currently compiles a BNE followed by a JMP (the Z flag is prepared first, of course), which allows the branch distance to be more than half a page. A different word is used for compiling a BEQ only, an optimization that can almost always be made.

I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)

Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors. For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange. Thus it is clear what a word does based on its name (e.g. DROP drops a cell from the data stack), and whether it is executed or compiled with what, if any, type of optimization based on its color. (I should also point out that, for example, you don't necessarily have to restrict brown words to those which can be specifically space optimized. If a JSR is the most space efficient way a word could be compiled, a brown word could just default to the green word behavior.) This you allows you to easily optimize one section for speed, and another for space, with high legibility. (In ANS Forth, you could maintain a "optimization state" variable, and define words that switch between the space, speed, and no optimization states, but with colors it is far more clear where and when optimizations occur.)

If detecting longer optimizable sequences such as DUP 0< IF isn't convenient, you could use the color gray to tell the compiler to start looking for an optimizable sequence (e.g. if DUP 0< IF were all gray or something). You could use the color pink if you encountered some other special scenario that isn't covered by any the above, and so on.

Anyway, a 6502 colorForth is another possible project. However, I'd like it to be suitable for use in a monochrome enviroment. There are a couple of advantages to this approach: (1) a hobbyist machine might well use a monochrome display or be limited in the number of colors it displays, and (2) code that is not colored will be much simpler to print, photocopy, cut-and-paste, or post to a message board. I have some vague ideas about how to accomplish this, but they are definitely not ready for public consumption at this point. I would like to implement the ideas above (and there are some other things I might do differently than colorForth as well) at some point, but this is way down the list of 6502 projects at the present time.

Getting back to the ANS Forth that I am actually writing, the optimizations that I really am considering handling automatically are: turning a JSR RTS sequence into a JMP, compiling BEQ rather than BNE JMP whenever possible with backward branches, and handling 0= IF and 0= UNTIL by turning a BEQ into a BNE and vice versa.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 10, 2004 4:19 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
How much optimization would you have the program itself do automatically as it compiles?


Again, this is compiler specific. In my current compiler, I do absolutely no optimization at all, except tail-call optimization, because I lack general purpose control flow (I simply don't need them with tail-call optimization).

In the future, I intend on optimizing out common sequences, like "45 +" to compile a simple ADD EAX,45 instruction. These simple optimizers work on a very simple principle: if the compiler just laid down code to push a literal on the stack, then (a) the literal is already in memory, and (b) we know precisely what pattern of opcodes to look for and where to find them. So a word like + would first check to see if a literal was laid down first, and if so, it would "back up" the dictionary pointer, then re-compile the appropriate ADD reg,imm opcode.

ColorForth uses this technique to very, very good effect. It still doesn't produce the greatest code, but it's works fantastically, and still produces obnoxiously compact code. It's also very fast, because it completely eliminates the stack accesses that would normally occur otherwise.

Since x86 does support indexed addressing modes, I do intend on also detecting + @ and + ! and having them compile to a simple MOV instruction that takes advantage of the CPU's indexed addressing modes, with or without a literal in front of the +.

Quote:
I could use two "execution tokens" (to use an ANS Forth term) for words that could be always be optimized, one for interpretation state and one for compilation state, but I have decided not to, because an ANS Forth seems (to me) to be much cleaner with just a single "execution token" for each word. (The standard is a little too ambiguous for my liking about how multiple execution tokens can be handled in a standard way.)


There are some ambiguities with ANSI Forth using this method too. I do not recommend using it unless you're willing to break some amount of ANSI compatibility.

ANSI Forth really is intended to model the Forth-83 standard, and run in a F83-like runtime environment.

Quote:
Once again, the colorForth concept is a more elegant solution for STC than ANS Forth. The concept can be adapted (i.e. altered) and extended by using additional colors.


ColorForth can also use fonts too. I cannot tell how many times people complain about ColorForth because it relies on something that alienates a significant portion of the human population from using due to color-blindness. Chuck Moore used colors because it was trivial to implement, and well, he's not color-blind. For those who want to explore the concepts of ColorForth, but want to use something other than colors, then fonts and typestyles can be just as easily used to represent the different kinds of words.

Quote:
For example, a word would be executed if yellow, compiled without optimization (i.e. a JSR) if green, compiled and space optimized if brown, compiled and speed optimized (i.e. inline assembly inserted) if orange.


It is not clear from your text, but I'd like to point how this is not how ColorForth (Chuck Moore's version) works. I'm assuming this is back-of-the-envelop designwork for your own interpretation of how a color-Forth would be used..


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 10, 2004 10:35 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
kc5tja wrote:
There are some ambiguities with ANSI Forth using this method too. I do not recommend using it unless you're willing to break some amount of ANSI compatibility.


I agree. Comparing the Forth-83 standard to the ANS standard, there was significant improvement in terms of not relying on implementation specific things like threading, twos complement math, etc. It would take a similar sort of improvement so that it doesn't rely on the single execution token (which can be immediate or not immediate) implementation, but I don't see why this sort of improvement couldn't be made.

kc5tja wrote:
ColorForth can also use fonts too.


Or both. I personally would use colors rather than fonts since I can remember yellow = execute, green = compile much more easily than this font = execute, that font = compile. The choice of fonts vs. colors affects only the editor, not the underlying representation. At best, the editor can use a deferred word named something like COLOR-TYPE for displaying words in "color", and the user can easily switch between fonts and colors at any time. At worst, you'd have two editors: one for colors, one for fonts -- and even that isn't all that bad.

kc5tja wrote:
It is not clear from your text, but I'd like to point how this is not how ColorForth (Chuck Moore's version) works. I'm assuming this is back-of-the-envelop designwork for your own interpretation of how a color-Forth would be used..


I was concerned that it might not be clear that I was suggesting something different from how colorForth actually behaves, so I am glad you pointed this out. (I tried to choose colors that colorForth wasn't using.) What I was suggesting was a Forth that used concepts from colorForth, which would be different from colorForth, but would bear a resemblance to it. It is completely hypothetical at this point. I haven't yet come up with a clever name for this hypothetical Forth, and in the notes I have jotted down, I refer to it as a colorForth variant. I'll definitely have to come up with a different name at some point so as avoid confusion with the extant colorForth implementation.

colorForth was tailored to a specific machine, and it is neither a standard nor a model (as FIG-Forth is), so I don't see a reason not to do things differently on a different processor like the 6502. On the x86 family, a lot of common words can be compiled as a single (or two) inline instruction, as you have alluded to above. This isn't the case on the 6502.

Anyway, what I referring to above was really just one of the concepts of colorForth -- colored words. In a traditional indirect or direct threaded Forth, for the most part, words are compiled between : and ; (or ;CODE) except between [ and ] and executed otherwise, so it very easy to keep track of whether a word will be executed or compiled. In fact, easier than that, since [ and ] are not used within most colon definitions.

With subtroutine threading, inline assembly is not only available without penalty, but frequently useful. Now you're switching between execution and compilation more often and already it's a bit more complicated to keep track of whether a word is executed or compiled. Enter color as an elegant solution. Now it's clear whether a word is executed or compiled, and you don't have even to look at its surrounding context anymore.

If you wish to write a Forth that emphasizes compilation options, you could do so with a traditional Forth, and keep an optimization state variable as I suggested. Now you have a lot to keep track of when you're to follow the code, plus the code will be more cluttered with optimization directives that don't have anything to do with what the word being defined actually does. The concept of colored words lends itself very nicely to being extended for this situation. The function of the code and the optimization being performed will be visually distinct. You look at the word names to see the function of the code, and you look at the colors to see the optimization being performed.

In fact, I myself wouldn't add all those extra colors I've suggested, certainly not at first. But subroutine threading has many optimization options available, and using additional colors for this or for other things opens up a variety of possibilities.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 16, 2022 1:45 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 851
dclxvi wrote:
Code:
AND   LDA #$35
      BNE MATH
OR    LDA #$15
      BNE MATH
XOR   LDA #$55
      BNE MATH
MINUS JSR NEGATE
PLUS  CLC
      LDA #$75
MATH  STA MATH2
      JSR MATH1
MATH1 LDA 2,X
MATH2 ADC 0,X
      STA 2,X
      INX
      RTS



That's clever the way the INX instruction before the RTS allows the same section of code to handle the math for both the low and high bytes while dropping the second stack parameter; however, the following doesn't seem to work.

dclxvi wrote:
Code:
ONEMINUS LDA 0,X
         BNE ONEM1
         DEC 1,X
ONEM1    DEC 0,X
         RTS
ABS      LDA 1,X
         BNE ABS1
NEGATE   JSR ONEMINUS
INVERT   JSR INV1
INV1     LDA #$FF
         EOR 0,X
         STA 0,X
         INX
ABS1     RTS



It looks like the result is being dropped.
Shouldn't it be this?
Code:
ONEMINUS LDA 0,X
         BNE ONEM1
         DEC 1,X
ONEM1    DEC 0,X
         RTS
ABS      LDA 1,X
         BNE ABS1
NEGATE   JSR ONEMINUS
INVERT   DEX
         DEX
         JSR INV1
INV1     LDA #$FF
         EOR 2,X
         STA 2,X
         INX
ABS1     RTS



Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: