How to make a BASIC compiler?
How to make a BASIC compiler?
I want to make a BASIC compiler, Write the source and build it in PC.
but how to make it ?!?!
but how to make it ?!?!
-
Nightmaretony
- In Memoriam
- Posts: 618
- Joined: 27 Jun 2003
- Location: Meadowbrook
- Contact:
I have a basic compiler partially done lying around, and have been trying to finish it lately. I used flex as the lexical analyzer, and Bison as the parser, constructing a variant of ansi standard basic which does not use line numbers, and has some support for objects (really structures with associated functions and subroutines). I would suggest picking up a book on compiler construction, and be prepared for a lot of study, as there is alot to know about the topic.
Rex.
Rex.
Hallo Rex,
I'm working on a Pascal compiler but one written in Pascal. The idea is that, once compiled on the PC, the user can improve and compile it on the native machine.
But I must confess there is a little snag in it: the executable so far is over 200 KB and I don't think it will ever be small enough to run it on a C64, 8032, Apple ][ or any other 6502 machine. The Apple2GS or a C64/128 with SuperCPU (=65816) could cope with it.
rex wrote:
I have a basic compiler partially done lying around,... I used flex as the lexical analyzer, and Bison as the parser, ....
But I must confess there is a little snag in it: the executable so far is over 200 KB and I don't think it will ever be small enough to run it on a C64, 8032, Apple ][ or any other 6502 machine. The Apple2GS or a C64/128 with SuperCPU (=65816) could cope with it.
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
-
Nightmaretony
- In Memoriam
- Posts: 618
- Joined: 27 Jun 2003
- Location: Meadowbrook
- Contact:
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
> Then there is the rub; compiled languages produce bloated code. If you
> want the tightest smallest code possible, write your compiler in
> assembly, thanks.
Oooooh, now I have to resist the temptation to write several pages about why I quit collecting languages when I found Forth, which will become more compact than assembly as the program gets pretty big.
For small stuff assembly will win every time. For example, if the assembly program takes 500 bytes, you might as well stay with it, because you're not going to get much of a Forth kernel with less than perhaps 5KB. But an assembly program of 40KB might be replaced with a Forth program half that size or better, including the Forth kernel, and will take a lot less time to write and debug. Forth Inc. claims a minimum compactness ratio of 4 to 1 over C, and that program development is typically close to 10 times as fast. When you get to real-time OS's, Forth will take a fraction as much memory and run a lot faster than other popular ones. Unlike the case with some really small processors, Forth really works very well with the 6502 also, including multitasking, not to mention the tools to do object-oriented programming it had from the beginning, even before it was called that.
> want the tightest smallest code possible, write your compiler in
> assembly, thanks.
Oooooh, now I have to resist the temptation to write several pages about why I quit collecting languages when I found Forth, which will become more compact than assembly as the program gets pretty big.
For small stuff assembly will win every time. For example, if the assembly program takes 500 bytes, you might as well stay with it, because you're not going to get much of a Forth kernel with less than perhaps 5KB. But an assembly program of 40KB might be replaced with a Forth program half that size or better, including the Forth kernel, and will take a lot less time to write and debug. Forth Inc. claims a minimum compactness ratio of 4 to 1 over C, and that program development is typically close to 10 times as fast. When you get to real-time OS's, Forth will take a fraction as much memory and run a lot faster than other popular ones. Unlike the case with some really small processors, Forth really works very well with the 6502 also, including multitasking, not to mention the tools to do object-oriented programming it had from the beginning, even before it was called that.
Nightmaretony wrote:
Then there is the rub; compiled languages produce bloated code. If you want the tightest smallest code possible, write your compiler in assembly, thanks.
I've been looking at adding a 6502 backend to the compiler I've built, and lots of the common tricks folks use in assembly can be incorporated. However, handling things like argument passing and locally scoped variables in a general manner on the 6502 is proving a challenge. If anyone has any ideas on this topic I am all ears.
I'm currently considering for the following. The first two bytes of call arguments are placed in A and X (or Y, I havent yet decided which indirect mode I'll be using more of). For calls requiring more than two bytes, a data stack will be used, which will be pointed to by a zero page word value. Call arguments and local variables will be maintained on this stack in a manner typical for HLLs. Call return addresses, and occasional temporary values will still be placed on the hardware stack. Any comments? Also under consideration is an option to generate code that exclusively uses the hardware stack.
(I see I posted concurrent with Garth. I have to agree with him on Forth, though its a strange paradigm to get used to).
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
> I'm currently considering for the following. The first two bytes of call
> arguments are placed in A and X (or Y, I havent yet decided which
> indirect mode I'll be using more of). For calls requiring more than two
> bytes, a data stack will be used, which will be pointed to by a zero page
> word value.
Based on experience with Forth, I'd say use X for the index, so you can use ZP,X and (ZP,X) addressing modes. The occasional use of the abs,Y and (ZP),Y addressing modes is usually not related to the data stack in ZP.
> (I see I posted concurrent with Garth. I have to agree with him on
> Forth, though its a strange paradigm to get used to).
It definitely has a way of turning your thinking inside out. I suppose it had to be that way to shake off the limitations of other languages while remaining simple.
> arguments are placed in A and X (or Y, I havent yet decided which
> indirect mode I'll be using more of). For calls requiring more than two
> bytes, a data stack will be used, which will be pointed to by a zero page
> word value.
Based on experience with Forth, I'd say use X for the index, so you can use ZP,X and (ZP,X) addressing modes. The occasional use of the abs,Y and (ZP),Y addressing modes is usually not related to the data stack in ZP.
> (I see I posted concurrent with Garth. I have to agree with him on
> Forth, though its a strange paradigm to get used to).
It definitely has a way of turning your thinking inside out. I suppose it had to be that way to shake off the limitations of other languages while remaining simple.
Hallo Rex,
> .... For calls requiring more than two bytes, a data stack will be used,
> which will be pointed to by a zero page word value. Call arguments
> and local variables will be maintained on this stack in a manner typical
> for HLLs.
I think this describes the way I may compiler treats data. All data, even it is one or two bytes.
Nightmaretony:
> Then there is the rub; compiled languages produce bloated code. If
> you want the tightest smallest code possible, write your compiler in
> assembly, thanks.
FYI: My compiler produces macros as output. My assembler translates these macros to ML. Ergo: the more efficient the macro, the more efficient the result.
Garth:
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....
This 'kernel' is in fact an interpreter I think. Then what about speed?
If you need speed: you need ML, real or throug an compiler
If size matters: you need real ML or Garth's idea.
You have less time to develop something: you need a HLL.
I think it is the choice between size, speed and complexity that decides what you are going to do.
> .... For calls requiring more than two bytes, a data stack will be used,
> which will be pointed to by a zero page word value. Call arguments
> and local variables will be maintained on this stack in a manner typical
> for HLLs.
I think this describes the way I may compiler treats data. All data, even it is one or two bytes.
Nightmaretony:
> Then there is the rub; compiled languages produce bloated code. If
> you want the tightest smallest code possible, write your compiler in
> assembly, thanks.
FYI: My compiler produces macros as output. My assembler translates these macros to ML. Ergo: the more efficient the macro, the more efficient the result.
Garth:
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....
This 'kernel' is in fact an interpreter I think. Then what about speed?
If you need speed: you need ML, real or throug an compiler
If size matters: you need real ML or Garth's idea.
You have less time to develop something: you need a HLL.
I think it is the choice between size, speed and complexity that decides what you are going to do.
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
> Garth:
>> But an assembly program of 40KB might be replaced with a Forth
>> program half that size or better, including the Forth kernel,....
>
>This 'kernel' is in fact an interpreter I think. Then what about speed?
There are at least three ways to implement Forth. The way that's most memory-efficient results in word definitions being lists of actual addresses, not tokens like interpreted BASIC gives. This is called "indirect threaded." The subroutine threading that Samuel Falvo and one or two others here are interested in compiles JSR's, straightlining code if the subroutine is short enough that a JSR or its delay is not warranted. This method is much faster, at the expense of program size.
Since a mere list of addresses is not executable by itself, the inner loop in indirect-threaded Forth does add some overhead, which is why indirect-threaded is not as fast. But what we call the interpreter is a part that takes text from an input stream, whether keyboard, source code text file, or something similar, looks up the words, and executes them without compiling. If you have a system where the user is not supposed to have direct access to this kind of thing, you could compile a Forth system with no interpreter. That makes it more memory-efficient, because the words' headers can be left out. The headers are just there so the compiled words can be found in subsequent interpretation or comilation, which the user won't be doing.
Forth will of course always be slower than a pure ML program, but competes well with other high-level languages. On something like the 6502, it will generally (but not always) be slightly slower than C; but there are processors that run Forth as their ML, and in that case, they're blazingly fast, doing more Forth MIPS than MHz. If using a high-level language results in a program that's too big for the 6502's address space but the Forth equivalent fits with plenty of room to spare, the choice is not too difficult. In commercial products, switching to Forth commonly reduces the hardware reuirements, lowering the cost.
You seldom need everything to be just as fast as possible though. You typically will only have various small parts that require maximum speed. Forth makes it easy to write those small parts in assembly. In fact, you can first write them in high-level Forth to test the idea, then once you get it going, change the definition to assembly language without having to change any of the code that calls the particular word you just re-wrote in assembly for maximum execution speed.
As a bonus, you can service interrupts in high-level Forth with zero overhead. Mike is working on getting my article on that subject posted.
>> But an assembly program of 40KB might be replaced with a Forth
>> program half that size or better, including the Forth kernel,....
>
>This 'kernel' is in fact an interpreter I think. Then what about speed?
There are at least three ways to implement Forth. The way that's most memory-efficient results in word definitions being lists of actual addresses, not tokens like interpreted BASIC gives. This is called "indirect threaded." The subroutine threading that Samuel Falvo and one or two others here are interested in compiles JSR's, straightlining code if the subroutine is short enough that a JSR or its delay is not warranted. This method is much faster, at the expense of program size.
Since a mere list of addresses is not executable by itself, the inner loop in indirect-threaded Forth does add some overhead, which is why indirect-threaded is not as fast. But what we call the interpreter is a part that takes text from an input stream, whether keyboard, source code text file, or something similar, looks up the words, and executes them without compiling. If you have a system where the user is not supposed to have direct access to this kind of thing, you could compile a Forth system with no interpreter. That makes it more memory-efficient, because the words' headers can be left out. The headers are just there so the compiled words can be found in subsequent interpretation or comilation, which the user won't be doing.
Forth will of course always be slower than a pure ML program, but competes well with other high-level languages. On something like the 6502, it will generally (but not always) be slightly slower than C; but there are processors that run Forth as their ML, and in that case, they're blazingly fast, doing more Forth MIPS than MHz. If using a high-level language results in a program that's too big for the 6502's address space but the Forth equivalent fits with plenty of room to spare, the choice is not too difficult. In commercial products, switching to Forth commonly reduces the hardware reuirements, lowering the cost.
You seldom need everything to be just as fast as possible though. You typically will only have various small parts that require maximum speed. Forth makes it easy to write those small parts in assembly. In fact, you can first write them in high-level Forth to test the idea, then once you get it going, change the definition to assembly language without having to change any of the code that calls the particular word you just re-wrote in assembly for maximum execution speed.
As a bonus, you can service interrupts in high-level Forth with zero overhead. Mike is working on getting my article on that subject posted.
GARTHWILSON wrote:
Based on experience with Forth, I'd say use X for the index, so you can use ZP,X and (ZP,X) addressing modes. The occasional use of the abs,Y and (ZP),Y addressing modes is usually not related to the data stack in ZP.
Though, I really must wonder -- why in the world are you choosing Pascal? Why not use Oberon? The language is much simpler, and fantastically more modular, and 100x as type safe. Strongly recommended.
Quote:
It definitely has a way of turning your thinking inside out. I suppose it had to be that way to shake off the limitations of other languages while remaining simple.
Ruud wrote:
Garth:
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....
This 'kernel' is in fact an interpreter I think. Then what about speed?
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....
This 'kernel' is in fact an interpreter I think. Then what about speed?
The 'kernel' of Forth is a compiler, and has been since the late 60s. The interpreter only is used to issue calls to top-level words (or low level if you're interactively debugging something). And I do mean "call", in the machine language sense of the term.
When Forth sees ":", it interprets and calls the body of ":". What most people don't realize is that ":" is the compiler for Forth. Therefore, everything between a : and a ; is compiled.
The representation of the compiled software can range from a vector of addresses (where each address represents another Forth word), all the way up to natively generated machine language. I often have to wonder why BASIC interpreters of old didn't represent their programs in this manner -- it sure would have made them run so very much faster.
If you are primarily interested in the compactness of code, you can compile it to a direct threaded representation. Given : a b c d ;, the Forth compiler will produce the following result in memory:
Code: Select all
a: jmp do_colon
dw b, c, d, exit
Obviously, a word implemented in pure machine language need not start with JMP do_colon. Direct threading has the advantage of producing substantially compact programs (literally all operations, regardless of their level of abstraction, are represented in precisely two bytes on a 6502, and can be represented in as little as 3-bytes on a 65816).
The speed of a direct-threaded representation is obviously not nearly as fast as raw machine language. If you're really looking for break-neck speed, you can come very close by employing subroutine threading instead, which is almost exactly like direct threading, except that it doesn't require the use of an "inner interpreter" (a misnomer if there ever was one), and it allows for certain optimizations (e.g., converting the last JSR of a subroutine to JMP, and inlined assembly language sequences for commonly used things).
Code: Select all
a:
jsr b
jsr c
jmp d
kc5tja wrote:
I'm getting pretty sick and tired of all the mis-information that exists about Forth.
kc5tja wrote:
It's also a fallacy that if a language is interpreted, it must necessarily be slow.
kc5tja wrote:
The representation of the compiled software can range from a vector of addresses (where each address represents another Forth word), all the way up to natively generated machine language. I often have to wonder why BASIC interpreters of old didn't represent their programs in this manner -- it sure would have made them run so very much faster.
I've been jotting down ideas for a fast interpreted BASIC that I haven't started writing yet. In terms of major 6502 projects, right now it's behind at least my write up of 6502 decimal mode (including undocumented behavior), which I am currently working on (but isn't getting any closer to being completed now that I'm writing long-winded forum posts), and finishing my functional, but incomplete and undocumented 6502 subtroutine threaded Forth.
My BASIC was partly inspired by Apple (Integer) BASIC, which was written by Steve Wozniak. Wozinak used ideas from HP BASICs (he had worked at HP), and of course HP was well-known for their stack-oriented RPN calculators. "Integer" BASIC is really not all that far away from the address interpreter you suggest. It uses a sophisticated parser which does all of it's syntax checking at parse time rather than execution time. It also converts constants at parse time to their twos complement reprepresentation rather than storing them as ASCII. Therefore, unlike many BASICs (e.g. EhBASIC and Applesoft) you never got a SYNTAX ERROR at run time, and you weren't converting constants (e.g. the statment PRINT 99) or line numbers (e.g. the statement IF A < B THEN 10) each and every time they were encountered.
There were other things that could have been handled at parse time, but weren't, such as checking for too many parentheses in an expression, and determining whether you needed the address of a variable so you could store a value in it (e.g. a LET or an INPUT statement) or if only the contents of the variable were needed (e.g. PRINT A). (Determining the latter at parse time would have sped up expression evaluation considerably.)
To use Wozniak's terms, the tokenized program contained "nouns" and "verbs". Nouns were constants (numbers), strings literals, and variable names, and verbs were basically everything else. In Forth terms, an example of a compiled literal (in assembly) is DW LIT,10. LIT would be the verb and 10 would be the noun.
The BASIC program was stored in infix order. All verbs took some action (although in some cases the action was to do nothing). Different tokens could represent the same keyword (verb). For example, there were three tokens that would list as PRINT. One token was for PRINTing numbers, one for strings, and one for CRs (i.e. when PRINT had no arguments). In Forth, these three actions have three different names: . (dot) TYPE and CR, but in BASIC they all had the name PRINT. Which token to use was determined at parse time based on syntax and context, so the parser was quite clever. Interestingly, even delimiters such as commas and semicolons were verbs. So a statement like PRINT 12+34;56 would would be stored as:
Code: Select all
DB PRINT_number_token
DB literal_number_token
DW 12
DB plus_token
DB literal_number_token
DW 34
DB semicolon_print_number_token
DB literal_number_token
DW 56
Incidentally, there were other examples of multiple tokens for a single keyword. An equals sign (=) had one token for a numeric comparison (e.g. IF A=1 THEN PRINT), a different token for a string comparison (e.g. IF A$="A" THEN PRINT), an assignment statement (e.g. LET A=1), etc. There are over a dozen tokens that LIST as a comma, since the comma in a POKE 0,0 statement takes a different action than the comma in a NEXT A,B statement, etc.
You may have already guessed one idea I've got: use a sophisticated parser AND a sophisticated LIST. The program would be entered in infix order, stored in postfix order, and LISTed in infix order. Thus, the parser would convert infix to postfix, and LIST would convert postfix back to infix. By moving precedence checking from the (inner) interpreter to the parser and to LIST, the interpreter will be very fast (and Forth-like).
In Forth, to insert code, you have to recompile everything after the insertion point, since the addresses of any subsequent words will change. In BASIC, I'd like to retain the ability to insert a line painlessly, so I am planning to start with tokens rather than addresses. It's possible to write a token interpreter that compares favorably to an addresses interpreter, in terms of speed.
I've gathered enough other ideas (e.g. I like the Acorn assembler idea, but I'm planning to try a one-pass assembler instead) that intrigue me that I plan to write this BASIC at some point. Obviously, the plans I've mentioned are subject to change once I start writing and discover what the tradeoffs are.
kc5tja wrote:
The disadvantage of this method is that it consumes slightly more space. But, this is a classic size/speed trade-off issue, and the ratio isn't that bad in practice.
The code definition example: CODE z NOP, NEXT, END-CODE
Indirect threading (6 bytes):
Code: Select all
z: DW z+2
NOP
JMP NEXT
Code: Select all
z: NOP
JMP NEXT
Code: Select all
z: NOP
RTS
The colon definition example: : z a b c d ;
Indirect threading (12 bytes):
Code: Select all
z: DW DO_COLON
DW a
DW b
DW c
DW d
DW EXIT
Code: Select all
z: JSR DO_COLON
DW a
DW b
DW c
DW d
DW EXIT
Code: Select all
z: JSR a
JSR b
JSR c
JMP d
Indirect threading: 14 bytes
Direct threading: 15 bytes
Subtroutine threading: 15 bytes
For anything longer, yes, subtroutine threading will take the most space, but good Forth programming practice dictates that definitions should be short, so the space penalty of subroutine threading is smaller than you might think.
You can also save space (and increase speed) with subroutine threading in some cases by inlining assembly, rather than writing a single-use code definition. In subroutine threading, with no NEXT routine, the Y register is usually free (I've even used it as a loop counter, to get a terrific benefit in both speed and space).
When I decided to write a subroutine threaded Forth, I thought I was going to get clobbered in terms of space. But that hasn't been the case so far. Now, if I could just bring myself to use postfix assembly with subroutine threading, rather than the more-legible (at least to me) prefix assembly...
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Since this discussion subject has quickly evolved so much, we really ought to move it to the Forth area; but since I'm responding to two different things here, I don't know what to call the subjet if I move it there myself now, so we're still here.
> When I decided to write a subroutine threaded Forth, I thought I was
> going to get clobbered in terms of space. But that hasn't been the case
> so far.
Interesting. Can you give some idea of how much space you're finding you need for a kernel, compared to what would be needed for the same thing in indirect-threadded?
> Now, if I could just bring myself to use postfix assembly with
> subroutine threading, rather than the more-legible (at least to me)
> prefix assembly...
You don't have to. The quick and dirty way to make a prefix Forth assembler which I used in my Forth was to just make mnemonics that had the address mode built in, like LDA_ABS, LDA(ZP,X), etc.. and comma-in the operand. The mnemonics just compile the appropriate machine-language op code byte. They don't need to even be in a separate vocabulary because they don't conflict with anything else in Forth. For example, AND is always in Forth but if you want the assembly AND, it'll have an address mode associated with it, like AND_ZP or AND#. The branches were a minor challenge, but still not complicated because the assembler in Forth is not normally used for large applications, only writing primitives, runtimes, maybe ISRs, and things like that. I am working on making my branching method more user-friendly though. When I thought of this, I was able to initially write the assembler in an evening. It made me wonder why I didn't do it years earlier. It has been so handy.
> When I decided to write a subroutine threaded Forth, I thought I was
> going to get clobbered in terms of space. But that hasn't been the case
> so far.
Interesting. Can you give some idea of how much space you're finding you need for a kernel, compared to what would be needed for the same thing in indirect-threadded?
> Now, if I could just bring myself to use postfix assembly with
> subroutine threading, rather than the more-legible (at least to me)
> prefix assembly...
You don't have to. The quick and dirty way to make a prefix Forth assembler which I used in my Forth was to just make mnemonics that had the address mode built in, like LDA_ABS, LDA(ZP,X), etc.. and comma-in the operand. The mnemonics just compile the appropriate machine-language op code byte. They don't need to even be in a separate vocabulary because they don't conflict with anything else in Forth. For example, AND is always in Forth but if you want the assembly AND, it'll have an address mode associated with it, like AND_ZP or AND#. The branches were a minor challenge, but still not complicated because the assembler in Forth is not normally used for large applications, only writing primitives, runtimes, maybe ISRs, and things like that. I am working on making my branching method more user-friendly though. When I thought of this, I was able to initially write the assembler in an evening. It made me wonder why I didn't do it years earlier. It has been so handy.
dclxvi wrote:
Getting laughed at for using Forth comes with the territory, unfortunately.
Quote:
True, but most interpreted languages (e.g. BASIC) are/were optimized for space, not for speed.
J, for example, doesn't even tokenize its source code, and K stores subroutines as strings.
Quote:
well, it's not interpreted either, so this isn't a perfect example), and common misconceptions like this arise.
Quote:
Early BASIC interpreters were written with two dominant restrictions: (a) a small amount of ROM to work with (maybe 8k, maybe only 4k), and (b) a small amount of RAM to work with (sometimes as little as 1k, though 4k was more common) to work with. With minimal space as the driving force, the result was simple parsers that don't optimize (much) and tokenized programs. Native code did eventually happen -- with BASIC compilers.
If BASIC designers had concentrated instead on truely optimizing their storage methods, they (a) could have drastically reduced the average storage requirements for a BASIC program, and (b) could have made it run substantially faster.
Quote:
rather than storing them as ASCII. Therefore, unlike many BASICs (e.g. EhBASIC and Applesoft) you never got a SYNTAX ERROR at run time, and you weren't converting constants (e.g. the statment PRINT 99) or line numbers (e.g. the statement IF A < B THEN 10) each and every time they were encountered.
Quote:
You may have already guessed one idea I've got: use a sophisticated parser AND a sophisticated LIST. The program would be entered in infix order, stored in postfix order, and LISTed in infix order. Thus, the parser would convert infix to postfix, and LIST would convert postfix back to infix. By moving precedence checking from the (inner) interpreter to the parser and to LIST, the interpreter will be very fast (and Forth-like).
Quote:
In Forth, to insert code, you have to recompile everything after the insertion point, since the addresses of any subsequent words will change. In BASIC, I'd like to retain the ability to insert a line painlessly, so I am planning to start with tokens rather than addresses. It's possible to write a token interpreter that compares favorably to an addresses interpreter, in terms of speed.
However, with that said, in my experience, recompiling software in Forth is so fast that the application I just beckoned has already been compiled before my finger leaves the ENTER key (assuming your Forth compiler's dictionary lookups are sufficiently efficient). Therefore, the issue of addresses, literally, is a complete non-issue for me.
Quote:
Another Forth misconception! In terms of space, it is not necessarily true that subroutine threading > direct threading > indirect threading.
Quote:
For anything longer, yes, subtroutine threading will take the most space, but good Forth programming practice dictates that definitions should be short, so the space penalty of subroutine threading is smaller than you might think.
Quote:
You can also save space (and increase speed) with subroutine threading in some cases by inlining assembly, rather than writing a single-use code definition.