How to make a BASIC compiler?

pbug · Post by **pbug** » Wed Jun 02, 2004 3:51 pm

I want to make a BASIC compiler, Write the source and build it in PC.
but how to make it ?!?!

Nightmaretony · Post by **Nightmaretony** » Sat Jun 05, 2004 5:40 pm

Look around for older Apple ][ documentation, Wozniak gacve out the source to basic and his monitor there,m so those should be ahuge help.

rex · Post by **rex** » Wed Aug 11, 2004 10:28 pm

I have a basic compiler partially done lying around, and have been trying to finish it lately. I used flex as the lexical analyzer, and Bison as the parser, constructing a variant of ansi standard basic which does not use line numbers, and has some support for objects (really structures with associated functions and subroutines). I would suggest picking up a book on compiler construction, and be prepared for a lot of study, as there is alot to know about the topic.

Rex.

Ruud · Post by **Ruud** » Thu Aug 12, 2004 11:27 am

Hallo Rex,

rex wrote:

I have a basic compiler partially done lying around,... I used flex as the lexical analyzer, and Bison as the parser, ....

I'm working on a Pascal compiler but one written in Pascal. The idea is that, once compiled on the PC, the user can improve and compile it on the native machine.

But I must confess there is a little snag in it: the executable so far is over 200 KB and I don't think it will ever be small enough to run it on a C64, 8032, Apple ][ or any other 6502 machine. The Apple2GS or a C64/128 with SuperCPU (=65816) could cope with it.

Nightmaretony · Post by **Nightmaretony** » Sat Aug 14, 2004 11:55 am

Then there is the rub; compiled languages produce bloated code. If you want the tightest smallest code possible, write your compiler in assembly, thanks.

GARTHWILSON · Post by **GARTHWILSON** » Sat Aug 14, 2004 6:45 pm

> Then there is the rub; compiled languages produce bloated code. If you
> want the tightest smallest code possible, write your compiler in
> assembly, thanks.

Oooooh, now I have to resist the temptation to write several pages about why I quit collecting languages when I found Forth, which will become more compact than assembly as the program gets pretty big.

For small stuff assembly will win every time. For example, if the assembly program takes 500 bytes, you might as well stay with it, because you're not going to get much of a Forth kernel with less than perhaps 5KB. But an assembly program of 40KB might be replaced with a Forth program half that size or better, including the Forth kernel, and will take a lot less time to write and debug. Forth Inc. claims a minimum compactness ratio of 4 to 1 over C, and that program development is typically close to 10 times as fast. When you get to real-time OS's, Forth will take a fraction as much memory and run a lot faster than other popular ones. Unlike the case with some really small processors, Forth really works very well with the 6502 also, including multitasking, not to mention the tools to do object-oriented programming it had from the beginning, even before it was called that.

rex · Post by **rex** » Sat Aug 14, 2004 6:47 pm

Nightmaretony wrote:

Then there is the rub; compiled languages produce bloated code. If you want the tightest smallest code possible, write your compiler in assembly, thanks.

The size or space efficiency of the output code from a compiler will not be determined by what language the compiler itself is written in. If you meant 'write your project in assembly' though then I am in agreement with you. However, it strikes me, at least for hobby projects, resources are often not at a premium, and the benefits of using a higher level of abstraction than assembly is worth the resource cost.

I've been looking at adding a 6502 backend to the compiler I've built, and lots of the common tricks folks use in assembly can be incorporated. However, handling things like argument passing and locally scoped variables in a general manner on the 6502 is proving a challenge. If anyone has any ideas on this topic I am all ears.

I'm currently considering for the following. The first two bytes of call arguments are placed in A and X (or Y, I havent yet decided which indirect mode I'll be using more of). For calls requiring more than two bytes, a data stack will be used, which will be pointed to by a zero page word value. Call arguments and local variables will be maintained on this stack in a manner typical for HLLs. Call return addresses, and occasional temporary values will still be placed on the hardware stack. Any comments? Also under consideration is an option to generate code that exclusively uses the hardware stack.

(I see I posted concurrent with Garth. I have to agree with him on Forth, though its a strange paradigm to get used to).

GARTHWILSON · Post by **GARTHWILSON** » Sat Aug 14, 2004 8:28 pm

> I'm currently considering for the following. The first two bytes of call
> arguments are placed in A and X (or Y, I havent yet decided which
> indirect mode I'll be using more of). For calls requiring more than two
> bytes, a data stack will be used, which will be pointed to by a zero page
> word value.

Based on experience with Forth, I'd say use X for the index, so you can use ZP,X and (ZP,X) addressing modes. The occasional use of the abs,Y and (ZP),Y addressing modes is usually not related to the data stack in ZP.

> (I see I posted concurrent with Garth. I have to agree with him on
> Forth, though its a strange paradigm to get used to).

It definitely has a way of turning your thinking inside out. I suppose it had to be that way to shake off the limitations of other languages while remaining simple.

Ruud · Post by **Ruud** » Mon Aug 16, 2004 2:32 pm

Hallo Rex,

> .... For calls requiring more than two bytes, a data stack will be used,
> which will be pointed to by a zero page word value. Call arguments
> and local variables will be maintained on this stack in a manner typical
> for HLLs.

I think this describes the way I may compiler treats data. All data, even it is one or two bytes.

Nightmaretony:
> Then there is the rub; compiled languages produce bloated code. If
> you want the tightest smallest code possible, write your compiler in
> assembly, thanks.

FYI: My compiler produces macros as output. My assembler translates these macros to ML. Ergo: the more efficient the macro, the more efficient the result.

Garth:
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....

This 'kernel' is in fact an interpreter I think. Then what about speed?

If you need speed: you need ML, real or throug an compiler
If size matters: you need real ML or Garth's idea.
You have less time to develop something: you need a HLL.

I think it is the choice between size, speed and complexity that decides what you are going to do.

GARTHWILSON · Post by **GARTHWILSON** » Mon Aug 16, 2004 5:51 pm

> Garth:
>> But an assembly program of 40KB might be replaced with a Forth
>> program half that size or better, including the Forth kernel,....
>
>This 'kernel' is in fact an interpreter I think. Then what about speed?

There are at least three ways to implement Forth. The way that's most memory-efficient results in word definitions being lists of actual addresses, not tokens like interpreted BASIC gives. This is called "indirect threaded." The subroutine threading that Samuel Falvo and one or two others here are interested in compiles JSR's, straightlining code if the subroutine is short enough that a JSR or its delay is not warranted. This method is much faster, at the expense of program size.

Since a mere list of addresses is not executable by itself, the inner loop in indirect-threaded Forth does add some overhead, which is why indirect-threaded is not as fast. But what we call the interpreter is a part that takes text from an input stream, whether keyboard, source code text file, or something similar, looks up the words, and executes them without compiling. If you have a system where the user is not supposed to have direct access to this kind of thing, you could compile a Forth system with no interpreter. That makes it more memory-efficient, because the words' headers can be left out. The headers are just there so the compiled words can be found in subsequent interpretation or comilation, which the user won't be doing.

Forth will of course always be slower than a pure ML program, but competes well with other high-level languages. On something like the 6502, it will generally (but not always) be slightly slower than C; but there are processors that run Forth as their ML, and in that case, they're blazingly fast, doing more Forth MIPS than MHz. If using a high-level language results in a program that's too big for the 6502's address space but the Forth equivalent fits with plenty of room to spare, the choice is not too difficult. In commercial products, switching to Forth commonly reduces the hardware reuirements, lowering the cost.

You seldom need everything to be just as fast as possible though. You typically will only have various small parts that require maximum speed. Forth makes it easy to write those small parts in assembly. In fact, you can first write them in high-level Forth to test the idea, then once you get it going, change the definition to assembly language without having to change any of the code that calls the particular word you just re-wrote in assembly for maximum execution speed.

As a bonus, you can service interrupts in high-level Forth with zero overhead. Mike is working on getting my article on that subject posted.

kc5tja · Post by **kc5tja** » Wed Aug 25, 2004 4:17 pm

GARTHWILSON wrote:

Based on experience with Forth, I'd say use X for the index, so you can use ZP,X and (ZP,X) addressing modes. The occasional use of the abs,Y and (ZP),Y addressing modes is usually not related to the data stack in ZP.

Since Pascal encourages the heavy use of RECORD types (e.g., structures), you'll find (zp),y addressing mode used more often than in Forth, which prefers the use of tables and arrays over structures. Despite this, I find the ZP,X and (ZP,X) to be quite useful, provided you reserve, say, 4 to 8 16-bit words or 32-bit words (for 6502 or 65816, respectively) to hold pointers for use with the (ZP),Y addressing mode.

Though, I really must wonder -- why in the world are you choosing Pascal? Why not use Oberon? The language is much simpler, and fantastically more modular, and 100x as type safe. Strongly recommended.

Quote:

It definitely has a way of turning your thinking inside out. I suppose it had to be that way to shake off the limitations of other languages while remaining simple.

I originally was going to post a rather lengthy comment here, but I decided against it now that we have a dedicated Forth-related forum. I've posted it there, under "Forth Philosophy." Please, I encourage everyone to read that message, because it will help clear up much of the smoke and mirrors that people think surrounds Forth.

kc5tja · Post by **kc5tja** » Thu Aug 26, 2004 2:10 am

Ruud wrote:

Garth:
> But an assembly program of 40KB might be replaced with a Forth
> program half that size or better, including the Forth kernel,....

This 'kernel' is in fact an interpreter I think. Then what about speed?

I'm getting pretty sick and tired of all the mis-information that exists about Forth. It's also a fallacy that if a language is interpreted, it must necessarily be slow. I refer you to APL, J, and K for wholly interpreted languages that attain C-like speeds on a regular basis.

The 'kernel' of Forth is a compiler, and has been since the late 60s. The interpreter only is used to issue calls to top-level words (or low level if you're interactively debugging something). And I do mean "call", in the machine language sense of the term.

When Forth sees ":", it interprets and calls the body of ":". What most people don't realize is that ":" is the compiler for Forth. Therefore, everything between a : and a ; is compiled.

The representation of the compiled software can range from a vector of addresses (where each address represents another Forth word), all the way up to natively generated machine language. I often have to wonder why BASIC interpreters of old didn't represent their programs in this manner -- it sure would have made them run so very much faster.

If you are primarily interested in the compactness of code, you can compile it to a direct threaded representation. Given : a b c d ;, the Forth compiler will produce the following result in memory:

Code: Select all

a: jmp do_colon
   dw b, c, d, exit

where do_colon and exit are run-time primitives provided by the Forth environment itself. do_colon is responsible for saving the current "virtual instruction pointer" on the return stack, while exit does the reverse. The 6502 isn't particularly handy in dealing with these virtual pointers, and therefore, do_colon in particular is god-awful slow compared to other techniques, such as subroutine-threading. Even so, it's faster than BASIC on the same architecture!

Obviously, a word implemented in pure machine language need not start with JMP do_colon. Direct threading has the advantage of producing substantially compact programs (literally all operations, regardless of their level of abstraction, are represented in precisely two bytes on a 6502, and can be represented in as little as 3-bytes on a 65816).

The speed of a direct-threaded representation is obviously not nearly as fast as raw machine language. If you're really looking for break-neck speed, you can come very close by employing subroutine threading instead, which is almost exactly like direct threading, except that it doesn't require the use of an "inner interpreter" (a misnomer if there ever was one), and it allows for certain optimizations (e.g., converting the last JSR of a subroutine to JMP, and inlined assembly language sequences for commonly used things).

Code: Select all

a:
   jsr b
   jsr c
   jmp d

The disadvantage of this method is that it consumes slightly more space. But, this is a classic size/speed trade-off issue, and the ratio isn't that bad in practice.

dclxvi · Post by **dclxvi** » Fri Aug 27, 2004 9:44 pm

kc5tja wrote:

I'm getting pretty sick and tired of all the mis-information that exists about Forth.

Getting laughed at for using Forth comes with the territory, unfortunately. A lot of people are less than receptive to Forth, but at least they don't laugh as hard once they see the sort of fast and powerful programs that can be written with it.

kc5tja wrote:

It's also a fallacy that if a language is interpreted, it must necessarily be slow.

True, but most interpreted languages (e.g. BASIC) are/were optimized for space, not for speed. When speed became an issue, they were compiled and the interpreter was ditched. If you look at the heyday of interpreted languages, fast interpretation wasn't exactly a popular approach (e.g. thanks to FIG-Forth -- and the Forth standards -- indirect threading was common on the 6502, even though subroutine threading is much faster, and in many ways simpler -- well, it's not interpreted either, so this isn't a perfect example), and common misconceptions like this arise.

kc5tja wrote:

The representation of the compiled software can range from a vector of addresses (where each address represents another Forth word), all the way up to natively generated machine language. I often have to wonder why BASIC interpreters of old didn't represent their programs in this manner -- it sure would have made them run so very much faster.

Early BASIC interpreters were written with two dominant restrictions: (a) a small amount of ROM to work with (maybe 8k, maybe only 4k), and (b) a small amount of RAM to work with (sometimes as little as 1k, though 4k was more common) to work with. With minimal space as the driving force, the result was simple parsers that don't optimize (much) and tokenized programs. Native code did eventually happen -- with BASIC compilers.

I've been jotting down ideas for a fast interpreted BASIC that I haven't started writing yet. In terms of major 6502 projects, right now it's behind at least my write up of 6502 decimal mode (including undocumented behavior), which I am currently working on (but isn't getting any closer to being completed now that I'm writing long-winded forum posts), and finishing my functional, but incomplete and undocumented 6502 subtroutine threaded Forth.

My BASIC was partly inspired by Apple (Integer) BASIC, which was written by Steve Wozniak. Wozinak used ideas from HP BASICs (he had worked at HP), and of course HP was well-known for their stack-oriented RPN calculators. "Integer" BASIC is really not all that far away from the address interpreter you suggest. It uses a sophisticated parser which does all of it's syntax checking at parse time rather than execution time. It also converts constants at parse time to their twos complement reprepresentation rather than storing them as ASCII. Therefore, unlike many BASICs (e.g. EhBASIC and Applesoft) you never got a SYNTAX ERROR at run time, and you weren't converting constants (e.g. the statment PRINT 99) or line numbers (e.g. the statement IF A < B THEN 10) each and every time they were encountered.

There were other things that could have been handled at parse time, but weren't, such as checking for too many parentheses in an expression, and determining whether you needed the address of a variable so you could store a value in it (e.g. a LET or an INPUT statement) or if only the contents of the variable were needed (e.g. PRINT A). (Determining the latter at parse time would have sped up expression evaluation considerably.)

To use Wozniak's terms, the tokenized program contained "nouns" and "verbs". Nouns were constants (numbers), strings literals, and variable names, and verbs were basically everything else. In Forth terms, an example of a compiled literal (in assembly) is DW LIT,10. LIT would be the verb and 10 would be the noun.

The BASIC program was stored in infix order. All verbs took some action (although in some cases the action was to do nothing). Different tokens could represent the same keyword (verb). For example, there were three tokens that would list as PRINT. One token was for PRINTing numbers, one for strings, and one for CRs (i.e. when PRINT had no arguments). In Forth, these three actions have three different names: . (dot) TYPE and CR, but in BASIC they all had the name PRINT. Which token to use was determined at parse time based on syntax and context, so the parser was quite clever. Interestingly, even delimiters such as commas and semicolons were verbs. So a statement like PRINT 12+34;56 would would be stored as:

Code: Select all

DB PRINT_number_token
DB literal_number_token 
DW 12
DB plus_token
DB literal_number_token 
DW 34
DB semicolon_print_number_token
DB literal_number_token 
DW 56

The tokens for PRINT and semicolon were different (but this was only so that LIST knew which one to print), but used the exact same routine (i.e. both went to the same address). The Forth equivalent, 12 34 + . 56 . (usually) compiles to DW LIT,12,LIT,34,PLUS,DOT,LIT,56,DOT. Forth can just execute "verbs" as it encounters them, but since the BASIC program is in infix order, the (inner) interpreter must check precedence to determine if it can execute a verb.

Incidentally, there were other examples of multiple tokens for a single keyword. An equals sign (=) had one token for a numeric comparison (e.g. IF A=1 THEN PRINT), a different token for a string comparison (e.g. IF A$="A" THEN PRINT), an assignment statement (e.g. LET A=1), etc. There are over a dozen tokens that LIST as a comma, since the comma in a POKE 0,0 statement takes a different action than the comma in a NEXT A,B statement, etc.

You may have already guessed one idea I've got: use a sophisticated parser AND a sophisticated LIST. The program would be entered in infix order, stored in postfix order, and LISTed in infix order. Thus, the parser would convert infix to postfix, and LIST would convert postfix back to infix. By moving precedence checking from the (inner) interpreter to the parser and to LIST, the interpreter will be very fast (and Forth-like).

In Forth, to insert code, you have to recompile everything after the insertion point, since the addresses of any subsequent words will change. In BASIC, I'd like to retain the ability to insert a line painlessly, so I am planning to start with tokens rather than addresses. It's possible to write a token interpreter that compares favorably to an addresses interpreter, in terms of speed.

I've gathered enough other ideas (e.g. I like the Acorn assembler idea, but I'm planning to try a one-pass assembler instead) that intrigue me that I plan to write this BASIC at some point. Obviously, the plans I've mentioned are subject to change once I start writing and discover what the tradeoffs are.

kc5tja wrote:

The disadvantage of this method is that it consumes slightly more space. But, this is a classic size/speed trade-off issue, and the ratio isn't that bad in practice.

Another Forth misconception! In terms of space, it is not necessarily true that subroutine threading > direct threading > indirect threading.

The code definition example: CODE z NOP, NEXT, END-CODE

Indirect threading (6 bytes):

Code: Select all

z: DW  z+2
   NOP
   JMP NEXT

Direct threading (4 bytes):

Code: Select all

z: NOP
   JMP NEXT

Subroutine threading (2 bytes):

Code: Select all

z: NOP
   RTS

So for code definitions, indirect threading is always the largest, and subroutine threading is always smallest (and fastest, to boot). Subroutine threading also seems to encourage writing more code definitions, so there's quite a bit a space savings that can be had.

The colon definition example: : z a b c d ;

Indirect threading (12 bytes):

Code: Select all

z: DW  DO_COLON
   DW  a
   DW  b
   DW  c
   DW  d
   DW  EXIT

Direct threading (13 bytes):

Code: Select all

z: JSR DO_COLON
   DW  a
   DW  b
   DW  c
   DW  d
   DW  EXIT

Subroutine threading (12 bytes):

Code: Select all

z: JSR a
   JSR b
   JSR c
   JMP d

For a shorter definition, subroutine threading will take the least amount of space. The colon definition example: : z a b c d e ;

Indirect threading: 14 bytes
Direct threading: 15 bytes
Subtroutine threading: 15 bytes

For anything longer, yes, subtroutine threading will take the most space, but good Forth programming practice dictates that definitions should be short, so the space penalty of subroutine threading is smaller than you might think.

You can also save space (and increase speed) with subroutine threading in some cases by inlining assembly, rather than writing a single-use code definition. In subroutine threading, with no NEXT routine, the Y register is usually free (I've even used it as a loop counter, to get a terrific benefit in both speed and space).

When I decided to write a subroutine threaded Forth, I thought I was going to get clobbered in terms of space. But that hasn't been the case so far. Now, if I could just bring myself to use postfix assembly with subroutine threading, rather than the more-legible (at least to me) prefix assembly...

GARTHWILSON · Post by **GARTHWILSON** » Sat Aug 28, 2004 6:38 am

Since this discussion subject has quickly evolved so much, we really ought to move it to the Forth area; but since I'm responding to two different things here, I don't know what to call the subjet if I move it there myself now, so we're still here.

> When I decided to write a subroutine threaded Forth, I thought I was
> going to get clobbered in terms of space. But that hasn't been the case
> so far.

Interesting. Can you give some idea of how much space you're finding you need for a kernel, compared to what would be needed for the same thing in indirect-threadded?

> Now, if I could just bring myself to use postfix assembly with
> subroutine threading, rather than the more-legible (at least to me)
> prefix assembly...

You don't have to. The quick and dirty way to make a prefix Forth assembler which I used in my Forth was to just make mnemonics that had the address mode built in, like LDA_ABS, LDA(ZP,X), etc.. and comma-in the operand. The mnemonics just compile the appropriate machine-language op code byte. They don't need to even be in a separate vocabulary because they don't conflict with anything else in Forth. For example, AND is always in Forth but if you want the assembly AND, it'll have an address mode associated with it, like AND_ZP or AND#. The branches were a minor challenge, but still not complicated because the assembler in Forth is not normally used for large applications, only writing primitives, runtimes, maybe ISRs, and things like that. I am working on making my branching method more user-friendly though. When I thought of this, I was able to initially write the assembler in an evening. It made me wonder why I didn't do it years earlier. It has been so handy.

kc5tja · Post by **kc5tja** » Sat Aug 28, 2004 7:30 am

dclxvi wrote:

Getting laughed at for using Forth comes with the territory, unfortunately.

People have a right to laugh if they wish. But when misconceptions and lies are the rule, it's impossible to offer a level field on which to discuss the merits of a language.

Quote:

True, but most interpreted languages (e.g. BASIC) are/were optimized for space, not for speed.

APL, J, and K can be implemented with rather trivial interpreters, and still achieve excellent speeds. The reason is that they are vector processing languages. They try hard to let you write programs without ever relying on explicit iteration. That is the secret to their speed.

J, for example, doesn't even tokenize its source code, and K stores subroutines as strings.

Quote:

well, it's not interpreted either, so this isn't a perfect example), and common misconceptions like this arise.

An aside: if you do a careful study of the academic papers written about compilers and interpreters, you'll find very few papers which actually distinguish between the two. This is because any interpreted language can be compiled, and vice versa.

Quote:

Early BASIC interpreters were written with two dominant restrictions: (a) a small amount of ROM to work with (maybe 8k, maybe only 4k), and (b) a small amount of RAM to work with (sometimes as little as 1k, though 4k was more common) to work with. With minimal space as the driving force, the result was simple parsers that don't optimize (much) and tokenized programs. Native code did eventually happen -- with BASIC compilers.

BASIC is a horrible example of a language to use for comparison. The utter lack of any standard structure for BASIC meant that it was impossible to arrive at a regular, tokenized encoding for representing source (e.g., you can represent Lisp code as a perfect tree for example. Forth code as an array of arrays. BASIC is . . . well.... err...I'll get back to you on that). For this reason, to interpret BASIC, you had to parse the tokenized stream as if it were raw source code, every time you executed a statement. It's this parsing that I was objecting to.

If BASIC designers had concentrated instead on truely optimizing their storage methods, they (a) could have drastically reduced the average storage requirements for a BASIC program, and (b) could have made it run substantially faster.

Quote:

rather than storing them as ASCII. Therefore, unlike many BASICs (e.g. EhBASIC and Applesoft) you never got a SYNTAX ERROR at run time, and you weren't converting constants (e.g. the statment PRINT 99) or line numbers (e.g. the statement IF A < B THEN 10) each and every time they were encountered.

Excellent! This is what I expected to see. I hadn't realized that this was already done. As far as I was aware, AppleBASIC for the Apple IIs didn't do this (at least, I don't remember them doing it), but then again, it was also floating point BASIC too, with the AppleDisk system. Not sure if that made any difference.

Quote:

You may have already guessed one idea I've got: use a sophisticated parser AND a sophisticated LIST. The program would be entered in infix order, stored in postfix order, and LISTed in infix order. Thus, the parser would convert infix to postfix, and LIST would convert postfix back to infix. By moving precedence checking from the (inner) interpreter to the parser and to LIST, the interpreter will be very fast (and Forth-like).

That will definitely work, and there is some prior art already: the Smalltalk programming language does precisely this. Given a chunk of bytecode, produced by a Smalltalk compiler, which is directly "executable" by the Smalltalk virtual machine, Smalltalk can produce the precise set of high-level command to produce that exact bytecode. Note that Smalltalk's bytecode is based on a stack-architecture virtual machine, not entirely unlike Java.

Quote:

In Forth, to insert code, you have to recompile everything after the insertion point, since the addresses of any subsequent words will change. In BASIC, I'd like to retain the ability to insert a line painlessly, so I am planning to start with tokens rather than addresses. It's possible to write a token interpreter that compares favorably to an addresses interpreter, in terms of speed.

Bytecode/tokens is just "token-threaded" -- several Forth implementations have been implemented token-threaded. The most popular of which is OpenFirmware, actually.

However, with that said, in my experience, recompiling software in Forth is so fast that the application I just beckoned has already been compiled before my finger leaves the ENTER key (assuming your Forth compiler's dictionary lookups are sufficiently efficient). Therefore, the issue of addresses, literally, is a complete non-issue for me.

Quote:

Another Forth misconception! In terms of space, it is not necessarily true that subroutine threading > direct threading > indirect threading.

What? Subroutine threading takes 3 bytes to store a reference to a word on the 6502, while both direct and indirect threading take 2. The numbers don't lie. Subroutine threading is smaller only for colon definitions with less than four words in the body. At 4 words, the overhead is the same, and beyond 4 words, ITC/DTC is smaller.

Quote:

For anything longer, yes, subtroutine threading will take the most space, but good Forth programming practice dictates that definitions should be short, so the space penalty of subroutine threading is smaller than you might think.

I don't think anyone was contesting this. I know I certainly wasn't. But it nonetheless is true that STC will consume more space (for a 16-bit Forth implementation on a 6502) than a DTC implementation for the average program size. The average number of words in a colon definition turns out to be 8 in most Forth programs, so each definition, one can estimate, will have four bytes of "overhead" to them compared to other methods. Times the number of definitions, and you can see that it adds up...slightly.

Quote:

You can also save space (and increase speed) with subroutine threading in some cases by inlining assembly, rather than writing a single-use code definition.

This is precisely how my Forth cross compiler works under Linux. In fact, this is the single simplest compiler for Forth you can ever possibly have! The core of the compiler consumes only 5 screens -- all total, the whole compiler fits on a single sheet of paper. I also have five more screens to define useful and common words too, but still, it's darn good!