I can see how this could be confusing if not described well.
With such clear and descriptive variable names as STATE, how could anybody get confused? :-P
Anyway, I think I see what's going on here. Let me know if I've got any major errors in the following summary of my understanding of the Forth part of this.
Essentially there are two main interpreters in Forth. The first is the runtime interpreter (let's call it "interpreter I," unless there's a more standard name for it) which reads lexical tokens, looks them up in a dictionary (I am eliding parsing of numeric literals for the moment), and immediately executes the routines to which they refer. Then there is the "compiler" interpreter ("interpreter C") which reads lexical tokens and looks them up in the dictionary, but then checks to see if they're marked "immediate" or not. If the word is marked immediate it's executed just as with interpreter I, but if not a reference to the word is instead appended to a buffer it's using to build a new word to be added to the dictionary.
So a when interpreter I reads a "
:", interpreter C is started, which reads the next lexical token as a name for the new word, allocates a buffer somewhere in which to build the word, and then continues parsing as described for interpreter C above. Eventually it reaches a "
;" and, upon reading that, does whatever it needs to do to finish up the generation of the new word and then returns control to interpreter I.
(As a side note, this idea of multiple interpreters all sharing lexical analysis and data about the current system but interpreting identical series of lexical tokens in various different ways is central to Lisp, too. Central even to Lisp without macros, I mean; macros add a whole new level to this of course.)
Presumably it would be possible to add further interpreters that could also take over reading of the input stream and do their own thing, if they wanted to. (This may or may not require some changes to the core of the interpretation system.) One could use such a technique to write an interpreter for an assembly language compatible with Forth's lexical analyzer.
But in your case you've decided to embed assembly in a different way, by having words read and executed by interpreter I, which set up and use their own data areas to build and register new Forth words. And "
C," and "
," (used to generate instructions and finalize the new word, respectively?) are nothing special to the interpreter but just words to be called like any other.
This is very interesting, and I realize now that a similar technique is available to me in Python. Currently I seem to be heading towards the "separate interpreter" approach, where I build up a program in the language and then apply an interpreter function to it, e.g.,
Code: Select all
# `ORG`, `LDX` etc. are just a variable names, bound in this environment.
count = 4
init = [ ORG, 0x300, LDX, count ]
loop = [ 'loop:', DEX, BNE, 'loop', RTS ]
obj = assemble(init + loop)
But an alternative would be to do the assembly on the fly using the Python interpreter itself to drive this execution:
Code: Select all
count = 4
obj = Assembly() # Object constructor.
.org(0x300) # Methods return `self` to allow
.ldx(count) # method call chaining.
obj.label('loop') # You can resume assembly after interspersing
.dex() # other host language code.
.bne('loop')
.rts()
A notable difference here is that this is latter approach seems a lot less readable in Python than it is in Forth. And of course, like Forth, this approach seems limited to being a one-pass assembler. (Though I suppose you could instead treat this as an intermediate object and ask the user to call
.assemble() on it to get the assembly output.)
In that case, for highly complex processors, there's almost no point in doing your own sections of assembly language, because there'd nothing to be gained.
Well, perhaps not for optimization, but there are other good reasons to have in-line assembly, such as calling a routine written in another language or, perhaps, direct access to hardware.