Embedding/Manipulating Assembler in Higher Level Languages
Embedding/Manipulating Assembler in Higher Level Languages
In this post from Garth Wilson he discusses a little bit about how he embeds 6502 assembly language into his Forth programs. I responded mentioning that I do this in Python and was interested in hearing more about his techniques. This started getting off-topic for that particular thread (which was actually about avoiding assembly by embedding machine language), so I'm starting this thread to continue to the assembly ideas.
Unlike the other thread, this one is not limited to embedding and/or manipulating assembly in just FORTH; discussion of doing this in any higher-level language is welcome.
The next two posts are Garth's post linked above, quoted by me, and my follow-up post.
Unlike the other thread, this one is not limited to embedding and/or manipulating assembly in just FORTH; discussion of doing this in any higher-level language is welcome.
The next two posts are Garth's post linked above, quoted by me, and my follow-up post.
Curt J. Sampson - github.com/0cjs
Re: The real power of [ and ]
GARTHWILSON wrote:
IamRob wrote:
I don't have a built in assembler in Forth, mostly because I have trouble grasping the RPN of assembly.
Code: Select all
: ADC# 69 C, ; : AND# 29 C, ; : ASL_A 0A C, ;
: ADC_ABS 6D C, ; : AND_ABS 2D C, ; : ASL_ABS 0E C, ;
: ADC_ZP 65 C, ; : AND_ZP 25 C, ; : ASL_ZP 06 C, ;
: ADC(ZP) 72 C, ; : AND(ZP) 32 C, ; : ASL,X 1E C, ;
: ADC,X 7D C, ; : AND,X 3D C, ; : ASL_ZP,X 16 C, ;
: ADC,Y 79 C, ; : AND,Y 39 C, ;
: ADC_ZP,X 75 C, ; : AND_ZP,X 35 C, ;
: ADC(ZP,X) 61 C, ; : AND(ZP,X) 21 C, ;
: ADC(ZP),Y 71 C, ; : AND(ZP),Y 31 C, ;
<etc.>which shows most of the '02 addressing modes. So you just put the mnemonic down followed by the operand (if any) with a comma (or C-comma, or in the case of the '816, a "3C," also, for the few times you need a three-byte operand). There is no parsing, even though the mnemonic precedes the operand. It is up to you to specify the addressing mode along with the mnemonic, and to use the right word to compile the operand (whether comma or C-comma). So you'll do for example,
Code: Select all
LDA_ABS FOOBAR ,Later I wrote a word "comp_op", laid down by macro OP_CODE, to shorten the op-code words by five bytes each (this is from the Forth assembly source-code, not high-level Forth):
Code: Select all
HEADER "ADC#", NOT_IMMEDIATE
OP_CODE $69so that after the header (formed by the HEADER macro), there are only the two bytes of CFA (pointing to comp_op) laid down in the code field by the OP_CODE macro, followed by the one byte of op code which the macro also lays down, and nothing more, ie, no nest, unnest, or lit, and only one byte (not two) for the op code number itself. comp_op also takes care of the semicolon's job, so it's all there, without adding more bytes.
For the places in the kernel where you'd want to jump to in various primitives you would write, I have constants like this:
Code: Select all
8F34 CONSTANT POP 8F74 CONSTANT SET-TRUE
8F32 CONSTANT POP2 8F86 CONSTANT SET-FALSE
8F7E CONSTANT POP2-FALSE 8F38 CONSTANT CPUSH
8F6C CONSTANT POP2-TRUE 8F3B CONSTANT PUSH
8F30 CONSTANT POP3 8F84 CONSTANT PUSH-FALSE
8F6A CONSTANT POP3-TRUE 8F72 CONSTANT PUSH-TRUE
8F7C CONSTANT POP3-FALSE 8F3D CONSTANT PUT
8F6E CONSTANT POP-TRUE 8F42 CONSTANT NEXT
8F80 CONSTANT POP-FALSE 00FD CONSTANT XSAVE
00F0 CONSTANT N C98D CONSTANT setirq
4 CONSTANT tempA FE CONSTANT irq?\Note that no separate assembler vocabulary is needed, because for example assembly's AND will not conflict with Forth's AND because assembly's word always comes with the addressing mode, whether #, _ABS, or whatever, as part of the name.
There's a little more to it, particularly in the matter of branching and labels, but It'll take me more time to describe that, which I'll do later if you're interested. For now, this should give you the ideas to get started. You assembly source code then won't look so different from how it does for a normal assembler.
Re: Embedding/Manipulating Assembler in Higher Level Languag
GARTHWILSON wrote:
There's a little more to it, particularly in the matter of branching and labels, but It'll take me more time to describe that, which I'll do later if you're interested.
Code: Select all
def test_call_push(m):
p = 0x406
cont = 0x40d # Address of continuation below
m.deposit(p, [
I.LDA, MB(cont-1), # 0406: 6502 puts return addr -1 on stack
I.PHA, # 0408:
I.LDA, LB(cont-1), # 0409:
I.PHA, # 040b:
I.RTS, # 040C: jump to continuation below
I.LDA, 0xA5, # 040d: continuation
I.RTS, # 040F: return to call()
])
m.call(p, trace=1)
assert R(pc=cont+2, a=0xA5) == m.regs
One idea I'm kicking around is to continue to use a list, use string elements for defining and referencing labels, maybe add some pseudo-ops, and feed the list into an assembler function. And one might allow 16-bit values in the list and change the LDA processing to pick an appropriate addressing mode.
So what does your system handle now, and what are your thoughts on how you'd expand it to have more functionality?
Curt J. Sampson - github.com/0cjs
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Embedding/Manipulating Assembler in Higher Level Languag
I'll keep this brief since it's not in the Forth forum section. In Forth, I have INLINE and END_INLINE to bracket a section of assembly-language programming in an otherwise high-level definition (ie, routine), per Bruce's recommendation 16 years ago from F-PC in the topic "Mixing assembly into a colon definition." Usually I just write a primitive to call from a secondary though, rather than writing some assembly language into the middle of a secondary. (For non-Forthers: "Primitive" just means the Forth word is defined in assembly language, while "secondary" just means it's defined in terms of other Forth words, possibly including both primitives and other secondaries. Either way, they are what would be called "routines" or "subroutines" in other languages.)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Embedding/Manipulating Assembler in Higher Level Languag
GARTHWILSON wrote:
I'll keep this brief since it's not in the Forth forum section.
I should also emphasise that I'm not talking about inline assembler here (i.e., assembly code interspersed with and executed as part of the host language code, though some uses of that may be related) but actually the embedding of assembly code in a higher-level language in the manner of what's usually called an embedded domain-specific language; i.e., usually using the host langauge syntax or something that can easily fit with that. This is typically used to produce reified objects that can then be manipulated as necessary.
For example, in your post I quoted above you used LDA_ABS FOOBAR (note that there's no trailing comma there!) to define an assembly language fragment, and then at that point it can be assembled in a variety of ways, and presumably various things could be done with it afterwards, if I'm understanding your explanation correctly. ("It is up to you...to use the right word to compile the operand (whether comma or C-comma).")
And in my Python example above, I use a pseudo-assembly syntax to generate a list of bytes, which I then deposit and call. That is not inline at all; if there is a way to do inline assembly in Python, I certainly don't use it, and it wouldn't even work since my Python interpreter is running on an x86_64 processor. (The 6502 code above is run in a simulator.)
GARTHWILSON wrote:
Usually I just write a primitive to call from a secondary though, rather than writing some assembly language into the middle of a secondary.
Curt J. Sampson - github.com/0cjs
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Embedding/Manipulating Assembler in Higher Level Languag
cjs wrote:
For example, in your post I quoted above you used LDA_ABS FOOBAR (note that there's no trailing comma there!) to define an assembly language fragment, and then at that point it can be assembled in a variety of ways, and presumably various things could be done with it afterwards, if I'm understanding your explanation correctly. ("It is up to you...to use the right word to compile the operand (whether comma or C-comma).")
Quote:
GARTHWILSON wrote:
Usually I just write a primitive to call from a secondary though, rather than writing some assembly language into the middle of a secondary.
If it's in the target system, they're compiled (not really assembled, unless you're building a subroutine-threaded-code Forth, where basically it all ends up in assembly language) by the Forth system that's running. Many secondaries will need to be in the kernel though, and unless you have a metacompiler (which I have used in the past), their source code will be processed by the assembler (or meta assembler or cross assembler) that you're using to form the kernel.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Embedding/Manipulating Assembler in Higher Level Languag
cjs wrote:
...I'm not talking about inline assembler here (i.e., assembly code interspersed with and executed as part of the host language code, though some uses of that may be related) but actually the embedding of assembly code in a higher-level language in the manner of what's usually called an embedded domain-specific language; i.e., usually using the host langauge syntax or something that can easily fit with that. This is typically used to produce reified objects that can then be manipulated as necessary.
Re: Embedding/Manipulating Assembler in Higher Level Languag
BigEd wrote:
Is it that the assembly text produces a lump of binary data which can then be used in the program? That is, it's not to be executed at the point it's described, but it's to be stored, passed around, eventually put together with other code and then executed?
The second part that is the key here (and where the "manipulation" comes in) is that with what Garth and I have been showing, the pieces of assembly are first-class objects within the host language. So, for example, if I need ten NOPs I can trivially generate that using the host language: in Python, [I.NOP] * 10. A slightly more sophisticated example from my post above is the use of host language variables and functions within the assembly code, such as addr = 0x400; asm = [ I.LDA, LB(addr-1) ], which takes an instruction and the low byte of a calculated value to give you a list of binary values, [ 0xA9, 0xFF ].
The most obvious way of "embedding" assembly code in a language is to simply write code in strings, e.g. A$ = "LDA #$33" in BASIC. But with just plain old strings like that, you essentially end up writing a separate interpreter to interpret those strings, and getting the kind of power with this that you see in the examples above is a huge amount of work, rather than most of that coming for nearly free when you use an embedded DSL in a reasonably powerful language. (As Alan Perlis points out in Epigram 34, "The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.")
Currently, my embedded "assembly code" (and Graham's, I think) is pretty darn simple, hardly more than assigning names to integers. So the question I'm interested in investigating is, how does one move beyond that to start handling things such as, say, labels. (Hand-counting entries in a list and then based on that setting addr =0x40d, as in the test in my previous post, is clearly neither reliable nor scalable.) The temptation is to start writing interpreters for those lists (essentially, writing an assembler), but go too far that way in a certain direction and you end up with just another assembler that you're calling from your host language, rather than having an actual embedded assembly language.
Curt J. Sampson - github.com/0cjs
Re: Embedding/Manipulating Assembler in Higher Level Languag
Embedding assembler into BBC Basic is "a thing" that's been there for a long time. You have to construct the 2-pass assembler using a BASIC FOR loop and the code generated can be stored and run from a byte array, or saved to disc, etc.
You can use BASIC procedures to create macros too and some quite complex systems have been built using it.
Some systems I built used the BASIC assembler to assemble and save code which was then loaded and run from disc by a 2nd program - mostly due to memory constraints, but this was really not that different from using a system that might have had a separate editor, assembler and loader - but the editor and assembler was the BASIC environment.
The BCPL system for the BBC Micro also had a separate relocatable assembler that allowed to you edit and assemble code which could then reference BCPL global variables and be called from BCPL as if it was a BCPL function. Like BBC BASIC, this was all self-hosted on the BBC Micro. I'm working on something similar for my own Ruby BCPL system - again, self hosted.
Doing it in C using cc65 and as65 is not hard either, but does require a separate system to edit, compile/assembler and link the code to produce a binary to run on the target.
-Gordon
You can use BASIC procedures to create macros too and some quite complex systems have been built using it.
Some systems I built used the BASIC assembler to assemble and save code which was then loaded and run from disc by a 2nd program - mostly due to memory constraints, but this was really not that different from using a system that might have had a separate editor, assembler and loader - but the editor and assembler was the BASIC environment.
The BCPL system for the BBC Micro also had a separate relocatable assembler that allowed to you edit and assemble code which could then reference BCPL global variables and be called from BCPL as if it was a BCPL function. Like BBC BASIC, this was all self-hosted on the BBC Micro. I'm working on something similar for my own Ruby BCPL system - again, self hosted.
Doing it in C using cc65 and as65 is not hard either, but does require a separate system to edit, compile/assembler and link the code to produce a binary to run on the target.
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Embedding/Manipulating Assembler in Higher Level Languag
I wrote in the other topic,
and you wrote above,
so here it is (but yes, it is very much Forth, continuing on with the matter of the tiny macro assembler):
I'd definitely be interested in hearing about this. [...]
One idea I'm kicking around is to continue to use a list, use string elements for defining and referencing labels, maybe add some pseudo-ops, and feed the list into an assembler function. [...]
So what does your system handle now, and what are your thoughts on how you'd expand it to have more functionality?
Rather than posting a long listing of code, it'd better if I give it more explanation than source code. In fact, maybe I'll try to make it a web page for my site, with diagrams and other helps.
As for labels: My Forth assembler is not intended for doing whole applications, only Forth primitives, runtimes, subroutines, ML ISRs, etc., where there's really only local use of labels and they're close enough to branch to. Going further with descriptive labels for things like long jumps is possible, but not as easy.
Branching backwards was the easiest part, since you can use HERE to put an address on the stack during assembly and not even use a real label of any kind; then if a BNE for example needs to branch back to it, I would write BNE GO_BACK, where BNE only lays down the op code (with no concern for an operand), and GO_BACK takes the top data-stack item (which is an address in this case) and figures out the branch distance from that and the current dictionary pointer, and lays it down in the assembled code. If you need more than the one thing on the stack, or need to refer back to the same place from more than one branch, the normal SWAP, DUP, etc. words are available of course.
Doing forward branching with a one-pass assembler is slightly more of a challenge. What I did for that is that if you have for example BNE 1$, the 1$ (and the rest through 9$ and 0$, ten of them) are Forth words with an associated variable, and the 1$ records the place where a branch distance operand needs to get filled in later when the place to branch to is finally found. Then 1$: (with the colon) is another word that figures out the branch distance to fill in back where the 1$ (without the colon) was encountered. Besides the labels not being descriptive, it's somewhat less than ideal in that if you have multiple places that need to branch to the same instruction, you'll need multiple labels, meaning sometimes I'll have something like:
The code for them is this:
(The special character there, ‡, was chosen only for the unlikelihood of ever conflicting with anything else I do. Make it anything you want.)
So then you can have for example,
In future improvements, I tentatively plan to use software buffers to set up things like local variables and arrays, constants (and in this case, label names with associated addresses), even code, and when I'm done with a buffer, I can delete it and reclaim the memory. Since it's at the opposite end of RAM from the main code I'm working on, deleting a buffer does not affect that area where the dictionary pointer is advancing as you compile or assemble things. Also, remaining buffers can get scooted around to close up holes in buffer memory, so memory never gets fragmented. I wrote words for buffers like this in Forth a few years ago, and wrote it up here. It was absolutely the worst I've ever done insofar as stack gymnastics go, and could definitely benefit from local variables to remedy that, which I need the buffers for. Chicken and egg. I will probably re-write the words in assembly language for performance before putting them to serious work, since I'll want them for a lot more than just running my assembler.
I had to do something similar with the metacompiler I started out with at work when I first started in 6502 Forth. As the source code got longer as the project progressed, the metacompiler would sometimes, without explanation or error messages, just skip over a line or two of source code. I had to pick through the .hex code output to find the bug. (I find bugs in every commercial piece of software I use!) The supplier didn't want to fix it. Anyway, I had to do some hand assembly and comma-in the hex numbers.
However, note that the ; above is not necessary, since the primitive will need to end with something like JMP NEXT anyway. Also, once you set up the header and make the CFA point to the parameter field (ie, you're making a primitive), you don't need to turn STATE on at all, and then you won't need the [ and ] either. ; is an immediate word that lays down the unnest (or EXIT) and turns STATE off.
Quote:
I was about to reply to Curt's last post (I saved my reply in a text file, so I have it all in its original form even though he moved it), but if you like, we can start a new topic
Quote:
I have no objection to heavy discussion of FORTH in this topic [...]
cjs wrote:
GARTHWILSON wrote:
There's a little more to it, particularly in the matter of branching and labels, but It'll take me more time to describe that, which I'll do later if you're interested.
One idea I'm kicking around is to continue to use a list, use string elements for defining and referencing labels, maybe add some pseudo-ops, and feed the list into an assembler function. [...]
So what does your system handle now, and what are your thoughts on how you'd expand it to have more functionality?
Rather than posting a long listing of code, it'd better if I give it more explanation than source code. In fact, maybe I'll try to make it a web page for my site, with diagrams and other helps.
As for labels: My Forth assembler is not intended for doing whole applications, only Forth primitives, runtimes, subroutines, ML ISRs, etc., where there's really only local use of labels and they're close enough to branch to. Going further with descriptive labels for things like long jumps is possible, but not as easy.
Branching backwards was the easiest part, since you can use HERE to put an address on the stack during assembly and not even use a real label of any kind; then if a BNE for example needs to branch back to it, I would write BNE GO_BACK, where BNE only lays down the op code (with no concern for an operand), and GO_BACK takes the top data-stack item (which is an address in this case) and figures out the branch distance from that and the current dictionary pointer, and lays it down in the assembled code. If you need more than the one thing on the stack, or need to refer back to the same place from more than one branch, the normal SWAP, DUP, etc. words are available of course.
Doing forward branching with a one-pass assembler is slightly more of a challenge. What I did for that is that if you have for example BNE 1$, the 1$ (and the rest through 9$ and 0$, ten of them) are Forth words with an associated variable, and the 1$ records the place where a branch distance operand needs to get filled in later when the place to branch to is finally found. Then 1$: (with the colon) is another word that figures out the branch distance to fill in back where the 1$ (without the colon) was encountered. Besides the labels not being descriptive, it's somewhat less than ideal in that if you have multiple places that need to branch to the same instruction, you'll need multiple labels, meaning sometimes I'll have something like:
Code: Select all
1$: 2$: LDA_ZP,X TOS C,The code for them is this:
Code: Select all
: SET_BR ( var_addr -- ) HERE SWAP ! 0 C, ;
: RESOLVE_BR ( addr -- ) @ HERE OVER - 1- SWAP C! ;
WSIZE VAR 0‡ : 0$ 0‡ SET_BR ; : 0$: 0‡ RESOLVE_BR ;
WSIZE VAR 1‡ : 1$ 1‡ SET_BR ; : 1$: 1‡ RESOLVE_BR ;
WSIZE VAR 2‡ : 2$ 2‡ SET_BR ; : 2$: 2‡ RESOLVE_BR ;
<etc.>(The special character there, ‡, was chosen only for the unlikelihood of ever conflicting with anything else I do. Make it anything you want.)
So then you can have for example,
Code: Select all
\ Unfortunately there's no INC (DP,X) instruction to use below.
CREATE INCR PRIMITIVE ( addr -- )
LDA(ZP,X) 0 C, INA STA(ZP,X) 0 C, BNE 1$ INC_ZP,X 0 C,
LDA(ZP,X) 0 C, INA STA(ZP,X) 0 C, 1$: JMP POP ,
CREATE DECR PRIMITIVE ( addr -- )
LDA(ZP,X) 0 C, DEA STA(ZP,X) 0 C, CMP# FF C, BNE 1$ INC_ZP,X 0 C,
LDA(ZP,X) 0 C, DEA STA(ZP,X) 0 C, 1$: JMP POP ,In future improvements, I tentatively plan to use software buffers to set up things like local variables and arrays, constants (and in this case, label names with associated addresses), even code, and when I'm done with a buffer, I can delete it and reclaim the memory. Since it's at the opposite end of RAM from the main code I'm working on, deleting a buffer does not affect that area where the dictionary pointer is advancing as you compile or assemble things. Also, remaining buffers can get scooted around to close up holes in buffer memory, so memory never gets fragmented. I wrote words for buffers like this in Forth a few years ago, and wrote it up here. It was absolutely the worst I've ever done insofar as stack gymnastics go, and could definitely benefit from local variables to remedy that, which I need the buffers for. Chicken and egg. I will probably re-write the words in assembly language for performance before putting them to serious work, since I'll want them for a lot more than just running my assembler.
IamRob wrote:
In ProForth one creates a code word like this:
Code: Select all
: TEST ;CODE [ HEX 20 C, FC58 , A0 , ... etc ] ;I had to do something similar with the metacompiler I started out with at work when I first started in 6502 Forth. As the source code got longer as the project progressed, the metacompiler would sometimes, without explanation or error messages, just skip over a line or two of source code. I had to pick through the .hex code output to find the bug. (I find bugs in every commercial piece of software I use!) The supplier didn't want to fix it. Anyway, I had to do some hand assembly and comma-in the hex numbers.
However, note that the ; above is not necessary, since the primitive will need to end with something like JMP NEXT anyway. Also, once you set up the header and make the CFA point to the parameter field (ie, you're making a primitive), you don't need to turn STATE on at all, and then you won't need the [ and ] either. ; is an immediate word that lays down the unnest (or EXIT) and turns STATE off.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Embedding/Manipulating Assembler in Higher Level Languag
Garth, that was an excellent reply, just the kind of thing I was looking for. (Though I did have to go and do a bit of research to remember what Forth immediate words words are. It has been many decades since I have typed even a word of Forth.)
Yes, that's pretty much exactly where I'm at, too, though perhaps with one caveat. My unit test system loads an entire assembly (which is potentially huge) into simulator's memory and also returns and the assembly's symbol table as a Python object. So I am often using those symbols in new code I assemble and add to the loaded assembly.
From looking at what you've written, it seems that your assembly system is working very much the way the standard Forth word compiler works: you start with CREATE ____ PRIMITIVE to start building a new word and then non-immediate words get compiled, immediate words to deal with things like branch targets get executed, and then , finishes off the whole thing. Is that right?
And though your examples use this along the lines of inline assembler, it seems to me that one could of course use a different dictionary and produce code for another system, in words that one would save and then load into the other system. So you could do this assembly in Forth running on a modern PC, producing words for a 6502 system, right? (Isn't this essentially how a Forth metacompiler works?)
Also, do you have any way to mix Forth and assembly compilation within a single word, or does a word have be one or the other? Can assembly code call into Forth code?
GARTHWILSON wrote:
As for labels: My Forth assembler is not intended for doing whole applications, only Forth primitives, runtimes, subroutines, ML ISRs, etc., where there's really only local use of labels and they're close enough to branch to.
From looking at what you've written, it seems that your assembly system is working very much the way the standard Forth word compiler works: you start with CREATE ____ PRIMITIVE to start building a new word and then non-immediate words get compiled, immediate words to deal with things like branch targets get executed, and then , finishes off the whole thing. Is that right?
And though your examples use this along the lines of inline assembler, it seems to me that one could of course use a different dictionary and produce code for another system, in words that one would save and then load into the other system. So you could do this assembly in Forth running on a modern PC, producing words for a 6502 system, right? (Isn't this essentially how a Forth metacompiler works?)
Also, do you have any way to mix Forth and assembly compilation within a single word, or does a word have be one or the other? Can assembly code call into Forth code?
Curt J. Sampson - github.com/0cjs
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Embedding/Manipulating Assembler in Higher Level Languag
cjs wrote:
Garth, that was an excellent reply, just the kind of thing I was looking for. (Though I did have to go and do a bit of research to remember what Forth immediate words words are.
Yes; "immediate" just means that when the word is encountered during compilation, it normally gets executed rather than compiled. (I say "normally" because there is also a way to compile immediate words.) That's one of the beauties of Forth. Even though the compiler is only something like a hundred bytes long (I haven't counted them), you can write a new word, even a word that defines other words, or forms a new structure of your own invention, and define both its compile-time behavior and its runtime behavior.
I won't go into that much here though, because when we assemble an assembly-language section in Forth, we're usually not in compile mode, ie, flag variable STATE is usually off (ie, the variable just contains a 0), meaning the input stream of source code gets executed, not compiled.
I can see how this could be confusing if not described well. I'll give it a try. When STATE is ON (ie, non-0), words encountered in the input stream get compiled, not executed (unless they're immediate). When STATE is OFF (ie, contains 0), words encountered are interpreted, ie, looked up in the dictionary to find their addresses to run right now. Assembly happens this way, ie, with STATE off. Take the example case of LDA#. If STATE were ON, the two-byte address of LDA# would get compiled (in ITC or DTC Forth, not STC). This is not what we want. We want it to actually do its job at that time, which is to lay down the machine-language op code, in this case $A9. In RPN assemblers (unlike mine), an LDA word would have more to do, to take what you put on the stack and determine which addressing mode to use, and lay down the correct op code and the correct length of operand (if any), or even report an error if for example you specified an addressing mode that the 6502 does not support for a particular instruction, like INC (ZP,X).
Quote:
It has been many decades since I have typed even a word of Forth.)
My work goes in waves, and there are long periods where I don't do any more than just the basics myself; so sometimes I have to look stuff up again too.
Quote:
From looking at what you've written, it seems that your assembly system is working very much the way the standard Forth word compiler works: you start with CREATE ____ PRIMITIVE to start building a new word and then non-immediate words get compiled, immediate words to deal with things like branch targets get executed, and then , finishes off the whole thing. Is that right?
For non-Forthers: Vocabularies let you have different meanings of words, depending on context. An example that may be silly but makes the point is that "ball" has different meanings depending on whether the context is baseball, Times Square, dance, or bearings.
Quote:
And though your examples use this along the lines of inline assembler, it seems to me that one could of course use a different dictionary and produce code for another system, in words that one would save and then load into the other system. So you could do this assembly in Forth running on a modern PC, producing words for a 6502 system, right? (Isn't this essentially how a Forth metacompiler works?)
Yes; that would be a metacompiler. Forming a new system with a metacompiler works really neat if it works right. Unfortunately the only one I've used (around 1990) had some pretty serious bugs.
Quote:
Also, do you have any way to mix Forth and assembly compilation within a single word, or does a word have be one or the other?
You can embed some assembly language in an otherwise high-level definition, like this:
Code: Select all
: FOOBAR ( n1 n2 - n3 n4 n5 f )
BEGIN
<do_stuff_in_Forth>
<do_more_in_Forth>
WHILE
<do_something_else_in_Forth>
INLINE
<do_some_assembly_language>
<do_some_more_assembly_language>
END_INLINE
<do_some_more_in_Forth>
REPEAT ;I've written the INLINE and END_INLINE words per Bruce Clark's suggestion from F-PC, but have not used them yet, and I suspect it's rare. More common would be to just write the assembly-language part as a primitive and refer to that in the Forth word.
Quote:
Can assembly code call into Forth code?
You can do basically anything you want; but my initial reaction to this one is one of avoidance. The closest I've come is that in my '816 Forth's system of prioritized very-low-overhead interrupt support, the interrupts that need the greatest performance in their servicing are of course serviced in assembly language, but then if there's a condition that needs special attention like to tell the user that there's a problem and ask him how he wants to handle it, it passes it on to a different ISR that handles it in high-level Forth.
I am mostly writing from a standpoint of indirect-threaded code (ITC) or direct-threaded code (DTC) Forth. I have not worked with subroutine-threaded code (STC) Forth which is where basically everything is converted to machine language, and can be a really hot performer, especially if the compiler's optimizer is intelligent enough to understand your goal and rework the approach to the way that's best suited for how the particular microprocessor works. In that case, for highly complex processors, there's almost no point in doing your own sections of assembly language, because there'd nothing to be gained. I took the following from Forth Inc.'s website. Their SwiftForth optimizer takes the following Forth source code for the common Forth word DIGIT
and turns out this assembly language:
with fewer instructions in machine language than in the Forth version! No one has written this kind of thing for the 6502 or '816 though, probably because there's just not the market to justify the number of programming man-hours to make such a product for 65xx, and also because 65xx (especially '816) assembly language is simple enough to write by hand. My '816 version is hardly any longer.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Embedding/Manipulating Assembler in Higher Level Languag
GARTHWILSON wrote:
I can see how this could be confusing if not described well.
Anyway, I think I see what's going on here. Let me know if I've got any major errors in the following summary of my understanding of the Forth part of this.
Essentially there are two main interpreters in Forth. The first is the runtime interpreter (let's call it "interpreter I," unless there's a more standard name for it) which reads lexical tokens, looks them up in a dictionary (I am eliding parsing of numeric literals for the moment), and immediately executes the routines to which they refer. Then there is the "compiler" interpreter ("interpreter C") which reads lexical tokens and looks them up in the dictionary, but then checks to see if they're marked "immediate" or not. If the word is marked immediate it's executed just as with interpreter I, but if not a reference to the word is instead appended to a buffer it's using to build a new word to be added to the dictionary.
So a when interpreter I reads a ":", interpreter C is started, which reads the next lexical token as a name for the new word, allocates a buffer somewhere in which to build the word, and then continues parsing as described for interpreter C above. Eventually it reaches a ";" and, upon reading that, does whatever it needs to do to finish up the generation of the new word and then returns control to interpreter I.
(As a side note, this idea of multiple interpreters all sharing lexical analysis and data about the current system but interpreting identical series of lexical tokens in various different ways is central to Lisp, too. Central even to Lisp without macros, I mean; macros add a whole new level to this of course.)
Presumably it would be possible to add further interpreters that could also take over reading of the input stream and do their own thing, if they wanted to. (This may or may not require some changes to the core of the interpretation system.) One could use such a technique to write an interpreter for an assembly language compatible with Forth's lexical analyzer.
But in your case you've decided to embed assembly in a different way, by having words read and executed by interpreter I, which set up and use their own data areas to build and register new Forth words. And "C," and "," (used to generate instructions and finalize the new word, respectively?) are nothing special to the interpreter but just words to be called like any other.
This is very interesting, and I realize now that a similar technique is available to me in Python. Currently I seem to be heading towards the "separate interpreter" approach, where I build up a program in the language and then apply an interpreter function to it, e.g.,
Code: Select all
# `ORG`, `LDX` etc. are just a variable names, bound in this environment.
count = 4
init = [ ORG, 0x300, LDX, count ]
loop = [ 'loop:', DEX, BNE, 'loop', RTS ]
obj = assemble(init + loop)
Code: Select all
count = 4
obj = Assembly() # Object constructor.
.org(0x300) # Methods return `self` to allow
.ldx(count) # method call chaining.
obj.label('loop') # You can resume assembly after interspersing
.dex() # other host language code.
.bne('loop')
.rts()
Quote:
In that case, for highly complex processors, there's almost no point in doing your own sections of assembly language, because there'd nothing to be gained.
Curt J. Sampson - github.com/0cjs
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Embedding/Manipulating Assembler in Higher Level Languag
It's simpler than that, and a new word gets formed directly where it will reside, without first going to a buffer or other data area. I'll try to find a concise way to describe it though, so we don't lose any remaining readers; maybe a flowchart. I could just post the code which is pretty short but still requires ability to read Forth, and then they won't use the ideas to apply to other languages either.
With such clear and descriptive variable names as STATE, how could anybody get confused?
It's the compiler's state of being on or off.
Well, perhaps not for optimization, but there are other good reasons to have in-line assembly, such as calling a routine written in another language or, perhaps, direct access to hardware.
Perhaps a nod to the routines written in other languages; but high-level Forth itself does not keep you from directly accessing hardware, only processor flags. You can try ideas in high-level Forth and then re-write the words as primitives (ie, in assembly language) without changing any of the code that uses them.
For a simple example, here're my words to initialize my 8-bit D/A and then write to it. It's a DAC0808 converter with an 8-bit parallel input and no clocking, and 150ns typical settling time. It's connected to VIA #2's port A, and initializing only involves making all 8 bits of VIA2PA to be outputs, by writing $FF (all 1's) to Data Direction Register A (DDRA). Then writing to the D/A converter only requires writing to Port A (PA) itself. First purely in Forth:
and now in assembly for my workbench computer's onboard one-pass assembler:
I use Y there because my '816 Forth keeps the index registers in 8-bit almost full time and the accumulator in 16-bit, and here I want an 8-bit operation. In the 6502 Forth, I just use A instead of Y. TOS_LO is top of stack (top data-stack cell), low byte. In the DP,X (or ZP,X) addressing modes, TOS_LO is just a 0, so you get 0,X.
Note that you can use almost any characters in Forth names, including <>/?!@#$%^&*+\|;:' and characters that aren't on the keyboard, so you could have a word with the name ±½°F or Ω/" (ohms per inch) for example. The Greek characters are especially useful in engineering. There are only a few you wouldn't want to put in a name, like <NUL>, <CR>, <LF>, <FF>, and sometimes ones that can be used as delimiters like " and ).
cjs wrote:
GARTHWILSON wrote:
I can see how this could be confusing if not described well.
Quote:
Quote:
In that case, for highly complex processors, there's almost no point in doing your own sections of assembly language, because there'd nothing to be gained.
Perhaps a nod to the routines written in other languages; but high-level Forth itself does not keep you from directly accessing hardware, only processor flags. You can try ideas in high-level Forth and then re-write the words as primitives (ie, in assembly language) without changing any of the code that uses them.
For a simple example, here're my words to initialize my 8-bit D/A and then write to it. It's a DAC0808 converter with an 8-bit parallel input and no clocking, and 150ns typical settling time. It's connected to VIA #2's port A, and initializing only involves making all 8 bits of VIA2PA to be outputs, by writing $FF (all 1's) to Data Direction Register A (DDRA). Then writing to the D/A converter only requires writing to Port A (PA) itself. First purely in Forth:
Code: Select all
: INIT_D/A FF VIA2DDRA C! ; ( -- )
: WR_D/A VIA2PA C! ; ( c -- )and now in assembly for my workbench computer's onboard one-pass assembler:
Code: Select all
CODE INIT_D/A ( -- )
LDY# FF C,
STY_ABS VIA2DDRA ,
JMP NEXT ,
CODE WR_D/A ( byte -- )
LDY_DP,X TOS_LO C,
STY_ABS VIA2PA ,
JMP POP ,I use Y there because my '816 Forth keeps the index registers in 8-bit almost full time and the accumulator in 16-bit, and here I want an 8-bit operation. In the 6502 Forth, I just use A instead of Y. TOS_LO is top of stack (top data-stack cell), low byte. In the DP,X (or ZP,X) addressing modes, TOS_LO is just a 0, so you get 0,X.
Note that you can use almost any characters in Forth names, including <>/?!@#$%^&*+\|;:' and characters that aren't on the keyboard, so you could have a word with the name ±½°F or Ω/" (ohms per inch) for example. The Greek characters are especially useful in engineering. There are only a few you wouldn't want to put in a name, like <NUL>, <CR>, <LF>, <FF>, and sometimes ones that can be used as delimiters like " and ).
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Embedding/Manipulating Assembler in Higher Level Languag
I just figured out how switching parsers works; it's explained on the Wikipedia Forth page at Parsing words and comments. Basically, words are able to consume the current input stream from the point just after they are executed, do what they will, and then when the main parser resumes it will continue after the text consumed by the parsing word.
Well, that's fair enough. Essentially, the "buffer" being filled is the target area of memory where the word will reside, then. Pretty much the same general idea.
Sure; for simple things where the equivalant of BASIC's PEEK(), POKE, IN() and OUT will do the trick, that's fine. But I was thinking of things like using the Apple II disk controller directly, where timing constraints are critical (certain loops must be a specific number of cycles), or switching in different bank of memory and copying something from it (where the Forth interpreter may no longer be mapped into the address space). For these sorts of things it may be possible to do them only with machine code.
Yup, I'm used to that. Lisp is similar; in the expression (- 5 3), the minus sign is just another variable name that happens to be bound to a particular function. You could even have fun confusing people, if that turns your crank, by using (let ((+ -)) (+ 5 3)) to do that same subtraction.
(The reader is in fact just an ordinary Lisp function, so you're free to replace it with something else if you like using, I don't know, BEGIN and END instead of ( and ), or even want a totally different syntax. Perhaps some people would find more pleasing a reader that interprets ' + ' - let 5 3 + as my let subtraction example above.)
BTW, just out of curiousity, what character set and encoding are you using on your system with those Greek characters and whatnot, and how do you deal with displaying them?
GARTHWILSON wrote:
It's simpler than that, and a new word gets formed directly where it will reside, without first going to a buffer or other data area.
Quote:
...but high-level Forth itself does not keep you from directly accessing hardware, only processor flags.
Quote:
Note that you can use almost any characters in Forth names...
(The reader is in fact just an ordinary Lisp function, so you're free to replace it with something else if you like using, I don't know, BEGIN and END instead of ( and ), or even want a totally different syntax. Perhaps some people would find more pleasing a reader that interprets ' + ' - let 5 3 + as my let subtraction example above.)
BTW, just out of curiousity, what character set and encoding are you using on your system with those Greek characters and whatnot, and how do you deal with displaying them?
Curt J. Sampson - github.com/0cjs