So since everybody keeps talking about how trivial it is to write an assembler in Forth, I tried it myself. Introducing "A Typist's 65c02 Assembler in Forth" (
https://github.com/scotws/tasm65c02), a cross-assembler in gforth with labels and a modified syntax aimed at ten-finger typists. This is the first BETA version (and my first time on GitHub).
Background I need the practice in more complex Forth, and I'm probably going to have to write my own 65816 assembler at some point anyway, so I thought I'd start out with something I know. The program is brute force in its approach: Whereas Forth assemblers traditionally aim for the smallest possible memory footprint, this one assumes lots of RAM and processing power, but should make adapting it to a different processor easy. Also, the syntax tries to be far more user-friendly.
SyntaxBecause of the focus on using as little memory as possible, Forth assemblers tend to have strange syntaxes. Take this well-known version from William Raqsdale (1982,
http://www.forth.org/fd/FD-V03N5.pdf):
Code:
.A ROL,
1 # LDY,
DATA ,X STA,
DATA ,Y CMP,
6 X) ADC,
POINT )Y STA,
VECTOR ) JMP,
That's bizarre with brackets closing that were never opened and commas on the wrong side of letters. I always felt that this was too alien and too hard to read. Actually, I've never liked conventional assembler syntax anyway because all those "$" and "(" require shift keys and make it hard (or at least slower) for ten-finger typists. So if I was going to have to roll my own syntax, I decided I might as well aim for a "typist friendly" variant. Hence the name.
Everything (well, almost everything) in the Typist's Assembler is lower-case and the addressing modes are added to the opcode after a dot (the "tail"), with Absolute Mode being the "tailless" version. This gives us:
Code:
implied dex dex
accumulator inc inc.a
absolute lda $1000 1000 lda
immediate lda #$00 00 lda.#
absolute x indexed lda $1000,x 1000 lda.x
absolute y indexed lda $1000,y 1000 lda.y
absolute indirect jmp ($1000) 1000 jmp.i
indexed indirect jmp ($1000,x) 1000 jmp.xi
zero page lda $10 10 lda.z
zero page x indexed lda $10,x 10 lda.zx
zero page y indexed lda $10,y 10 lda.zy
zero page indirect lda ($10) 10 lda.zi
zp indirect x indexed lda ($10,x) 10 lda.zxi
zp indirect y indexed lda ($10),y 10 lda.ziy
relative bne $2000 2000 bne
There is one special case: Because AND is also a Forth word, its Absolute Addressing opcode gets a dot as "and." We don't need dollar signs because Forth uses HEX and DECIMAL and whatnot. Note the "i" for indirect mode mirrors the placement of the bracket in the conventional syntax. A small loop example:
Code:
lda #$00 00 lda.#
tax tax
loop1: -> loop1
sta $1000,x 1000 sta.x
dex dex
bne loop1 loop1 bne
Operand comes before opcode as usual in Forth; formatting alignes the opcode body in a "column". The alignment is the part that takes the most getting used to so far -- at some point I'll bite the bullet and set up the correct vi functions to automate this. Note that "lda.#" doesn't violate the "no uppercase" rule because it is lowercase on a German keyboard. YKMV.
LabelsForth assemblers have historically tried to avoid labels because of the space thing. As Brad Rodriguez puts it in his (absolutely invaluable, don't-try-this-at-home-without-it) articles on Forth assemblers (
http://www.bradrodriguez.com/papers/tcjassem.txt):
Quote:
(F)orth assemblers favor label-free, structured assembly code for a pragmatic reason: in Forth, it's simpler to create assembler structures than labels! The structures commonly included in Forth assemblers are intended to resemble the programming structures of high-level Forth.
Except that I really like labels. It turns out that "backward references" (like the one in the loop above) are trivial, but that for "single-pass assemblers" like this one, "forward references" are a major pain. I actually ended up reading a book on this (
Assemblers and Loaders, David Salomon 1993,
http://www.davidsalomon.name/assem.adve ... semAd.html) and then figuring out how to do single-linked lists in Forth.
That part was not quite as trivial.
The result is not optimal, but this is as far as I can go with my current skill level. Backward references just get the label introduced with "->" as above. Forward references have to distinguish between jumps and branches. They are prefixed with a special command, either "j>" or "b>" (yes, those are upper case, but they are the easiest to see).
Code:
j> frog jsr
nop
b> dogs bra
nop
-> dogs
brk
-> frog
inc
dogs bra
(Yes, I know that code won't run.) Note that once the label is defined, we revert to "normal" use of labels. What happens under the hood is that each j> and b> adds an entry in that label's list and a dummy value in the assembled machine code. When the label is reached, that list is unwound, the dummy values are replaced, and the definition is replaced by a simple, new one. I tried to write the code in such a way that expanding the system for 65816 references will hopefully be fairly easy.
UsageIn gforth, you load the program and then the file you want to assemble.
Code:
include tasm65c02.fs
include example.fs
Note the .fs file type -- as far as the machine is concerned, this is a Forth file, not assembler. If everything goes right, you end up with ( addr u ) as the location and size of the machine code on the stack. DUMP will print this on the screen, and there is a SAVE <filename> assembler command you can use. There are a few more commands; see MANUAL.txt and the files example.fs and rom.fs for details.
ProblemsThe syntax breaks the standard for just about every assembler ever, and the "column" alignment can be fussy without editor support (pending). The forward reference code in its current state is non-elegant. There is currently no way to include external assembler files. The program is fairly large for what it does (13 kb with comments), and relies on the hardware for speed. The program uses some gforth specific Forth words such as NEXTNAME not available in ANSI Forth.
ConclusionExcept for the part with the forward references and list structures (which I assume real computer people learn in college), writing the assembler with this brute-force approach was in fact downright trivial, just as advertised. I can see how somebody who actually knows what he or she is doing can write them in a few hours. I would strongly recommend anybody who is interested in Forth tries this themselves, even if it is only for the "wow" effect. Though I've just started dogfooding Typist's Assembler, I can already say that programming 65c02 assembler in Forth (which is what this amounts to) is a whole different game. Normal "macro" functions seem primitive and having to put all the commands in separate lines restricting. My appreciation for Forth has grown enormously.
(Thanks again to Brad for the great article which gave me the push to try this, even if I ended up doing things differently.)