Fixed it, thanks again for the help. Not happy with the code, though, because this is how it works now:
The assembler source code is actually a Forth program, and as such single pass by definition. This is fine when we define labels before we use them:
Code:
-> mylabel
is simply a variable with the right address, which "->" defines (if you're new to Forth, yes, that's a legal instruction, because "special characters" aren't special). But when we use an undefined forward references, the Forth interpreter has no way of knowing that "mylabel" is a label, and throws and error when it can't find an instruction with that name in the dictionary.
So we need a way to tell the assembler that this is a label, but we don't know where it will go, and to have it reserve some space for it, depending what instruction we're using. The classic single-pass assembler way of doing this is by link lists, one for each unknown label, that are later resolved when the label is finally defined.
Right now, this works by having a special instruction for every kind of branch or jump.
Code:
b> bottomlink bra
j> bottomlink jmp
j> bottomlink jsr
jl> bottomlink jmp.l \ JML
jl> bottomlink jsr.l \ JSL
bl> bottomlink bra.l \ BRL
-> bottomlink nop
For instance, B> triggers a common routine ("addlabel") that either creates a new list of unresolved references or adds a new link to an existing one. It also saves a link to the word ("subroutine") that will later replace a dummy value in the list with the real address ("dummy>rel" in this case). The BRA instruction saves its own opcode and reserves one byte for the offset, while JMP.L (JML) saves three bytes along with its opcode. When "->" finally comes around with the actual label, each dummy replacement word is triggered triggered and the address or offset calculated.
(Actually, the assembler currently it doesn't save a link to the replacement routines but an offset to a jump table with the addresses of those routines, which is an extra level of indirection that obviously has to go but was useful while I was figuring out what the hell I was doing.)
These "label prefixes" were only mildly annoying when it was just B> and J> for the 65c02. Now that we have the 65816 with JL> and BL> as well, this is getting to be a bit much. Also, we're defining the type of branch or jump twice: BRA.L (BRL) should know that it is a long branch without any "BL>". The current setup violates DRY - "don't repeat yourself".
So I'm considering a different solution. First, add a "reserve this label" instruction so that the assembler knows the next name is a label that is going to be used later.
Code:
reserve bottomlink
bottomlink bra
bottomlink jmp
bottomlink jsr
bottomlink jmp.l \ JML
bottomlink jsr.l \ JSL
bottomlink bra.l \ BRL
-> bottomlink nop
(PRELABEL, ANTICIPATE, AWAIT or STASH might be better than RESERVE -- EXPECT is used by Gforth, unfortunately -- but that's details) This will also get rid of the pesky ">" character, which is a pain to type. Then, when it comes time to replace the dummy values with the actual offset or address, have the assembler go back one character from the gap we left and re-exame the opcode to find out what kind of branch/jump instruction this is. If it's $82, we have BRA.L (BRL), and we need to call the routine that calculates a 16-bit offset, if it's $4C, we're dealing with a vanilla jump instruction. We can do this without a second pass.
I'm sure there's a more hardcore Forth way of doing this -- maybe when a word is not found in the dictionary, interrupt the error sequence, check if the next word is something like BRA, BRA.L, or JMP, and if yes, define it as a new label, for instance. This seems a bit
too clever at the moment for my skill level, though
. If anybody sees a different solution, I'd be most grateful.