secamline wrote:
Ok, thanks!
Code:
lda label
...
label:
...
If you want a two pass assembler, the way to deal with this is to assign values to labels during the first pass. If you see a label that hasn't been defined yet being used, assume that it's 16 bit.
There's another catch:
Code:
lda label
...
label = 10
On the first pass, the LDA doesn't know what label is and assumes it's 16 bit. On the second, it's known to be small. The assembler might be tempted to use the zero page version of LDA here, but that would potentially change the value of other labels, invalidating everything that has been done so far. The assembler has to remember that the label had been assumed to be 16 bit and stick with that assumption after its value is discovered.
Your method of replacing the text of a label with the text of its value could make this difficult. My previous assemblers always have a struct for labels, with flags carrying information like this: is the value known? was the value assumed to be 16 bit? On the first pass, a use of an undefined label will create a struct with the value marked as unknown. When it gets defined later, that struct gets updated instead of creating a new one. On the second pass, all labels that are used must have known values; using one that's undefined or unknown is an error.
My current assembler has to take a more complicated approach, which Garth described earlier in the thread: allow instructions to change size when more information becomes available, and keep repeating passes until things stop changing. I had to do it this way, as this assembler is for an extension of the 6502 that has branches with one and two byte offsets. You really want short forward branches to use the small offset, and it's not possible to insist that the destination label be defined before the instruction that uses it.
It's possible to set up a situation where these changes oscillate and it will never reach stability. To avoid that, each decision can be changed in only one direction: I start by assuming that every operand is one byte, and if one pass sees a value that won't fit in a byte, that instruction gets switched to a two byte operand. But it can never switch back. That has worked well.
But as I said, I only did it this way because I had to. A simpler assembler for a simpler CPU can get away with two passes, insisting that the label has to be defined first if you want zero page.