Questions about 6502 assemblers

BigEd · Post by **BigEd** » Thu Sep 10, 2020 2:07 pm

Oh, I see. Yes, if you implement your assembler as a series of transformations, it might well make sense to transform all values into decimal strings. (There's an amusingly extreme historical precedent, IBM's 63-pass Fortran compiler.)

secamline · Post by **secamline** » Thu Sep 10, 2020 7:11 pm

Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:

Code: Select all

lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.

drogon · Post by **drogon** » Thu Sep 10, 2020 7:33 pm

secamline wrote:

Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:

Code: Select all

lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.

It's an interesting and very "edge" case though - and even ca65 gives up on this:

Code: Select all

000000r 1               
000000r 1               	.org	$FC
0000FC  1  AD FF 00     	lda	label
0000FF  1  00           label:	.byte	0
000100  1               
000100  1               	.org	$FD
0000FD  1  AD 00 01     	lda	label2
000100  1  00           label2:	.byte	0
000100  1

However it did print a warning:

Code: Select all

foo.s(3): Warning: Didn't use zeropage addressing for 'label'

Which makes me think that it did 2 passes but then did a 3rd 'sanity check' but didn't do a 3rd pass for a final resolution.

-Gordon

GARTHWILSON · Post by **GARTHWILSON** » Thu Sep 10, 2020 8:11 pm

When you write the code, you'll normally put ZP assignments before the code that uses them; so that code will know, on first pass, that they're only 8-bit addresses.

secamline · Post by **secamline** » Thu Sep 10, 2020 8:33 pm

So if an instruction other than a branching condition happens to contain a label, I can just treat it as an absolute address. What a relief!
By the way, I've already started rewriting the code from scratch. I replaced the regex dictionnary with a new "Instruction" class that parses the opcode, the operand (or the label if we only want to get the instruction's size) and the addressing mode (represented by an arbitrary numerical constant). Then I guess I can simply use the opcode and the addressing mode to fetch the actual instruction number in a list or a dictionnary.
I hope I've done things the right way this time.

GARTHWILSON · Post by **GARTHWILSON** » Thu Sep 10, 2020 10:42 pm

secamline wrote:

So if an instruction other than a branching condition happens to contain a label, I can just treat it as an absolute address. What a relief!

Yes, the vast majority of the cases should be pretty simple in that regard. If there really is no way for the assembler to know, on first pass, whether to do the absolute or ZP version, and it happens to get it wrong, it will find phase errors when it does the second pass, and it should do a third pass to resolve them (or as many passes as necessary).

Quote:

By the way, I've already started rewriting the code from scratch. I replaced the regex dictionary with a new "Instruction" class that parses the opcode, the operand (or the label if we only want to get the instruction's size) and the addressing mode (represented by an arbitrary numerical constant). Then I guess I can simply use the opcode and the addressing mode to fetch the actual instruction number in a list or a dictionary.

I'm not sure what you mean here. The op code is the two-digit hex number that tells the instruction and the addressing mode. Maybe you meant "parses the mnemonic," rather than the op code? The mnemonic is the three-letter abbreviation for the instruction. (Other processors have differing numbers of letters, or even varying numbers of letters, like PIC's BSF, CALL, MOVLW, CLRWDT, etc., different lengths.) The parser will need to be able to figure out the addressing mode from things like # (for immediate), parentheses (for indirects), and a comma (for indexing), with also the ability to figure out other things that may need to be computed such as adding to an array's base address

Code: Select all

        LDA  table + $10

where it needs to get the address of the table and then add 10 (hex) to it to get the operand; or

Code: Select all

        LDY  # SomeAddress & $FF

where it needs to get the low byte of an address.

It will also need to be able to figure out assembler directives, like EQUates, ORiGin (ie, what address the subsequent code will start at), laying down strings or data bytes or words, reserving a specified number of bytes for data not yet known, etc..

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Sep 11, 2020 3:37 am

secamline wrote:

Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:

Code: Select all

lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.

The usual advice is for the programmer to declare zero page locations before writing any instructions that reference them. Otherwise, almost all assemblers take the safe path and assume the location is an absolute address.

John West · Post by **John West** » Fri Sep 11, 2020 8:58 am

secamline wrote:

Ok, thanks!

Code: Select all

lda label
...
label:
...

If you want a two pass assembler, the way to deal with this is to assign values to labels during the first pass. If you see a label that hasn't been defined yet being used, assume that it's 16 bit.

There's another catch:

Code: Select all

    lda label
    ...
    label = 10

On the first pass, the LDA doesn't know what label is and assumes it's 16 bit. On the second, it's known to be small. The assembler might be tempted to use the zero page version of LDA here, but that would potentially change the value of other labels, invalidating everything that has been done so far. The assembler has to remember that the label had been assumed to be 16 bit and stick with that assumption after its value is discovered.

Your method of replacing the text of a label with the text of its value could make this difficult. My previous assemblers always have a struct for labels, with flags carrying information like this: is the value known? was the value assumed to be 16 bit? On the first pass, a use of an undefined label will create a struct with the value marked as unknown. When it gets defined later, that struct gets updated instead of creating a new one. On the second pass, all labels that are used must have known values; using one that's undefined or unknown is an error.

My current assembler has to take a more complicated approach, which Garth described earlier in the thread: allow instructions to change size when more information becomes available, and keep repeating passes until things stop changing. I had to do it this way, as this assembler is for an extension of the 6502 that has branches with one and two byte offsets. You really want short forward branches to use the small offset, and it's not possible to insist that the destination label be defined before the instruction that uses it.

It's possible to set up a situation where these changes oscillate and it will never reach stability. To avoid that, each decision can be changed in only one direction: I start by assuming that every operand is one byte, and if one pass sees a value that won't fit in a byte, that instruction gets switched to a two byte operand. But it can never switch back. That has worked well.

But as I said, I only did it this way because I had to. A simpler assembler for a simpler CPU can get away with two passes, insisting that the label has to be defined first if you want zero page.

Druzyek · Post by **Druzyek** » Fri Sep 11, 2020 1:00 pm

Quote:

It's possible to set up a situation where these changes oscillate and it will never reach stability. To avoid that, each decision can be changed in only one direction: I start by assuming that every operand is one byte, and if one pass sees a value that won't fit in a byte, that instruction gets switched to a two byte operand. But it can never switch back. That has worked well.

Another idea is to allow an operand to change from one byte to two bytes then back to one byte as needed but limit the number of passes the assembler can do. If it reaches 10 passes or however many you're willing to wait and the operand is still oscillating, you could fix it at 2 bytes.

secamline · Post by **secamline** » Sun Sep 13, 2020 4:10 pm

So far I'm back to where I stopped when using regex, the assembler does work with all addressing modes, decimal, hex, binary, and the labels are resolved correctly. You've all submitted many ideas, now where should I start? I guess I should implement the basic "org", "db" and "dw" pseudo instructions.

Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers

Re: Questions about 6502 assemblers