6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat May 18, 2024 10:18 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu Sep 10, 2020 2:07 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
Oh, I see. Yes, if you implement your assembler as a series of transformations, it might well make sense to transform all values into decimal strings. (There's an amusingly extreme historical precedent, IBM's 63-pass Fortran compiler.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 7:11 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:
Code:
lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 7:33 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1413
Location: Scotland
secamline wrote:
Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:
Code:
lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.


It's an interesting and very "edge" case though - and even ca65 gives up on this:

Code:
000000r 1               
000000r 1                  .org   $FC
0000FC  1  AD FF 00        lda   label
0000FF  1  00           label:   .byte   0
000100  1               
000100  1                  .org   $FD
0000FD  1  AD 00 01        lda   label2
000100  1  00           label2:   .byte   0
000100  1               


However it did print a warning:

Code:
foo.s(3): Warning: Didn't use zeropage addressing for 'label'


Which makes me think that it did 2 passes but then did a 3rd 'sanity check' but didn't do a 3rd pass for a final resolution.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 8:11 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
When you write the code, you'll normally put ZP assignments before the code that uses them; so that code will know, on first pass, that they're only 8-bit addresses.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 8:33 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
So if an instruction other than a branching condition happens to contain a label, I can just treat it as an absolute address. What a relief!
By the way, I've already started rewriting the code from scratch. I replaced the regex dictionnary with a new "Instruction" class that parses the opcode, the operand (or the label if we only want to get the instruction's size) and the addressing mode (represented by an arbitrary numerical constant). Then I guess I can simply use the opcode and the addressing mode to fetch the actual instruction number in a list or a dictionnary.
I hope I've done things the right way this time.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 10:42 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
secamline wrote:
So if an instruction other than a branching condition happens to contain a label, I can just treat it as an absolute address. What a relief!

Yes, the vast majority of the cases should be pretty simple in that regard. If there really is no way for the assembler to know, on first pass, whether to do the absolute or ZP version, and it happens to get it wrong, it will find phase errors when it does the second pass, and it should do a third pass to resolve them (or as many passes as necessary).

Quote:
By the way, I've already started rewriting the code from scratch. I replaced the regex dictionary with a new "Instruction" class that parses the opcode, the operand (or the label if we only want to get the instruction's size) and the addressing mode (represented by an arbitrary numerical constant). Then I guess I can simply use the opcode and the addressing mode to fetch the actual instruction number in a list or a dictionary.

I'm not sure what you mean here. The op code is the two-digit hex number that tells the instruction and the addressing mode. Maybe you meant "parses the mnemonic," rather than the op code? The mnemonic is the three-letter abbreviation for the instruction. (Other processors have differing numbers of letters, or even varying numbers of letters, like PIC's BSF, CALL, MOVLW, CLRWDT, etc., different lengths.) The parser will need to be able to figure out the addressing mode from things like # (for immediate), parentheses (for indirects), and a comma (for indexing), with also the ability to figure out other things that may need to be computed such as adding to an array's base address
Code:
        LDA  table + $10
where it needs to get the address of the table and then add 10 (hex) to it to get the operand; or
Code:
        LDY  # SomeAddress & $FF
where it needs to get the low byte of an address.

It will also need to be able to figure out assembler directives, like EQUates, ORiGin (ie, what address the subsequent code will start at), laying down strings or data bytes or words, reserving a specified number of bytes for data not yet known, etc..

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 11, 2020 3:37 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8190
Location: Midwestern USA
secamline wrote:
Ok, thanks!
I just realized that I may encounter some problems with the labels.
Let's say I'm trying to assemble this piece of code:
Code:
lda label
...
label:
...

lda could either be a 1 byte instruction or a 2 bytes instruction depending on wether label is a zeropage address or not.
I have to determine the address of "label" to know the length of the instruction. But to so I have to know the length of the instruction.
It's kind of problematic.

The usual advice is for the programmer to declare zero page locations before writing any instructions that reference them. Otherwise, almost all assemblers take the safe path and assume the location is an absolute address.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 11, 2020 8:58 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298
secamline wrote:
Ok, thanks!
Code:
lda label
...
label:
...


If you want a two pass assembler, the way to deal with this is to assign values to labels during the first pass. If you see a label that hasn't been defined yet being used, assume that it's 16 bit.

There's another catch:
Code:
    lda label
    ...
    label = 10

On the first pass, the LDA doesn't know what label is and assumes it's 16 bit. On the second, it's known to be small. The assembler might be tempted to use the zero page version of LDA here, but that would potentially change the value of other labels, invalidating everything that has been done so far. The assembler has to remember that the label had been assumed to be 16 bit and stick with that assumption after its value is discovered.

Your method of replacing the text of a label with the text of its value could make this difficult. My previous assemblers always have a struct for labels, with flags carrying information like this: is the value known? was the value assumed to be 16 bit? On the first pass, a use of an undefined label will create a struct with the value marked as unknown. When it gets defined later, that struct gets updated instead of creating a new one. On the second pass, all labels that are used must have known values; using one that's undefined or unknown is an error.

My current assembler has to take a more complicated approach, which Garth described earlier in the thread: allow instructions to change size when more information becomes available, and keep repeating passes until things stop changing. I had to do it this way, as this assembler is for an extension of the 6502 that has branches with one and two byte offsets. You really want short forward branches to use the small offset, and it's not possible to insist that the destination label be defined before the instruction that uses it.

It's possible to set up a situation where these changes oscillate and it will never reach stability. To avoid that, each decision can be changed in only one direction: I start by assuming that every operand is one byte, and if one pass sees a value that won't fit in a byte, that instruction gets switched to a two byte operand. But it can never switch back. That has worked well.

But as I said, I only did it this way because I had to. A simpler assembler for a simpler CPU can get away with two passes, insisting that the label has to be defined first if you want zero page.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 11, 2020 1:00 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
It's possible to set up a situation where these changes oscillate and it will never reach stability. To avoid that, each decision can be changed in only one direction: I start by assuming that every operand is one byte, and if one pass sees a value that won't fit in a byte, that instruction gets switched to a two byte operand. But it can never switch back. That has worked well.
Another idea is to allow an operand to change from one byte to two bytes then back to one byte as needed but limit the number of passes the assembler can do. If it reaches 10 passes or however many you're willing to wait and the operand is still oscillating, you could fix it at 2 bytes.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 13, 2020 4:10 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
So far I'm back to where I stopped when using regex, the assembler does work with all addressing modes, decimal, hex, binary, and the labels are resolved correctly. You've all submitted many ideas, now where should I start? I guess I should implement the basic "org", "db" and "dw" pseudo instructions.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: