6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 11:09 am

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Sep 07, 2020 1:32 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
Hi!
I recently started a fun little project: making a 6502 assembler in python.
The assembler itself isn't the real goal though, it's really all about testing my knowledge of the processor.
As of now it works pretty much the way it's supposed to. However like pretty much everything regarding programming, just because it works doesn't mean it's good.
I feel like my implementation of the assembler is akward.
Of course I could just take a look at the source code of an already existing assembler, but then it would be cheating!
I'd rather ask for some advice here.

So here's how the assembler works for now.
The instruction set is stored in a dictionnary. The key is regex string and the value is a tuple containing the opcode of the instruction and its number of operands.
Here's an example:
Code:
"^STA \$([0-9A-F]{2})$" : (133,1)

It corresponds to the STA ZPG instruction.

And here's how the 6502 code is assembled:
  1. First the program is loaded from an external file. Everything is converted to uppercase. Extra tabs, spaces and empty lines are removed, and each line (i.e. instruction) is stored in a list.
  2. Two passes are then realized. The first one converts binary and decimal constants to hexadecimal. The labels' addresses are stored in a list and the labels are removed.
  3. During pass 2, labels used in instructions are replaced by their offset in branching instructions and by their absolute address in other instructions.
  4. Finally, each instruction is converted to its corresponding opcode and operands

As stated before, I have no idea if I'm doing this the right way.
If you could help me that'd wonderful!
Thanks!


Attachments:
6502 assembler.zip [4 KiB]
Downloaded 81 times


Last edited by secamline on Mon Sep 07, 2020 3:49 pm, edited 1 time in total.
Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 2:18 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Having hex constants for instruction operands is rather limiting. When you write big programs you need to be able to reference named constants (like peripheral addresses, bits in registers, etc,) and variables. For example this is the setup routine for a 65C22 VIA timer
Code:
                                             .global viaTimerEnable
                                             .extern VIAAV
                             viaTimerEnable:
00:0000' 08                :                 php                     ; Suspend interrupts if enabled
00:0001' 78                :                 sei
00:0002' A001              :                 ldy     #1              ; Install interrupt handler
                                             repeat
00:0004' B9????            :                  lda    VIAAV,y
00:0007' 99????            :                  sta    OLDVIAAV,y
00:000A' B9????            :                  lda    VECTOR,y
00:000D' 99????            :                  sta    VIAAV,y
00:0010' 88                :                  dey
00:0011' 10F1              :                 until mi

00:0013' A980              :                 lda     #VIA_ACR_T1C1   ; Set Timer 1 for continuous interrupts

Portable 65xx Assembler [20.01]

00:0015' 1C2B7F            :                 trb     VIAA_ACR
00:0018' A940              :                 lda     #VIA_ACR_T1C0
00:001A' 0C2B7F            :                 tsb     VIAA_ACR
00:001D' A99A              :                 lda     #<TIMER_1KHZ    ; Set the latches for 1 MSec period
00:001F' 8D267F            :                 sta     VIAA_T1LL
00:0022' A939              :                 lda     #>TIMER_1KHZ
00:0024' 8D277F            :                 sta     VIAA_T1LH

00:0027' A9C0              :                 lda     #VIA_IER_SET|VIA_IER_T1
00:0029' 8D2E7F            :                 sta     VIAA_IER        ; Enable the interrupt

00:002C' 9C????            :                 stz     MILLIS+0        ; Clear the counter
00:002F' 9C????            :                 stz     MILLIS+1
00:0032' 28                :                 plp                     ; Restore flags
00:0033' 60                :                 rts


The are only a couple of simple instructions in that routine.

A more conventional approach is to break the line to tokens and process them to extract labels, opcodes, addressing modes and expressions.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 2:26 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Good point - but I think it's still possible to use a regex, if you prefer, to figure out the addressing mode and instruction, and isolate whatever expression needs to be evaluated.

Another idea is to use some dictionary to match on the opcode and a regex to match on the addressing mode.

easy6502 does something like this (but it's a very weak assembler)


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 3:48 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
BitWise wrote:
Having hex constants for instruction operands is rather limiting. When you write big programs you need to be able to reference named constants (like peripheral addresses, bits in

It does support labels as operands, maybe I didn't get what you meant?
I'll attach the source code in my first message, this should make things easier.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 7:53 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
secamline wrote:
[*]During pass 2, labels used in instructions are replaced by their offset in branching instructions and by their absolute address in other instructions.

Just an opinion, but making branch targets a special case as you are doing seems to be awkward. Ultimately, all values generated during assembly are addresses or constants—branch targets are addresses. More logically, to me, the offset that becomes a branch instruction's operand should be calculated during the second pass when the instruction is assembled. At that time, the assembler would check the offset and emit a "branch out of range" diagnostic if it is outside of -128/+127.

Incidentally, we've had some discussion around here about the use of regular expressions in parsing assembly language instructions. It's not something I would implement. A 6502 assembly language instruction consists of a maximum of three significant fields (label, mnemonic and operand—only the mnemonic is mandatory in all cases) and is readily parsed with a simple LL(k) procedure. Given the very predictable structure of 6502 instructions and the uniform three-character size of instruction mnemonics, there would be no need for use of regex processing.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 8:37 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
Then branching instructions would still be a special case during pass 2, right?
The reason I chose to use regex was because I didn't want to use a bunch of if-elif-else commands.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 07, 2020 9:29 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
secamline wrote:
And here's how the 6502 code is assembled:
  1. First the program is loaded from an external file. Everything is converted to uppercase. Extra tabs, spaces and empty lines are removed, and each line (i.e. instruction) is stored in a list.
  2. Two passes are then realized. The first one converts binary and decimal constants to hexadecimal. The labels' addresses are stored in a list and the labels are removed.
  3. During pass 2, labels used in instructions are replaced by their offset in branching instructions and by their absolute address in other instructions.
  4. Finally, each instruction is converted to its corresponding opcode and operands

A list (.lst) file can be tremendously helpful in debugging certain things. Be sure to leave the spaces and empty lines there, or at least the correct line numbers, so that when you see an error message in the list file, you can get the exact line number to go back to in the source-code file to fix something. The comments could be stripped out, but I wouldn't do it.

I recently wasted a lot of time on a bug caused by my assembler (C32) not being case-sensitive. I knew it was not case-sensitive, so to differentiate two labels in continuing in the vein of an earlier project done on a compiler with case sensitivity, I preceded the lower-case one with the underscore character; but when I left it out accidentally, the assembler was referencing the wrong one (which did exist as a valid label without the _). It did not generate an error message saying the label did not exist, because it did in fact exist—in the other case. I recommend allowing case sensitivity.

There are also many situations where you'll need more than two passes because there are phase errors in the second pass that need one or more additional passes to resolve. A phase error is where the address of something changes between the passes. For this reason, you also probably cannot discard the labels in the second pass. The need for more than two passes will usually (or always?) come from sections of conditional assembly where for example all the conditions are not known yet on first pass, so what gets assembled on the second pass may be of a length different from that of the first pass, and addresses will change. I would recommend making it keep doing passes until it gets no phase errors. There are some really slick things you can do with conditional assembly in macros that may require quite a lot of passes. The extra assembly time is not a problem with the speed of modern PCs.

I have a few more suggestions at http://wilsonminesco.com/AssyDefense/as ... uests.html

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 08, 2020 10:02 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
I don't know much about conditionnal assembly, what kind of conditions can be implemented?


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 08, 2020 10:40 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
secamline wrote:
I don't know much about conditionnal assembly, what kind of conditions can be implemented?


Any conditions you like. It's just an "if" that works at assemble time, and the condition can be any expression that the assembler can evaluate.

One use is to include different code depending on a configuration value:
Code:
Commodore64 = 1
AppleII = 2

platform = Commodore64

plotPoint:
    .if platform == Commodore64
        ; insert Commodore 64 pixel-plotting code here
    .else if platform == AppleII
        ; insert Apple II pixel-plotting code here
    .endif


Or you could use it to conditionally insert a byte to make sure that the target address of an indirect JMP doesn't cross a page boundary
Code:
    .if *%256 == 255
    .byte 0
    .endif
jumpTarget:
    .word destination

    JMP (jumpTarget)


There are all sorts of possibilities.

My advice for your assembler would be to forget that you ever heard of regular expressions. They're seductive for getting something very simple going quickly, but you'll run into their limits just as quickly. Tokenize the source, and put some effort into a proper expression parser.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 09, 2020 7:58 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Not wanting to contradict, but: if you're writing your assembler in a language like python or javascript, you can probably get expression evaluation for free just by evaluating a string. Both the easy6502 and OPC assemblers do this, I think. A little care might be needed to manage namespaces, but it's cheap and cheerful. Macros come out very simply too.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 09, 2020 8:54 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
secamline wrote:
I don't know much about conditional assembly. What kind of conditions can be implemented?

I suppose conditional assembly is most valuable in macros, and sometimes to optimize the code they generate, based on what's in the parameter list in the line that invoked the macro, or possibly based on assembler variables, ie, variables you have in the assembler itself, not the code that it produces for the target computer to run. ("Assembler" is the tool. The language is "assembly language.")

In my 65816 Forth's assembly-language source, I have a macro that forms the words' headers. It looks at a couple of assembler variables to see if you have locally or globally turned off headers, and if you have, it won't generate the header. So suppose you want to assemble the whole thing headerless. (That saves a lot of memory, but it's only for end products that just run a pre-compiled application and won't have to do any compilation of their own which requires being able to find the various words by searching the linked list of headers.) You can, near the beginning of the main source-code file, set the OMIT_HEADERS assembler variable to a non-0 value, and not have to change anything in the hundreds of places a header is laid down in the output code.

Here's a simple one, for when I want to align the next byte, ie, make it start on an even address:
Code:
        IF      $ & 1         ; If next addr is odd,
                DFB     0     ; add a 0 byte before you
        ENDI                  ; lay down the link field.
(This is part of the HEADER macro too.)

Here's a simple-ish macro with conditional assembly to condense up to four lines of assembly language to put a 16-bit literal into a two-byte variable. A typical usage would be:
Code:
        PUT2  $28BE, in, FOOBAR
and in this case it would assemble the code:
Code:
        LDA  #$BE
        STA  FOOBAR
        LDA  #$28
        STA  FOOBAR+1
It makes the source code a lot more concise.

However, if the two bytes were the same, like $2828, it would be more efficient to LDA only once. If either byte was 0, it would be more efficient to use STZ. You can have conditional assembly in the macro to watch for these things, like this:
Code:
PUT2:   MACRO  num, preposition, addr
        IF  num != 0
            IF  {num & FF} != {num >> 8}
                LDA  #num & $FF
                STA  addr
                LDA  #{num >> 8}
                STA  addr + 1
            ELSE
                LDA  #num & $FF
                STA  addr
                STA  addr + 1
            ENDI
        ELSE
            STZ  addr
            STZ  addr + 1           
        ENDI
        ENDM
 ;-------------
Each time the macro is invoked, it may lay down two, three, or four instructions, depending on the conditions.

If the accumulator were not available in a particular part of the code where you invoke the macro, but Y was available, you might add a "using_Y" parameter at the end of the parameter field, and the macro could be written to watch for "using_Y" (or "using_X") and lay down LDY...STY or LDX...STX instead of LDA...STA.

Now suppose the number constant might be changing, and you've given it a name. Further, suppose that that constant's value depends on something elsewhere in the code, possibly making the length of the code that the macro lays down change between pass 1 and pass 2, changing also all the addresses after it. Then you'll need at least one more pass to resolve it and get rid of the phase errors.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 5:24 am 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
Alright, I'll each of your advice into consideration. First thing first I will start from scratch and make a real parser. Btw should I convert numerical values into decimal?


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 6:19 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
secamline wrote:
Btw should I convert numerical values into decimal?

Why would you? In the $28BE value I gave in the example above, it needs to be split into low byte and high byte. Left in hex, it's really easy to check that. (That number, BTW, is 2^16/(2π), to specify a radian for trig operations in scaled-integer arithmetic.) Hex is better for visualizing address boundaries for address-decoding logic and instructions that might be affected by a page boundary, individual bits' set/clear status in an I/O IC's interrupt-enable register or control registers (thinking of the 6522), and much more. Each base has its purpose in source code, and numbers in the list file should be left as written in the source-code file. (They'll all be in hex In the machine-language output though which is usually specified in an Intel Hex file, a Motorola S19 file, binary, or something like that.) I do a lot of embedded stuff where there's very little human I/O; so I use hex more. Someone who's doing a lot of financial or office software is more likely to use mostly decimal.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 7:21 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
secamline wrote:
Btw should I convert numerical values into decimal?

I find this a slightly confusing question... within the assembler code, you will be dealing with expressions and with values. There's no base to a value - it's a number. The constants you find in expressions might be in decimal or in hex, if your assembler supports both, and maybe in binary too. But when you evaluate an expression, you deal with the values of constants and variables and you end up with a value.

Eventually your assembler will output something. In the listing, it's convenient to list the addresses and byte values in hex, together with the original source lines and their comments. In the binary or hex output, those byte values will be binary bytes, or hex representations.

Or, possibly, I misunderstand the query!


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 10, 2020 1:27 pm 
Offline
User avatar

Joined: Mon Sep 07, 2020 10:02 am
Posts: 10
Location: France
Sorry, my questions are poorly phrased.
Basically in the first version of my assembler I would convert all litterals into hex, so this piece of code :
Code:
start:
lda #$0A
sta %0001001
label:
inx
bne label
jmp start

would be converted into this:
Code:
lda #$0A
sta $11
inx
bne $FF
jmp $0000

I chose hexadecimal because it would make it easier to tell zeropage and absolute addressing modes appart in the regular expressions.
The program could easily distinguish "sta $FF" and "sta $0100" as two different instructions. I've seen some assemblers that actually don't differenciate the two.
Since I won't be using regex for the next version, I was wondering if I could convert all values to decimal. So the above code would be converted to this:
Code:
lda #10
sta 17
inx
bne 255
jmp 0

This would make it easier to figure which byte to output after the opcode, I'd simply need to parse the string operand into an int, without the need to change the base.
Of course I would still display error messages with the original base and display the listing in hex.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: