Yet anther assembler...

barnacle · Post by **barnacle** » Fri Nov 01, 2019 8:24 am

Largely for my own amusement, I am writing a 6502 assembler and I'm trying to incorporate as many 'optional' features as possible, so that it will build as large a code base as possible with a minimum of text changes.

Thus, it will accept (e.g.) db and byte as symonyms, and inputs to db/byte include hex, decimal, binary in any of the obvious flavours - 0xnnnn, 0nnnnh, $nnnn, 0bnnnn, 0nnnnb and so on, and also 'c' characters or "strings" (slight bug there regarding commas still to be resolved!).

The intent is truly old school; the initial version will accept only 6502N code, with 65C02 options next on the list, and it produces Intel hex, binary blob, and a list file as default outputs - absolute positions, no relocatable code (that'll get some downvotes!). This means it's easy to use for e.g. single board computers without an OS.

However, what I can't find - and I've read so many documents on this - is how a zero page address is defined. At the moment I have followed what seems to be the default, automatically choosing zero page if the evaluated expression is both less than 0x100 and the evaluation is trusted; otherwise, it's an absolute.

Any thoughts/preferences regarding this behaviour?

Neil

BigEd · Post by **BigEd** » Fri Nov 01, 2019 9:09 am

It's normal, but not ideal, to detect ZP as you say. Some assemblers have some syntax to help - but of course every assembler is different. It looks like ca65 has some idea of the 'size' of a value, and a byte-sized value is an appropriate ZP address. Other heuristics too:
https://cc65.github.io/doc/ca65.html#ss5.2

John West · Post by **John West** » Fri Nov 01, 2019 9:58 am

barnacle wrote:

However, what I can't find - and I've read so many documents on this - is how a zero page address is defined. At the moment I have followed what seems to be the default, automatically choosing zero page if the evaluated expression is both less than 0x100 and the evaluation is trusted; otherwise, it's an absolute.

Any thoughts/preferences regarding this behaviour?

That's very common, in my experience. There is one trap to watch out for though - if you have a two pass assembler, and a symbol that's defined after it is used, the first pass will have to assume that it's 16 bit. If it later turns out to fit in 8 bits, you can't change your mind without invalidating every label after that first use.

The assembler for my 65020 does things the complicated way - it will do as many passes as it needs for the symbol values to become stable. It starts out assuming one byte for every symbol. If that assumption is incorrect, it will expand to two bytes and trigger another pass. It will never shrink a two byte value to one byte, so it's guaranteed to converge eventually. This allows forward branches to use one byte offsets most of the time, and two bytes when necessary. It was a lot easier to implement than I feared it might be.

barnacle · Post by **barnacle** » Fri Nov 01, 2019 1:38 pm

This is the approach I've taken. My expression evaluation is looking at both literal values and symbol values, and the symbol value, if not yet defined, returns an 'untrusted' flag as part of its return value. That flag is propagated all the way up the expression chain, so when I eventually get the value for the target address I know whether I can trust it. (I can also set a symbol as untrusted, if it uses a forward reference).

Only if I can trust it do I decide whether it fits in a zero page address, otherwise it's a three-byte absolute instruction. That means no more than two passes, at the possible risk of a slightly bigger code than might be required.

As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?

And logic: not yet implemented by the expression parser, but I think &, ^, and | with the first two having the same precedence as * and / and the latter the same precedence as + and - ?

Neil

Druzyek · Post by **Druzyek** » Fri Nov 01, 2019 2:27 pm

John West wrote:

There is one trap to watch out for though - if you have a two pass assembler, and a symbol that's defined after it is used, the first pass will have to assume that it's 16 bit. If it later turns out to fit in 8 bits, you can't change your mind without invalidating every label after that first use...It will never shrink a two byte value to one byte, so it's guaranteed to converge eventually.

This is where I stopped on my assembler project. Hmm, interesting. Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved? My plan is to do that but fail after a certain number of passes. Another way might be to monitor the sizes of symbols and try to detect the situation where the size oscillates indefinitely. Your way seems reliable and maybe that should be an option or the back up method.

Quote:

As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?

Just my 2 cents but how about < and > since those can be wrapped in a HI or LO macro? Depending on how you have your macros set up, it might not accept < as a macro name that can be mapped to LO.

What language are you using for the assembler? In Python it was really easy to set up functions like the following and let users add their own functions, which is about as powerful as an assembler could ever get (assuming of course you're not averse to having to include a custom Python file in your project source.)

Code: Select all

def left(arg1,arg2): return arg1[0:int(arg2)]
def right(arg1,arg2): return arg1[-int(arg2):]
def hi(arg1): return (int(arg1)>>8)&0xFF
def lo(arg1): return int(arg1)&0xFF
def concat(arg1,arg2): return arg1+arg2
def substr(arg1,arg2,arg3): return arg1[int(arg2):int(arg3)]
def lower(arg1): return arg1.lower()
def upper(arg1): return arg1.upper()
def to_int(arg1): return int(float(arg1))

#text name of function, number of arguments, function
commandlist=[('left',2,left),
            ('right',2,right),
            ('hi',1,hi),
            ('lo',1,lo),
            ('concat',2,concat),
            ('substr',3,substr),
            #('lower',1,lower),
            #Alternately, define the function inline
            ('lower',1,lambda arg1: arg1.lower()),
            ('upper',1,upper),
            #Built in functions
            ('int',1,to_int),
            ('float',1,float),
            #These change type and need to be handled in the main program
            ('alpha',1,0),
            ('str',1,0),
            ('char',1,0),

            #ADD CUSTOM FUNCTIONS HERE:
            #example(x,y) = x+2*y
            ('example',2,lambda x,y:int(x)+int(y)*2)]

John West · Post by **John West** » Fri Nov 01, 2019 4:03 pm

Druzyek wrote:

Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved?

In practice I would expect it to never happen. In theory, I'm not sure if it's even possible. I remember having a discussion about it with friends back in the day, but don't remember the conclusion. Most of my reason for doing it this way is so I don't have to think about it

dmsc · Post by **dmsc** » Fri Nov 01, 2019 4:50 pm

Hi!

John West wrote:

Druzyek wrote:

Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved?

In practice I would expect it to never happen. In theory, I'm not sure if it's even possible. I remember having a discussion about it with friends back in the day, but don't remember the conclusion. Most of my reason for doing it this way is so I don't have to think about it

Assuming address rolls:

Code: Select all

   org $FFFD
   lda X
X:
   brk

Or, if your assembler uses more than 16 bits of addresses, try:

Code: Select all

   org $FFFD
   lda X & $FFFF
X:
   brk

barnacle · Post by **barnacle** » Fri Nov 01, 2019 6:16 pm

Druzyek wrote:

Quote:

As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?

Just my 2 cents but how about < and > since those can be wrapped in a HI or LO macro? Depending on how you have your macros set up, it might not accept < as a macro name that can be mapped to LO.

What language are you using for the assembler? In Python it was really easy to set up functions like the following and let users add their own functions, which is about as powerful as an assembler could ever get (assuming of course you're not averse to having to include a custom Python file in your project source.)

Macros? I wasn't, to be honest, even thinking about including macros. Like I said, old school...

So old school it's written in C. I can just about find my way through a Python script, but not well enough to write an assembler in it.

Neil

GARTHWILSON · Post by **GARTHWILSON** » Sat Nov 02, 2019 8:44 am

I have a request list for anyone who writes an assembler, at http://wilsonminesco.com/AssyDefense/as ... uests.html .

About defining ZP addresses & addressing modes: This topic is relevant: "Assembler that automatically select what to put in ZP"

Rather than stop at two passes, make it keep going until there are no more phase errors. I had a valid situation 20 years ago that required about 30 passes. The amount of time the assembler takes to do the job is not a problem with modern PCs' speed. I don't remember the situation, but it was not 8- versus 16-bit addresses, but rather that there were many forward chained references that depended on each other. Variables should normally be declared before they're encountered anyway, meaning it should be known the first time whether they're in ZP or not.

Quote:

Macros? I wasn't, to be honest, even thinking about including macros. Like I said, old school...

I don't know what you're thinking 'old school' is, but I was introduced to macros in about 1987 by a neighbor who was into digital more than I was, and who had been using them for years. I quickly became a macro junkie.

barnacle · Post by **barnacle** » Sat Nov 02, 2019 9:25 pm

One of those YMMV things, I guess - I've used macros in C, but as far as I can recall *never* in assembler, for any processor. I can see the appeal of a straight-forward define/replace text option but I do like to see on the list file exactly what I'm getting. Then again, I'm not intending to create relocateable code.

Old school for me is mid seventies, loading assembler and source code from cassette tape.

Neil

p.s. your 'if you write an assembler' page is open in another window. Those points with which I agree I am implementing

drogon · Post by **drogon** » Sat Nov 02, 2019 10:25 pm

The first macro assembler I used was PMA - Prime Macro Assembler in 1980. I don't recall what macro assembler I used on the Apple II. but on the BBC Micro with it's 2-pass assembler built into BASIC you just called a BASIC function/procedure to implement a macro...

Macros, even at the simplest level are very useful. Especially if you-re in-lining repetitive code. So for example in a VM I'm playing with, many of the instructions I'm interpreting require a copy to take place, so I have a macro:

Code: Select all

.macro  pushAB
        lda     regA+0
        sta     regB+0
        lda     regA+2
        sta     regB+2
.endmacro

This simply copies a value from "regA" to "regB" which could be done via subroutine, but I care more for speed than code density.

Another example - more for the 65816 is switching modes:

Code: Select all

; n816: e6502
;       Enter Native 65816, or emulated 6502 modes.
;********************************************************************************

.macro  n816
        clc
        xce
.endmacro

.macro  e6502
        sec
        xce
.endmacro

Parametrised macros can be very powerful indeed.

block move (negative) in the 65816:

Code: Select all

; bmn
;       Block move macro

.macro  bmn     len,from,to
        lda     #len-1
        ldx     #(from & $FFFF)
        ldy     #(to   & $FFFF)
        mvn     (from & $FF0000),(to & $FF0000)
.endmacro

and so on. I don't think I'd be without a macro assembler these days.

Of-course writing the assembler is best left to those who know

Especially when it comes to temporary or local labels that you might need at times...

Cheers,

-Gordon

GARTHWILSON · Post by **GARTHWILSON** » Sun Nov 03, 2019 2:29 am

barnacle wrote:

but I do like to see on the list file exactly what I'm getting.

Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

Quote:

Old school for me is mid seventies, loading assembler and source code from cassette tape.

There is some sort of romance in that, like that of steam engines.

Quote:

your 'if you write an assembler' page is open in another window. Those points with which I agree I am implementing

Well, as I say there, choice is good, and that's what you said in the head post, that you wanted to allow different ways of doing things. Not everyone will want to do things the same way. Also, if they're transferring code written for another assembler, it's nice if it works with minimal modification.

hmn · Post by **hmn** » Sun Nov 03, 2019 7:51 am

barnacle wrote:

... but I do like to see on the list file exactly what I'm getting.

GARTHWILSON wrote:

Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

I recently evaluated a couple, and found that this cannot be taken for granted. Out of the five I have looked at:

- xa (xa65) does not do list files at all
- crasm prints only the assemled bytes of an expanded macro in hex (no inlined source or disassembly)
- acme does the same but only shows the first couple of bytes followed by an ellipsis
- ca65 (cc65) only has listing output for it's relocatables, so you only see placeholder addresses ("rr rr")

GARTHWILSON · Post by **GARTHWILSON** » Sun Nov 03, 2019 8:05 am

hmn wrote:

barnacle wrote:

... but I do like to see on the list file exactly what I'm getting.

GARTHWILSON wrote:

Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

I recently evaluated a couple, and found that this cannot be taken for granted. Out of the five I have looked at:

- xa (xa65) does not do list files at all
- crasm prints only the assemled bytes of an expanded macro in hex (no inlined source or disassembly)
- acme does the same but only shows the first couple of bytes followed by an ellipsis
- ca65 (cc65) only has listing output for it's relocatables, so you only see placeholder addresses ("rr rr")

I've used three commercial ones, and they all show, in the .lst file, exactly what the macro expansion produces:

2500AD
Cross-32 (C32) originally from Universal Cross Assemblers
MPASM from Microchip (for PIC microcontrollers, not 65xx)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Nov 03, 2019 7:49 pm

GARTHWILSON wrote:

I've used three commercial ones, and they all show, in the .lst file, exactly what the macro expansion produces:

2500AD
Cross-32 (C32) originally from Universal Cross Assemblers
MPASM from Microchip (for PIC microcontrollers, not 65xx)

The assembler in the Kowalski simulator also shows the results of macro expansion in the listing file. In fact, all of the 6502-family assemblers I've used do that. In my opinion, a symbolic assembler that doesn't give you the option of displaying all the gory details in the listing is not a real assembler.

Yet anther assembler...

Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...

Re: Yet anther assembler...