6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Sep 16, 2024 9:37 am

All times are UTC




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Yet anther assembler...
PostPosted: Fri Nov 01, 2019 8:24 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 834
Location: Potsdam, DE
Largely for my own amusement, I am writing a 6502 assembler and I'm trying to incorporate as many 'optional' features as possible, so that it will build as large a code base as possible with a minimum of text changes.

Thus, it will accept (e.g.) db and byte as symonyms, and inputs to db/byte include hex, decimal, binary in any of the obvious flavours - 0xnnnn, 0nnnnh, $nnnn, 0bnnnn, 0nnnnb and so on, and also 'c' characters or "strings" (slight bug there regarding commas still to be resolved!).

The intent is truly old school; the initial version will accept only 6502N code, with 65C02 options next on the list, and it produces Intel hex, binary blob, and a list file as default outputs - absolute positions, no relocatable code (that'll get some downvotes!). This means it's easy to use for e.g. single board computers without an OS.

However, what I can't find - and I've read so many documents on this - is how a zero page address is defined. At the moment I have followed what seems to be the default, automatically choosing zero page if the evaluated expression is both less than 0x100 and the evaluation is trusted; otherwise, it's an absolute.

Any thoughts/preferences regarding this behaviour?

Neil


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 9:09 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
It's normal, but not ideal, to detect ZP as you say. Some assemblers have some syntax to help - but of course every assembler is different. It looks like ca65 has some idea of the 'size' of a value, and a byte-sized value is an appropriate ZP address. Other heuristics too:
https://cc65.github.io/doc/ca65.html#ss5.2


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 9:58 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 324
barnacle wrote:
However, what I can't find - and I've read so many documents on this - is how a zero page address is defined. At the moment I have followed what seems to be the default, automatically choosing zero page if the evaluated expression is both less than 0x100 and the evaluation is trusted; otherwise, it's an absolute.

Any thoughts/preferences regarding this behaviour?


That's very common, in my experience. There is one trap to watch out for though - if you have a two pass assembler, and a symbol that's defined after it is used, the first pass will have to assume that it's 16 bit. If it later turns out to fit in 8 bits, you can't change your mind without invalidating every label after that first use.

The assembler for my 65020 does things the complicated way - it will do as many passes as it needs for the symbol values to become stable. It starts out assuming one byte for every symbol. If that assumption is incorrect, it will expand to two bytes and trigger another pass. It will never shrink a two byte value to one byte, so it's guaranteed to converge eventually. This allows forward branches to use one byte offsets most of the time, and two bytes when necessary. It was a lot easier to implement than I feared it might be.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 1:38 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 834
Location: Potsdam, DE
This is the approach I've taken. My expression evaluation is looking at both literal values and symbol values, and the symbol value, if not yet defined, returns an 'untrusted' flag as part of its return value. That flag is propagated all the way up the expression chain, so when I eventually get the value for the target address I know whether I can trust it. (I can also set a symbol as untrusted, if it uses a forward reference).

Only if I can trust it do I decide whether it fits in a zero page address, otherwise it's a three-byte absolute instruction. That means no more than two passes, at the possible risk of a slightly bigger code than might be required.

As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?

And logic: not yet implemented by the expression parser, but I think &, ^, and | with the first two having the same precedence as * and / and the latter the same precedence as + and - ?

Neil


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 2:27 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
John West wrote:
There is one trap to watch out for though - if you have a two pass assembler, and a symbol that's defined after it is used, the first pass will have to assume that it's 16 bit. If it later turns out to fit in 8 bits, you can't change your mind without invalidating every label after that first use...It will never shrink a two byte value to one byte, so it's guaranteed to converge eventually.
This is where I stopped on my assembler project. Hmm, interesting. Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved? My plan is to do that but fail after a certain number of passes. Another way might be to monitor the sizes of symbols and try to detect the situation where the size oscillates indefinitely. Your way seems reliable and maybe that should be an option or the back up method.

Quote:
As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?
Just my 2 cents but how about < and > since those can be wrapped in a HI or LO macro? Depending on how you have your macros set up, it might not accept < as a macro name that can be mapped to LO.

What language are you using for the assembler? In Python it was really easy to set up functions like the following and let users add their own functions, which is about as powerful as an assembler could ever get (assuming of course you're not averse to having to include a custom Python file in your project source.)

Code:
def left(arg1,arg2): return arg1[0:int(arg2)]
def right(arg1,arg2): return arg1[-int(arg2):]
def hi(arg1): return (int(arg1)>>8)&0xFF
def lo(arg1): return int(arg1)&0xFF
def concat(arg1,arg2): return arg1+arg2
def substr(arg1,arg2,arg3): return arg1[int(arg2):int(arg3)]
def lower(arg1): return arg1.lower()
def upper(arg1): return arg1.upper()
def to_int(arg1): return int(float(arg1))

#text name of function, number of arguments, function
commandlist=[('left',2,left),
            ('right',2,right),
            ('hi',1,hi),
            ('lo',1,lo),
            ('concat',2,concat),
            ('substr',3,substr),
            #('lower',1,lower),
            #Alternately, define the function inline
            ('lower',1,lambda arg1: arg1.lower()),
            ('upper',1,upper),
            #Built in functions
            ('int',1,to_int),
            ('float',1,float),
            #These change type and need to be handled in the main program
            ('alpha',1,0),
            ('str',1,0),
            ('char',1,0),

            #ADD CUSTOM FUNCTIONS HERE:
            #example(x,y) = x+2*y
            ('example',2,lambda x,y:int(x)+int(y)*2)]


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 4:03 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 324
Druzyek wrote:
Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved?


In practice I would expect it to never happen. In theory, I'm not sure if it's even possible. I remember having a discussion about it with friends back in the day, but don't remember the conclusion. Most of my reason for doing it this way is so I don't have to think about it :-)


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 4:50 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 136
Hi!

John West wrote:
Druzyek wrote:
Do you think it happens often in practice that an 8 bit symbol becomes 16 bit but could be shrunk back to 8 bit after other symbols are resolved?


In practice I would expect it to never happen. In theory, I'm not sure if it's even possible. I remember having a discussion about it with friends back in the day, but don't remember the conclusion. Most of my reason for doing it this way is so I don't have to think about it :-)


Assuming address rolls:

Code:
   org $FFFD
   lda X
X:
   brk


Or, if your assembler uses more than 16 bits of addresses, try:
Code:
   org $FFFD
   lda X & $FFFF
X:
   brk


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 01, 2019 6:16 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 834
Location: Potsdam, DE
Druzyek wrote:

Quote:
As an aside: HIGH vs HI vs > vs LOW vs LO vs < ?
Just my 2 cents but how about < and > since those can be wrapped in a HI or LO macro? Depending on how you have your macros set up, it might not accept < as a macro name that can be mapped to LO.

What language are you using for the assembler? In Python it was really easy to set up functions like the following and let users add their own functions, which is about as powerful as an assembler could ever get (assuming of course you're not averse to having to include a custom Python file in your project source.)


Macros? I wasn't, to be honest, even thinking about including macros. Like I said, old school...

So old school it's written in C. I can just about find my way through a Python script, but not well enough to write an assembler in it.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 8:44 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8508
Location: Southern California
I have a request list for anyone who writes an assembler, at http://wilsonminesco.com/AssyDefense/as ... uests.html .

About defining ZP addresses & addressing modes: This topic is relevant: "Assembler that automatically select what to put in ZP"

Rather than stop at two passes, make it keep going until there are no more phase errors. I had a valid situation 20 years ago that required about 30 passes. The amount of time the assembler takes to do the job is not a problem with modern PCs' speed. I don't remember the situation, but it was not 8- versus 16-bit addresses, but rather that there were many forward chained references that depended on each other. Variables should normally be declared before they're encountered anyway, meaning it should be known the first time whether they're in ZP or not.

Quote:
Macros? I wasn't, to be honest, even thinking about including macros. Like I said, old school...

I don't know what you're thinking 'old school' is, but I was introduced to macros in about 1987 by a neighbor who was into digital more than I was, and who had been using them for years. I quickly became a macro junkie.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 9:25 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 834
Location: Potsdam, DE
One of those YMMV things, I guess - I've used macros in C, but as far as I can recall *never* in assembler, for any processor. I can see the appeal of a straight-forward define/replace text option but I do like to see on the list file exactly what I'm getting. Then again, I'm not intending to create relocateable code.

Old school for me is mid seventies, loading assembler and source code from cassette tape.

Neil

p.s. your 'if you write an assembler' page is open in another window. Those points with which I agree I am implementing :)


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 02, 2019 10:25 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1465
Location: Scotland
The first macro assembler I used was PMA - Prime Macro Assembler in 1980. I don't recall what macro assembler I used on the Apple II. but on the BBC Micro with it's 2-pass assembler built into BASIC you just called a BASIC function/procedure to implement a macro...

Macros, even at the simplest level are very useful. Especially if you-re in-lining repetitive code. So for example in a VM I'm playing with, many of the instructions I'm interpreting require a copy to take place, so I have a macro:

Code:
.macro  pushAB
        lda     regA+0
        sta     regB+0
        lda     regA+2
        sta     regB+2
.endmacro


This simply copies a value from "regA" to "regB" which could be done via subroutine, but I care more for speed than code density.

Another example - more for the 65816 is switching modes:

Code:
; n816: e6502
;       Enter Native 65816, or emulated 6502 modes.
;********************************************************************************

.macro  n816
        clc
        xce
.endmacro

.macro  e6502
        sec
        xce
.endmacro


Parametrised macros can be very powerful indeed.

block move (negative) in the 65816:

Code:
; bmn
;       Block move macro

.macro  bmn     len,from,to
        lda     #len-1
        ldx     #(from & $FFFF)
        ldy     #(to   & $FFFF)
        mvn     (from & $FF0000),(to & $FF0000)
.endmacro


and so on. I don't think I'd be without a macro assembler these days.

Of-course writing the assembler is best left to those who know ;-) Especially when it comes to temporary or local labels that you might need at times...

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2019 2:29 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8508
Location: Southern California
barnacle wrote:
but I do like to see on the list file exactly what I'm getting.

Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

Quote:
Old school for me is mid seventies, loading assembler and source code from cassette tape.

There is some sort of romance in that, like that of steam engines. :)

Quote:
your 'if you write an assembler' page is open in another window. Those points with which I agree I am implementing :)

Well, as I say there, choice is good, and that's what you said in the head post, that you wanted to allow different ways of doing things. Not everyone will want to do things the same way. Also, if they're transferring code written for another assembler, it's nice if it works with minimal modification.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2019 7:51 am 
Offline

Joined: Sun May 07, 2017 3:59 pm
Posts: 20
barnacle wrote:
... but I do like to see on the list file exactly what I'm getting.

GARTHWILSON wrote:
Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

I recently evaluated a couple, and found that this cannot be taken for granted. Out of the five I have looked at:

- xa (xa65) does not do list files at all
- crasm prints only the assemled bytes of an expanded macro in hex (no inlined source or disassembly)
- acme does the same but only shows the first couple of bytes followed by an ellipsis
- ca65 (cc65) only has listing output for it's relocatables, so you only see placeholder addresses ("rr rr")


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2019 8:05 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8508
Location: Southern California
hmn wrote:
barnacle wrote:
... but I do like to see on the list file exactly what I'm getting.

GARTHWILSON wrote:
Every macro assembler I've seen shows exactly what you're getting from the macros unless you tell it not to.

I recently evaluated a couple, and found that this cannot be taken for granted. Out of the five I have looked at:

- xa (xa65) does not do list files at all
- crasm prints only the assemled bytes of an expanded macro in hex (no inlined source or disassembly)
- acme does the same but only shows the first couple of bytes followed by an ellipsis
- ca65 (cc65) only has listing output for it's relocatables, so you only see placeholder addresses ("rr rr")

I've used three commercial ones, and they all show, in the .lst file, exactly what the macro expansion produces:

  • 2500AD
  • Cross-32 (C32) originally from Universal Cross Assemblers
  • MPASM from Microchip (for PIC microcontrollers, not 65xx)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2019 7:49 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8380
Location: Midwestern USA
GARTHWILSON wrote:
I've used three commercial ones, and they all show, in the .lst file, exactly what the macro expansion produces:

  • 2500AD
  • Cross-32 (C32) originally from Universal Cross Assemblers
  • MPASM from Microchip (for PIC microcontrollers, not 65xx)

The assembler in the Kowalski simulator also shows the results of macro expansion in the listing file.  In fact, all of the 6502-family assemblers I've used do that.  In my opinion, a symbolic assembler that doesn't give you the option of displaying all the gory details in the listing is not a real assembler.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Wed Aug 21, 2024 2:14 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: