6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 19, 2024 3:11 am

All times are UTC




Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Jan 19, 2008 1:52 am 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
65xx assemblers differentiate between immediate and address operands by the # character, rather than the instruction's mnemonic:
Code:
lda #10 ; A = the number 10
lda 10  ; A = byte at address 10

Since # is not required in other common computer languages or general writing, it is easy to forget. The value at the unintended address might only rarely cause problems, making it hard to track down. My solution is to have the assembler track whether an expression is an address or numeric value and give a warning when a number is used where an address is expected:
Code:
const = 10

lda #const       ; OK
lda 10           ; warning
lda const        ; warning

Sometimes a numeric constant must be used as an address, for example a hardware I/O register. To solve this, I added a special address constant named "ADDR":
Code:
ADDR = 0         ; treated specially by patched ca65, normally by others
ioreg = ADDR+$FE ; ioreg is an address, not just a number

lda ioreg        ; OK
lda 10+ADDR      ; OK

To allow source compatibility with other assemblers, you must define ADDR as shown. My ca65 extension knows to treat the assignment specially, while other assemblers will just treat it as a normal constant of zero. Suggestions on a better name are welcome, since this is likely to clash.

To reduce unnecessary warnings, I made a few other tweaks:
Code:
const  = $12
zpaddr = ADDR+$12
addr   = ADDR+$123
label:

lda <12      ; OK, often used for quick nameless temporaries
lda $12,x    ; OK, indexed modes always accept numeric expressions
lda const,y  ; OK
sta const    ; OK, since sta never accepts immediate anyway
sta $1234    ; OK
lda <zpaddr  ; OK, since you might want to emphasize it's zero page
lda <addr    ; warning, since addr is more than 8 bits
lda addr*2   ; warning, not an addr
lda #<label  ; OK, result of < operator is numeric

For normal assembler coding, where it is given the task of putting things in memory and assigning addresses, about the only changes needed to avoid warnings with this extension are where hardware addresses are defined, like I/O locations.

Here are the modified sources, a readme, and test file: cc65-2.11.0-addr-type.tgz

I just coded it today, and wasn't very familiar with ca65 before this, so expect problems. Right now I mainly want to experiment with the implementation to find the best design. Please add your ideas about this feature and how to improve it.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 5:08 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8422
Location: Southern California
I've never had this problem, although it might just be because my programming background is different. Who knows. Anyway, it might be best to make such warnings some kind of option. If they help someone, terrific; but I personally would turn off the option.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 6:34 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
If you come from an Intel- or Zilog-background, this will definitely nip you in the bud. That being said, I would:

* Make this an option that you can turn off, and,
* Add an operator to the equate expression for when we explicitly want an address.

E.g.,:

const = 10
aLongAddr = l:$DEFACE
anAddress = a:$CDEF
aDirectPage = z:$AD

This way, we can leave the option on, and be able to give the assembler enough information to know that $DEFACE is *intended* to be an address, not a constant.

Just an idea.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 7:12 am 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
Quote:
Make this an option that you can turn off

Currently it's --addr-type to turn it on, off by default.

Quote:
Add an operator to the equate expression for when we explicitly want an address.

Spelled ADDR+ (or +ADDR if you like a suffix). Introducing new syntax makes one's source dependent on the new feature, while adding the zero constant ADDR allows full compatibility with other assemblers. As mentioned, the name of this symbol could use some work.

EDIT: Another way to describe this in terms of the C language is that a literal is like an integral constant, where you can't do *0x12. ADDR is like a char* with the value 0, so if you do *(0x12+ADDR) then you can access address 0x12. This way, I don't need to implement an equivalent to a typecast (char*) 0x12, I just implement the magical constant ADDR.

I also realized that in non-65816 mode, I can allow things like LDA $1234 (or any literal constant >= $100) without warning, since you wouldn't otherwise be meaning to load that as an immediate. This would further reduce the already few changes required to accommodate this feature.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 6:49 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
GARTHWILSON wrote:
I've never had this problem, although it might just be because my programming background is different.


That's was my first thought too. I can't recall ever having left off the #.

blargg wrote:
Introducing new syntax makes one's source dependent on the new feature, while adding the zero constant ADDR allows full compatibility with other assemblers.


There's probably a 50% chance that an assembler will use DB and DW instead of .byte and .word, so assembles-on-any-old-dog is a bit of a pipe dream. If one's assembler is extensible, or has good macro support, or the source code is available, it may possible to support multiple syntaxes without too much trouble. Still, IMO just write the assembly and set up the syntax so that it's as clear as possible. EQUs are usually all together at the beginning of the source code file, so that's relatively painless as far as syntax conversion goes.

Besides, unless you comment what the purpose of ADDR+ is in every source file it's used in, it could still cause confusion. Someone might wonder if it's an I/O address, or if it's for some sort of weird zero page mapping. Without a comment, no one's going to remember that it's for assembler warnings, since most assemblers don't do that. If you use a really long, obnoxious name that won't conflict with anything else, like say, A_D_D_R_E_S_S_0_0_0_0+ you can do an ordinary find-and-replace-with-nothing with any old text editor before passing your source code on, to make it more generic.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 7:34 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
That's true only for the first time someone sees ADDR+ notation. You'll find that as its popularity spreads, more and more folks will cease to insert comments.

Contrast typical C code found for, say, GEM applications in 1985 against today's modern C code. You'll find substantially fewer comments of the sort, "This macro makes XYZ easier" or "This is for ANSI compatibility."


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 8:50 pm 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
The main reason to avoid new syntax, at least right now, is that code will continue to work on the standard ca65 at the very least, with the patched ca65 serving as an optional lint-like assembler.

Not to be picky, but isn't it difficult to be sure that one has never made this error in all 65xx coding? After all, the point of this is that the error can easily go unnoticed if the inadvertent address supplies the proper value most of the time.

This feature might be most beneficial to new 65xx programmers, as one pretty quickly gets in the habit of # (though I occasionally make the error and notice sometime later). My goal is to make the cost of using this warning low enough that catching those few missing # makes it worthwhile to anyone.

Anyway, I eliminated the need for ADDR in virtually all non-65816 code by not warning when a literal address is >= $100.
Code:
lda $1234       ; OK on 6502, warning on 65816
lda $1234+ADDR  ; OK on all

Heres' the updated full patch: cc65-2.11.0-addr-type2.tgz


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 10:39 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
blargg wrote:
Not to be picky, but isn't it difficult to be sure that one has never made this error in all 65xx coding? After all, the point of this is that the error can easily go unnoticed if the inadvertent address supplies the proper value most of the time.


Is it possible that I've made that error at some point? Sure. But it's so rare that I don't ever remember doing so, and there was never one of those moments where I thought "I wish the assembler would check for this sort of thing". I make my share of typos, logic errors, and other silly mistakes, but forgetting the # isn't a common one for me.

Would I be willing to a bet a case of beer that if you looked through all the 6502 assembly source code I've ever written, including things I've never tested, that you wouldn't find that error anywhere? Yes I would. Granted, that means I might be springing for a case of beer, but I figure I've got a sporting chance here. :)

I've entered 6502 code as raw hex (short, hand assembled routines, or a patch as I'm testing something, etc.) from time to time and I'll type something like A5 (LDA zp) instead of A9 (LDA imm). It's the same effect, but since it's object code, not source code, there's no # to be missing. This is usually pretty easy to spot and fix, because it almost always behaves bizarrely. The odds are 255 to 1 against that the value in the memory location is correct. Plus, the first thing I'm thinking in that case is "did I enter the right opcode?"

Don't get me wrong, I agree that if you actually do make this sort of mistake and forget a # in this middle of some long routine, it can be difficult to find the problem.

blargg wrote:
My goal is to make the cost of using this warning low enough that catching those few missing # makes it worthwhile to anyone.


And therein lies the rub. Having to state that this label is an address, and that label is an address, and this other label is an address, and that other label is an address, and so on seems kinda tedious to me.

Hmm...come to think of it, maybe you could add a couple of new directives to say "here's a list of addresses", then do something along the lines of (I don't know the ca65 directives/syntax off the top of my head, so just play along :)):

Code:
DEBUG_ADDRS =   1
           .if DEBUG_ADDR
           .addrs
           .endif
IOBASE     =   $8000
IO1        =   IOBASE+0
IO2L       =   IOBASE+1
IO2H       =   IOBASE+2
IO3        =   IOBASE+3
           .if DEBUG_ADDR
           .endaddrs
           .endif


To (hopefully) keep the standard ca65 (and other assemblers) from choking on the new .addrs and .endaddrs directives, just change DEBUG_ADDR to 0. If that doesn't work, you'll still only have to comment out two lines total, no matter how many labels are addresses. That should cover a lot of cases. Or you could do the reverse and make any label an address, unless it's specifically designated as a constant within .consts and .endconsts directives.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 19, 2008 10:58 pm 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
Quote:
Having to state that this label is an address, and that label is an address, and this other label is an address, and that other label is an address, and so on seems kinda tedious to me.

I must have utterly failed in my presentation, because for the most part it's automatic. With the second patch, the only place that requires any work on the programmer's part is when coding literal memory addresses for non-indexed reads (and for non-65816 targets, only addresses < $100). Otherwise it's fully automatic.
Quote:
Hmm...come to think of it, maybe you could add a couple of new directives to say "here's a list of addresses"

This will work with either patch:
Code:
IOBASE     =   $8000+ADDR
IO1        =   IOBASE+0
IO2L       =   IOBASE+1
IO2H       =   IOBASE+2
IO3        =   IOBASE+3

And again, this is only necessary on the 65816; for other members, my second patch makes it unnecessary since a 16-bit value would never have been intended as an immediate value anyway.
Quote:
Or you could do the reverse and make any label an address, unless it's specifically designated as a constant within .consts and .endconsts directives.

Terminology clarification: when I see "label", I think of something with a colon after it, which is automatically an address with my patch. Do you mean simply "named value" when you say "label"?
Quote:
Is it possible that I've made that error at some point? Sure. But it's so rare that I don't ever remember doing so, and there was never one of those moments where I thought "I wish the assembler would check for this sort of thing".

I have to concede that it's fairly obscure. I guess it originally bothered me that the assembler has most of the semantic information to catch this error, but throws it away.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jan 20, 2008 3:35 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
dclxvi wrote:
And therein lies the rub. Having to state that this label is an address, and that label is an address, and this other label is an address, and that other label is an address, and so on seems kinda tedious to me.


Is it any more problematic than having to type "T *" (for any T you desire) in C or C++ every time?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 22, 2008 1:29 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
blargg wrote:
I must have utterly failed in my presentation, because for the most part it's automatic. With the second patch, the only place that requires any work on the programmer's part is when coding literal memory addresses for non-indexed reads (and for non-65816 targets, only addresses < $100). Otherwise it's fully automatic.


Speaking of mistakes, I really should read the thread more carefully. :) Let's see...if you're just marking EQUs whose value is address < 256, to distinguish them from EQU constants < 256, there are at worst only 256 zp address. Some addresses are be defined in terms of others (e.g. PTRH = PTRL+1; that was a really bad example in my previous post). So by marking a handful or two of zero page addresses specially, you should be able to at least catch some common cases of missing #s...okay, I'll agree that's a reasonable claim.

So, this would be assuming that:

Code:
buffer1  =   $1000    ; address
buf1_len =   384      ; constant
         LDA #buffer1
         LDY #>buffer1
         STA ptr
         STY ptr+1
         LDA #buf1_len
         LDY #>buf1_len
         STA count
         STY count+1


would give warnings because the LDAs don't have a < (lo byte) operator, thus missing #s with a < operator would be LDA <buffer1 and LDA <buf1_len (a warning according to the original post) rather than LDA buffer1 and LDA buf1_len (without the < operator). Otherwise, wouldn't you have to mark buffer1 as being an address?

By the way, if that assumption is correct (and if it isn't, never mind me here), would LDX #-2 give a warning since the -2 using 16-bit arithmetic is $FFFF, or is something like that (it's clearly a constant) handled differently? (The point here is not to catch a missing #, but whether you have to use something like #<-2 instead of #-2). A couple of places where this is used from time to time are with loops that count up, rather than down, and with functions that return a -1 ($FF), 0, or 1. On the 65816, LDX #-10 might represent either $F6 or $FFF6 depending on whether 8-bit or 16-bit immediate data is being assembled. Truncation of the high byte(s) for 8-bit immediate data is kinda convenient for that.

There are some other situations that are more complicated on the 65816, involving the D register. One example is with the TSC-TCD technique (see the parameter passing article on the wiki), where an instruction like LDA 5 can be used to access parameters on the stack (there are instances of this in the example code of the wiki article), so you really do mean LDA 5 and not LDA #5. So wouldn't you have to convert them all to ADDR+5 etc. (or use something like FRAME = ADDR+0 and convert them all to FRAME+5 etc.) for missing # detection?

blargg wrote:
Terminology clarification: when I see "label", I think of something with a colon after it, which is automatically an address with my patch. Do you mean simply "named value" when you say "label"?


I've been sloppy with terms here. Usually by "label" I mean anything in the leftmost column (except comments), thus in:

Code:
; comment
LABEL1 =     123
LABEL2 LDA   #0
       TAY
LABEL3
LABEL4 .byte $AB


LABEL1, LABEL2, LABEL3, and LABEL4 are all labels. However, LABEL2, LABEL3, and LABEL4, are all clearly addresses, so in this thread, without explicitly saying so, I have been using "label" to refer to the case of LABEL1, since that is an ambiguous case, as it's not specified as an address or as a constant.

kc5tja wrote:
Is it any more problematic than having to type "T *" (for any T you desire) in C or C++ every time?


No, but given the choice between an assembler where you have to indicate types and an assembler where you don't, I'd choose the latter. That's just my personal preference. It doesn't make the program any smaller or run any faster; the object code is going to be the same in either case. It's just there as a sanity check for the programmer. Typed assemblers have always struck me as overkill. Again, IMO.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 22, 2008 6:20 am 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
Quote:
Let's see...if you're just marking EQUs whose value is address < 256, to distinguish them from EQU constants < 256, there are at worst only 256 zp address. Some addresses are be defined in terms of others (e.g. PTRH = PTRL+1; that was a really bad example in my previous post).

This is assuming you manually allocate zero-page, rather than just use .zeropage and let the assembler do it. For a list of addresses, I do like your idea of an "address list" mode.
Quote:
So, this would be assuming that [...] would give warnings [...]

No warnings, actually (some errors if registers are in 8-bit mode, but that's on any assembler). I disabled checking for using an address as an immediate, since that can be fairly common. Here's a summary of the operator result types and case where a warning is made. The rules changed slightly since the first post, and I didn't state them as precisely as this before:
Code:
addr <n        where n < $100 (including negative)
val  <n        where n >= $100

addr addr+val
addr addr+addr
addr val+addr
val  val+val

addr addr-val
val  addr-addr
val  val-addr

val  all other operators

Warning if all of the following hold:
- Operand is val
- Non-indexed addressing mode
- Instruction also supports immediate mode
- 65816 target OR operand < $100 (including negative)

So for the stack-pointer-in-direct-page-register example, you'd just do LDA <5 to use direct-page addressing mode. You'd only need ADDR if you wanted absolute addressing mode for something like LDA $1234+ADDR (again, +ADDR is only necessary on 65816, since only there could you ever have reasonably meant to load A with a 16-bit immediate value).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 22, 2008 6:44 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
You know, much of this issue comes from the simple fact that LDA tells you nothing. Contrast this with RISC CPUs, whose assemblers quite often have very precise opcode/mnemonic mappings. If memory serves me correctly, a PowerPC assembler would have something like:

* lw - load word into register
* lh - load half-word into register
* lb - load byte into register
* lwu - same as lw, but unsigned (meaning, zero-extended to full register width)
* lhu
* lbu
* lwi - load word using an indexed addressing mode.
* lhi
* lbi
* lwui
* lhui
* lbui

Anyway, I think you get the picture.

The idea of "addressing modes" really only makes sense on architectures that are TOTALLY orthogonal (e.g., the PDP-11). On any other CPU, they're no more than intellectual conveniences.

Therefore, maybe it's high time that the 6502 and 65816, perhaps the most non-orthogonal architectures on the face of the planet today, just drop the idea of "addressing modes" all-together, and instead switch to explicit mnemonics for everything. Truely, this would make writing assemblers a *LOT* easier.

For example, to multiply two 16-bit numbers into a 32-bit number:

Code:
.export unsignedMultiply_by_
.proc unsignedMultiply_by_
    ; Multiply two unsigned 16-bit integers to produce a 32-bit result.
    ;
    ; Inputs:
    multiplicand = multiplier+2     ; One term
    multiplier = rpc+2              ; The other term
    ;
    ; Returns:
    result = multiplicand+2         ; 32-bit result

    rpc = regA+2
    pha
    regA = regX+2
    phx
    regX = regD+2
    phd
    regD = multiplicandHi+2
   
    pha
    multiplicandHi = 1

    tsc
    tcd

    stzd result                     ; STZD = STZ dp
    stzd result+2
    stzd multiplicandHi

    ldxiw 16                        ; LDXIW = LDX #nnnn
nextBit:                            ; LDXIB = LDX #nn
    lsrd multiplier                 ; it'd do away with .a8/.i16 etc.
    bcc skipAddition
    clc
    ldad multiplicand               ; LDAD is LDA dp
    adcd result                     ; LDADI is LDA (dp)
    stad result                     ; LDADIY is LDA (dp),y
    ldad multiplicandHi             ; LDADL is LDA [dp]
    adcd result+2                   ; LDADLY is LDA [dp],y
    stad result+2                   ; LDAS is LDA x,S
skipAddition:                       ; LDASIY is LDA (x,S),y
    asld multiplicand               ; you get the idea.
    rold multiplicandHi
    dex
    bne nextBit

    pla
    pld
    plx
    pla
    rts
.endproc


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 22, 2008 8:45 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8422
Location: Southern California
Quote:
maybe it's high time that the 6502 and 65816...just drop the idea of "addressing modes" altogether, and instead switch to explicit mnemonics for everything. Truly this would make writing assemblers a *LOT* easier.

That's why my Forth assembler is that way. It made it much easier. There is for example
LDA#
LDA_ZP
LDA_ABS
LDA_ABS,X
etc., and all these do is lay down the op code. It's up to the programmer to comma-in the operand after laying down the op code, so there's no parsing at all. It does not have to look ahead. The initial version of the assembler took part of an evening to write, and that was mostly just entering the op codes for all the mnemonics. Since for example there's no three-letter AND (it's AND#, etc.,) I didn't even have to put them in a separate vocabulary to avoid conflicts with Forth's words like AND. More than one instruction can be put on a line, for example,
Code:
DEX  DEX
LDA# 4F C,   STA_ZP TOS_LO C,
INC_ABS VARIABLE  ,   BNE 1$

The whole assembler is small enough to keep in memory all the time.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jan 22, 2008 6:57 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
My Forth assembler (now) does this too. However, I was talking about adopting this practice for more conventional assemblers. The fact is, having to constantly type "C," or "," after operands is pretty cumbersome.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: