Warning of LDA const instead of LDA #const

Programming the 6502 microprocessor and its relatives in assembly and other languages.
User avatar
dclxvi
Posts: 362
Joined: 11 Mar 2004

Post by dclxvi »

blargg wrote:
This is assuming you manually allocate zero-page, rather than just use .zeropage and let the assembler do it. For a list of addresses, I do like your idea of an "address list" mode.
I was once among the many 6502 programmers who were in the habit of defining zero page usage with EQUs rather than DS (uninitialized storage, it just advances the ORG address) with an ORG on the zero page. Bad habit or not, the former is pretty common.

However, for zero page locations that are input/output parameters of ROM routines it makes sense to use EQUs, e.g.

Code: Select all

ROMSUB1PARM1 = $FD
ROMSUB1PARM2 = $FE
ROMSUB1      = $F000
ROMSUB2PARM1 = $FB
ROMSUB2      = $F123
blargg wrote:
So for the stack-pointer-in-direct-page-register example, you'd just do LDA <5 to use direct-page addressing mode.
That's less typing, but it still has to be done numerous times. Starting from the "Example Implementing strtok()" section in the wiki (just after the tsc-tcd concept is introducted), by my quick count, there are 9 instances (LDA, CMP, etc. but not STA/STX/STY) where you'd have to do this. All but one block of assembly code (which only contains two JSRs and several .word directives) from that point in the article contains at least one instance. And these are intentionally short and simple examples. Most procedures you write are going to have at least one instance.

Another example is that address-address can be either a constant or an address, e.g.

Code: Select all

     LDX #0
LOOP LDA STR,X
     JSR SUB
     INX
     CPX #STR_END-STR  ; address - address = constant
     BCC LOOP
     RTS
STR  .byte "some string"
STR_END
vs.:

Code: Select all

DP_BASE = $1000     ; address
VAR1    = DP_BASE   ; address
VAR2    = DP_BASE+2 ; address

SUB1
;
; Set up VAR1 for later
; The D register doesn't point at DP_BASE yet
;
     ASL
     STA VAR1          ; uses abs (or long) addressing here
     RTS
SUB2
;
; Set the D register to DP_BASE (e.g. for speed)
;
     LDA #DP_BASE
     TCD
;
; Start by copying VAR1 to VAR2
;
     LDA VAR1-DP_BASE ; address - address = dp address
     STA VAR2-DP_BASE
That's not the only way to do either of those things, but I think both examples are reasonable approaches.
kc5tja wrote:
Therefore, maybe it's high time that the 6502 and 65816, perhaps the most non-orthogonal architectures on the face of the planet today, just drop the idea of "addressing modes" all-together, and instead switch to explicit mnemonics for everything. Truely, this would make writing assemblers a *LOT* easier.
You can make a case for either. I would argue against putting the zp/abs (or zp/abs/long on the 65816) distinction into the instruction mnemonic, even if the ,X ,Y (,X) (),Y etc. are. It's not as big a deal on the 65816, but on the 6502/65C02, if someone's example is written for abs and I want to use zp (or vice versa), I'd prefer to just make the single change of LABEL = $1000 to LABEL = $00 rather than edit numerous instructions. Yeah, this could be handled with a boatload of macros (one for each instruction) that check if the argument (label) is > $FF, for those variables that can be either abs or zp. But that seems kinda one step forward, one step back to me.

Speaking of macros, they can be simpler without the distinction in the instruction, e.g.

Code: Select all

MOV2A MAC    ; 6502/65C02 (.n are the macro parameters)
      LDA .1
      STA .2
      LDA 1+.1
      STA 1+.2
      EOM
This can handle any combination of abs, zp, abs,X, zp,X, and abs,Y. When the distinction is in the instruction, with really good macro capability, the macro could be invoked with something like MOV2A AX VAR1 AX VAR2 and then the macro would append the AX to the LDA and STA to form LDAAX and STAAX, but that's still a more complicated macro.

The assembler only has to be written once. I'm not sure that making the assembler itself easy to write is all that important.
kc5tja wrote:

Code: Select all

    ldxiw 16                        ; LDXIW = LDX #nnnn
nextBit:                            ; LDXIB = LDX #nn
    lsrd multiplier                 ; it'd do away with .a8/.i16 etc.
Some assemblers avoid the .a16/.a8 nonsense by using # to indicate 8-bit immediate data and ## to indicate 16-bit immediate data, e.g.

Code: Select all

LDA #0  ; assembles A9 00
LDA ##0 ; assembles A9 00 00
GARTHWILSON wrote:
That's why my Forth assembler is that way. It made it much easier.

(snip)

The whole assembler is small enough to keep in memory all the time.
Um, 6502 assemblers usually are pretty simple in Forth in any case.

I wrote an ugly 6502 prefix assembler (I should've just made it postfix and I should've implemented labels differently) which is both a simplification and an expansion of the FIG-Forth 6502 assembler. There are colon definitions for JMP and JSR, and only 4 defining words: one for addressing modes, one for implied instructions, one for relative branch instructions, and one for the remaining instructions. It includes branch range checking, and ORG capability, i.e. ORG need not equal HERE (it was intended as a cross assembler for the NMOS 6502). It doesn't include forced zp or abs, though that would be really simple to add (probably only a few dozen bytes or so). It only takes 775 bytes of dictionary space and 690 bytes of head space (heads are separated in this particular indirect-threaded Forth). A significant percentage of it consists of using the defining words to define the instructions (ADC, AND, etc.) It's not as simple as what you're doing, but as poorly designed as it is, it was still pretty easy to do.
fachat
Posts: 1123
Joined: 05 Jul 2005
Location: near Heidelberg, Germany
Contact:

Post by fachat »

For what ir's worth, the xa64 assembler (shamless self plug :-)
has address segments for labels. So it knows whether a label
is in zeropage, text (code), data or bss (uninitialized).
IIRC it actually issues warnings in certain types of situations
(e.g. adding addresses, but not substracting addresses, which is
useful)

segments are started with pseudo opcodes (e.g. ".text", ".zero")
and assigning addresses to labels works using the "*" as "PC"
in the assignement or by using the label name in front of the
opcode. Not using "*" results in an absolute value.

e.g.

Code: Select all

.text
dataval=15
label1=*
label2 LDA #<dataddr1
       LDX #dataval
.data
dataddr1 .byt 1,2,4

just my .02€
André

User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Post by BitWise »

There have been a number of assemblers that have used this approach in the past. Lee's site has a listing of a KIM-1 program that uses suffixed instructions and the original source of Microchess does as well - although not for absolute instructions.
http://www.themotionstore.com/leeedavis ... us/21.html
http://www.floodgap.com/retrobits/ckb/s ... essasm.txt
I did some work for Acorn in the late 1980's (a simple database system squeezed into a 16K ROM) and used their macro assembler which also worked this way. It certainly compiled very quickly compared to other assemblers on the BBC but I can't say I was all that keen on suffixed opcodes. Whilst they make the job of determining the addressing mode easier for the assember (especially for indirect instructions) I think the original notation expresses the addressing action of the more clearly.

At some time or other I probably have missed a # out of a immediate instruction but such mistakes are pretty easy to spot and I'm not sure all the additional baggage needed to allow the assembler to spot it is really worth it. There are plenty of other ways I could make mistakes that break my code that the assembler can't spot, does this one really need special treatment?

I can see the value of using suffixed opcodes in a Forth assembler but why have you opted to have address arguments after the operand rather than before (as in normal Forth code)? I would have thought that something like ...

Code: Select all

: LDAIM, ( n -- ) A9 C, C, ; IMMEDIATE

FF LDAIM,
... would have been more in the spirit of Forth.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
fachat
Posts: 1123
Joined: 05 Jul 2005
Location: near Heidelberg, Germany
Contact:

Post by fachat »

BitWise wrote:
At some time or other I probably have missed a # out of a immediate instruction but such mistakes are pretty easy to spot and I'm not sure all the additional baggage needed to allow the assembler to spot it is really worth it. There are plenty of other ways I could make mistakes that break my code that the assembler can't spot, does this one really need special treatment?
xa65 uses the pseudo opcodes to put the code into different sections of
the binary "o65" file, which is relocatable - so the assembler needs to
have an idea which labels are addresses and which are not.

The checks are a nice byproduct

André
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

BitWise wrote:
I can't say I was all that keen on suffixed opcodes. Whilst they make the job of determining the addressing mode easier for the assember (especially for indirect instructions) I think the original notation expresses the addressing action of the more clearly.
In my experience, I positively have to disagree with you. I find LDA (dp,X) and LDADX dp to be completely isomorphic.

The only disadvantage is if you use a macro like:

foo .macro
lda .1
sta .2
.endm

then you could conceivably use something like, foo (dp,X) (dp),Y and have it "just work."

However, I have never actually encountered a need for this level of support.

If you really want to discuss what was more descriptive of what was going on, this syntax would have excelled:

a := @dp+X
y+@dp := a

where @ is a monadic operator that means "the contents of" (kind of like * in C). To qualify things like @2 for direct page, absolute, or long, perhaps you would use different operators (e.g., * for absolute) or use type-casts (e.g., long(@2)).

Hmm...I guess I'll have to give that a whirl now to see what that would look like:

Code: Select all

.export unsignedMultiply_by_
.proc unsignedMultiply_by_
    ; Multiply two unsigned 16-bit integers to produce a 32-bit result.
    ;
    ; Inputs:
    multiplicand = multiplier+2     ; One term
    multiplier = rpc+2              ; The other term
    ;
    ; Returns:
    result = multiplicand+2         ; 32-bit result

    push(a, x, y, d)
    rpc = regA+2
    regA = regX+2
    regX = regD+2
    regD = multiplicandHi+2
   
    push(0)
    multiplicandHi = 1

    ; If you use parsing rules similar to the J programming language,
    ; you could write the following like this:  d := a := s
    a := s
    d := a

    @(long(result)) := 0
    x := 16
    loop {
      @multiplier /= 2
      if(carry) {
        @(long(result)) := @(long(multiplicand)) + @(long(result))
      }
      @(long(multiplicand)) *= 2
      x := x - 1
    until(zero)

    pull(a, d, x, a)
    return
.endproc
I will have to play with this concept a bit more.
Quote:
I can see the value of using suffixed opcodes in a Forth assembler but why have you opted to have address arguments after the operand rather than before (as in normal Forth code)?
Laziness to some extent. Writing:

Code: Select all

: opc   create , does> @ c, ;
was simpler than:

Code: Select all

: opc0   create c, does> @ c, ;
: opc1   create c, c, does> dup @ c, 1+ @ c, ;
: opc2   create c, , does> dup @ c, 1+ @ ,abs ;
: opc3   create c, , does> dup @ c, 1+ @ ,long ;
: opcR   create c, does> @ c, ,pc-rel8 ;
then making sure everything lined up nicely.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Post by GARTHWILSON »

Quote:
but why have you opted to have address arguments after the operand rather than before (as in normal Forth code)?
Because I used a commercially available Forth metacompiler for a couple of years which did the assembly in postfix, and I absolutely hated it. However that might have been because I had already done so much "regular" assembly programming. After all my time on postfix assembly, it still looks like Greek to me. The visual factoring is not there, and making sense of it does not come as naturally. I like Forth a lot, but not postfix assembly.

BTW, instead of using

Code: Select all

: LDAIM, ( n -- ) A9 C, C, ; IMMEDIATE
which takes 22 bytes in IDT Forth including the header, nest and unnest, the address of LITERAL's runtime, and the alignment byte, (or 20 bytes without the second C, ), you can use one of Forth's strengths and make a defining word COMP_OP (compile op code) whose use when you're compiling the assembler would look like

Code: Select all

COMP_OP  LDA#  A9
The defining word scans the input stream for the name (LDA# in this case), lays down the header, followed by the CFA which points to COMP_OP's runtime comp_op, followed by the one-byte op code which is also found by scanning ahead in the input stream. When you're assembling, comp-op lays down the A9. It is similar to a one-byte CONSTANT but compiles the A9 instead of putting it on the stack. It only takes three bytes plus the header. This eliminates nest, unnest, lit, 00A9, and C, (plus a second C, if you still want postfix assembly) making the assembler faster and a couple thousand bytes shorter.
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

I know this is wildly off-topic now, but I couldn't resist. Garth, your approach is fundamentally correct, but way too complicated. Use CREATE and DOES> -- it factors everything out for you.

Code: Select all

: COMP   CREATE , DOES> C@ C, ;

HEX
A9 COMP LDA#
In fact, I took this one step farther in my assembler's opcode table:

Code: Select all

: ~    CREATE DUP , 1+ DOES> C@ C, ;
: ||   ~ ~ ~ ~  ~ ~ ~ ~ ;

0
||  brk,  ora(,x), cop,   ora,s,     tsb,      ora,    asl,    ora[],   ( 0x )
||  php,  ora#,    asla,  phd,       tsb@,     ora@,   asl@,   ora@l,
||  bpl,  ora(),y, ora(), ora(,s),y, trb,      ora,x,  asl,x,  ora[],y, ( 1x )
||  clc,  ora@,y,  inca,  tcs,       trb@,     ora@,x, asl@,x, ora@l,x,
||  jsr@, and(,x), jsr@l, and,s,     bit,      and,    rol,    and[],   ( 2x )
||  plp,  and#,    rola,  pld,       bit@,     and@,   rol@,   and@l,
||  bmi,  and(),y, and(), and(,s),y, bit,x,    and,x,  rol,x,  and[],y, ( 3x )
||  sec,  and@,y,  deca,  tsc,       bit@,x,   and@,x, rol@,x, and@l,x,
|| ...etc...
Makes life MUCH easier. :-)
Last edited by kc5tja on Wed Jan 23, 2008 8:52 pm, edited 1 time in total.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Post by GARTHWILSON »

True, but using the assembly-language runtime makes it run faster than doing it purely in Forth with >DOES. Hmmm... I think there's an assembly equivalent for >DOES , but it has been so long since I've used it that I can't remember. Time for review. My work hasn't taken me down this particular road in a few years.
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

You posted before I could finish my edit. Re-read my post above; I've added something to it.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Post by GARTHWILSON »

Ah yes, even shorter.
debounce
Posts: 27
Joined: 23 Nov 2004
Location: London, UK

Post by debounce »

kc5tja wrote:
That's true only for the first time someone sees ADDR+ notation. You'll find that as its popularity spreads,
I wouldn't have said anything but I believe in voting for something you want to keep.

FWIW I prefer the old syntax, and not just for inertia's sake. Addressing modes are an 'intellectual convenience' I'd like to keep, as they are a mnemonic in themselves. (All-alphabetic mnemonics won't save you from composing an illegal combination.)

If I had to have something then instead of a paranoid ADDR+ notation I'd rather have an optional @ prefix that can be stripped out for older assemblers and merely suppresses the warning, like if((foo=bar)).
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

debounce wrote:
FWIW I prefer the old syntax, and not just for inertia's sake. Addressing modes are an 'intellectual convenience' I'd like to keep, as they are a mnemonic in themselves.
That is their syntactic point -- the idea is to make assembly language a bit more like a high-level language.
Quote:
(All-alphabetic mnemonics won't save you from composing an illegal combination.)
I vehemently disagree. If you know you're accessing direct page, you know a priori that the 'd' appears in the suffix. It becomes utterly automatic. Currently, the instruction LDA 5 is ambiguous in any 6502 or 65816 assembler. Is it direct-page? Well, it's not really an issue on the 6502, because direct page is zero page. But on the 65816, it matters immensely.

From inertia, any absolute address less than 256 is typically assembled as a direct page reference. If you want otherwise, you need to remember to prefix the address with (in ca65's case) A: (e.g., LDA A:5) to force an absolute addressing mode. There is a similar prefix for forcing a long-address too. Conversely, there is also a Z: prefix that forces direct-page as well. Many assemblers won't share this precise syntax, but the fact that they exist at all stands testament to the inherent ambiguity of the assembly syntax.

The RISC approach, where you have one opcode, one mnemonic, eliminates outright this ambiguity. Therefore, there is zero need for such prefixes, and utterly no need for assemblers to "guess" what addressing mode you really wanted to use in their absence. The only disadvantage is that it makes macros a wee bit less flexible.
Quote:
optional @ prefix that can be stripped out for older assemblers and merely suppresses the warning, like if((foo=bar)).
How can an optional @ prefix be stripped out? Can you give an example of how this would work?
User avatar
dclxvi
Posts: 362
Joined: 11 Mar 2004

Post by dclxvi »

GARTHWILSON wrote:
Hmmm... I think there's an assembly equivalent for >DOES , but it has been so long since I've used it that I can't remember.
I believe the word you are thinking of here is ;CODE (note the semicolon).
debounce
Posts: 27
Joined: 23 Nov 2004
Location: London, UK

Post by debounce »

kc5tja wrote:
debounce wrote:
FWIW I prefer the old syntax, and not just for inertia's sake. Addressing modes are an 'intellectual convenience' I'd like to keep, as they are a mnemonic in themselves.
That is their syntactic point -- the idea is to make assembly language a bit more like a high-level language.
Not sure what you're saying here. You mean "the idea [of bracket-and-comma] is..." or "The idea [of all-alphabetic] is..."? If the latter, then reducing mnemonics to a simple label for an opcode byte (and a width specifier for the following operand) pushes the assembly language closer to machine code. As evidenced by the Forth examples quoted earlier.
kc5tja wrote:
Quote:
(All-alphabetic mnemonics won't save you from composing an illegal combination.)
I vehemently disagree. If you know you're accessing direct page, you know a priori that the 'd' appears in the suffix. It becomes utterly automatic. Currently, the instruction LDA 5 is ambiguous in any 6502 or 65816 assembler. Is it direct-page? Well, it's not really an issue on the 6502, because direct page is zero page. But on the 65816, it matters immensely.

From inertia, any absolute address less than 256 is typically assembled as a direct page reference. If you want otherwise, you need to remember to prefix the address with (in ca65's case) A: (e.g., LDA A:5) to force an absolute addressing mode. There is a similar prefix for forcing a long-address too. Conversely, there is also a Z: prefix that forces direct-page as well. Many assemblers won't share this precise syntax, but the fact that they exist at all stands testament to the inherent ambiguity of the assembly syntax.
That's ambiguity, and I completely agree. What I meant was non-orthogonality. INC abs,Y is an illegal instruction (the only one that's bitten me.) It's just as easy to assume INCAY abs exists.
Quote:
The RISC approach, where you have one opcode, one mnemonic, eliminates outright this ambiguity. Therefore, there is zero need for such prefixes, and utterly no need for assemblers to "guess" what addressing mode you really wanted to use in their absence. The only disadvantage is that it makes macros a wee bit less flexible.
And the mnemonics aren't such great mnemonics any more. Some people work better with brackets and commas.
kc5tja wrote:
Quote:
optional @ prefix that can be stripped out for older assemblers and merely suppresses the warning, like if((foo=bar)).
How can an optional @ prefix be stripped out? Can you give an example of how this would work?
The @ prefixes an otherwise bare operand of an instruction to indicate it is intended as an absolute address. Any addressing mode notation # () [] , A: Z: is enough to show the intention (or non-intention) and @ is forbidden. Where @ is allowed and the assembler does not generate a warning (depending on the sophistication of its heuristics), @ is optional. @ has nothing to do with ADDR notation, but there's no reason why the two couldn't be used in conjunction (though I'd rather there was a shorter and less shouty symbol than "ADDR+".)

Assuming the assembler only implements @ notation:

Code: Select all

        
        LDY     const   ; give warning
        LDY     @const  ; opcode AC or A4
        LDA     <addr   ; give warning as < is not an addressing mode specifier
        LDA     @<addr  ; opcode A5 as <addr less than 256
        LDA     A:addr  ; opcode AD
        LDA     #<addr  ; opcode A9
        STA     const   ; opcode 8D, unambiguous
        STA     @const  ; opcode 8D
@ notation is appropriate when the problem is framed as "# being left off instructions." ADDR notation is an appropriate response to "constants being used as addresses" (in which case STA const should give a warning.) I prefer @ notation as ADDR doesn't cover the case of an address being used as an address when it should have been used as a constant.

We'll see, I suppose, which syntax wins out.
debounce
Posts: 27
Joined: 23 Nov 2004
Location: London, UK

With tongue in cheek...

Post by debounce »

In the meantime, if you're stuck with an 'old' assembler, you could always implement the new syntax yourself:

Code: Select all

LDAIM   MAC
        .byte   $A9
        .byte   .1
        EOM

LDAAB   MAC
        .byte   $A5
        .word   .1
        EOM
...
:D
Post Reply