Page 3 of 4

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 12:10 pm
by BillG
John West wrote:
cjs wrote:
One of the features I use and value most in the macroassembler AS is named local labels. Any label that starts with a period is considered local to the scope between the two nearest global labels. I find things like JMP .loop and BEQ .exit to be a lot more readable and reliable than using the equivalant + and - local label syntax.
I like that one.
My cross assembler supports local labels in the form of a two digit number. References are in the form of the label appended with a 'b' or 'f' for the instance before or following the line.

Discussion here:

viewtopic.php?p=76604#p76604

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 1:46 pm
by cjs
John West wrote:
For my assembler, I'm toying with the idea of introducing nested scopes, with C-style { } or Pascal-ish .begin and .end. Symbols defined inside a scope are only visible within that scope. But I haven't implemented it yet, and have no experience with how usable (or unusable) it will be. If it doesn't work out, your suggestion is my next best choice.
I probably wasn't clear enough when I mentioned that, "I find this to be orthogonal to larger-scale .proc scoping." Those are the explicit, nestable scopes that you're talking about. And, as I said, I appreciate and use both.

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 5:50 pm
by BigDumbDinosaur
cjs wrote:
One of the features I use and value most in the macroassembler AS is named local labels...

The HCD65 assembler that was part of Commodore's DEVPAK assembly language development software for the C-128 supported local labels (and conditional assembly) as you described. A local label was defined by appending a dollar sign to the label name, e.g.,
EXIT$. Its scope was bounded by the nearest global labels.

Quote:
Any label that starts with a period is considered local to the scope between the two nearest global labels.

That's how local labels are implemented in the Kowalski assembler.

Quote:
As for the whole brouhaha about whether one should ever consider a new assembler syntax, while it's reasonable to point out the downsides of a new syntax, starting out your comment with "I don't understand why someone would..." is saying essentially "I am ignorant as to why someone would...."...Consider what your reaction would be to, "I don't understand why in this day and age someone would design a new system based around a 6502 processor."

I'm not sure your analogy holds water here. It would be like questioning one's decision to purchase an automobile with a manual transmission instead of an automatic one (the implication being stick shift is inferior technology, something most long-haul truckers would not agree with). In this case, the OP is, to belabor an analogy, proposing a new way to operate the clutch and shift gears that is different than found in most automobiles.

A standard exists for the 6502 assembly language that was promulgated by the original manufacturer of the device. As part of that original standard, assembler directives (pseudo-ops) were published and a reference assembler was made available to developers to use and study. The current manufacturer of 6502 technology (WDC) continues to promulgate that standard with suitable enhancements to accommodate the modern incarnations of the 6502, especially the 65C816. The enhancements that WDC has published to the 6502 assembly language do not obsolete or otherwise render the MOS Technology standard invalid and in fact, the assembler that is included with WDC's development software will assemble NMOS 6502 code and the commonly-used pseudo-ops of that era without error.

Given that, any assembler that significantly deviates from the published standard is going to create porting headaches for anyone who tries to use it. I use the Kowalski assembler for my projects and while it is close to the MOS/WDC standard, it deviates in some areas that have given me trouble at times (e.g., use of
@ instead of % to indicate a bit-wise value). It would be better if the assembler were 100 percent compliant, but monkeying with code written in Microsoft's version of C++ is not something that interests me.¹ In any case, being the old codger I am, my time has become too valuable to me to fix other people's code. :D

————————————————————
¹Since I wrote this, 8BIT has extensively modified the Kowalski assembler to be almost 100 percent compliant to the MOS Technology standard. His modifications also added the ability to assemble 65C816-unique instructions and support enhanced operand sizes.

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 5:55 pm
by GARTHWILSON
John West wrote:
cjs wrote:
One of the features I use and value most in the macroassembler AS is named local labels. Any label that starts with a period is considered local to the scope between the two nearest global labels. I find things like JMP .loop and BEQ .exit to be a lot more readable and reliable than using the equivalant + and - local label syntax.
I like that one. For my assembler, I'm toying with the idea of introducing nested scopes, with C-style { } or Pascal-ish .begin and .end. Symbols defined inside a scope are only visible within that scope. But I haven't implemented it yet, and have no experience with how usable (or unusable) it will be. If it doesn't work out, your suggestion is my next best choice. It won't allow nesting, but it's more readable than anything else I've seen.

I might use : instead of . though. Words starting with . are always directives to me.

The 2500AD assembler I used in the 1980's used the $ after the name to mean local, which I like more. (I see BDD wrote the same thing about another assembler too while I was writing this.)

You can use macros for the scopes though, and get rid of the labels altogether. I show how in my article about it at http://wilsonminesco.com/StructureMacros/, with longer programming examples starting about 60% of the way down the page in my article on simple multitasking methods at http://wilsonminesco.com/multitask/, and more information on how the insides work in the related page of my 6502 stacks treatise at http://wilsonminesco.com/stacks/pgmstruc.html . Here's another example usage, this one being from a project I did for work several years ago using a PIC16 microcontroller. I hope the lines don't wrap on your monitor):

Code: Select all

                                                        ; When we come here the first time after selection, NEW_KEY_FLAG was
TREATMENT:                                              ; cleared in MENU_CASE_PRELUDE, and MENU_ITEM_STATE was cleared in
        MOVF    MENU_ITEM_STATE, W                      ; MENU_TASK case 0.  If TREATMENT was already selected, then NEW_KEY_FLAG
        CASE                                            ; will remain set if a new keypress was made.
            CASE_OF_  5                                 ; Case 5 is for running the treatment.
                GOTO  TREATING                          ; (CALL, RETURN)  I'm separating out this routine.  Case 5 is first, to
            END_OF_                                     ; reduce the number of instructions taken each time the task is called up
                                                        ; after a treatment is actually started.

            CASE_OF_  0                                 ; Case 0 is for when we first arrive here from the main menu.  Display
                DISP_ROM_STR  CLR,  Sel_a_Rx_STR        ; "Rx #1  (Use < > )"  (but replace < and > with the arrow characters).
                INCF  MENU_ITEM_STATE, F                ; Transition to state 1, offering the choice of Rx #1.
                CLRF  MENU_ITEM_STATE_L2
                GOTO  trm1                              ; Go about 100 lines down, to test the validity of the Rx.
            END_OF_                                     ; (The _ after the END_OF prevents wasting memory on a GOTO <END_CASE>.)


            CASE_OF_  6                                 ; Case 6 is for exiting, to time the "Exiting" message.
                MOVF  MENU_ITEM_STATE_L2, W
                IF_ZERO                                 ; This is kind of like a "CASE_OF 0", and, further down, "CASE_OF 1".
                    DISP_ROM_STR  CLR,  Exiting_STR     ; Display, "Exiting".
                    CALL  _1_SEC_DISP                   ; Put the current time plus one second in LCD_TARGET_TM.
                    INCF  MENU_ITEM_STATE_L2, F
                ELSE_
                    IF_FLAG_VAR  NEW_KEY_FLAG, IS_CLEAR
                        CALL  TM_2_LCD_TARGET_TM?       ; How long left to see "Exiting"?  Take current time minus LCD_TARGET_TM.
                        RETURN_IF_ACCb_NEG              ; If it's negative, we must still give more time to see msg, so just exit.
                    END_IF
                                                        ; If we've reached the target time, or if a key was pressed,
                    CLRF  LCD_TARGET_TM                 ; make sure nothing else could think LCD_TARGET_TM is in use,
                    CLRF  MENU_STATE                    ; set _main_ menu state to 0, (MENU_TASK will clear MENU_ITEM_STATE)
                END_IF                                  ; and proceed to clear key status and exit.

                CLR_FLAG  NEW_KEY_FLAG
                RETURN
            END_OF_                                     ; The _ at the end of END_OF_ eliminates the jump to END_CASE, since it was
        END_CASE                                        ; immediately preceded by an unconditonal RETURN, & END_CASE is next anyway.

                                                        ; We're waiting for the 1st keypress to select a Rx.
        RETURN_IF_FLAG_VAR  NEW_KEY_FLAG, IS_CLEAR      ; If no key was pressed (which is the usual situation), just exit.
        CLR_FLAG  NEW_KEY_FLAG                          ; A key has been pressed.

        MOVF    NEW_KEY, W                              ; We only watch for <--, -->, HOME, END, NO, and YES.  Other keys ignored.
        CASE
            CASE_OF_   RAW_LEFT_KEY
                DECF   MENU_ITEM_STATE, F               ; Decrement with left_arrow key, but
                BTFSC  STATUS, Z                        ; don't let it get below 1.  If it did,
                INCF   MENU_ITEM_STATE, F               ; put it back to 1.
            END_OF


            CASE_OF_   RAW_RIGHT_KEY
                MOVF   MENU_ITEM_STATE, W               ; Check MENU_ITEM_STATE
                SUBLW  4                                ; against 4 (the max allowable).
                BTFSS  STATUS, Z                        ; If it's not already there,
                INCF   MENU_ITEM_STATE, F               ; you can increment it.
            END_OF


            CASE_OF_   RAW_HOME_KEY
                PUT  1, IN, MENU_ITEM_STATE
            END_OF


            CASE_OF_  RAW_END_KEY
                PUT  4, IN, MENU_ITEM_STATE
            END_OF


            CASE_OF_  RAW_NO_KEY
                PUT  6, IN, MENU_ITEM_STATE             ; This gets trapped above, to time the "Exiting" message.
                RETURN
            END_OF_                                     ; The _ at the end of END_OF_ eliminates the jump to END_CASE, to save a
                                                        ; program-memory word since there's an unconditional RETURN right before it.

            CASE_OF_  RAW_YES_KEY
                IF_FLAG_VAR  Rx_VALID_FLAG, IS_SET      ; If not valid Rx, YES key gets ignored. For checking, the decryp-
                    CALL   INSTALL_DECRYPT_ARRAY        ; tion was already done when the left or right arrow was pressed.

                    DISP_ROM_STR  CLR,  Treating_STR    ; Display "Treating" briefly before asking for AGL.
                    CALL   _4_SEC_DISP                  ; Tell it to display for 4 seconds before asking for initial AGL.
                                                        ; (No delay loops, but it puts current time + 4 sec in LCD_TARGET_TM.)
                    CALL   XFER_HI_TM_2_ACCb            ; We'll ask for first AGL right away (but it will wait until the
                    MOVF   ACCbLO, W                    ; "Treatment begun" message has shown for a few seconds first).
                    MOVWF  NEXT_AGL_TM                  ; We'll forgo the macro usage here since we're copying the time
                    MOVWF  TREATMENT_BEG_TM             ; to two variables, and we can avoid re-loading bytes this way.

                    MOVF   ACCbHI, W
                    MOVWF  NEXT_AGL_TM+1
                    MOVWF  TREATMENT_BEG_TM+1

                    _8V_UP                              ; Make the 8V pulse output high for 5 or 10ms.  (DELAY's parameter is the
                        DELAY  1                        ; number of ms if interrupts are off, with a 2ms resolution so input is
                    _8V_DN                              ; rounded up.  Here, interrupts are on though, and I'm seeing about 7ms.)

                    CLRF   AGL_IMG_TO_STORE             ; (This will get changed 3 lines down to reflect Rx number.)
                    CALL   SET_TREATMENT_BOUNDARY       ; Store a fake AGL with Rx#=0 to mark the beginning of another treatment.
                                                        ; Only the first byte of the fake AGL is significant; rest are random.
                    CLRF   MENU_ITEM_STATE_L2
                    COPY   MENU_ITEM_STATE, TO, AGL_IMG_TO_STORE     ; Before overwriting MENU_ITEM_STATE, store Rx# for STORE_AGL.
                    PUT    5, IN, MENU_ITEM_STATE       ; This will make the rest get handled (or re-routed) above.  The "ELSE_"
                END_IF                                  ; would be that YES was pressed on an invalid Rx; but that does nothing,
                RETURN                          ; so we don't have to specify.  (With an unconditional RETURN right before the
            END_OF_                             ; END_OF, we can put the _ after END_OF to keep it from assembling GOTO END_CASE.)
            RETURN                              ; All other keys are ignored.
        END_CASE                                ; But if <--, -->, HOME, or END were pressed, we will continue here to display the
                                                ; Rx number, followed by either "Rx is invalid", "Rx is blank", or the Rx itself.
                                                ; Invalid ones won't be stored here, but a crook may try to store by different way.
 < snip >

(This is from the semi-medical realtime multitasking project I told a little bit about at http://anycpu.org/forum/viewtopic.php?p=563#p563 . And yes, that's what assembly language looks like when a macro junkie does it. :D )

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 6:03 pm
by BigDumbDinosaur
GARTHWILSON wrote:
The 2500AD assembler I used in the 1980's used the $ after the name to mean local, which I like more. (I see BDD wrote the same thing about another assembler too while I was writing this.)

I preferred the trailing dollar sign as well, as the dot-label form looks too much like a pseudo-op to me.

Re: What assembly syntax is good to support in an assembler?

Posted: Sat Oct 31, 2020 6:42 pm
by BigEd
Quote:
A standard exists for the 6502 assembly language that was promulgated by the original manufacturer of the device. ... (WDC) continues to promulgate that standard with suitable enhancements ...
In practice, that particular syntax does not act as a standard: we have a fragmented world. There might be family resemblances, and each of us might have preferences - we might even wish there were a standard - but usage has gone beyond that point. Just as I write 'colour' and you write 'color' and neither of us is any more correct than the other.

Re: What assembly syntax is good to support in an assembler?

Posted: Sun Nov 01, 2020 3:25 am
by cjs
BigDumbDinosaur wrote:
Quote:
Any label that starts with a period is considered local to the scope between the two nearest global labels.
That's how local labels are implemented in the Kowalski assembler.
That seems to be the common way to do it, though I must mention that I'm not entirely happy about that. I reguarly run into circumstances where I wish the local labels had a slightly broader scope, such as routines with an additional entry point in the middle. I've not really figured out a better way to handle that. (Which is precisely why we need research on this sort of thing!)
GARTHWILSON wrote:
The 2500AD assembler I used in the 1980's used the $ after the name to mean local, which I like more. (I see BDD wrote the same thing about another assembler too while I was writing this.)
My instinct is that searching is easier with the dot-prefix form, especially when you're searching for things like "the next global symbol." I also find the dot form a bit cleaner looking, but that could be put down to arbitrary personal preference (though there's no question that it's less "ink on the page").
BigDumbDinosaur wrote:
I preferred the trailing dollar sign as well, as the dot-label form looks too much like a pseudo-op to me.
I don't see how; the WDC pseudo-ops don't start with a dot. Or do you prefer "non-standard" pseudo-ops?

--------------------------------------------------------------
The rest of this can be skipped if you're already convinced that trying out a different assmebler syntax is not always an entirely useless endeavour.
BigDumbDinosaur wrote:
I'm not sure your analogy holds water here. It would be like questioning one's decision to purchase an automobile with a manual transmission instead of an automatic one (the implication being stick shift is inferior technology, something most long-haul truckers would not agree with).
Even granting that you meant generic manual transmission rather than specifically "stick shift" (which is only one form of manual transmission, and usually assumes a particular clutching system; compare with e.g. sequential manual transmissions), your transmission analogy seems to support my argument. You're quite correct that various forms of manual and automatic transmission have different good and bad points in various situations, which is why there are various options currently in use.
Quote:
In this case, the OP is, to belabor an analogy, proposing a new way to operate the clutch and shift gears that is different than found in most automobiles.
I'm not totally buying this analogy, either, but even if we accept it, there are very good reasons to do exactly this. I've found, for example, that often one can achieve significant insight and understanding of something only from actually designing and implementing some versions of it different from the "the way it's done." And frequently enough, this kind of research does produce better ways of doing things.
Quote:
A standard exists for the 6502 assembly language that was promulgated by the original manufacturer of the device. As part of that original standard, assembler directives (pseudo-ops) were published....
Sure. And not only does that standard entirely lack features important to many (such as any form of label scoping), but also has some questionable design decisions. For example, it makes "bare" operands sometimes immediate (JMP $1234, loading a register with the given value) and sometimes indirect (LDA $1234, loading a register with the value pointed to by the given value; you must prepend a # for immediate with that instruction). That's actually been an annoying source of errors in my programming, and certainly is a design decision worth examining.
Quote:
Given that, any assembler that significantly deviates from the published standard is going to create porting headaches for anyone who tries to use it.
That's simply not true. It may create porting headaches for any situation you can imagine, but I can imagine situations where this would reduce porting headaches, such as when one's trying to write a single program targeting multiple microprocessors. (I anticipate you may object to say that this could never be the case; since I don't here want to get into a very lengthy discussion of my ideas relating to this, I'm just going to suggest that if this is your initial reaction you either accept that some people have had ideas that you haven't had, or spend some time making a sincere attempt to figure out how such a situation could arise.)

Re: What assembly syntax is good to support in an assembler?

Posted: Sun Nov 01, 2020 3:41 am
by GARTHWILSON
cjs wrote:
My instinct is that searching is easier with the dot-prefix form, especially when you're searching for things like "the next global symbol."

For that, I like MultiEdit's [1] "Condense display" mode where it shows only the lines that have something in column 1; then I start labels of only local interest starting in column 2.  The matter of searching for a label that has a space before it was brought up earlier; but with the condensed display feature, you don't have to be looking for a specific one.

Quote:
I also find the dot form a bit cleaner looking, but that could be put down to arbitrary personal preference

The reason I don't prefer that is that if you have a list of things some starting with dots and some not, and you have them vertically aligned, they don't look aligned.  It kind of looks like someone was sloppy about vertical alignment.  Personal preference again; but it emphasizes the need to accommodate different preferences instead of forcing the user to do it a certain way.

Quote:
such as when one's trying to write a single program targeting multiple microprocessors.

For that, I could recommend the C32 assembler, since you pay $99 once and you get a great macro assembler that's good for dozens of processors, and they give you the information to adapt it to even a new processor of your own design if you like.

[1] Edit, 8/9/25:  I just found out, from the Wikipedia article that MultiEdit is defunct, due to the heart-attack death of the man who really made it go, and that even the website has been gone since Aug 2022.

Re: What assembly syntax is good to support in an assembler?

Posted: Sun Nov 01, 2020 4:14 am
by cjs
GARTHWILSON wrote:
cjs wrote:
I also find the dot form a bit cleaner looking, but that could be put down to arbitrary personal preference
The reason I don't prefer that is that if you have a list of things some starting with dots and some not, and you have them vertically aligned, they don't look aligned.
That could be considered an advantage for local symbols, at least in assmblers where symbols must start in the first column, since then your local symbols look "indented" as compared to the global ones.

It's also potentially an argument for not starting psuedo-ops with a dot, though I suppose if that appearance of indentation bothers you you could just start the pseudo-ops one column earlier. (There are probably stronger arguments against dot-prefixed psudo-ops; I don't quite understand what the purpose of the dot-prefix is, actually. It doesn't distinguish between whether data is emitted to the object image or not, and introduces the question of whether macro names should start with a dot or not, or vary based on what the macro does. Fortunately for me, dot-prefix versus no-dot pseudo-ops doesn't bother me enough to be a factor in my choice of assembler.)
Quote:
Quote:
such as when one's trying to write a single program targeting multiple microprocessors
For that, I could recommend the C32 assembler, since you pay $99 once and you get a great macro assembler that's good for dozens of processors, and they give you the information to adapt it to even a new processor of your own design if you like.
Right. The option I went for is AS which is free and open source, and probably supports more CPUs than any other 8-bit-focused assembler out there, as well as being well-supported.

But I don't see using multiple assemblers as a huge problem, either; even with AS I end up using macros to define consistent sets of commands; e.g., AS (quite reasonably) uses DB when assembing for Intel-style architectures but BYT when assembling for Motorola-style (including MOS/CBM/WDC architectures), so I define a DB macro for the latter. The bigger issue can come with syntactic things such as how radixes are represented; fortunately I can convince AS to accept $1A2B even for Intel.

Re: What assembly syntax is good to support in an assembler?

Posted: Tue Nov 03, 2020 5:44 am
by dclxvi
cjs wrote:
I regularly run into circumstances where I wish the local labels had a slightly broader scope, such as routines with an additional entry point in the middle.
An approach I've found to work pretty well is:

Code: Select all

MIN
   LDA 1,X
   CMP 0,X
   BCC .2
.1 LDA 0,X
.2 INX
   RTS
MAX
   LDA 1,X
   CMP 0,X
   BCC MIN.1
   INX
   RTS
i.e. there are four global labels: MIN, MIN.1, MIN.2, and MAX. Labels that start with "." (or whatever character indicates a local label) just get treated as though they were appended to the most recent global label.

You can also do stuff like this:

Code: Select all

OUTSPACES.1
.1 JSR OUTSPACE
OUTSPACES
   SEC
   SBC #1
   BCS .1
   RTS
Even though the OUTSPACES label is not defined at that point, it becomes the most recent global at the OUTSPACES.1 line. The .1 before the JSR is optional but I like to be able to look down the left side and see all the local labels, particularly if they are branch destinations. Then the implementation details don't get exposed to the caller, and JSR OUTSPACES is used instead of something like JSR OUTSPACES.ENTRY

Or you might find this more readable:

Code: Select all

_OUTSPACES
.1 JSR OUTSPACE
.2 SEC
   SBC #1
   BCS .1
   RTS
OUTSPACES = _OUTSPACES.2
Speaking of not exposing implementation details, you might prefer that local labels can only be accessed by the global.local syntax if the local label is specifically declared to be global/public, since the vast majority of labels will be referenced as .local or global instead of global.local

Re: What assembly syntax is good to support in an assembler?

Posted: Sun Nov 08, 2020 7:14 pm
by beethead
John West wrote:
cjs wrote:
One of the features I use and value most in the macroassembler AS is named local labels. Any label that starts with a period is considered local to the scope between the two nearest global labels. I find things like JMP .loop and BEQ .exit to be a lot more readable and reliable than using the equivalant + and - local label syntax.
I like that one. For my assembler, I'm toying with the idea of introducing nested scopes, with C-style { } or Pascal-ish .begin and .end. Symbols defined inside a scope are only visible within that scope. But I haven't implemented it yet, and have no experience with how usable (or unusable) it will be. If it doesn't work out, your suggestion is my next best choice. It won't allow nesting, but it's more readable than anything else I've seen.
I might use : instead of . though. Words starting with . are always directives to me.
I implemented the C-style { } code blocks in my own assembler and found it was useful in reading actual source code as opposed to looking at an assembly listing. Using { } can get messy if you put more than one statement on a line which I'm not a fan of.

Code: Select all

{LDX #$00 : LOOP INX : CPX #$40 : BNE LOOP} // Count to 64
This seems more of a feature for C programmers who are accustom to using it rather than .SCOPE/.ENDSCOPE just like using '// comment' rather than '; comment'. C syntax with assembly insructions? Imagine dropping the statement delimiter ':'.

Code: Select all

{LDX #$00 LOOP INX CPX #$40 BNE LOOP} // Count to 64
The local label rule of having scope between two global labels is standard with most assembers but I added a directive setting for how many global labels to go past in searching for a local label. I've also considered adding an extra parameter to a local label reference to temporarily allow searching beyond the limit like JMP .OUTTERLOOP,3 which would search up to 3 global labels before and after the current address.

.Dot directives is also a tradition I've kept in my own assembler as well as following Garth Wilson's suggestion of dropping the '.' but I've noticed a move towards the exclamation point '!' for directives more recently. For listing purposes it might be handy to recognize which prefix or lack of prefix is used to decide which column to align a statement. For example prefixed .BYTE would align with the label field and non-prefixed ORG would align with the operator.

Code: Select all

        ORG $8000
START:
        LDX #$00
.NEXTCHAR:
        LDA HELLOTEXT,X
        BEQ .DONE,2 ; search within 2 global labels
        JSR $FFD2
        INX
        JMP .NEXTCHAR
HELLOTEXT:
.BYTE 'Hello World',0
.DONE:
        RTS
After writing that I just noticed the confusion of directive .BYTE and local label .DONE. I'm more use to using @ to prefix a local label but I can see why '.' is used since the label is a member of START. Or I could change to !BYTE.

Just goes to show it's all about personal preferences.

Re: What assembly syntax is good to support in an assembler?

Posted: Sun Nov 08, 2020 9:08 pm
by GARTHWILSON
beethead wrote:
I implemented the C-style { } code blocks in my own assembler and found it was useful in reading actual source code as opposed to looking at an assembly listing. Using { } can get messy if you put more than one statement on a line which I'm not a fan of.

Code: Select all

{LDX #$00 : LOOP INX : CPX #$40 : BNE LOOP} // Count to 64
This seems more of a feature for C programmers who are accustom to using it rather than .SCOPE/.ENDSCOPE just like using '// comment' rather than '; comment'. C syntax with assembly instructions? Imagine dropping the statement delimiter ':'.

Code: Select all

{LDX #$00 LOOP INX CPX #$40 BNE LOOP} // Count to 64
I don't think I've seen any commercial assemblers that allow putting more than one instruction on a line, except, of course, the many instructions that can be hidden in a macro, and that the assemblers that are often part of Forth kernels allow everything Forth does. I do like to group certain things in lines and columns, for example:

Code: Select all

     CLC
     LDA FOO     ADC BAR     STA FOO      ; Add the low byte of a pair of 3-byte variables,
     LDA FOO+1   ADC BAR+1   STA FOO+1    ; then the mid byte,
     LDA FOO+2   ADC BAR+2   STA FOO+2    ; then the high byte.
It makes the code more compact, and also improves visual factoring. There is probably no need for delimiters like : between instructions, and no need for { and } anywhere.

Re: What assembly syntax is good to support in an assembler?

Posted: Mon Nov 09, 2020 12:51 am
by BillG
GARTHWILSON wrote:
I don't think I've seen any commercial assemblers that allow putting more than one instruction on a line
The CP/M assembler permits it:

Code: Select all

    rrc  !  rrc  !  rrc  !  rrc  ; Shift upper nybble to lower

Re: What assembly syntax is good to support in an assembler?

Posted: Mon Nov 23, 2020 10:14 pm
by johnwbyrd
Before deciding what syntax is "good", you have to decide the purpose of your assembler.

If you are designing an assembler to please yourself, or to engage your own specific creative vision, then it technically doesn't matter what anyone else's opinion is. You can design the syntax that is the most entertaining to you.

From an engineering perspective, this problem is easiest and most fun to solve. When there are no rules, you can write your own rules.

If you are designing an assembler for others, then the 65xx series presents a particular problem for assembler authors. Namely, there are so many variants of 6502 assembly extant, that none of them can reasonably be called standard. All 6502 assemblers follow the MCS6500 Programmers Guide, but beyond that it's a free-for-all. Most 6502 assemblers in practical use, have tons of rules and special cases added to them, especially with regard to macro processing.

Generally, this practice results from the broad assumption that the 65xx is a unique and magical processor somehow, and that traditional assemblers can't describe the 6502's particular flavor of hotness. Therefore, assembler authors would have you believe, the 6502 requires exclusive, processor-specific assembly-language abstractions to unlock its magic. This incorrect but broadly accepted conceit, has created a 6502 Tower of Babel, where you can never be assured of running your particular assembly code on another assembler. And so, dozens of 6502 assembler authors have created mutually incompatible implementations of the same basic concepts.

It's a much HARDER problem to make your assembler capable of reading OTHER assembly language formats. That would require reading and understanding each spec, and probably poring through old assembler source code, when such source is even available out there. And for some reason, no assembler author ever has a taste for this particular problem, which would be much likelier to assure that your new assembler might be used by programmers other than yourself.

The route I've taken with llvm is a compromise. The llvm-mc tool uses GNU assembler format, and moreso than any other assembler in the world, GNU assembler is very much the devil that everyone knows. Both gcc and llvm speak it fluently. However, llvm allows the llvm developer to instance AsmParserVariants, for parsing assembly code written in a variety of different formats. This leaves the door open for llvm to read assembly language written for ca65, xa, as65, etc., in the future. Entertainingly, this means that implementing a particular assembly variant in llvm, means that llvm could convert that assembler variant to GNU assembler format, as a side effect.

Re: What assembly syntax is good to support in an assembler?

Posted: Mon Nov 23, 2020 10:44 pm
by BigDumbDinosaur
johnwbyrd wrote:
If you are designing an assembler for others, then the 65xx series presents a particular problem for assembler authors.

It's only a problem for assembler authors who don't know the 6502 assembly language standard.

Quote:
Namely, there are so many variants of 6502 assembly extant, that none of them can reasonably be called standard. All 6502 assemblers follow the MCS6500 Programmers Guide, but beyond that it's a free-for-all. Most 6502 assemblers in practical use, have tons of rules and special cases added to them, especially with regard to macro processing.

Note that the MOS Technology language standard explicitly states that pseudo-ops and macros are not part of the standard. Nevertheless, many 6502 assemblers have adopted the pseudo-ops that were defined in the MOS Technology reference assembler. The reference assembler at the time when I used it (latter 1970s) did not have a macro language, so there was no reference to be followed.

As for the syntax applied to instruction operands, the standard is unambiguous. # means immediate mode, < means take the LSB, > means take the MSB, etc. The MOS Technology reference assembler recognized %, @ and $ as radix symbols for binary, octal and hexadecimal, respectively. WDC's assembler does the same, with some additional operators to handle 24-bit values. The MOS assembly language "borrowed" its syntax from Motorola's MC6800 assembler, right down to many of the mnemonics being the same and doing essentially the same thing.

Quote:
Generally, this practice results from the assumption that the 65xx is a unique and magical processor somehow, and that traditional assemblers can't describe the 6502's particular holiness.

Actually, no. Much of the Tower of Babel nature of amateur-written 6502 assemblers comes from their authors not knowing the MOS Technology language standard. Such authors are the same ones who often improperly refer to the conversion of source code to object code as "compiling."

Quote:
The route I've taken with llvm is a compromise. The llvm-mc tool uses GNU assembler format, and moreso than any other assembler in the world, GNU assembler is very much the devil that everyone knows.

Except the 6502 assembly language was around long before Richard Stallman developed the GNU concept. Any tools designed to assemble code for the 6502 should be using the MOS Technology standards, not those of an unrelated set of programming tools.