Which assembler could I possibly use ?

jgharston · Post by **jgharston** » Thu Aug 07, 2025 3:57 pm

Virtual1 wrote:

GARTHWILSON wrote:

I have a list of feature requests for if someone writes an assembler, here.

Hilariously, I think my Excel assembler breaks every single one of your requests

To amend to the list, I'd add:
Make sure a comma or a semicolon in a quoted string doesn't fool the assembler into thinking you're going on to the next parameter in a line <b>or a comment</i>

In my PDP11 assembler, evaluating things like EQUS "Thu,12 Apr; Birthday",0 ; default date correctly caused the fiddliest code.

GARTHWILSON · Post by **GARTHWILSON** » Thu Aug 07, 2025 4:43 pm

jgharston wrote:

To amend to the list, I'd add:
Make sure a comma or a semicolon in a quoted string doesn't fool the assembler into thinking you're going on to the next parameter in a line <b>or a comment</i>

Excellent point. I added it.

Virtual1 · Post by **Virtual1** » Sun Aug 17, 2025 5:23 pm

BigEd wrote:

I'm a bit late to the party, but welcome Virtual1, well done in constructing and thanks for sharing your unorthodox assembler.

Now that I've been working with it for awhile (and with an Apple IIe emulator instead of Apple ][ ) I've made some improvements to it
- labels can have simple additions and subtractions on them
- added option to make object generation address different than assembly addresses (useful for making self-relocating code)
- added "compressed" compile output (8 bytes per line) for faster uploading (remember to add "end" below your assembly so you get the last line)
- in addition to the conditional formatting that points out line errors, NO compressed compile will be generated if there are any errors anywhere in the program
- in addition to the 8 and 16 bit private labels you can define independently on each sheet/program, you can add global 8 and 16 bit labels to the 6502 sheet
- made a few minor bug fixes

I've been using the one workbook for several programs. To add anther program, just copy an existing sheet and delete out the code/comments/labels.
DO NOT delete or insert lines, or select wide areas and delete, since there are MANY hidden columns containing important formulas you may accidentally delete.

I realize not everyone is using the 6502 on the Apple II. It should work equally well on any other platform though, just customize the global labels to your platform.

I've expanded the sheets to around 1,000 rows, which is good enough for me. There's no reason they can't be longer though. Excel might slow down a bit but should be able to handle it.

Feel free to review and critique the included programs. I've been "out of circulation" for decades and am both uneducated AND rusty, and would love to hear any comments you have.

I'm specifically looking for comments on CHECKSUMMER.BIN. The only checksummers I ever recall running across in the (distant!) past were simple 8-bit summers and XOR'ers. My 16 bit checksum/digest code looks good, but it's a bit like crypto - it's incredibly easy to think you got it right but be very mistaken.

I'm also looking for new 6502 challenges to stretch my legs on. If you have something you'd like to see written (like HEXDUMP.BIN) or have a binary program you'd like to see disassembled and properly labeled and commented (like UNCRUNCH.BIN) just let me know.

cjs · Post by **cjs** » Thu Sep 04, 2025 11:29 am

Virtual1 wrote:

...as assembling by hand is a big pain.

Oh, yes. Don't do that. The key is just to forget about assembly language entirely and when you want to, e.g., return from a subroutine, just think "$60" and deposit that directly. Doing it this way removes so much hassle. :-)

Quote:

I wrote an assembler... in Excel.

Excellent. Excel is, after all, what we functional programmers call a "zeroth-order functional programming language." :-) So, functional programming for the win?

GARTHWILSON wrote:

I have a list of feature requests for if someone writes an assembler, here.

I think that ASL satisfies most of these. The major exception is, "Do not require labels to start in column 1." Unfortunately that one leaves you stuck in several ways. In theory if you're supposed to put a colon after every name that defines a symbol, you need to write `foo: equ 3`, which no assemblers do. So you end up with needing the colon sometimes, not needing it sometimes, depending. Which then brings in further problems, such as ` lda equ 3`. Is `lda` a symbol, or is it an instruction? So maybe you disallow any symbol names that are also instruction names? Beyond the mess that's already introducing, what about macros? Maybe you also disallow symbol names that conflict with macro names? This is just a start on the issues that can happen.

I agree with most of the other stuff, and one of the things I love about ASL is that it's syntax is flexible enough, and has enough options, that I can assemble code from almost any other assembler with minimal to no changes.

BigDumbDinosaur wrote:

The late Jon Postel once said something to the effect of “be liberal in what you accept, and conservative in what you emit.”

Yup. Which sounded like a good idea at the time (and still sometimes is), but in many cases it turned out to be a terrible idea, not just making it harder to write servers that work with common clients (some of which were rather too liberable about what they sent, but never got fixed because "it worked," causing the rest of the world to have to support that forever), but has even introduced security bugs that would never have appeared had servers been very conservative about what they would accept.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Sep 04, 2025 4:00 pm

cjs wrote:

BigDumbDinosaur wrote:

The late Jon Postel once said something to the effect of “be liberal in what you accept, and conservative in what you emit.”

Yup. Which sounded like a good idea at the time (and still sometimes is), but in many cases it turned out to be a terrible idea, not just making it harder to write servers that work with common clients (some of which were rather too liberable about what they sent, but never got fixed because "it worked," causing the rest of the world to have to support that forever), but has even introduced security bugs that would never have appeared had servers been very conservative about what they would accept.

The classic example of what you describe is Sendmail.

Eric Allman got the idea that Sendmail should be extra-tolerant of E-mail address forms, which led to massive complication in parsing addresses...which (surprise!) has led to a history of security problems. The RFCs governing SMTP actually were fairly conservative in defining what constitutes an acceptable E-mail address, but Allman apparently caved into pressure to relax his parsing algorithm so all sorts of cruft could be processed.

GARTHWILSON · Post by **GARTHWILSON** » Thu Sep 04, 2025 9:51 pm

cjs wrote:

The major exception is, "Do not require labels to start in column 1." Unfortunately that one leaves you stuck in several ways. In theory if you're supposed to put a colon after every name that defines a symbol, you need to write `foo: equ 3`, which no assemblers do. So you end up with needing the colon sometimes, not needing it sometimes, depending. Which then brings in further problems, such as ` lda equ 3`. Is `lda` a symbol, or is it an instruction? So maybe you disallow any symbol names that are also instruction names? Beyond the mess that's already introducing, what about macros? Maybe you also disallow symbol names that conflict with macro names? This is just a start on the issues that can happen.

I'll have to see what I can do to make it more clear. Anything that starts in column 1 (other than a semicolon) should be considered a label, regardless of whether it's followed by a colon or not; but you should still use a colon to make searches easier. A label not starting in column 1 should definitely have the colon. Without it, the assemblers I've used would give an error message saying your "foo" is not defined. Regardless, the colon always means a symbol is being defined; so there's no ambiguity there.

Virtual1 · Post by **Virtual1** » Thu Sep 04, 2025 10:44 pm

GARTHWILSON wrote:

I'll have to see what I can do to make it more clear. Anything that starts in column 1 (other than a semicolon) should be considered a label, regardless of whether it's followed by a colon or not; but you should still use a colon to make searches easier. A label not starting in column 1 should definitely have the colon. Without it, the assemblers I've used would give an error message saying your "foo" is not defined. Regardless, the colon always means a symbol is being defined; so there's no ambiguity there.

In mine I've made anything in column 1 that starts with a $ is an address (ORG) and anything else is a label

barnacle · Post by **barnacle** » Fri Sep 05, 2025 5:55 am

Not a criticism, just an observation...

https://xkcd.com/927/

I think the best we can hope for in writing something like an assembler for a processor this old is to do it the way we like, ideally which can either accept other standards directly or accept them via bulk text replacement, and see if it catches on.

Remember that in most cases we're making changes beyond the original 1975 design to provide something useful for _us_; it it's useful for someone else that's a bonus.

Neil

cjs · Post by **cjs** » Fri Sep 05, 2025 5:59 am

GARTHWILSON wrote:

Anything that starts in column 1 (other than a semicolon) should be considered a label, regardless of whether it's followed by a colon or not; but you should still use a colon to make searches easier. A label not starting in column 1 should definitely have the colon.

Ah, that tokens starting in column 1 are always labels was what was not clear. With that further restriction, yes, I think your suggestion makes sense.

And I just did a quick check on ASL and it turns out it works exactly the way you suggest. Anything starting in column 1 is a label (whether with or without a suffixed colon suffix), and you can start a label in any column if you append a colon.

BTW, I don't find the "add colon for convenience in searching" thing all that convincing, since you can also simply search for the label anchored to the start of the line (i.e., type /^foo instead of /foo), but obviously this would be different for those that do use indented labels.

GARTHWILSON · Post by **GARTHWILSON** » Fri Sep 05, 2025 6:13 am

cjs wrote:

BTW, I don't find the "add colon for convenience in searching" thing all that convincing, since you can also simply search for the label anchored to the start of the line (i.e., type /^foo instead of /foo), but obviously this would be different for those that do use indented labels.

I just tried it in my MultiEdit professional programmers' text editor, and it didn't work. But yes, if your editor has a way to specify that you only want results that start in column 1, then you wouldn't need the colon (if the label starts in column 1). Maybe MultiEdit has a way to do it that I have not discovered yet.

cjs · Post by **cjs** » Fri Sep 05, 2025 7:16 am

GARTHWILSON wrote:

I just tried it in my MultiEdit professional programmers' text editor, and it didn't work.

Well, given that the commands I gave were for vi, I wouldn't expect it always to work unchanged in other text editors. But I had a quick look at Multi-Edit's Wikipedia page, and it looks reasonably powerful, so I would be surprised if it couldn't search for typical things such as start/end of line and start/end of word. (In fact, I'd assume that when you searched for label bar, you were searching for it using a word boundary so it doesn't match foobar:.

I'd imagine that there are also ways to search specifically for labels; Vim offers gd and gD to search for a local or global definition matching the label under the cursor. (Which I need to get around to redefining for assembly syntax when I'm editing assembly language files.) But now that I have your idea on how labels should probably be done, I know better what I need to search for.

teamtempest · Post by **teamtempest** » Fri Sep 05, 2025 3:09 pm

I think there may be at least one other way to allow labels with or without a colon suffix to unambiguously start anywhere at all on a line without being mistaken for a pseudo opcode, macro or mnemonic.

My assembler simply checks the first text field on an input line to see what it is. Is it an already known pseudo opcode, macro name or processor mnemonic? If it's none of those, it must be a label.

A colon suffix is optional and is generally ignored. However, if for some reason a label must have the same name as an existing processor mnemonic, a colon suffix will force the assembler to recognize it as a label.

The current version of my assembler does fail to meet Garth's case sensitivity requirement. It doesn't care about the case used in any input file, but internally all names are converted to upper case. That's my preferred way of avoiding having to worry about being picky to match case. But I'm thinking that maybe I'll put a flag in the next version to make case sensitivity an option (not the default, though!).

As for comment lines, of course a semi-colon in the first column signals a comment line. But so does a '#', '*' or a double slash '//' starting in the first column. Actually, any line whose first non-whitespace characters are either ';' or '//' is a comment line. '#' and '*' have to be followed by whitespace to be recognized as starting a comment line if not in the first column.

The assembler does accept '*=' as a pseudo opcode to assign a value to the program counter, for those used to that convention. Just not in the first column of a line.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Sep 05, 2025 5:26 pm

cjs wrote:

BTW, I don't find the "add colon for convenience in searching" thing all that convincing, since you can also simply search for the label anchored to the start of the line...

In my code, each function has a brief comment block that is the function’s name, purpose, calling convention, e.g.:

Code: Select all

;===============================================================================
;
;pbrget: LOAD PARTITION BOOT RECORD
;
;	———————————————————————————————————————————————————————————————————————
;	Synopsis: This function loads a partition’s PBR.
;
;	          § All parameters are 32-bit, little-endian pointers to data.
;	            Pointers cannpt be null — there is no check for this.

...etc...

The function name (pbrget, in this case) is followed by a colon so I can find the function in the assembly listing without having the search bring up JSR pbrget instructions. However, the label does not have the colon when used in the code proper—a colon is superfluous in the Kowalski assembler, e.g.:

Code: Select all

pbrget   clc                   ;assume no error (entry point)
;
;—————————————————————————————————————————————————————————
;
;LOCAL STACK DEFINITIONS
;
.sfbase  .set 0                ;base stack index
.sfidx   .set .sfbase          ;workspace index

...etc...

Every assembler I have ever used has required that labels and symbols start at column 1, and that instructions, equates, etc., start at column 2 or later. I have never viewed those requirements as inconvenient, as I am a bit OCD when it comes to how I organize my source code.

Incidentally, I never leave lines blank in my source files—otherwise blank lines get a semicolon, as in the above examples. The purpose of doing so is to make it clear to me that nothing is missing from the file. If I see a blank line, I am immediately left wondering if I accidentally deleted something that was supposed to stay.

gilhad · Post by **gilhad** » Fri Sep 05, 2025 7:00 pm

I am sometimes indenting blocks of code, if that part is of "and now something totally different" nature, and in that case I also indent local labels in the part.

Last time it was inside an interrupt for drawing VGA lines, where timing was critical and each tick was carefully counted and balanced, but in one place I compute the number of the current visible line and if it is larger than 200, I leave the drawing part, and if it is exactly 200 I execute some hook and then leave.

So the whole part, after the first comparison is done, is visually indented, as the timing is NOT critical here and should NOT be done in case of changes.

cjs · Post by **cjs** » Fri Sep 05, 2025 7:29 pm

BigDumbDinosaur wrote:

Incidentally, I never leave lines blank in my source files—otherwise blank lines get a semicolon, as in the above examples. The purpose of doing so is to make it clear to me that nothing is missing from the file.

If you accidentally deleted something and yet everything still builds and the automated tests pass, it can't have been all that important, right? :-) (I do admit that others are probably not as keen on testing as I am, and don't have the thousand or so unit tests that I have for my personal pile of 8-bit code.)

But even if it was something completely untested and unused, you'll easily see the deletion when you review the Git commit.

I find that the blank lines are very useful because you can use paragraph moves ({ and } in Vim) to easily move forward and back routine by routine.

Quote:

In my code, each function has a brief comment block that is the function’s name, purpose, calling convention...

Which reminds me: one thing I would add to Garth's list would be to accept UTF-8 input, at least in comments. (Ideally, accept non-ASCII letters in symbol names as well, as ASL does.) I use this per-routine comment block thing as well, but in the interest of being concise (which makes code more readable to those who understand the conventions), I use various symbols to indicate register usage: ♠ for input/ouput registers (and locations), ♣ for registers destroyed, and ♡ for registers preserved. E.g.:

Code: Select all

;---------------------------------------------------------------------
;   ♣AY,pristrP ♡X pristr: print inline string
;
;   Using `prchar`, print the $00-terminated string from memory immediately
;   after the call to this subroutine. (This requires a JSR.) There is no
;   limit on the size of the string.
;
;   This requires one word of zero-page storage, pristrP, for only the
;   duration of each call to this routine. It may destroy additional
;   registers from the `prchar` call.
;
;   This is adapted from the versions by Ross Archer and Mike Barry
;   from http://6502.org/source/io/primm.htm

pristr      pla                 ; LSB of return addr/string start -1
            sta  pristrP
            ...

This is especially useful to keep in-line comments short:

Code: Select all

prcharB     equ  $01B1  ; ♠B ♣A ♡* Send character in B to SIOA.
                        ; Sent character is left in A.

I guess the Kowalski assembler does accept Unicode, since I see you use em-dashes (\u2014) in your header comments.

Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?

Re: Which assembler could I possibly use ?