desirable assembler features

kc5tja · Post by **kc5tja** » Wed Nov 25, 2009 6:04 pm

That construct will also work for assemblers which use software-managed virtual memory, aka "random access" files. For example, nothing prevents the above code from failing when using REL files as an intermediate code representation prior to writing out a PRG file.

I know this for a fact because similar problems occur when writing IFF files (especially prior to AmigaOS 2.0, when iff.library didn't exist). In the simplest IFF exporters, you have to serialize data to disk, then seek back and update each chunk's size field afterwards. Crude, but quite effective, and very memory tight. The tradeoff, of course, is that it's relatively slow (up to two seeks per chunk written).

On a completely off-topic tangent (feel free to ignore), I often wonder why files employing HDLC framing (with byte-stuffing, of course) to isolate items haven't caught on. It seems like it'd work very well with sequential file access, which was the norm back in the early to mid-80s. Frame bloat doesn't statistically occur (around 5% seems average based on PPP experience), so while it's a valid concern (especially with files containing a lot of $7E, $7C, and $7D bytes) that can't possibly be a significant reason.

Although, today, COBS (Consistent Overhead Byte Stuffing) might make more sense than HDLC. The disadvantage is that it requires a bit of buffering prior to serialization, but the buffer size is bounded and well-known. It generates a worst-case byte-stuffing overhead somewhere close to 0.5%, even with pathologically constructed data.

BitWise · Post by **BitWise** » Wed Nov 25, 2009 6:14 pm

My assembler uses XML as its object format (so much easier to debug and with modern machines disk space is not really an issue - I was going to compress it on writing but its not really worth it). The code generated for the previous sample looks like this (when the white space is added back in):

Code: Select all

<?xml version="1.0"?>
<module target="65XX" endian="little" name="MOS.obj">
	<section name=".code" addr="0000FFFC" size="2">0004</section>
	<section name=".code" addr="00000400" size="10">A9008D03A09A90000400</section>
</module>

The linker rearranges this, fixes up references, checks for overlaps etc. and produces binary, hex, S19 or what ever (it was designed to be be very easy to plug in a new output format).

When the compiled code is relocatable the output contains the expressions to be evaluated at link time like this.

Code: Select all

<?xml version="1.0"?>
<module target="65XX" endian="little" name="test.obj">
	<section name=".code">EAEAEAEAEA18EAEAEA580148656C6C6F070148656C6C6F0718EAEAEA5802576F726C64<byte>
			<ext>ExtLab</ext>
		</byte>02576F726C64<byte>
			<ext>ExtLab</ext>
		</byte>4B0A4C<word>
			<val sect=".code">45</val>
		</word>4C<word>
			<ext>ExtLab</ext>
		</word>6C<word>
			<val sect=".code">45</val>
		</word>6C<word>
			<ext>ExtLab</ext>
		</word>20<word>
			<val sect=".code">45</val>
		</word>20<word>
			<ext>ExtLab</ext>
		</word>4C<word>
			<val sect=".code">45</val>
		</word>5C<word>
			<and>
				<ext>ExtLab</ext>
				<val>65535</val>
			</and>
		</word>
		<byte>
			<shr>
				<and>
					<ext>ExtLab</ext>
					<val>16711680</val>
				</and>
				<val>16</val>
			</shr>
		</byte>6C<word>
			<val sect=".code">45</val>
		</word>6C<word>
			<ext>ExtLab</ext>
		</word>20<word>
			<val sect=".code">45</val>
		</word>22<word>
			<and>
				<ext>ExtLab</ext>
				<val>65535</val>
			</and>
		</word>
		<byte>
			<shr>
				<and>
					<ext>ExtLab</ext>
					<val>16711680</val>
				</and>
				<val>16</val>
			</shr>
		</byte>69... Big chunk of data removed ...EC</section>
	<section name=".data">0102030B41424348656C6C6F20576F726C640D0A<byte>
			<and>
				<val sect=".data">22</val>
				<val>255</val>
			</and>
		</byte>
		<byte>
			<and>
				<shr>
					<val sect=".data">22</val>
					<val>8</val>
				</shr>
				<val>255</val>
			</and>
		</byte>010002000300<word>
			<val sect=".data">31</val>
		</word>0800010000000200000003000000<word>
			<and>
				<val sect=".data">44</val>
				<val>65535</val>
			</and>
		</word>
		<byte>
			<shr>
				<and>
					<val sect=".data">44</val>
					<val>16711680</val>
				</and>
				<val>16</val>
			</shr>
		</byte>
		<word>
			<and>
				<ext>ExtLab</ext>
				<val>65535</val>
			</and>
		</word>
		<byte>
			<shr>
				<and>
					<ext>ExtLab</ext>
					<val>16711680</val>
				</and>
				<val>16</val>
			</shr>
		</byte>
		<word>
			<and>
				<val sect=".code">45</val>
				<val>65535</val>
			</and>
		</word>
		<byte>
			<shr>
				<and>
					<val sect=".code">45</val>
					<val>16711680</val>
				</and>
				<val>16</val>
			</shr>
		</byte>
	</section>
	<section name=".code" addr="00001005" size="2">9000</section>
	<section name=".code" addr="0000E000" size="265">90... Big chunk of data removed ...E0</section>
	<gbl>GblLab<val sect=".code">45</val>
	</gbl>
</module>

The whole development package was designed as a central core of framework classes for building any assembler, linker or librarian with a derived customised layer that adds knowledge of a particular device family and code syntax.

GARTHWILSON · Post by **GARTHWILSON** » Wed Nov 25, 2009 6:23 pm

Quote:

Completely non-standard. Using your assembler would mean editing many source files in which the directive *=*+n is used.

The $ is the only one I've ever heard of for this; so whether it's original or not, it seems standard to me. Is there an advantage to "*=*" other than that someone used it long ago?

Then later I wrote,

Quote:

Ah yes, I forgot about the * .

I guess I was looking at it cockeyed the first time around, thinking not of the star but of the equals sign flanked by stars, and it looked foreign to me. I have no trouble with the star, although the assemblers I have used would still give the directive above as "ORG *+n". not *=*+n.

Since WDC owns the 6502 intellectual property, I looked up what they have to say about it in their programming manual. Page 366 says,

Quote:

6502 assemblers have been wildly inconsistent in their syntax, and early 65802 assemblers have not set standards either. This book describes syntax recommended by the designers of the 65816, as implemented in the ORCA/M assembler. Others, however, do and will differ.

and it gives a couple of examples, but not about the star. Page 78 says,

Quote:

The assembly syntax used in this book is that recommended by the Western Design Center in their data sheet (see Appendix F [which I don't find --gw]). The assembler actually used is the ProDOS ORCA/M assembler for the Apple // computer, by Byteworks, Inc.. Before learning how to code the 65816, a few details about some of the assembler directives need to be explained.

Full-line comments are indicated by starting the line with an asterisk or a semicolon.

kc5tja · Post by **kc5tja** » Wed Nov 25, 2009 6:23 pm

Wow. This illustrates so perfectly why XML is wholesale unsuitable for representing pretty much anything at all. You claim that disk space is no longer an issue, and that's true. However, it still requires a huge effort to read in and parse that monstrosity. There's a reason why Google has moved away from XML as a data representation format, despite their huge storage and unfathomable network bandwidth accessible to them.

XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.

The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.

BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.

UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!

BitWise · Post by **BitWise** » Wed Nov 25, 2009 6:36 pm

kc5tja wrote:

Wow. This illustrates so perfectly why XML is wholesale unsuitable for representing pretty much anything at all. You claim that disk space is no longer an issue, and that's true. However, it still requires a huge effort to read in and parse that monstrosity. There's a reason why Google has moved away from XML as a data representation format, despite their huge storage and unfathomable network bandwidth accessible to them.

XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.

The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.

BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.

UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!

There are good ways of processing XML and there are bad ways. I work with it every day and I get to see a lot of bad XML code. (I design protocols for transferring complex financial transactions between investment banks).

The code used to marshall and un-marshall these files isn't that big or complicated and if I wanted to make it readable I could write a simple XSLT script (more XML!) and render it in HTML.

kc5tja · Post by **kc5tja** » Wed Nov 25, 2009 7:02 pm

BitWise wrote:

The code used to marshall and un-marshall these files isn't that big or complicated

This might be true if you're using an existing XML library to do this (which if you're using Java, you almost certainly are), so now the question boils down to, "Do you include the complexity of the library in your assessment?"

Quote:

and if I wanted to make it readable I could write a simple XSLT script (more XML!) and render it in HTML.

There's a reason why I won't touch XSLT with a 100m pole. Yes, that's close to 300 feet, give or take 10%. I've had the misfortune of looking at some XSLT before, and it was a life-changing experience. For the worse.

As a point of comparison, I work for a rather large social networking company, where XML, JSON, and other tools are in everyday use as well.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Nov 25, 2009 9:19 pm

kc5tja wrote:

BDD, you're right that *=*-N was an error, primarily because MADS processed source listings sequentially, and wrote out blocks of code using sequential or PRG files (I forgot if it supported tape output, not ever having used a datasette with my C64). These files lacked random-access seek capability, so going backwards in the binary image was effectively forbidden by the DOS. The assembler design reflects this.

MADS wrote SEQuential files that were MOS hex loader code. A separate loader was used to create a binary image, which of course had to be saved from within an M/L monitor. However, someone hacked MADS and got it to write a PRoGram file directly to disk, bypassing the loader. That version quickly made the rounds.

You're right about no random access capability with PRG and SEQ files. If you wanted random access, you had to use RELative files or roll your own with direct disk reads and writes.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Nov 25, 2009 9:31 pm

GARTHWILSON wrote:

Since WDC owns the 6502 intellectual property, I looked up what they have to say about it in their programming manual...

Actually, that isn't "their" manual. The authors wrote it back in the 1980s around the time the Apple ][GS came on the scene. I believe the ORCA/M assembler was a derivative from a Z80 or 8080 assembler and thus didn't (still doesn't) conform to the "traditional" MOS Technology syntax.

Quote:

...and it gives a couple of examples, but not about the star. Page 78 says,

Quote:

The assembly syntax used in this book is that recommended by the Western Design Center in their data sheet (see Appendix F...

...[which I don't find --gw]).

That's because Appendix F never existed in the data sheet.

The assembler recommendations in WDC's data sheet differ from what the ORCA/M assembler recognizes. In that regard, the '816 programming manual available on WDC's site needs a significant overhaul, especially since WDC's own assembler differs in some respects with the manual.

Quote:

Full-line comments are indicated by starting the line with an asterisk or a semicolon.

I'd bet someone completely new to 65xx assembly language would be scratching his head over that one, especially right after he discovered that * in compliant 65xx assemblers refers to the program counter.

teamtempest · Post by **teamtempest** » Thu Nov 26, 2009 5:07 am

kc5tja wrote:

Note that using $ for the current location pseudovariable effectively precludes its use from identifying hexadecimal constants. While a sufficiently complicated parser can be made to detect isolated $ characters from prefix $ operators (absolutely requiring at least an LL-parser to do this), it opens up such errors as $1 being syntactically valid when you meant to type $+1. This is an amazingly common error to make on keyboards scanned at 60Hz (e.g., the C64's keyboard) if you're a fast typer like I am.

I'm not well-versed in parser theory (heck, completely unversed), but I can't say I find it terribly difficult to distinguish '$' used either as a reference to the (pseudo) program counter or as a prefix radix indicator. If a radix indicator it must be immediately followed by legal hex characters, if a program counter reference it must be immediately followed by anything else. I use regular expressions to perform token matching and check for the longer possibility (radix indicator) before the shorter in the sequence of tokens I try to match. Doesn't seem that hard.

You're right that '$+1' mis-typed as '$1' would be mis-interpreted by my assembler. It's hard to see how to prevent that kind of thing in general, though. Like many computer-related things, it's what's actually there, not what you meant to put there, that gets acted upon.

Because I'm agnostic about a lot of these kinds of issues, my assembler also accepts '*' as a reference to the program counter (when the parser is looking for an operand, anyway - '*' means multiplication when it wants an operator). It's actually the form I normally use myself.

And it accepts '*=' as an alias for 'ORG'. I put that in after trying to assemble 'EHBasic' (it was easier than modifying the source). Since I'd seen a lot of code along the lines of '* = *+n' I was actually more worried about the lack of a space before the '=' sign being a hindrance to code portability.

And when I did that, the previous behavior of "if first non-space char of input line is ';' or '*' then comment line" became an issue. So I changed that to "if first non-space char of input line is ';' or char in first column is '*' then comment line". Gotta put in a space or two before any '*=', or avoid the ambiguity altogether by using '$=' (also legal!).

Whew! So many ripple effects from such little changes.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Nov 26, 2009 4:41 pm

teamtempest wrote:

Whew! So many ripple effects from such little changes.

All the more reason to stick with standards that were established long ago. Imagine if someone who purported to be a traffic engineer and wasn't familiar with current practice decided that if a traffic signal displays green that means stop and blue means go. That's what's going on when folks who write roll-your-own assemblers do things like using $ to reference the program counter or * to demarcate the start of a comment.

kc5tja · Post by **kc5tja** » Thu Nov 26, 2009 4:50 pm

The use of "*" to start a comment dates back at least to Motorola's 6800 microprocessor (which predates the 6502). The 6800/6809 assemblers still do to this day. Motorola's official 68000 and ColdFire assemblers do as well, but most people wised up and switched to using ';' like the rest of the world uses.

(Interestingly, their PowerPC assemblers use '#' for comments, presumably because they thought everyone coding for PowerPC would be doing so on some flavor of Unix.)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Nov 27, 2009 4:43 am

kc5tja wrote:

(Interestingly, their PowerPC assemblers use '#' for comments, presumably because they thought everyone coding for PowerPC would be doing so on some flavor of Unix.)

That being the case, you'd have to wonder why the /* comment */ style was chosen by Dennis Ritchie when he was designing C. Evidently he wasn't conversant with the notion of using a semicolon to start a comment, and instead chose it as a statement terminator (which was perpetuated in most dialects of Business BASIC).

kc5tja · Post by **kc5tja** » Fri Nov 27, 2009 6:58 am

He chose /* */ because C's predecessor, B, used them (of course, this might not count, since Ritchie also influenced B). B's use of /* */ was influenced by BCPL's //-style comments (which returned to prominence thanks to C++) and by Pascal's (* *) style comments (used when keyboards lacked { } keys) to implement multi-line comments. In other words, Ritchie got sick of typing // all the time.

Apparently, there is a conspiracy theory that suggests /* */ was used to make IBM mainframe users lives miserable. It seems that typing /* on an IBM terminal would achieve the same effect as typing CTRL-D on Unix.

Moreover, when C was written, the use of "#" for comments appeared primarily only in shell scripts; most other languages at the time used other conventions. It wasn't until awk scripts (and later, Perl, and still later, Python and Ruby) that # became useful as a comment elsewhere in a Unix environment.

The semicolon as a statement separator was already in use by a number of languages at the time as well, including BCPL, ALGOL, and Pascal.

Ahh...computer language archaeology.

BitWise · Post by **BitWise** » Fri Nov 27, 2009 10:21 am

kc5tja wrote:

Apparently, there is a conspiracy theory that suggests /* */ was used to make IBM mainframe users lives miserable. It seems that typing /* on an IBM terminal would achieve the same effect as typing CTRL-D on Unix.

I don't remember having any problems with /* on 3270 terminals for the seven years IBM made me use them for mail, time recording and expenses. Sound like pure conspiracy.

paulrsm · Post by **paulrsm** » Fri Dec 11, 2009 6:08 am

I wrote SALP (Structured Assembly Language Preprocessor) for 6502 code. This let me leverage an existing assembler rather than writing my own. SALP was written in SALP; the first version was tranformed by hand and after that it was used to develop itself. The commands are:

Code: Select all

*! IF <condition> {length} 
*! ELSE {length}
*! ENDIF

*! LOOP
*! AGAIN <condition> {length}
*! WHILE <condition> {length}
*! UNTIL <condition> {length}
*! FOREVER {length}

*! FILE path name

*! CPU 6502|65C02|65802|65816

If your macros are powerful enough, you can use them to create structures. See "StructurE: The Complete Toolkit For Structuring Assembly Language Programs" by Kurt M. Schindler, Logical Solutions, 1989, ISBN 0685269450.

See also the Motorola M68000 Family Resident Structured Assembler Reference Manual (1983) at: http://www.easy68k.com/paulrsm/doc/m68kmasm.txt

desirable assembler features

; or * or #