desirable assembler features
That construct will also work for assemblers which use software-managed virtual memory, aka "random access" files. For example, nothing prevents the above code from failing when using REL files as an intermediate code representation prior to writing out a PRG file.
I know this for a fact because similar problems occur when writing IFF files (especially prior to AmigaOS 2.0, when iff.library didn't exist). In the simplest IFF exporters, you have to serialize data to disk, then seek back and update each chunk's size field afterwards. Crude, but quite effective, and very memory tight. The tradeoff, of course, is that it's relatively slow (up to two seeks per chunk written).
On a completely off-topic tangent (feel free to ignore), I often wonder why files employing HDLC framing (with byte-stuffing, of course) to isolate items haven't caught on. It seems like it'd work very well with sequential file access, which was the norm back in the early to mid-80s. Frame bloat doesn't statistically occur (around 5% seems average based on PPP experience), so while it's a valid concern (especially with files containing a lot of $7E, $7C, and $7D bytes) that can't possibly be a significant reason.
Although, today, COBS (Consistent Overhead Byte Stuffing) might make more sense than HDLC. The disadvantage is that it requires a bit of buffering prior to serialization, but the buffer size is bounded and well-known. It generates a worst-case byte-stuffing overhead somewhere close to 0.5%, even with pathologically constructed data.
I know this for a fact because similar problems occur when writing IFF files (especially prior to AmigaOS 2.0, when iff.library didn't exist). In the simplest IFF exporters, you have to serialize data to disk, then seek back and update each chunk's size field afterwards. Crude, but quite effective, and very memory tight. The tradeoff, of course, is that it's relatively slow (up to two seeks per chunk written).
On a completely off-topic tangent (feel free to ignore), I often wonder why files employing HDLC framing (with byte-stuffing, of course) to isolate items haven't caught on. It seems like it'd work very well with sequential file access, which was the norm back in the early to mid-80s. Frame bloat doesn't statistically occur (around 5% seems average based on PPP experience), so while it's a valid concern (especially with files containing a lot of $7E, $7C, and $7D bytes) that can't possibly be a significant reason.
Although, today, COBS (Consistent Overhead Byte Stuffing) might make more sense than HDLC. The disadvantage is that it requires a bit of buffering prior to serialization, but the buffer size is bounded and well-known. It generates a worst-case byte-stuffing overhead somewhere close to 0.5%, even with pathologically constructed data.
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
My assembler uses XML as its object format (so much easier to debug and with modern machines disk space is not really an issue - I was going to compress it on writing but its not really worth it). The code generated for the previous sample looks like this (when the white space is added back in):
The linker rearranges this, fixes up references, checks for overlaps etc. and produces binary, hex, S19 or what ever (it was designed to be be very easy to plug in a new output format).
When the compiled code is relocatable the output contains the expressions to be evaluated at link time like this.
The whole development package was designed as a central core of framework classes for building any assembler, linker or librarian with a derived customised layer that adds knowledge of a particular device family and code syntax.
Code: Select all
<?xml version="1.0"?>
<module target="65XX" endian="little" name="MOS.obj">
<section name=".code" addr="0000FFFC" size="2">0004</section>
<section name=".code" addr="00000400" size="10">A9008D03A09A90000400</section>
</module>
When the compiled code is relocatable the output contains the expressions to be evaluated at link time like this.
Code: Select all
<?xml version="1.0"?>
<module target="65XX" endian="little" name="test.obj">
<section name=".code">EAEAEAEAEA18EAEAEA580148656C6C6F070148656C6C6F0718EAEAEA5802576F726C64<byte>
<ext>ExtLab</ext>
</byte>02576F726C64<byte>
<ext>ExtLab</ext>
</byte>4B0A4C<word>
<val sect=".code">45</val>
</word>4C<word>
<ext>ExtLab</ext>
</word>6C<word>
<val sect=".code">45</val>
</word>6C<word>
<ext>ExtLab</ext>
</word>20<word>
<val sect=".code">45</val>
</word>20<word>
<ext>ExtLab</ext>
</word>4C<word>
<val sect=".code">45</val>
</word>5C<word>
<and>
<ext>ExtLab</ext>
<val>65535</val>
</and>
</word>
<byte>
<shr>
<and>
<ext>ExtLab</ext>
<val>16711680</val>
</and>
<val>16</val>
</shr>
</byte>6C<word>
<val sect=".code">45</val>
</word>6C<word>
<ext>ExtLab</ext>
</word>20<word>
<val sect=".code">45</val>
</word>22<word>
<and>
<ext>ExtLab</ext>
<val>65535</val>
</and>
</word>
<byte>
<shr>
<and>
<ext>ExtLab</ext>
<val>16711680</val>
</and>
<val>16</val>
</shr>
</byte>69... Big chunk of data removed ...EC</section>
<section name=".data">0102030B41424348656C6C6F20576F726C640D0A<byte>
<and>
<val sect=".data">22</val>
<val>255</val>
</and>
</byte>
<byte>
<and>
<shr>
<val sect=".data">22</val>
<val>8</val>
</shr>
<val>255</val>
</and>
</byte>010002000300<word>
<val sect=".data">31</val>
</word>0800010000000200000003000000<word>
<and>
<val sect=".data">44</val>
<val>65535</val>
</and>
</word>
<byte>
<shr>
<and>
<val sect=".data">44</val>
<val>16711680</val>
</and>
<val>16</val>
</shr>
</byte>
<word>
<and>
<ext>ExtLab</ext>
<val>65535</val>
</and>
</word>
<byte>
<shr>
<and>
<ext>ExtLab</ext>
<val>16711680</val>
</and>
<val>16</val>
</shr>
</byte>
<word>
<and>
<val sect=".code">45</val>
<val>65535</val>
</and>
</word>
<byte>
<shr>
<and>
<val sect=".code">45</val>
<val>16711680</val>
</and>
<val>16</val>
</shr>
</byte>
</section>
<section name=".code" addr="00001005" size="2">9000</section>
<section name=".code" addr="0000E000" size="265">90... Big chunk of data removed ...E0</section>
<gbl>GblLab<val sect=".code">45</val>
</gbl>
</module>
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
Quote:
Completely non-standard. Using your assembler would mean editing many source files in which the directive *=*+n is used.
Quote:
Ah yes, I forgot about the * .
Since WDC owns the 6502 intellectual property, I looked up what they have to say about it in their programming manual. Page 366 says,
Quote:
6502 assemblers have been wildly inconsistent in their syntax, and early 65802 assemblers have not set standards either. This book describes syntax recommended by the designers of the 65816, as implemented in the ORCA/M assembler. Others, however, do and will differ.
Quote:
The assembly syntax used in this book is that recommended by the Western Design Center in their data sheet (see Appendix F [which I don't find --gw]). The assembler actually used is the ProDOS ORCA/M assembler for the Apple // computer, by Byteworks, Inc.. Before learning how to code the 65816, a few details about some of the assembler directives need to be explained.
Full-line comments are indicated by starting the line with an asterisk or a semicolon.
Full-line comments are indicated by starting the line with an asterisk or a semicolon.
Wow. This illustrates so perfectly why XML is wholesale unsuitable for representing pretty much anything at all. You claim that disk space is no longer an issue, and that's true. However, it still requires a huge effort to read in and parse that monstrosity. There's a reason why Google has moved away from XML as a data representation format, despite their huge storage and unfathomable network bandwidth accessible to them.
XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.
The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.
BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.
UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!
XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.
The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.
BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.
UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
kc5tja wrote:
Wow. This illustrates so perfectly why XML is wholesale unsuitable for representing pretty much anything at all. You claim that disk space is no longer an issue, and that's true. However, it still requires a huge effort to read in and parse that monstrosity. There's a reason why Google has moved away from XML as a data representation format, despite their huge storage and unfathomable network bandwidth accessible to them.
XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.
The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.
BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.
UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!
XML is best applied when you have data with an unknown set of attributes for each datum, and where it's usually a sparse description. (And, even then, ProtoBuf and IFF files both accomplish the same thing in a purely binary format, with substantially tighter encoding and easier parsing.) Binary file formats definitely do not qualify for this class of file.
The debugability of XML is undebatable, relying on the "power of plain text." However, I'd never want to use this in any production environment. I would much rather have tools which took a binary format, expanded it into a text file editable form, then if I make changes, to recompile the representation back into binary form.
BTW, one nasty side effect of the XML posted above is that it causes the forum to render the page so wide as to be borderline unusable to me.
UPDATE: It looks like the large quantity of binary data that resulted in horrible formatting has been removed. Many thanks!
The code used to marshall and un-marshall these files isn't that big or complicated and if I wanted to make it readable I could write a simple XSLT script (more XML!) and render it in HTML.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
BitWise wrote:
The code used to marshall and un-marshall these files isn't that big or complicated
Quote:
and if I wanted to make it readable I could write a simple XSLT script (more XML!) and render it in HTML.
As a point of comparison, I work for a rather large social networking company, where XML, JSON, and other tools are in everyday use as well.
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
kc5tja wrote:
BDD, you're right that *=*-N was an error, primarily because MADS processed source listings sequentially, and wrote out blocks of code using sequential or PRG files (I forgot if it supported tape output, not ever having used a datasette with my C64). These files lacked random-access seek capability, so going backwards in the binary image was effectively forbidden by the DOS. The assembler design reflects this.
You're right about no random access capability with PRG and SEQ files. If you wanted random access, you had to use RELative files or roll your own with direct disk reads and writes.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
GARTHWILSON wrote:
Since WDC owns the 6502 intellectual property, I looked up what they have to say about it in their programming manual...
Quote:
...and it gives a couple of examples, but not about the star. Page 78 says,
...[which I don't find --gw]).
Quote:
The assembly syntax used in this book is that recommended by the Western Design Center in their data sheet (see Appendix F...
Quote:
Full-line comments are indicated by starting the line with an asterisk or a semicolon.
x86? We ain't got no x86. We don't NEED no stinking x86!
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
kc5tja wrote:
Note that using $ for the current location pseudovariable effectively precludes its use from identifying hexadecimal constants. While a sufficiently complicated parser can be made to detect isolated $ characters from prefix $ operators (absolutely requiring at least an LL-parser to do this), it opens up such errors as $1 being syntactically valid when you meant to type $+1. This is an amazingly common error to make on keyboards scanned at 60Hz (e.g., the C64's keyboard) if you're a fast typer like I am.
You're right that '$+1' mis-typed as '$1' would be mis-interpreted by my assembler. It's hard to see how to prevent that kind of thing in general, though. Like many computer-related things, it's what's actually there, not what you meant to put there, that gets acted upon.
Because I'm agnostic about a lot of these kinds of issues, my assembler also accepts '*' as a reference to the program counter (when the parser is looking for an operand, anyway - '*' means multiplication when it wants an operator). It's actually the form I normally use myself.
And it accepts '*=' as an alias for 'ORG'. I put that in after trying to assemble 'EHBasic' (it was easier than modifying the source). Since I'd seen a lot of code along the lines of '* = *+n' I was actually more worried about the lack of a space before the '=' sign being a hindrance to code portability.
And when I did that, the previous behavior of "if first non-space char of input line is ';' or '*' then comment line" became an issue. So I changed that to "if first non-space char of input line is ';' or char in first column is '*' then comment line". Gotta put in a space or two before any '*=', or avoid the ambiguity altogether by using '$=' (also legal!).
Whew! So many ripple effects from such little changes.
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
teamtempest wrote:
Whew! So many ripple effects from such little changes.
x86? We ain't got no x86. We don't NEED no stinking x86!
The use of "*" to start a comment dates back at least to Motorola's 6800 microprocessor (which predates the 6502). The 6800/6809 assemblers still do to this day. Motorola's official 68000 and ColdFire assemblers do as well, but most people wised up and switched to using ';' like the rest of the world uses.
(Interestingly, their PowerPC assemblers use '#' for comments, presumably because they thought everyone coding for PowerPC would be doing so on some flavor of Unix.)
(Interestingly, their PowerPC assemblers use '#' for comments, presumably because they thought everyone coding for PowerPC would be doing so on some flavor of Unix.)
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
; or * or #
kc5tja wrote:
(Interestingly, their PowerPC assemblers use '#' for comments, presumably because they thought everyone coding for PowerPC would be doing so on some flavor of Unix.)
x86? We ain't got no x86. We don't NEED no stinking x86!
He chose /* */ because C's predecessor, B, used them (of course, this might not count, since Ritchie also influenced B). B's use of /* */ was influenced by BCPL's //-style comments (which returned to prominence thanks to C++) and by Pascal's (* *) style comments (used when keyboards lacked { } keys) to implement multi-line comments. In other words, Ritchie got sick of typing // all the time.
Apparently, there is a conspiracy theory that suggests /* */ was used to make IBM mainframe users lives miserable. It seems that typing /* on an IBM terminal would achieve the same effect as typing CTRL-D on Unix.
Moreover, when C was written, the use of "#" for comments appeared primarily only in shell scripts; most other languages at the time used other conventions. It wasn't until awk scripts (and later, Perl, and still later, Python and Ruby) that # became useful as a comment elsewhere in a Unix environment.
The semicolon as a statement separator was already in use by a number of languages at the time as well, including BCPL, ALGOL, and Pascal.
Ahh...computer language archaeology.
Apparently, there is a conspiracy theory that suggests /* */ was used to make IBM mainframe users lives miserable. It seems that typing /* on an IBM terminal would achieve the same effect as typing CTRL-D on Unix.
Moreover, when C was written, the use of "#" for comments appeared primarily only in shell scripts; most other languages at the time used other conventions. It wasn't until awk scripts (and later, Perl, and still later, Python and Ruby) that # became useful as a comment elsewhere in a Unix environment.
The semicolon as a statement separator was already in use by a number of languages at the time as well, including BCPL, ALGOL, and Pascal.
Ahh...computer language archaeology.
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
kc5tja wrote:
Apparently, there is a conspiracy theory that suggests /* */ was used to make IBM mainframe users lives miserable. It seems that typing /* on an IBM terminal would achieve the same effect as typing CTRL-D on Unix.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
I wrote SALP (Structured Assembly Language Preprocessor) for 6502 code. This let me leverage an existing assembler rather than writing my own. SALP was written in SALP; the first version was tranformed by hand and after that it was used to develop itself. The commands are:
If your macros are powerful enough, you can use them to create structures. See "StructurE: The Complete Toolkit For Structuring Assembly Language Programs" by Kurt M. Schindler, Logical Solutions, 1989, ISBN 0685269450.
See also the Motorola M68000 Family Resident Structured Assembler Reference Manual (1983) at: http://www.easy68k.com/paulrsm/doc/m68kmasm.txt
Code: Select all
*! IF <condition> {length}
*! ELSE {length}
*! ENDIF
*! LOOP
*! AGAIN <condition> {length}
*! WHILE <condition> {length}
*! UNTIL <condition> {length}
*! FOREVER {length}
*! FILE path name
*! CPU 6502|65C02|65802|65816If your macros are powerful enough, you can use them to create structures. See "StructurE: The Complete Toolkit For Structuring Assembly Language Programs" by Kurt M. Schindler, Logical Solutions, 1989, ISBN 0685269450.
See also the Motorola M68000 Family Resident Structured Assembler Reference Manual (1983) at: http://www.easy68k.com/paulrsm/doc/m68kmasm.txt