I finally got around to finishing the assembler part of the machine language monitor. My original plan was to adapt an older assembler I had written for the 65C02. However, due to the substantially greater number of addressing modes available with the '816, as well as the possibility of 16 bit immediate mode operands, the 'C02 code proved to be inadequate. It turned out that scratch-writing a new assembler was less time-consuming than trying to rework the older code.
The core of the assembly process is a set of data tables to translate mnemonics and addressing mode symbology to actual machine instructions and vice versa. Two of the tables are compressed mnemonics in opcode order (i.e., the first entry in each table corresponds to BRK). This is the old "compress three ASCII characters into 15 bits" trick, so that BRK becomes 1C D8 (big endian order), with the 1C value being byte zero in one table and the D8 value being byte zero in the second table. Here's a subset of the two tables:
Code:
; encoded mnemonics MSB...
;
mnetabhb .byte $1c,$84,$24,$84,$ad,$84,$15,$84,$8a,$84,$15,$8a,$ad,$84,$15,$84
.byte $1c,$84,$84,$84,$ac,$84,$15,$84,$23,$84,$53,$a9,$ac,$84,$15,$84
.byte $5d,$13,$5d,$13,$1a,$13,$9c,$13,$8b,$13,$9c,$8b,$1a,$13,$9c,$13
;
;
; encoded mnemonics LSB...
;
mnetablb .byte $d8,$c4,$22,$c4,$06,$c4,$1a,$c4,$62,$c4,$1a,$4a,$06,$c4,$1a,$c4
.byte $5a,$c4,$c4,$c4,$c6,$c4,$1a,$c4,$48,$c4,$c8,$28,$c6,$c4,$1a,$c4
.byte $26,$ca,$1a,$ca,$aa,$ca,$1a,$ca,$62,$ca,$1a,$4a,$aa,$ca,$1a,$ca
A third table, also arranged in opcode order, contains addressing mode and instruction size data, with a one-to-one correspondence to the mnemonic tables. I used the following format for that table:
Code:
xxxxxxxx
||||||||
|||+++++---> addressing mode
|++--------> operand size (0-3)
+----------> 1: accept 8 or 16 bit operand
Additional tables point to the symbology that is recognized by the assembler, for example "(),X" or "(,S),Y". The index into these tables is derived from the addressing mode bits of the above.
If an immediate mode instruction's addressing mode data has bit 7 set, the assembler will assemble a 16 bit operand if entered. For example, LDA #$34 would assemble as A9 34, as would be expected. LDA #$1234 will assemble as A9 34 12. However, LDA #$0034 will assemble as A9 34, even though entered in 16 bit format. If a 16 bit operand is desired even though the MSB is zero, it can be forced by coding LDA ^#$34 or LDA ^#$0034, either of which will assemble as A9 34 00. Immediate mode instructions such as REP and SEP do not have bit 7 set in their addressing mode data, so an instruction such as SEP #$1234 will cause an error.
It's entirely possible that a more efficient method exists to do the translation, but this does work and the data tables in all consume 816 bytes. The disassembler uses the same set of tables to do its work. The assembler code itself consumes 355 bytes, not counting support subroutines, all of which are used in more than one place in the monitor.