whartung wrote:
ElEctric_EyE wrote:
Yes, thanks BigEd and Charles R. Bond..
Excellent write-up by Charles on his CBA65. Hashing mnemonics is the most interesting chapter in the .pdf. I'll read this .pdf many times as I would like to eventually tackle an assembler/disassembler for the 65Org16.b. There are many clues here.
I read it, it's a novel idea.
I also read up on it and concluded it's mostly wasted effort. He (C.R. Bond) has to maintain an ASCII list of the mnemonics in a table, as well as the hash table, plus the hash code itself. After having generated the hash from the presumed mnemonic, he still has to cross-compare it with the ASCII list—albeit a single comparison, since it is conceivable a seemingly-valid hash could be produced from an invalid mnemonic.
When all is said and done, all that has been accomplished is determining whether a mnemonic in the source code is a valid one. His technique doesn't address (!) the requirement of associating that mnemonic with a valid addressing mode and the number of bytes that must be present in the operand, if an operand is required. As many mnemonics can be used with multiple addressing modes, further resolution is required at assembly time to determine if the entered instruction can, in fact, be assembled as written.
Furthermore, while his "perfect" hashing does work for the NMOS 6502 it doesn't directly apply to the CMOS derivatives, which map some previously invalid opcodes onto new instructions or onto existing instructions with new addressing modes. The result is that there isn't a consistent pattern to the relationships between mnemonics and the corresponding opcodes like there was in the NMOS 6502 parts. This is especially the case with the 65C816, which unlike its 8 bit cousins, has no invalid opcodes, has instructions stashed in "odd places," and therefore requires a different method of resolving mnemonics and corresponding addressing modes and operands.
As I earlier said, I used tables (totaling 850 bytes) in my POC's M/L monitor to assemble and disassemble machine instructions, as I was not able to develop any hashing method for the '816 that would work with consistency. Three of the tables are 256 bytes each, corresponding to the 256 possible opcodes of the '816 (i.e., the first entry in each table corresponds to the
BRK instruction and the last entry corresponds to an
SBC $BBHHLL instruction, where
BBHHLL is a 24 bit address), two containing the actual mnemonics in a binary format that produces a unique 16 bit encoding for each mnemonic. I separated the LSBs (least significant byte) and MSBs (most significant byte) into two tables so a simple technique can be used to search for a mnemonic. A short excerpt of the two mnemonic tables follows (taken directly from the POC's BIOS ROM assembly listing):
Code:
10222 ; encoded W65C816S instruction mnemonics MSB...
10223 ;
10224 mnetabhb .byte >mne_brk ; $00 BRK
10225 .byte >mne_ora ; $01 ORA (dp,X)
10226 .byte >mne_cop ; $02 COP
10227 .byte >mne_ora ; $03 ORA dp,S
10228 .byte >mne_tsb ; $04 TSB dp
10229 .byte >mne_ora ; $05 ORA dp
10230 .byte >mne_asl ; $06 ASL dp
10231 .byte >mne_ora ; $07 ORA [dp]
...
10497 ; encoded W65C816S instruction mnemonics LSB...
10498 ;
10499 mnetablb .byte <mne_brk ; $00 BRK
10500 .byte <mne_ora ; $01 ORA (dp,X)
10501 .byte <mne_cop ; $02 COP
10502 .byte <mne_ora ; $03 ORA dp,S
10503 .byte <mne_tsb ; $04 TSB dp
10504 .byte <mne_ora ; $05 ORA dp
10505 .byte <mne_asl ; $06 ASL dp
10506 .byte <mne_ora ; $07 ORA [dp]
...
The bytes corresponding to the symbolic names (e.g., MNE_ASL) shown above are as follows:
Code:
1040 6D04 mne_asl =$6d04 ;ASL
1049 64C6 mne_brk =$64c6 ;BRK
1058 8C08 mne_cop =$8c08 ;COP
1079 14E0 mne_ora =$14e0 ;ORA
1118 1D2A mne_tsb =$1d2a ;TSB
The encoding method takes advantage of the fact that all 65xx mnemonics are comprised of three alpha characters and that, disregarding case, any character in the Roman alphabet can be encoded in only 5 bits. Hence it is possible to represent each mnemonic with a unique 15 bit value, which can be generated with the following code (again, taken directly from the assembly listing for the POC's ROM):
Code:
5929 EA2F 64 53 stz mnepck ;clear encoded...
5930 EA31 64 54 stz mnepck+s_byte ;mnemonic workspace (s_byte = 1)
5931 ;
5932 ;
5933 ; encode mnemonic...
5934 ;
5935 EA33 A0 03 ldy #s_mnemon ;chars needed for mnemonic
5936 ;
5937 EA35 20 ED F2 .0000040 jsr getcharw ;get a char from buffer, strip whitespace
5938 EA38 D0 0A bne .0000060 ;gotten
5939 ;
5940 EA3A C0 03 cpy #s_mnemon ;any input at all? (s_mnemon = mnemonic size)
5941 EA3C 90 03 bcc .0000050 ;yes
5942 ;
5943 EA3E 4C 55 E9 jmp monce ;no, abort assembly
5944 ;
5945 EA41 4C C1 EB .0000050 jmp monasc10 ;incomplete mnemonic — error
5946 ;
5947 EA44 38 .0000060 sec
5948 EA45 E9 3F sbc #a_mnecvt ;ASCII to binary factor (a question mark)
5949 EA47 A2 05 ldx #n_shfenc ;shifts required to encode (5 bits to a char)
5950 ;
5951 EA49 4A .0000070 lsr a ;shift out a bit...
5952 EA4A 66 54 ror mnepck+s_byte ;into...
5953 EA4C 66 53 ror mnepck ;encoded mnemonic
5954 EA4E CA dex
5955 EA4F D0 F8 bne .0000070 ;next bit
5956 ;
5957 EA51 88 dey
5958 EA52 D0 E1 bne .0000040 ;get next char
The result, which is left in the location
MNEPCK, is a 16 bit value. The conversion is not case-sensitive, of course. A simple loop and compare function can then be used to see if the encoded mnemonic is in the tables.
It is possible, using the same tables, to diassemble an instruction into the corresponding ASCII mnemonic. The first code fragment uses the opcode to get the encoded form of the mnemonic:
Code:
8722 F3EC A6 55 ldx opcode ;instruction opcode is mnemonic table index
8723 ;
8724 ;
8725 ; decode mnemonic & addressing info...
8726 ;
8727 F3EE BD E1 F9 lda mnetablb,x ;packed mnemonic LSB
8728 F3F1 85 53 sta mnepck ;working storage LSB
8729 F3F3 BD E1 F8 lda mnetabhb,x ;packed mnemonic MSB
8730 F3F6 85 54 sta mnepck+s_byte ;working storage MSB
...
With the encoded mnemonic stored at MNEPCK, simple code converts it back to ASCII and displays it:
Code:
8803 ; display mnemonic...
8804 ;
8805 F457 A0 03 ldy #s_mnemon ;size of ASCII mnemonic
8806 ;
8807 F459 A9 00 .0000040 lda #0 ;initialize char
8808 F45B A2 05 ldx #n_shfenc ;shifts to execute
8809 ;
8810 F45D 06 53 .0000050 asl mnepck ;shift encoded mnemonic
8811 F45F 26 54 rol mnepck+s_byte
8812 F461 2A rol a
8813 F462 CA dex
8814 F463 D0 F8 bne .0000050
8815 ;
8816 F465 69 3F adc #a_mnecvt ;convert to ASCII &...
8817 F467 48 pha ;stash
8818 F468 88 dey
8819 F469 D0 EE bne .0000040 ;continue with mnemonic
8820 ;
8821 F46B A0 03 ldy #s_mnemon
8822 ;
8823 F46D 68 .0000060 pla ;get mnenmonic byte
8824 F46E 20 9A E3 jsr bsouta ;print it (BIOS console display sub)
8825 F471 88 dey
8826 F472 D0 F9 bne .0000060
The third 256 byte table contains flags that identify the addressing mode that corresponds to each opcode. Again, a short excerpt follows:
Code:
10772 ; instruction addressing modes & sizes...
10773 ;
10774 ; xxxxxxxx
10775 ; ||||||||
10776 ; ||||++++———> addressing mode index...
10777 ; ||||
10778 ; |||| Index Mode
10779 ; |||| ———————————————————————————————————
10780 ; |||| 0000 dp, abs, absl, implied or A
10781 ; |||| 0001 #
10782 ; |||| 0010 dp,X, abs,X or absl,X
10783 ; |||| 0011 dp,Y or abs,Y
10784 ; |||| 0100 (dp) or (abs)
10785 ; |||| 0101 [dp] or [abs]
10786 ; |||| 0110 [dp],Y
10787 ; |||| 0111 (dp,X) or (abs,X)
10788 ; |||| 1000 (dp),Y
10789 ; |||| 1001 dp,S
10790 ; |||| 1010 (dp,S),Y
10791 ; |||| 1111 sbnk,dbnk (block move)
10792 ; |||| —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
10793 ; |||| A = accumulator
10794 ; |||| abs = absolute
10795 ; |||| absl = absolute long
10796 ; |||| dbnk = destination bank
10797 ; |||| dp = direct (zero) page
10798 ; |||| S = stack relative
10799 ; |||| sbnk = source bank
10800 ; |||| ———————————————————————————————————
10801 ; ||||
10802 ; ||++———————> binary-encoded operand size
10803 ; |+—————————> 1: relative branch instruction
10804 ; +——————————> 1: variable operand size...
10805 ;
10806 ; —————————————————————————————————————————————————————————————
10807 ; Variable operand size refers to an immediate mode instruction
10808 ; that can accept either an 8 or 16 bit operand. During instr-
10809 ; uction assembly, an 8 bit operand can be forced to 16 bits by
10810 ; preceding the operand field with a pipe (|), e.g., LDA |#$01,
10811 ; which will assemble as $A9 $01 $00. LDA #$0001 will assemble
10812 ; as $A9 $01, whereas LDA |#$0001 will assemble as $A9 $01 $00.
10813 ; —————————————————————————————————————————————————————————————
10814 ;
10815 mnetabam .byte ops0 | am_nam ; $00 BRK
10816 .byte ops1 | am_indx ; $01 ORA (dp,X)
10817 .byte ops1 | am_nam ; $02 COP
10818 .byte ops1 | am_stk ; $03 ORA dp,S
10819 .byte ops1 | am_nam ; $04 TSB dp
10820 .byte ops1 | am_nam ; $05 ORA dp
10821 .byte ops1 | am_nam ; $06 ASL dp
10822 .byte ops1 | am_indl ; $07 ORA [dp]
In the above table,
OPS0,
OPS1, etc., are the operand size in bytes (
OPS0 means no operand,
OPS1 is a one byte operand, etc.) and the UNIX pipe symbol is the assembler's logical OR operator. The
AM_... symbols are the addressing mode indices, which are defined as follows (another excerpt):
Code:
0992 ; addressing mode translation...
0993 ;
0994 0000 am_nam =%0000 ;(0) no symbol
0995 0001 am_imm =%0001 ;(1) #
0996 0002 am_adrx =%0010 ;(2) dp,X or addr,X
0997 0003 am_adry =%0011 ;(3) dp,Y or addr,Y
0998 0004 am_ind =%0100 ;(4) (dp) or (addr)
0999 0005 am_indl =%0101 ;(5) [dp] or [addr]
1000 0006 am_indly =%0110 ;(6) [dp],Y
1001 0007 am_indx =%0111 ;(7) (dp,X) or (addr,X)
1002 0008 am_indy =%1000 ;(8) (dp),Y
1003 0009 am_stk =%1001 ;(9) dp,S
1004 000A am_stky =%1010 ;(10) (dp,S),Y
1005 000B am_rsrva =%1011 ;(11) reserved
1006 000C am_rsrvb =%1100 ;(12) reserved
1007 000D am_rsrvc =%1101 ;(13) reserved
1008 000E am_rsrvd =%1110 ;(14) reserved
1009 000F am_move =%1111 ;(15) MVN/MVP sbnk,dbnk
Other tables are used to translate between the symbolic form of an operand, e.g., the
($01,S),Y part of an
LDA ($01,S),Y instruction, and the opcode that corresponds to that particular addressing mode,
$B3 in this case (the resulting machine code would be
$B3 $01). In the case of the '816, such translation is somewhat complicated by the irregular syntax of some instructions, such as
MVN SBNK,DBNK, where
DBNK is actually the second byte in the three byte assembled instruction—the assembled code has
SBNK and
DBNK reversed in memory. The stack relative instructions are another such case, due to the presence of the
,S in the operand syntax. To resolve all syntax matters, I use one table that contains the ASCII form of each possible operand syntax and a corresponding look-up table to relate the syntactical form to the addressing mode index bits. The tables are bidirectional, so they can be used to disassemble instructions as well.
As the '816 can use 24 bit operands with many instructions (e.g.,
LDA $123456), some code shenanigans were required to deal such situations. In general, the rule is "least fit," meaning use the machine opcode to which the smallest form of the operand will fit. For example, if I code
JMP $000012, the assembler will assemble the instruction as
$4C $12 $00 (
JMP $0012), since JMP can accept either a two or three byte address and the entered address, $12, can be minimally resolved to two bytes. Coding
JMP $012345 will result in
$5c $54 $32 $01 (
JMP $012345)—note the
$5C opcode, which is a jump to another bank instruction.
Immediate mode instructions that can accept either 8 or 16 bit operands are also subjected to the "least fit" rule, so coding
LDA #$0012 assembles to
$A9 $12, even though the apparent intention is to assemble the instruction with a 16 bit operand. As "least fit" in this case can alter the program in an undesirable way, I devised a syntactical method to force the assembler to assemble a 16 bit operand, even though the MSB is zero. If the programmer precedes the operand with a
| (UNIX pipe) the operand will be assembled as 16 bits. Hence
LDA |#$12 will result in
$A9 $12 $00 (
LDA #$0012). This sort of chicanery is necessary because, unlike the above
JMP examples, as well as other instructions that can use 24 bit addresses as operands, there is no opcode difference when assembling 8 bit immediate operands vs. 16 bits—the resolution comes from how the M and X bits in the status register are conditioned at run time.
Anyhow, this ran on a bit longer than originally intended and wasn't meant to be a diatribe on assembler algorithms. My point was that C.R. Bond's assembler is simplistic in the sense that going outside of the realm of the NMOS 6502 will produces complications that his code would not readily handle. This statement, however, is not meant to denigrate his work in any way.