6502.org

Posted: **Fri Dec 04, 2015 1:32 am**

It very quickly has became apparent that for writing test cases for my new 65816 emulator the little Forth single-pass assembler I wrote, as nice as it is, is not going to be enough. To save my sanity, I'll need something more powerful like a classical two-pass assembler. Since I've gotten use to the Typist's Assember Notation, it became clear that I'd either have to adapt an existing assembler or write my own.

But here's the problem: Classical two-pass assemblers are basically little compilers, which involve stuff like parsers and tokens and lexers and abstract syntax trees and whatnot. I'm sure all you real computer science types just whip these up off the top of your heads, but for us hobbists, the dragon book even sounds scary. Adapting other people's code means understanding their parsers, etc, which sucks. It's just not easy to tinker with them.

At this point, Samuel A. Falvo II came to the rescue with a reference to "nanopass compilers" (http://kestrelcomputer.github.io/kestre ... -compiler/), based on a paper by Sarkar et al (http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf). I didn't even get past the abstract before I was writing an assembler based on lots and and lots of very small, easy to understand passes.

So here is the BETA of such an assembler for our favorite MPUs. The design goal - apart from the actual assembly thing - was to create something that would be very easy for a hobbyist to mess around with, which is why it is a "Tinkerer's Assembler" (https://github.com/scotws/tinkasm). It currently consists of more than 20 passes (more or less, depending on the processor), each aiming to be simple. The code itself is written not only in Python, a very widespread language with famously easy to understand code, but in "primitive" Python - no objects, no functional programming, no maps or filters, few list comprehensions. It uses IF/ELSE and TRY/EXCEPT to show the logic even where it is horribly inefficient. The code starts at the beginning, goes to the end, and then quits (which all really annoys pylint, by the way). It loads no external files and only standard "batteries included" external libraries.

(Dummy code example with vim syntax highlighting for Typist's Assembler)

The downside is that as a pure assembler, for obvious reasons, it sort of sucks. With our small file sizes, speed doesn't matter too much, but still. This is not the program you want to use for raw speed. Also, the current version is still pretty basic. For example, there are macros, but they don't have parameters yet (that comes next). It's also still missing the more advanced macro functions such as IF/THEN/ELSE. The lack of a real parser means that the math functions are rather primitive, pretty much limited to one operator and two operands, but for most assembler stuff, that might be enough.

As an aid to tinkering, to see what happens step for step, the assembler can be told to produce human-readable snapshots after every pass. I've attached the stout for the little test program above which amounts to almost 1,000 lines (the "frog" file name has no meaning, it's my version of foobar, I just realized I forgot to change it). It also can be told to print a listing file (which I'm still experimenting with) and a hexdump in ASCII.

So now I have a little-tested assembler to test my little-tested emulator with. This, ah, might not be the most traditional procedure.

I'll probably be playing around with the assembler first for obvious reasons (a rewrite of Tali Forth for the 65c02 might be in order). The good news is that I'm rapidly running of ways to procrastinate and might have to actually get some real stuff done now ...

Posted: **Fri Dec 04, 2015 4:34 pm**

Quote:

Adapting other people's code means understanding their parsers, etc, which sucks.

Not to hijack your thread (oh, actually to hijack your thread), but I've been tinkering with writing a parser in what I have hope is a fairly understandable step-by-step fashion. It doesn't have much to do with the 6502 per se, but it could be adapted as part of an assembler.

https://anexpressionparser.wordpress.com/

Posted: **Fri Dec 04, 2015 6:58 pm**

You could also do what I have done in my assembler, I use Flex and Bison.
This makes life so much easier. A lex file sets the tokens and the bison sets the grammar.

here is my lex and bison files for my 6502/65C02 macro assembler with scripting.

Lex

Code: Select all

%option case-insensitive yylineno noyywrap
 
%{
#pragma warning(disable:4996)
#pragma warning(disable:6011)
#pragma warning(disable:6387)

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#include "opcodes.h"
#include "pasm64.h"
#include "pasm64.tab.h"
#include "symbol.h"
#include "node.h"
#include "str.h"

int inDoLoop = 0;
int inMacroDef = 0;

#define OP(op)              { yylval.iValue = _ ## op; return OPCODE; }
#define OP_REL(op)          { yylval.iValue = _ ## op ## 0 + yytext[3] - '0'; return OPCODE; }
#define OP_ILLEGAL(op)      { if (AllowIllegalOpCpodes) OP(op) else REJECT; }
#define OP_65C02(op)        { if (CPUMode == cpu_65C02) OP(op) else REJECT; }
#define OP_65C02_REL(op)    { if (CPUMode == cpu_65C02) OP_REL(op) else REJECT; }
#define INT(off, base)      { yylval.iValue = (int) strtol (yytext + off, NULL, base); \
                                return INTEGER; \
                            }
#define INT_1BYTE           { int outlen; \
                                char* tmpStr = SantizeString(yytext, &outlen); \
                                yylval.iValue = (int)tmpStr[1]; \
                                free(tmpStr); \
                                return INTEGER; \
                            }
%}

%x C_COMMENT

ES                          (\\(['"\?\\abfnrtv]|[0-7]{1,3}|x[a-fA-F0-9]+))
LOGOR                       [|][|]
BITOR                       [|]
LOGAND                      [&][&]
BITAND                      [&]
STRVALUE					(\"([^"\\\n]|{ES})*\")+	

%%

"ORA"                       OP(ora)
"AND"                       OP(and)
"EOR"                       OP(eor)
"ADC"                       OP(adc)
"SBC"                       OP(sbc)
"CMP"                       OP(cmp)
"CPX"                       OP(cpx)
"CPY"                       OP(cpy)
"DEC"                       OP(dec)
"DEC"[ /t]*"A"              OP(dec)
"DEX"                       OP(dex)
"DEY"                       OP(dey)
"INC"                       OP(inc)
"INC"[ /t]*"A"              OP(inc)
"INX"                       OP(inx)
"INY"                       OP(iny)
"ASL"                       OP(asl)
"ASL"[ /t]*"A"              OP(asl)
"ROL"                       OP(rol)
"ROL"[ /t]*"A"              OP(rol)
"LSR"                       OP(lsr)
"LSR"[ /t]*"A"              OP(lsr)
"ROR"                       OP(ror)
"ROR"[ /t]*"A"              OP(ror)
"LDA"                       OP(lda)
"STA"                       OP(sta)
"LDX"                       OP(ldx)
"STX"                       OP(stx)
"LDY"                       OP(ldy)
"STY"                       OP(sty)
"TAX"                       OP(tax)
"TXA"                       OP(txa)
"TAY"                       OP(tay)
"TYA"                       OP(tya)
"TSX"                       OP(tsx)
"TXS"                       OP(txs)
"PLA"                       OP(pla)
"PHA"                       OP(pha)
"PLP"                       OP(plp)
"PHP"                       OP(php)
"BPL"                       OP(bpl)
"BMI"                       OP(bmi)
"BVC"                       OP(bvc)
"BVS"                       OP(bvs)
"BCC"                       OP(bcc)
"BCS"                       OP(bcs)
"BNE"                       OP(bne)
"BEQ"                       OP(beq)
"BRK"                       OP(brk)
"RTI"                       OP(rti)
"JSR"                       OP(jsr)
"RTS"                       OP(rts)
"JMP"                       OP(jmp)
"BIT"                       OP(bit)
"CLC"                       OP(clc)
"SEC"                       OP(sec)
"CLD"                       OP(cld)
"SED"                       OP(sed)
"CLI"                       OP(cli)
"SEI"                       OP(sei)
"CLV"                       OP(clv)
"NOP"                       OP(nop)
"SLO"                       OP_ILLEGAL(slo)
"RLA"                       OP_ILLEGAL(rla)
"SRE"                       OP_ILLEGAL(sre)
"RRA"                       OP_ILLEGAL(rra)
"SAX"                       OP_ILLEGAL(sax)
"LAX"                       OP_ILLEGAL(lax)
"DCP"                       OP_ILLEGAL(dcp)
"ISC"                       OP_ILLEGAL(isc)
"ANC"                       OP_ILLEGAL(anc)
"ANC2"                      OP_ILLEGAL(anc2)
"ALR"                       OP_ILLEGAL(alr)
"ARR"                       OP_ILLEGAL(arr)
"XAA"                       OP_ILLEGAL(xaa)
"LAX2"                      OP_ILLEGAL(lax2)
"AXS"                       OP_ILLEGAL(axs)
"SBC2"                      OP_ILLEGAL(sbc2)
"AHX"                       OP_ILLEGAL(ahx)
"SHY"                       OP_ILLEGAL(shy)
"SHX"                       OP_ILLEGAL(shx)
"TAS"                       OP_ILLEGAL(tas)
"LAS"                       OP_ILLEGAL(las)
"BRA"                       OP_65C02(bra)
"PHX"                       OP_65C02(phx)
"PHY"                       OP_65C02(phy)
"PLX"                       OP_65C02(plx)
"PLY"                       OP_65C02(ply)
"STZ"                       OP_65C02(stz)
"TRB"                       OP_65C02(trb)
"TSB"                       OP_65C02(tsb)
"STP"                       OP_65C02(stp)
"WAI"                       OP_65C02(wai)
"BBR"[0-7]                  OP_65C02_REL(bbr)
"BBS"[0-7]                  OP_65C02_REL(bbs)
"RMB"[0-7]                  OP_65C02_REL(rmb)
"SMB"[0-7]                  OP_65C02_REL(smb)

".BYTE"                     return BYTE;
".DB"                       return BYTE;
".DCB"                      return BYTE;
".WORD"                     return WORD;
".DW"                       return WORD;
".DCW"                      return WORD;
".DS"                       return DS;
".EQU"                      return EQU;
"NOT"                       return NOT;
">="                        return GE;
"<="                        return LE;
"=="                        return EQ;
"!="                        return NE;
"<>"                        return NE;
"<<"                        return SHIFT_LEFT;
">>"                        return SHIFT_RIGHT;
{LOGOR}                     return OR;
{BITOR}                     return BIT_OR;
{LOGAND}                    return AND;
{BITAND}                    return BIT_AND;
".REPEAT"                   return REPEAT;
".UNTIL"                    return UNTIL;
".END"                      return END;
".ENDIF"                    return ENDIF;
".IF"                       return IF;
".ELSE"                     return ELSE;
".PRINT"                    return PRINT;
"\?"                        return PRINT;
".PRINTALL"                 return PRINTALL;                   
"\?\?"                      return PRINTALL;
".FOR"                      return FOR;
".NEXT"                     return NEXT;
".STEP"                     return STEP;
".TO"                       return TO;
".DOWNTO"                   return DOWNTO;
".STR"						return STR;
".STRING"				    return STR;
".ORG"                      return ORG;
".SECTION"                  return SECTION;
".ENDSECTION"               return ENDSECTION;
 ".SECT"                    return SECTION;
 ".ENDS"                    return ENDSECTION;

".WHILE"                    { if (inDoLoop > 0) { inDoLoop--; return ENDDO; } else { inDoLoop = 0; return WHILE; }}
".WEND"                     { return WEND; }
".DO"                       { inDoLoop++; return DO; }
".MACRO"                    { inMacroDef++; return MACRO; }
".MAC"                      { inMacroDef++; return MACRO; }
".ENDMACRO"                 { inMacroDef--; return ENDMACRO; }
".ENDM"                     { inMacroDef--; return ENDMACRO; }
".REGX"                     { return REGX; }
".REGY"                     { return REGY; }
".VAR"                      { return VAR;  }
".6502"[ /t]*"ON"           { CPUMode = cpu_6502; }
".65C02"[ /t]*"ON"          { CPUMode = cpu_65C02; }
".ILLEGAL"[ /t]*"ON"        { AllowIllegalOpCpodes = TRUE; }
".ILLEGAL"[ /t]*"OFF"       { AllowIllegalOpCpodes = FALSE; }
".WARN"[ /t]*"ON"           { NoWarnings = FALSE; }
".WARN"[ /t]*"OFF"          { NoWarnings = TRUE; }
".C64"                      { OutFileFormat = c64; }
[@][0-9]+                   {
                                if (!inMacroDef)
                                    REJECT;
                                yylval.strValue = Strdup(yytext); return SYMBOL;
                            }
[\'].[\']                   INT_1BYTE
[\'][\\].[\']               INT_1BYTE
$[0-9A-Fa-f]*               INT(1, 16)
[\'][\\]x[0-9A-Fa-f]+[\']   INT(3, 16)
[0-9]*                      INT(0, 10)
%[0-1]*                     INT(1, 2)
[\'][\\][0-7]{3}[\']        INT(2, 8)

{STRVALUE}					{ yylval.strValue = Strdup(yytext); return STRING_LITERAL; }
"*"[ /t]*[=]                { unput('='); return PCASSIGN; }
"*"[ /t]*".EQU"             { unput('='); return PCASSIGN; }
"X"                         { return 'X'; }
"Y"                         { return 'Y'; }
[A-Za-z_][A-Za-z0-9_.]*:?   { yylval.strValue = Strdup(yytext); return SYMBOL; }
"/*"                        { BEGIN(C_COMMENT); }
<C_COMMENT>"*/"             { BEGIN(INITIAL); }
<C_COMMENT>.|[\n]           { /* ignore comments */ }
;[^\n]*                     { /* ignore comments */ }
[/][/][^\n]*                { /* ignore comments */ }
"~"                         { return '~'; }
"^"                         { return '^'; }
[-<>=+*/#,();=\n]           { return *yytext; }
[ \t]+                      { /* ignore white space */ }
.                           { yyerror("syntax error"); }

%%

Bison

Code: Select all

%no-lines

%{

// ***********************************************************************
// Author           : Paul Baxter
// Created          : 02-23-2015
//
// copyright (c) 2015 Paul Baxter
//
// Last Modified By : Paul
// Last Modified On : 11-4-2015
// ***********************************************************************

#pragma warning(disable:4065)
#pragma warning(disable:4996)

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <limits.h>
#include <errno.h>

#include "pasm64.h"
#include "opcodes.h"
#include "node.h"
#include "genlist.h"
#include "pasm64.tab.h"
#include "symbol.h" 
#include "str.h" 

%}

%union 
{
    int iValue;                 /* integer value */
    char* strValue;      		/* string */
    char* sIndex;               /* symbol table pointer */
    parseNode *nPtr;            /* node pointer */
};

%token <iValue> INTEGER
%token <iValue> OPCODE
%token <sIndex> SYMBOL MACROPARAM
%token <strValue> STRING_LITERAL

%token WHILE ENDDO REPEAT UNTIL IF PRINT PRINTALL EQU ORG PCASSIGN 
%token END DO MACRO ENDMACRO ENDIF WEND STATEMENT EXPRLIST STR
%token FOR NEXT TO DOWNTO STEP NOT
%token BYTE WORD LOBYTE HIBYTE DS
%token REGX REGY VAR
%token SECTION ENDSECTION

%nonassoc ELSE UMINUS '~'

%left SHIFT_LEFT SHIFT_RIGHT
%left OR AND GE LE EQ NE '>' '<' 
%left BIT_OR BIT_AND '^'
%left '+' '-'
%left '*' '/'

%type <nPtr> stmt_list stmt 
%type <nPtr> opcode regloopexpr
%type <nPtr> macrodef macrocall expr_list symbol_list
%type <nPtr> symbol_assign symbol_value var_def pc_assign 
%type <nPtr> expr subexpr ifexpr loopexpr
%type <nPtr> section endsection
%%

program 
    : program stmt                      { ex($2);                                       }
    | /* NULL */
    ;
        
stmt    
    : opcode '\n'                       { $$ = $1;                                      }
    | symbol_value '\n'                 { $$ = $1;                                      }
    | symbol_value opcode '\n'          { ex($1); $$ = $2;                              }
    | symbol_assign '\n'                { $$ = $1;                                      }
    | pc_assign '\n'                    { $$ = $1;                                      }
    | ifexpr '\n'                       { $$ = $1;                                      }
    | loopexpr '\n'                     { $$ = $1;                                      }
    | regloopexpr '\n'                  { $$ = $1;                                      }
    | macrodef '\n'                     { $$ = $1;                                      }
    | macrocall '\n'                    { $$ = $1;                                      }
    | section '\n'                      { $$ = $1;                                      }
    | endsection '\n'                   { $$ = $1;                                      }
    | var_def '\n'                      { $$ = $1;                                      }
    | '\n'                              { $$ = opr(STATEMENT, 0);                       }
    ;

stmt_list
    : stmt                              { $$ = $1;                                      }
    | stmt_list stmt                    { $$ = opr(STATEMENT, 2, $1, $2);               }
    ;

section
    : SECTION SYMBOL                    { $$ = opr(SECTION, 1, id($2));                 }
    ;

endsection
    : ENDSECTION                        { $$ = opr(ENDSECTION, 0);                      }
    ;

ifexpr 
    : IF subexpr stmt_list ELSE stmt_list ENDIF                                 { $$ = opr(IF, 3, $2, $3, $5);                  }
    | IF subexpr stmt_list ENDIF                                                { $$ = opr(IF, 2, $2, $3);                      }
    ;

loopexpr
    : REPEAT '\n' stmt_list UNTIL subexpr                                       { $$ = opr(REPEAT, 2, $3, $5);                  }
    | DO stmt_list ENDDO subexpr                                                { $$ = opr(DO, 2, $2, $4);                      }
    | WHILE subexpr '\n' stmt_list WEND                                         { $$ = opr(WHILE, 2, $2, $4);                   }   
    | FOR symbol_assign TO subexpr '\n' stmt_list NEXT SYMBOL                   { $$ = opr(FOR, 4, $2, $4, $6, id($8));         }
    | FOR symbol_assign TO subexpr STEP subexpr '\n' stmt_list NEXT SYMBOL      { $$ = opr(FOR, 5, $2, $4, $8, id($10), $6);    }
    ;

regloopexpr
    : FOR REGX '=' subexpr TO subexpr '\n' stmt_list NEXT 'X'                   { $$ = opr(REGX, 4, $4, $6, $8, con(1, 0));     }
    | FOR REGX '=' subexpr DOWNTO subexpr '\n' stmt_list NEXT 'X'               { $$ = opr(REGX, 4, $4, $6, $8, con(-1,0));     }
    | FOR REGY '=' subexpr TO subexpr '\n' stmt_list NEXT 'Y'                   { $$ = opr(REGY, 4, $4, $6, $8, con(1,0));      }
    | FOR REGY '=' subexpr DOWNTO subexpr '\n' stmt_list NEXT 'Y'               { $$ = opr(REGY, 4, $4, $6, $8, con(-1,0));     }
    ;

expr_list
    : subexpr                           { $$ = opr(EXPRLIST, 1, $1);                    }
    | STRING_LITERAL                    { $$ = opr(EXPRLIST, 1, str($1));               }
    | expr_list ',' subexpr             { $$ = opr(EXPRLIST, 2, $1, $3);                }
    | expr_list ',' STRING_LITERAL      { $$ = opr(EXPRLIST, 2, $1, str($3));           }
    ;
          
macrodef
    : MACRO SYMBOL stmt_list ENDMACRO   { $$ = opr(MACRO, 2, macroid($2), $3);          }
    ;

macrocall
    : SYMBOL expr_list                  { $$ = macroex($1, $2);                         }
    ;

symbol_list
    : SYMBOL                            { $$ = opr(EXPRLIST, 1, id($1));                }
    | symbol_list ',' SYMBOL            { $$ = opr(EXPRLIST, 2, $1, id($3));            }
    | SYMBOL '=' subexpr                { $$ = opr(EXPRLIST, 3, id($1), $3);            }
    | symbol_list ',' SYMBOL '=' subexpr{ $$ = opr(EXPRLIST, 3, $1, id($3), $5);        }
    ;

var_def
    : VAR symbol_list                   { $$ = opr(VAR, 1, $2);                         }
    ;

symbol_assign
    : SYMBOL '=' subexpr                { $$ = opr('=', 2, id($1), $3);                 }
    | SYMBOL EQU subexpr                { $$ = opr(EQU, 2, id($1), $3);                 }
    ;
 
pc_assign
    : PCASSIGN '=' subexpr              { $$ = opr(PCASSIGN, 1, $3);                    }
    | PCASSIGN EQU subexpr              { $$ = opr(PCASSIGN, 2, $3);                    }
    ;

symbol_value
    : SYMBOL                            {                                                
                                            SymbolTablePtr sym = LookUpSymbol($1);
                                            if (sym && sym->ismacroname)
                                            {
                                                $$ = macroex($1, NULL);
                                            }
                                            else
                                            {
                                                $$ = opr('=', 2, id($1), con(PC, TRUE));
                                            }
                                        }
    ;

opcode
    : OPCODE                            { $$ = opcode($1, i, 0);                        }
    | OPCODE '#' subexpr                { $$ = opcode($1, I, 1, $3);                    }
    | OPCODE expr                       { $$ = opcode($1, a, 1, $2);                    }
    | OPCODE expr ',' 'X'               { $$ = opcode($1, ax, 1, $2);                   }
    | OPCODE expr ',' 'Y'               { $$ = opcode($1, ay, 1, $2);                   }
    | OPCODE '(' subexpr ')'            { $$ = opcode($1, ind, 1, $3);                  }
    | OPCODE '(' subexpr ',' 'X' ')'    { $$ = opcode($1, aix, 1, $3);                  }
    | OPCODE '(' subexpr ')' ',' 'Y'    { $$ = opcode($1, zpiy, 1, $3);                 }
    | OPCODE expr ',' subexpr           { $$ = opcode($1, zr, 2, $2, $4);               }
    | ORG subexpr                       { $$ = opr(ORG, 1, $2);                         }
    | DS subexpr                        { $$ = opr(DS, 1, $2);                          }
    | BYTE expr_list                    { $$ = data(1, $2);                             }
    | WORD expr_list                    { $$ = data(2, $2);                             }
	| STR expr_list					    { $$ = data(0, $2);	 							}
    | PRINT                             { $$ = opr(PRINT, 0);                           }
    | PRINT expr_list                   { $$ = opr(PRINT, 1, $2);                       }
    | PRINTALL                          { $$ = opr(PRINTALL, 0);                        }
    | PRINTALL expr_list                { $$ = opr(PRINTALL, 1, $2);                    }
    ;

subexpr
    : expr                              { $$ = $1;                                      }
    | '*'                               { $$ = con(PC, TRUE);                           }
    | '(' subexpr ')'                   { $$ = $2;                                      }
    ;

expr
    : INTEGER                           { $$ = con($1, FALSE);                          }
    | SYMBOL                            { $$ = id($1);                                  }
    | '-' subexpr %prec UMINUS          { $$ = opr(UMINUS, 1, $2);                      }
    | '~' subexpr %prec UMINUS          { $$ = opr('~', 1, $2);                         }
    | '<' subexpr %prec UMINUS          { $$ = opr(LOBYTE, 1, $2);                      }
    | '>' subexpr %prec UMINUS          { $$ = opr(HIBYTE, 1, $2);                      }
    | NOT subexpr %prec UMINUS          { $$ = opr(NOT, 1, $2);                         }
    | subexpr OR subexpr                { $$ = opr(OR, 2, $1, $3);                      }
    | subexpr AND subexpr               { $$ = opr(AND, 2, $1, $3);                     }
    | subexpr SHIFT_LEFT subexpr        { $$ = opr(SHIFT_LEFT, 2, $1, $3);              }
    | subexpr SHIFT_RIGHT subexpr       { $$ = opr(SHIFT_RIGHT, 2, $1, $3);             }
    | subexpr '<' subexpr               { $$ = opr('<', 2, $1, $3);                     }
    | subexpr '>' subexpr               { $$ = opr('>', 2, $1, $3);                     }
    | subexpr GE subexpr                { $$ = opr(GE, 2, $1, $3);                      }
    | subexpr LE subexpr                { $$ = opr(LE, 2, $1, $3);                      }
    | subexpr NE subexpr                { $$ = opr(NE, 2, $1, $3);                      }
    | subexpr EQ subexpr                { $$ = opr(EQ, 2, $1, $3);                      }
    | subexpr BIT_AND subexpr           { $$ = opr(BIT_AND, 2, $1, $3);                 }
    | subexpr BIT_OR subexpr            { $$ = opr(BIT_OR, 2, $1, $3);                  }
    | subexpr '^' subexpr               { $$ = opr('^', 2, $1, $3);                     }
    | subexpr '+' subexpr               { $$ = opr('+', 2, $1, $3);                     }
    | subexpr '-' subexpr               { $$ = opr('-', 2, $1, $3);                     }
    | subexpr '*' subexpr               { $$ = opr('*', 2, $1, $3);                     }
    | subexpr '/' subexpr               { $$ = opr('/', 2, $1, $3);                     }
    ;

%%

Posted: **Sat Dec 05, 2015 10:39 am**

Thanks for the suggestions - at some point I'm going to write a "normal" assembler for Typist's Assembler Notation, in a more powerful language. However, doing it this way instantly became an idée fixe after having read that paper, though, so I had to get it out of my system. Yes, I feel much better now.

A "conventional" assembler will be a bit down the road, though, because after the fun I had learning Forth, I'm considering learning another "non-mainstream" language for the insights, and that would be what I'll write it in for practice. Possibly a strongly functional one -- Scheme would seem the obvious choice because of all the available literature, and the next challenge would be to write my own version for the 65816. I'm obviously not going to be writing a 16-bit version of Clojure anytime soon, for example. (I did spend a week looking at Haskell, at one point, before I became aware that completing the Ritual of Kolinahr is a prerequisite. )

But first things first. So little time, so many projects

.

Posted: **Sat Dec 05, 2015 5:44 pm**

Nanopass compilers - I like it! Historical note: "FORTRAN was provided for the IBM 1401 by an innovative 63-pass compiler that ran in only 8k of core. It kept the program in memory and loaded overlays that gradually transformed it, in place, into executable form"
http://ibm-1401.info/1401-FORTRAN-Illustrated.html
The overlay space had room for just 300 instructions, the average pass using just half that:
https://www.cs.sjsu.edu/~mak/archive/CS ... mpiler.pdf

Posted: **Sat Dec 05, 2015 7:11 pm**

BigEd wrote:

Nanopass compilers - I like it! Historical note: "FORTRAN was provided for the IBM 1401 by an innovative 63-pass compiler that ran in only 8k of core. It kept the program in memory and loaded overlays that gradually transformed it, in place, into executable form"
http://ibm-1401.info/1401-FORTRAN-Illustrated.html
The overlay space had room for just 300 instructions, the average pass using just half that:
https://www.cs.sjsu.edu/~mak/archive/CS ... mpiler.pdf

Only 63 passes? Why, that is positively blazing performance!

Posted: **Sat Dec 05, 2015 7:16 pm**

scotws wrote:

A "conventional" assembler will be a bit down the road...

Once you leap the hurdle of parsing the input the rest isn't too difficult. Most of what an assembler does is routine math and string comparison, along with the requisite I/O. I've already written a library of string and math functions for the 65C816. Some of the string functions implicitly generate pointers that allow one to perform an insertion sort, which if used as the assembler runs, facilitate building up the symbol table.

Posted: **Wed Dec 09, 2015 5:18 pm**

scotws wrote:

It very quickly has became apparent that for writing test cases for my new 65816 emulator the little Forth single-pass assembler I wrote, as nice as it is, is not going to be enough.

I'm rather surprised to read that. The assembler that I wrote for my FORTH system has proved to be one of the most powerful assemblers that I have used (once you get used to the reverse syntax). Being label-less, structured programming is a breeze. With the full power of FORTH behind it, any manner of address/data computation is possible. Conditional assembly, macros, the FORTH assembler has it all.

The only pain is when it comes to multiple condition tests. A 6502 branch instruction can only take on one condition flag at a time and the label-less IF, WHILE, etc statements are the same. The work around for this is to either use a flag variable (byte) to combine the results of the different condition tests or to manually calculate the destination address of each branch instruction and code the conditional Bxx, instructions as appropriate.

Of course, my FORTH system uses separate code, data and name spaces so it is possible to write an entire program in assembly language and save it as a stand-alone program.

Posted: **Thu Dec 10, 2015 2:07 pm**

theGSman wrote:

I'm rather surprised to read that. The assembler that I wrote for my FORTH system has proved to be one of the most powerful assemblers that I have used (once you get used to the reverse syntax).

I probably should have been more specific -- my Forth assembler is insanely powerful given its size, and returning to Python seems downright clunky. The problem is that I'm using labels a lot, and that gets to be a pain. It's less a problem that it's Forth, but that it's a single-pass assembler. A two- or multi-pass assembler makes stuff easier.

Also, we're talking about test routines for a 65816 emulator, and so it's probably of more use to the world if those are in a more generally accepted format. Typist's Assembler is bad enough in that respect, but since it is actually more simple to use than the traditional format, I'm sure you can whip up a conversion program that turns "lda.z 10" into "LDA $10" with little trouble (I should probably do that at some point). Converting the Forth code to traditional 65* format is probably more an exercise left for the AI crowd

.

Posted: **Thu Dec 10, 2015 7:35 pm**

scotws wrote:

theGSman wrote:

I'm rather surprised to read that. The assembler that I wrote for my FORTH system has proved to be one of the most powerful assemblers that I have used (once you get used to the reverse syntax).

I probably should have been more specific -- my Forth assembler is insanely powerful given its size, and returning to Python seems downright clunky. The problem is that I'm using labels a lot, and that gets to be a pain. It's less a problem that it's Forth, but that it's a single-pass assembler. A two- or multi-pass assembler makes stuff easier.

You may want to read Henry Baker's paper on "Comfy", which is a one-pass 6502 assembler written in Lisp, and which does not require labels for branches. It may well be that some of the ideas there map to Forth.

Posted: **Fri Dec 11, 2015 7:46 am**

scotws wrote:

I probably should have been more specific -- my Forth assembler is insanely powerful given its size, and returning to Python seems downright clunky. The problem is that I'm using labels a lot, and that gets to be a pain. It's less a problem that it's Forth, but that it's a single-pass assembler. A two- or multi-pass assembler makes stuff easier.

I think that is your real problem. A true Forth assembler is label-less because it has assembly language equivalents to the Forth flow control words: namely IF, ELSE, THEN, BEGIN, WHILE, UNTIL, REPEAT, AGAIN, (the commas distinguish these words from their high level equivalents). Use of these structures means that labels should almost never be needed. Of course, you can always use CONSTANT or VARIABLE to identify a memory location or piece of data and you can still use HERE (or in my case, CP @) to mark a place in the code.

Another advantage of the Forth assembler is that you are not limited to one instruction per line so you can arrange your code in logical segments. You can also use indentation to make the control structures easier to identify.

scotws wrote:

I'm sure you can whip up a conversion program that turns "lda.z 10" into "LDA $10" with little trouble (I should probably do that at some point). Converting the Forth code to traditional 65* format is probably more an exercise left for the AI crowd

.

One of the beautiful things I discovered about the 6502 when I first wrote the Forth assembler is that the same bits were used to define the addressing mode for all instructions. The assembler can automatically distinguish between zero page and absolute addressing modes (DUP 256 U< IF ...). Other modes are just as easily identified: #, #<, #>, for immediate mode, ,X ,Y for absolute (or zero page) indexed, ,X) for indexed indirect and ),Y for indirect indexed. Detecting illegal addressing modes and out of range branches also proved to be a snap.

Although I wrote my Forth assembler in assembly language, it can just as easily (even more easily) be written in Forth. This might be your starting point if your existing assembler lacks the features that I have listed.

Posted: **Tue Dec 15, 2015 7:10 am**

@scotws, since your Forth assembler seems to be less than ideal, I have converted my code into Forth language and attached it here for you. It is untested (so E&OE) but it should work for almost any version of Forth. If you can alter where HERE points to then you can create stand-alone programs (otherwise, write new HERE , and C, words). It is only 6502 based but there should be no problem extending it for 65C02 or 658xx instructions.

Forth6502ASM.txt: (4.7 KiB) Downloaded 267 times

I have also converted your sample code above to show how it would look if written for the Forth assembler. You must admit that the control structures make it easier to see what the code is doing. Since the Forth assembler is intended primarily for creating low level code words only, it had no provision for in-lining strings (notice how that didn't faze me one bit.

) Of course, the IO would have to be tailored to your system.

Code: Select all

( macros )
: string" >TIB @ DUP TIB + SWAP #TIB SWAP - ( a n )
   BEGIN
      1- DUP 0>= ( a n f )
      IF
        OVER C@ DUP '" <> ( a n c f )
        IF 
           C, SWAP 1+ SWAP
        ELSE 
           2DROP 1+ TIB - >TIB ! EXIT
        THEN
      ELSE
        DROP TIB - >TIB ! EXIT
      THEN
   AGAIN
;
: string0" string" 0 C, ;
: stringlf" string" LF C, ;
: L, ( l h -- ) SWAP , , ;
: lazy NOP, NOP, ;

$8000 CONSTANT vga_base
vga_base 1+ CONSTANT vga1
vga1 1+ CONSTANT vga2

5 CONSTANT done?

0 #, LDA, TAX,
BEGIN,
   BEGIN,
      done? #, CMP,
      vga_base ,X STA,
      INX, 0=,
   UNTIL,

   " tests/includeme.f" FLOAD
   NATIVE,
   AXY16,

   0 #, PHE,
   PLA,
   0 #, CMP, 0=,
UNTIL,

( data ) HERE
   ( byte ) 1 C, 2 C, 3 C, 4 C,
   ( word ) $1000 , $2000 , $3000 ,
   ( long ) $01aaaa. L, $02bbbb. L,

( strings ) HERE
   string" The cake is a lie!"
   string0" Terminated by a zero"
   stringlf" Terminated by LF"

CONSTANT strings CONSTANT data

Posted: **Wed Dec 16, 2015 12:54 pm**

Thanks for that -- I'll take a longer look at your assembler after I've gotten some of the other projects that are running parallel here already get settled down (assembler, emulator, rewrite of Tali Forth, argh!). I've gotten so use to the Typist's syntax in both Forth and "normal" versions by now that I'd see about rewriting that part, but for pure Forth assembler, structured programming would seem to make more sense than labels (though to be fair, I learned a lot about Forth by implementing those forward references). Thanks again!

Posted: **Wed Apr 20, 2016 4:28 pm**

Dogfooding the assembler for Liara Forth showed some features are really missing. So I've added the ability to do math terms by pushing anything in curly braces through a sanitizer and then the Python 3 eval function (which, yes, is evil). Also, more clever handling of the MVN and MVP instructions (I now split up the terms into two lines with a dummy opcode for the second line, run them through the next steps to handle variables and math, and then put them together again). The current line directive is now ".*" (dot star). What doesn't work yet are math terms and modifiers in instructions such as .byte, but the real world is calling and so that'll have to wait a few more days.

Posted: **Fri Apr 22, 2016 7:15 pm**

By being a horrible parent and letting my kid play Minecraft for far too long, I've changed the code so that now modifiers and math terms will work in data directives, eg.

Code: Select all

.byte .lsb { 1 + 1 } .msb { 20 ** 2 }

Needs far more testing, but that really is as far as I'm going to get today before social services starts calling.

6502.org

Introducing a Tinkerer's Assembler for the 6502/65c02/65816

Introducing a Tinkerer's Assembler for the 6502/65c02/65816

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65

Re: Introducing a Tinkerer's Assembler for the 6502/65c02/65