I've released my initial version of the
Ronald Mak Pascal compiler on github.com. Professor Mak developed the referenced compiler as part of his book on compilers and interpreters: "Writing Compilers and Interpreters - An Applied Approach.", John Wiley & Sons, New York, NY, 1991.
Professor Mak has given me permission to release the M65C02A version of the compiler under an open source license. I've elected to release the M65C02A version of the compiler under the GPLv3 license. Professor Mak sent me the original source for the 8086 version of the compiler, which required some minor changes to make it compatible with my Visual C++ 6.0 development environment on Windows XP, and the Eclipse C++ SDK development environment on Linux (Mint).
Uncharacteristically, the README.md file included in the repository is very sparse at this time. I'll add to it in future updates, but at present, it simply states that the compiler is a simple recursive descent Pascal compiler for the M65C02A and 8086 processors. The text/book referenced above, also available for C++ and Java, provides a very good description of the theory and practice used in developing recursive descent compilers and interpreters. My contributions are (1) the translation of the Mak 8086 code generator elements into equivalent M65C02A code generator elements; (2) addition of the reference logic for global (level 1) and local variables of nested procedures and functions (level > 1); (3) correction of the integer division function for the 8086 compiler; and (4) some simple optimizations in the array index computations. (I had optimized the structure programming logic jumps, but lost those optimizations in a failure of my source code version control. Will add those optimizations back in soon.)
As provided in the repository, the M65C02A code generator elements are implemented in a brute force manner, and interspersed with the 8086 code generator elements. This was done to ensure that I had properly understood the 8086 instruction mappings and created the appropriate M65C02A instruction mappings. Professor Mak makes extensive use of the capabilities of the target assembler to implement some of the addressing modes used by the 8086 version of the compiler. In other words, the string substitution facilities of the Microsoft assembler are used to create base pointer relative symbolic variable references: STATIC_LINK EQU <WORD PTR [BP+4]>.
Mapping these features to the M65C02A core has not been too difficult, but it has meant that I've had to add some conditional tests in the code generator to ensure that absolute addressing is used when referencing variables declared at level 1, i.e. local variables of the main Pascal program, and base-relative addressing is used when reference variables declared at any level greater than 1. Luckily, Professor Mak provided a number of Pascal programs that exercise the features of the language to test the code generation of the compiler, and I've made extensive use of these programs to test that the M65C02A code generator matches the the 8086 code generator.
An example of the assembler output of the compiler, as currently constituted, is provided in the following code block for the Sieve of Eratosthenes program:
Code:
; 1: PROGRAM eratosthenes(output);
.STACK 1024 ; Set stack size
.CODE ; place in CODE segment
STATIC_LINK .EQ +4 ;--- STATIC_LINK EQU <WORD PTR [bp+4]>
RETURN_VALUE .EQ -4 ;--- RETURN_VALUE EQU <WORD PTR [bp-4]>
HIGH_RETURN_VALUE .EQ -2 ;--- HIGH_RETURN_VALUE EQU <WORD PTR [bp-2]>
; 2:
; 3: CONST
; 4: max = 1000;
; 5:
; 6: VAR
; 7: sieve : ARRAY [1..max] OF BOOLEAN;
; 8: i, j, limit, prime, factor : INTEGER;
; 9:
; 10: BEGIN
_pc65_main .PROC
phx.w ;--- push bp
tsx.w ;--- mov bp,sp
; 11: limit := max DIV 2;
lda.w #1000 ;--- mov ax,1000
pha.w ;--- push ax
lda #2 ;--- mov ax,2
pha.w ;--- mov cx,ax
;--- pop ax
;--- cwd
jsr _idiv ;--- idiv cx
adj #4
sta.w limit_005 ;--- mov WORD PTR limit_005,ax
; 12: sieve[1] := FALSE;
psh.w #sieve_002 ;--- lea ax,WORD PTR sieve_002
;--- push ax
lda #1 ;--- mov ax,1
dec.w a ;--- sub ax,1
;--- mov dx,2
;--- imul dx
asl.w a
;--- pop dx
clc ;--- add dx,ax
adc.w 0,S
sta.w 0,S ;--- push dx
lda #0 ;--- mov ax,0
;--- pop bx
sta.w (0,S) ;--- mov WORD PTR [bx],ax
adj #2
; 13:
; 14: FOR i := 2 TO max DO
lda #2 ;--- mov ax,2
sta.w i_003 ;--- mov WORD PTR i_003,ax
L_008
lda.w #1000 ;--- mov ax,1000
cmp.w i_003 ;--- cmp WORD PTR i_003,ax
bge L_009 ;--- jle L_009
jmp L_010 ;--- jmp L_010
L_009
; 15: sieve[i] := TRUE;
psh.w #sieve_002 ;--- lea ax,WORD PTR sieve_002
;--- push ax
lda.w i_003 ;--- mov ax,WORD PTR i_003
dec.w a ;--- sub ax,1
;--- mov dx,2
;--- imul dx
asl.w a
;--- pop dx
clc ;--- add dx,ax
adc.w 0,S
sta.w 0,S ;--- push dx
lda #1 ;--- mov ax,1
;--- pop bx
sta.w (0,S) ;--- mov WORD PTR [bx],ax
adj #2
inc.w i_003 ;--- inc WORD PTR i_003
jmp L_008 ;--- jmp L_008
L_010
dec.w i_003 ;--- dec WORD PTR i_003
As can be seen, the extended instruction set of the M65C02A maps very well onto the 8086 register set that Professor Mak used to implement the Pascal virtual machine. Extensive use is made of base pointer relative addressing in many of the other example programs, but this simple program uses absolute addressing for level 1 variables, and the stack. Thus, the base-relative (M65C02A XTOS register) and stack relative addressing modes are very much needed to support an efficient mapping of the 8086 to the M65C02A.
One optimization that I've added to the compiler is evident where the load effective address and push register instruction sequence of the 8086 ISA is replaced by the psh #imm16 instruction of the M65C02A ISA. Another is evident where the asl.w instruction is used instead of an imul instruction to calculate the index into the array sieve_002. The adj #imm instruction is used to clean up the stack after procedure calls, or to remove pointers from the top of the stack without affecting registers or memory.
Also included in the repository is a partial assembler specific to the M65C02A instructions used by the compiler. I am using AWK to implement the assembler. I've not yet included a "linker" for the standard library functions, annd I will require a ASCII hexadecimal to binary translator to complete the process. The following code block is an example of the assembler's output for the code block provided above:
Code:
0000: ; _pc65_main
0000: ABDA ; phx.w_imp
0002: ABBA ; tsx.w_imp
0004: ABA9E803 ; lda.w_imm16 1000
0008: AB48 ; pha.w_imp
000A: A902 ; lda_imm 2
000C: AB48 ; pha.w_imp
000E: 20FFFF ; jsr_abs _idiv
0011: 6204 ; adj_imm 4
0013: AB8DE609 ; sta.w_abs limit_005
0017: AB541202 ; psh.w_imm16 sieve_002
001B: A901 ; lda_imm 1
001D: AB3A ; dec.w_A
001F: AB0A ; asl.w_A
0021: 18 ; clc_imp
0022: AB6300 ; adc.w_sp 0
0025: AB8300 ; sta.w_sp 0
0028: A900 ; lda_imm 0
002A: BB8300 ; sta.w_spI 0
002D: 6202 ; adj_imm 2
002F: A902 ; lda_imm 2
0031: AB8DE209 ; sta.w_abs i_003
0035: ; L_008
0035: ABA9E803 ; lda.w_imm16 1000
0039: ABCDE209 ; cmp.w_abs i_003
003D: AB5003 ; bge_rel L_009
0040: 4C6400 ; jmp_abs L_010
0043: ; L_009
0043: AB541202 ; psh.w_imm16 sieve_002
0047: ABADE209 ; lda.w_abs i_003
004B: AB3A ; dec.w_A
004D: AB0A ; asl.w_A
004F: 18 ; clc_imp
0050: AB6300 ; adc.w_sp 0
0053: AB8300 ; sta.w_sp 0
0056: A901 ; lda_imm 1
0058: BB8300 ; sta.w_spI 0
005B: 6202 ; adj_imm 2
005D: ABEEE209 ; inc.w_abs i_003
0061: 4C3500 ; jmp_abs L_008
0064: ; L_010
0064: ABCEE209 ; dec.w_abs i_003