6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 7:53 pm

All times are UTC




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Sat Nov 12, 2016 5:34 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I've released my initial version of the Ronald Mak Pascal compiler on github.com. Professor Mak developed the referenced compiler as part of his book on compilers and interpreters: "Writing Compilers and Interpreters - An Applied Approach.", John Wiley & Sons, New York, NY, 1991.

Professor Mak has given me permission to release the M65C02A version of the compiler under an open source license. I've elected to release the M65C02A version of the compiler under the GPLv3 license. Professor Mak sent me the original source for the 8086 version of the compiler, which required some minor changes to make it compatible with my Visual C++ 6.0 development environment on Windows XP, and the Eclipse C++ SDK development environment on Linux (Mint).

Uncharacteristically, the README.md file included in the repository is very sparse at this time. I'll add to it in future updates, but at present, it simply states that the compiler is a simple recursive descent Pascal compiler for the M65C02A and 8086 processors. The text/book referenced above, also available for C++ and Java, provides a very good description of the theory and practice used in developing recursive descent compilers and interpreters. My contributions are (1) the translation of the Mak 8086 code generator elements into equivalent M65C02A code generator elements; (2) addition of the reference logic for global (level 1) and local variables of nested procedures and functions (level > 1); (3) correction of the integer division function for the 8086 compiler; and (4) some simple optimizations in the array index computations. (I had optimized the structure programming logic jumps, but lost those optimizations in a failure of my source code version control. Will add those optimizations back in soon.)

As provided in the repository, the M65C02A code generator elements are implemented in a brute force manner, and interspersed with the 8086 code generator elements. This was done to ensure that I had properly understood the 8086 instruction mappings and created the appropriate M65C02A instruction mappings. Professor Mak makes extensive use of the capabilities of the target assembler to implement some of the addressing modes used by the 8086 version of the compiler. In other words, the string substitution facilities of the Microsoft assembler are used to create base pointer relative symbolic variable references: STATIC_LINK EQU <WORD PTR [BP+4]>.

Mapping these features to the M65C02A core has not been too difficult, but it has meant that I've had to add some conditional tests in the code generator to ensure that absolute addressing is used when referencing variables declared at level 1, i.e. local variables of the main Pascal program, and base-relative addressing is used when reference variables declared at any level greater than 1. Luckily, Professor Mak provided a number of Pascal programs that exercise the features of the language to test the code generation of the compiler, and I've made extensive use of these programs to test that the M65C02A code generator matches the the 8086 code generator.

An example of the assembler output of the compiler, as currently constituted, is provided in the following code block for the Sieve of Eratosthenes program:

Code:
;    1: PROGRAM eratosthenes(output);
     .STACK  1024                   ; Set stack size

     .CODE                          ; place in CODE segment

STATIC_LINK         .EQ    +4       ;--- STATIC_LINK         EQU    <WORD PTR [bp+4]>
RETURN_VALUE        .EQ    -4       ;--- RETURN_VALUE        EQU    <WORD PTR [bp-4]>
HIGH_RETURN_VALUE   .EQ    -2       ;--- HIGH_RETURN_VALUE   EQU    <WORD PTR [bp-2]>

;    2:
;    3: CONST
;    4:     max = 1000;
;    5:
;    6: VAR
;    7:     sieve : ARRAY [1..max] OF BOOLEAN;
;    8:     i, j, limit, prime, factor : INTEGER;
;    9:
;   10: BEGIN

_pc65_main    .PROC

     phx.w                          ;---    push    bp
     tsx.w                          ;---    mov     bp,sp
;   11:     limit := max DIV 2;
     lda.w #1000                    ;---    mov     ax,1000
     pha.w                          ;---    push    ax
     lda #2                         ;---    mov     ax,2
     pha.w                          ;---    mov     cx,ax
                                    ;---    pop     ax
                                    ;---    cwd
     jsr _idiv                      ;---    idiv    cx
     adj #4
     sta.w limit_005                ;---    mov     WORD PTR limit_005,ax
;   12:     sieve[1] := FALSE;
     psh.w #sieve_002               ;---    lea     ax,WORD PTR sieve_002
                                    ;---    push    ax
     lda #1                         ;---    mov     ax,1
     dec.w a                        ;---    sub     ax,1
                                    ;---    mov     dx,2
                                    ;---    imul    dx
     asl.w a
                                    ;---    pop     dx
     clc                            ;---    add     dx,ax
     adc.w 0,S
     sta.w 0,S                      ;---    push    dx
     lda #0                         ;---    mov     ax,0
                                    ;---    pop     bx
     sta.w (0,S)                    ;---    mov     WORD PTR [bx],ax
     adj #2
;   13:
;   14:     FOR i := 2 TO max DO
     lda #2                         ;---    mov     ax,2
     sta.w i_003                    ;---    mov     WORD PTR i_003,ax
L_008
     lda.w #1000                    ;---    mov     ax,1000
     cmp.w i_003                    ;---    cmp     WORD PTR i_003,ax
     bge L_009                      ;---    jle     L_009
     jmp L_010                      ;---    jmp     L_010
L_009
;   15:         sieve[i] := TRUE;
     psh.w #sieve_002               ;---    lea     ax,WORD PTR sieve_002
                                    ;---    push    ax
     lda.w i_003                    ;---    mov     ax,WORD PTR i_003
     dec.w a                        ;---    sub     ax,1
                                    ;---    mov     dx,2
                                    ;---    imul    dx
     asl.w a
                                    ;---    pop     dx
     clc                            ;---    add     dx,ax
     adc.w 0,S
     sta.w 0,S                      ;---    push    dx
     lda #1                         ;---    mov     ax,1
                                    ;---    pop     bx
     sta.w (0,S)                    ;---    mov     WORD PTR [bx],ax
     adj #2
     inc.w i_003                    ;---    inc     WORD PTR i_003
     jmp L_008                      ;---    jmp     L_008
L_010
     dec.w i_003                    ;---    dec     WORD PTR i_003

As can be seen, the extended instruction set of the M65C02A maps very well onto the 8086 register set that Professor Mak used to implement the Pascal virtual machine. Extensive use is made of base pointer relative addressing in many of the other example programs, but this simple program uses absolute addressing for level 1 variables, and the stack. Thus, the base-relative (M65C02A XTOS register) and stack relative addressing modes are very much needed to support an efficient mapping of the 8086 to the M65C02A.

One optimization that I've added to the compiler is evident where the load effective address and push register instruction sequence of the 8086 ISA is replaced by the psh #imm16 instruction of the M65C02A ISA. Another is evident where the asl.w instruction is used instead of an imul instruction to calculate the index into the array sieve_002. The adj #imm instruction is used to clean up the stack after procedure calls, or to remove pointers from the top of the stack without affecting registers or memory.

Also included in the repository is a partial assembler specific to the M65C02A instructions used by the compiler. I am using AWK to implement the assembler. I've not yet included a "linker" for the standard library functions, annd I will require a ASCII hexadecimal to binary translator to complete the process. The following code block is an example of the assembler's output for the code block provided above:
Code:
0000:               ; _pc65_main
0000: ABDA          ;         phx.w_imp     
0002: ABBA          ;         tsx.w_imp     
0004: ABA9E803      ;         lda.w_imm16   1000
0008: AB48          ;         pha.w_imp     
000A: A902          ;         lda_imm       2
000C: AB48          ;         pha.w_imp     
000E: 20FFFF        ;         jsr_abs       _idiv
0011: 6204          ;         adj_imm       4
0013: AB8DE609      ;         sta.w_abs     limit_005
0017: AB541202      ;         psh.w_imm16   sieve_002
001B: A901          ;         lda_imm       1
001D: AB3A          ;         dec.w_A       
001F: AB0A          ;         asl.w_A       
0021: 18            ;         clc_imp       
0022: AB6300        ;         adc.w_sp      0
0025: AB8300        ;         sta.w_sp      0
0028: A900          ;         lda_imm       0
002A: BB8300        ;         sta.w_spI     0
002D: 6202          ;         adj_imm       2
002F: A902          ;         lda_imm       2
0031: AB8DE209      ;         sta.w_abs     i_003
0035:               ; L_008
0035: ABA9E803      ;         lda.w_imm16   1000
0039: ABCDE209      ;         cmp.w_abs     i_003
003D: AB5003        ;         bge_rel       L_009
0040: 4C6400        ;         jmp_abs       L_010
0043:               ; L_009
0043: AB541202      ;         psh.w_imm16   sieve_002
0047: ABADE209      ;         lda.w_abs     i_003
004B: AB3A          ;         dec.w_A       
004D: AB0A          ;         asl.w_A       
004F: 18            ;         clc_imp       
0050: AB6300        ;         adc.w_sp      0
0053: AB8300        ;         sta.w_sp      0
0056: A901          ;         lda_imm       1
0058: BB8300        ;         sta.w_spI     0
005B: 6202          ;         adj_imm       2
005D: ABEEE209      ;         inc.w_abs     i_003
0061: 4C3500        ;         jmp_abs       L_008
0064:               ; L_010
0064: ABCEE209      ;         dec.w_abs     i_003

_________________
Michael A.


Last edited by MichaelM on Tue Nov 15, 2016 2:07 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 12, 2016 6:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
What a gift! Many thanks for making this and sharing it!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 17, 2021 3:06 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
The M65C02A Pascal compiler, PC65, has been updated in some minor ways over the past few years. The primary changes in the code generator were (1) that stack-relative addressing was changed to 1-based instead of 0-based as originally implemented, and (2) resolving indirect addressing of pointers on the stack is now performed using the FORTH VM's IP register indirect addressing mode instead of stack-relative indirect addressing.

The AWK-based assembler has been replaced with a Python-based assembler, PyAsm65. In addition to supporting all of the M65C02A instructions, the PyAsm65 assembler provides a number of peephole optimizations that can be applied to the assembler file produced by the Pascal compiler. The peephole optimizations currently performed are as follows:

    pho_ldaImmPha_to_pshImm(source)
    pho_ldaImmSta_to_Stz(source)
    pho_StackAdd_to_DirectAdd(source)
    pho_StackCmp_to_DirectCmp(source)
    pho_optimizeBooleanTest(source, balanced)
    pho_optimizeBooleanTest2(source)
    pho_ConvertAdc_to_Inc(source)
    pho_optimize1DArrayLoad(source)
    pho_optimize1DArrayWrite(source)
    pho_ReduceConstantQuotient(source)
    pho_ReduceVarImmProduct(source)
    pho_ReduceImmVarProduct(source)

Most of these peephole optimizations are performed on the Sieve.asm source file. Some examples of the optimizations are provided below:
pho_ReduceConstantQuotient reduces the following quotient of constants:
Code:
(  30)                  ; ;   11:     limit := max DIV 2;
(  31) 0223 ABA9E803    ;    lda.w #1000
(  32) 0227 AB48        ;    pha.w
(  33) 0229 A902        ;    lda #2
(  34) 022B AB48        ;    pha.w
(  35) 022D 203104      ;    jsr _idiv
(  36) 0230 C204        ;    adj #4
(  37) 0232 AB8D160D    ;    sta.w limit_005

into the following code:
Code:
(  31)                  ; ;   11:     limit := max DIV 2;
(  37) 0225 ABA9F401    ;    lda.w #500

pho_optimize1DArrayWrite optimizes 1D array assignment:
Code:
(  38)                  ; ;   12:     sieve[1] := FALSE;
(  39) 0236 ABE24205    ;    psh.w #sieve_002
(  40) 023A A901        ;    lda #1
(  41) 023C AB3A        ;    dec.w a
(  42) 023E AB0A        ;    asl.w a
(  43) 0240 18          ;    clc
(  44) 0241 CB7501      ;    adc.w 1,S
(  45) 0244 CB9501      ;    sta.w 1,S
(  46) 0247 A900        ;    lda #0
(  47) 0249 8B6B        ;    pli.s
(  48) 024B AB8300      ;    sta.w 0,I++

into the following code:
Code:
(  39)                  ; ;   12:     sieve[1] := FALSE;
(  41) 022D A901        ;    lda #1
(  42) 022F AB3A        ;    dec.w a
(  43) 0231 AB0A        ;    asl.w a
(  44) 0233 ABA8        ;    tay.w
(  47) 0235 A900        ;    lda #0
(  49) 0237 AB999B04    ;    sta.w sieve_002,Y

This set of peephole optimizations, when applied to Sieve.asm, result in significant improvements in the size and runtime of the Sieve benchmark: runtime is approximately 50% faster. Examples of the other peephole optimizations currently implemented in PyAsm65 can be found be comparing the Sieve.prn and Sieve2.prn files.

Note: the 1D array assignment and load / read peephole optimizations can be further improved. As currently implemented, these optimizations are not currently checking to see if the index is a constant or a variable. If the index is a constant, then a further optimization can be performed so that the dec and asl instructions could be replaced with a ldy #imm instruction. That optimization will be added to these peephole optimization functions in the near future.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 05, 2021 7:39 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Some results with the PC65 Pascal compiler and PyAsm65 6502/65C02/M65C02A assembler.

For a Pascal implementation of the Sieve of Eratosthenes, the following table (from this thread) provides various timings with and without peephole optimization (provided with the PyAsm65 assembler):
Code:
       | Time (s) | CPI  | Avg Inst Len | Total Cycles | Num Inst | sec/cycle (s) |    MIPS    |
Sieve  |   0.777  | 3.53 |     2.53     |   370,905    | 105,047  |  0.00000209   | 135,158.80 | no peephole
Sieve2 |   0.374  | 3.88 |     2.96     |   187,977    |  48,495  |  0.00000199   | 129,590.55 | peephole
Sieve3 |   0.361  | 4.30 |     3.06     |   139,759    |  32,509  |  0.00000258   |  89,935.19 | hand-optimized, 8-bit boolean
Sieve4 |   0.306  | 4.32 |     3.07     |   139,186    |  32,222  |  0.00000220   | 105,331.56 | Sieve3 + optimized branches
Also, a Fibonacci sequence computation program in Pascal was also generated. The same I/O functions required to implement the Sieve program were used. Three different programs were constructed. The first computes the maximum (signed) 16-bit Fibonacci number in "main", the second uses a function to calculate the maximum Fibonacci number with a single call from "main", and the third uses a recursive function to calculate the maximum Fibonacci number. In the case of the functions, a global array (in "main") is used to hold all of the representable Fibonacci numbers.

Unlike the Sieve program, which used a 1-based array, the Fibonacci programs used a 0-based array to store the Fibonacci numbers. This meant that the array index did not have to be decremented. This made the 1D array peephole optimization routines for the Sieve benchmark not work for the Fibonacci programs. Thus, some additional 1D array index calculation, load, and store peephole optimizers were added plus other optimizers that dealt with variables on the stack rather than in absolutely addressed memory.

The results for the Fibonacci programs are shown in the following table:
Code:
                 | Time (s) |  CPI |  AIL | # Cycles | # Inst | Tc (us) |     MIPS    |
Fibonacci        | 0.017424 | 3.62 | 2.03 |   6846   |  1890  |  2.545  | 109,469.829 | peephole, balanced
Fibonacci2       | 0.017375 | 3.62 | 2.03 |   6846   |  1890  |  2.538  | 108,776.978 | peephole, not balanced
Fibonacci3       | 0.017631 | 3.62 | 2.04 |   7090   |  1959  |  2.487  | 111,112.370 | no peephole optimization
Fibonacci_funct  | 0.019328 | 3.49 | 1.94 |   6900   |  1976  |  2.801  | 102,232.455 | peephole, balanced
Fibonacci_funct2 | 0.019074 | 3.49 | 1.94 |   6898   |  1974  |  2.765  | 103,491.122 | peephole, not balanced
Fibonacci_funct3 | 0.021689 | 3.47 | 1.98 |   8112   |  2336  |  2.674  | 107,705.856 | no peephole optimization
Fibonacci_recurs | 0.024695 | 3.37 | 2.08 |   9986   |  2962  |  2.473  | 119,944.767 | peephole, balanced
Fibonacci_recurs2| 0.024487 | 3.40 | 2.08 |   9856   |  2898  |  2.485  | 118,346.095 | peephole, not balanced
Fibonacci_recurs3| 0.029769 | 3.62 | 2.14 |  12705   |  3507  |  2.343  | 117,805.138 | no peephole optimization
The Pascal code, the assembler, and Pascal / Assembler print files for these programs can be found in the TestPgms subdirectory of the PyAsm65 github repository.

Edit: added cross-link for Sieve performance table.

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 21 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron