Compiler writing

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Dan Moos
Posts: 277
Joined: 11 Mar 2017
Location: Lynden, WA

Compiler writing

Post by Dan Moos »

What resources exist to learn how to write a compiler that folks might recommend? C specifically.

I should mention, I'm not doing this because I need a compiler, so I don't need to be directed towards existing ones. This is for fun and education.

A quality book, or YouTube series would be awesome.
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: Compiler writing

Post by drogon »

Dan Moos wrote:
What resources exist to learn how to write a compiler that folks might recommend? C specifically.

I should mention, I'm not doing this because I need a compiler, so I don't need to be directed towards existing ones. This is for fun and education.

A quality book, or YouTube series would be awesome.
There are a billion books on the subjects - people have written PhDs on it and it's often a standard part of a Comp. Sci degree (in the UK), so there is plenty to find and a million ways to go about doing it..

Start here: https://compilers.iecc.com/crenshaw/

Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Re: Compiler writing

Post by BitWise »

Aho’s Dragon book is an often referenced guide to compiler writing
https://docs.google.com/open?id=0B1Mogs ... WR5NWVTSVE
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Compiler writing

Post by BillG »

If you want something which leans toward practice rather than theory, I liked this book back in the day:

https://holub.com/compiler/
unclouded
Posts: 81
Joined: 24 Feb 2015

Re: Compiler writing

Post by unclouded »

This one is billed "A Compiler Writing Journey" and covers the step by step implementation of a C compiler with buildable examples at the end of each chapter:

https://github.com/DoctorWkt/acwj
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Compiler writing

Post by BigEd »

Not a book, but this is a classic:
Let's Build a Compiler, by Jack Crenshaw

via this discussion which has many other pointers, and which headlines this series by Nora Sandler:
Writing a C Compiler:
Quote:
I’ve been working on my own C compiler, nqcc for the past several weeks, using Abdulaziz Ghuloum’s An Incremental Approach to Compiler Construction as a roadmap. I really like Ghuloum’s approach: you start by compiling a tiny, trivial subset of your source language all the way down to assembly. Then you add new language features, one step at a time.
Dan Moos
Posts: 277
Joined: 11 Mar 2017
Location: Lynden, WA

Re: Compiler writing

Post by Dan Moos »

Thanks all. Lots to look at here!
resman
Posts: 154
Joined: 12 Dec 2015
Location: Lake Tahoe
Contact:

Re: Compiler writing

Post by resman »

I saw that one of my profs, Hank Dietz, from college has posted the notes to his compiler class. A pretty quick intro into compilers: http://aggregate.org/EE380/notes.pdf

It's interesting to note that Terrence Parr worked with Prof. Dietz about the time Terrence started ANTLR.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: Compiler writing

Post by johnwbyrd »

The Dragon book, mentioned above, seems to be well respected by older programmers, but it may be a little long in the tooth. I suggest "Engineering: A Compiler" by Cooper and Torczon.
Dan Moos
Posts: 277
Joined: 11 Mar 2017
Location: Lynden, WA

Re: Compiler writing

Post by Dan Moos »

For now I've got an old copy of the Dragon Book on the way. Everything in my research seems to imply that, although it is very dated in the languages it refers to, the concepts are described in a manner that makes it still very useful.

My hope is to create "enough" of a C compiler such that I can create games on my homebrew.

I have no need for standards compliance (I would do my best to be so though)
I have no need for total completeness (I imagine it would slowly gain features as I needed them)
I have no illusions about having even a fraction of the standard library.

Basically, I just want to have most of the flow control statements, the types and operators that I need, and the ability to have local variables. And the ability to inline assembly when the limits of my feeble compiler are hit.

I'll probably end up with a fairly limited "C like" language, and that is fine! Not caring about portability probably will be freeing in some ways!
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Re: Compiler writing

Post by BitWise »

C is a tricky language to compile properly. Its grammar has some ambiguities that mean you must look ahead several tokens to determine what syntax rule should be followed.

I can't remember the specifics but I think type casts (e.g. '(t) x' -- the (t) looks like a sub expression until you find work out that t is a type) and array definitions in types (e.g. char *x[] -- x is an array of char pointers) are examples. A lot of compilers use LALR(n) based code generators (e.g. yacc, bison, antlr) to handle this.

I've been searching for a good language to write a native 6502 compiler for. I've yet to find one. Early languages like BCPL and B are inherently word based and untyped so you end up doing a lot of address computations to access memory.
PASCAL and MODULA-2 have complicated stack requirements for nested procedures. BASIC and LISP are normally interpreted on 8-bit machines. FORTH can be either compiled or interpreted and works quite well but its like 'Marmite' and you either love it or hate it.

I wonder if some of these languages would be better compiled to pseudo-code and run on a virtual machine rather than native code. The 6502 works well for intepreters.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: Compiler writing

Post by Chromatix »

Have you considered Lua?
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: Compiler writing

Post by drogon »

BitWise wrote:
BASIC and LISP are normally interpreted on 8-bit machines.
BASIC is a curios one in many respects. FORTRAN too, surprisingly. One way to compile FORTRAN is a one-pass compiler that generates an intermediate code plus a symbol table - a block of variable storage, if you like, followed by an interpreter for this intermediate code. I have such a system on my PDP-8.

BASIC - often 'tokenised' then "detokenised" for the LIST or editing command (With note that some Forths do this too) However some BASICS only tokenise the keywords, so you end up with

Code: Select all

PRINT 3.14
being stored as a token for PRINT, then the textual representation of 3.14 which has to subsequently be parsed at run-time.

Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.

Back to compilers - what I did was essentially a one-pass compiler with the symbol table and some run-time fixups (scan for goto/gosub/proc/func targets) there is also a "just in time" compiler for expression evaluation - I cache the RPN generated by the shunting yard expression evaluator so turning that into a 2-pass compiler would not be that hard - and it's on my "to-do" list once I'm happy with my BCPL system on my '816 board. (Although plan A there is to translate the existing tokeniser/sym. table builder into BCPL then have the run-time in ASM, however after doing a lot more with BCPL I'm now looking at compiling directly into code for the BCPL Cintcode VM)

Cheers,

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Compiler writing

Post by barrym95838 »

drogon wrote:
Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.
I am involved in a detailed analysis of WozBASIC, which was the first 6502 BASIC interpreter. Woz hand-assembled it, so there are several obvious non-linear patches and a couple of minor bugs, but his philosophy is quite evident: tokenize as much as you reasonably can, including constants. He even has several different tokens that execute differently but LIST the same, like PRINT a blank line, PRINT a variable, PRINT a string literal, etc. His comma tokens are even more diverse. The run-time subroutines for each one are generally short and simple (sometimes just an RTS), but the tokenizer and the program LISTer are quite complex. He has numerous eccentric coding methods, but the result is compact and very efficient. It's a shame that he was too busy to add floats, a situation that I hope to rectify some day, assuming I am able to complete my analysis and extend his code without disrespecting his legacy.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: Compiler writing

Post by drogon »

barrym95838 wrote:
drogon wrote:
Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.
I am involved in a detailed analysis of WozBASIC, which was the first 6502 BASIC interpreter. Woz hand-assembled it, so there are several obvious non-linear patches and a couple of minor bugs, but his philosophy is quite evident: tokenize as much as you reasonably can, including constants. He even has several different tokens that execute differently but LIST the same, like PRINT a blank line, PRINT a variable, PRINT a string literal, etc. His comma tokens are even more diverse. The run-time subroutines for each one are generally short and simple (sometimes just an RTS), but the tokenizer and the program LISTer are quite complex. He has numerous eccentric coding methods, but the result is compact and very efficient. It's a shame that he was too busy to add floats, a situation that I hope to rectify some day, assuming I am able to complete my analysis and extend his code without disrespecting his legacy.
My float issue in LIST was that one day I entered a constant (I forget what, but lets say 10) and when I LISTed the program saw 10.00000001, so good luck with printing floats in the listing :-)

Sounds like a worthwhile project though.

Cheers,

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Post Reply