6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 3:57 pm

All times are UTC




Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Compiler writing
PostPosted: Sat May 02, 2020 4:31 pm 
Offline

Joined: Sat Mar 11, 2017 1:56 am
Posts: 276
Location: Lynden, WA
What resources exist to learn how to write a compiler that folks might recommend? C specifically.

I should mention, I'm not doing this because I need a compiler, so I don't need to be directed towards existing ones. This is for fun and education.

A quality book, or YouTube series would be awesome.


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sat May 02, 2020 5:25 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Dan Moos wrote:
What resources exist to learn how to write a compiler that folks might recommend? C specifically.

I should mention, I'm not doing this because I need a compiler, so I don't need to be directed towards existing ones. This is for fun and education.

A quality book, or YouTube series would be awesome.


There are a billion books on the subjects - people have written PhDs on it and it's often a standard part of a Comp. Sci degree (in the UK), so there is plenty to find and a million ways to go about doing it..

Start here: https://compilers.iecc.com/crenshaw/

Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sat May 02, 2020 8:53 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Aho’s Dragon book is an often referenced guide to compiler writing
https://docs.google.com/open?id=0B1MogsyNAsj9elVzQWR5NWVTSVE

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sat May 02, 2020 11:20 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
If you want something which leans toward practice rather than theory, I liked this book back in the day:

https://holub.com/compiler/


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sun May 03, 2020 5:08 am 
Offline

Joined: Tue Feb 24, 2015 11:07 pm
Posts: 81
This one is billed "A Compiler Writing Journey" and covers the step by step implementation of a C compiler with buildable examples at the end of each chapter:

https://github.com/DoctorWkt/acwj


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sun May 03, 2020 5:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Not a book, but this is a classic:
Let's Build a Compiler, by Jack Crenshaw

via this discussion which has many other pointers, and which headlines this series by Nora Sandler:
Writing a C Compiler:
Quote:
I’ve been working on my own C compiler, nqcc for the past several weeks, using Abdulaziz Ghuloum’s An Incremental Approach to Compiler Construction as a roadmap. I really like Ghuloum’s approach: you start by compiling a tiny, trivial subset of your source language all the way down to assembly. Then you add new language features, one step at a time.


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sun May 17, 2020 10:16 pm 
Offline

Joined: Sat Mar 11, 2017 1:56 am
Posts: 276
Location: Lynden, WA
Thanks all. Lots to look at here!


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Tue May 19, 2020 2:40 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 145
Location: Lake Tahoe
I saw that one of my profs, Hank Dietz, from college has posted the notes to his compiler class. A pretty quick intro into compilers: http://aggregate.org/EE380/notes.pdf

It's interesting to note that Terrence Parr worked with Prof. Dietz about the time Terrence started ANTLR.


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sat May 23, 2020 8:56 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
The Dragon book, mentioned above, seems to be well respected by older programmers, but it may be a little long in the tooth. I suggest "Engineering: A Compiler" by Cooper and Torczon.


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sun May 24, 2020 9:34 pm 
Offline

Joined: Sat Mar 11, 2017 1:56 am
Posts: 276
Location: Lynden, WA
For now I've got an old copy of the Dragon Book on the way. Everything in my research seems to imply that, although it is very dated in the languages it refers to, the concepts are described in a manner that makes it still very useful.

My hope is to create "enough" of a C compiler such that I can create games on my homebrew.

I have no need for standards compliance (I would do my best to be so though)
I have no need for total completeness (I imagine it would slowly gain features as I needed them)
I have no illusions about having even a fraction of the standard library.

Basically, I just want to have most of the flow control statements, the types and operators that I need, and the ability to have local variables. And the ability to inline assembly when the limits of my feeble compiler are hit.

I'll probably end up with a fairly limited "C like" language, and that is fine! Not caring about portability probably will be freeing in some ways!


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Sun May 24, 2020 10:17 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
C is a tricky language to compile properly. Its grammar has some ambiguities that mean you must look ahead several tokens to determine what syntax rule should be followed.

I can't remember the specifics but I think type casts (e.g. '(t) x' -- the (t) looks like a sub expression until you find work out that t is a type) and array definitions in types (e.g. char *x[] -- x is an array of char pointers) are examples. A lot of compilers use LALR(n) based code generators (e.g. yacc, bison, antlr) to handle this.

I've been searching for a good language to write a native 6502 compiler for. I've yet to find one. Early languages like BCPL and B are inherently word based and untyped so you end up doing a lot of address computations to access memory.
PASCAL and MODULA-2 have complicated stack requirements for nested procedures. BASIC and LISP are normally interpreted on 8-bit machines. FORTH can be either compiled or interpreted and works quite well but its like 'Marmite' and you either love it or hate it.

I wonder if some of these languages would be better compiled to pseudo-code and run on a virtual machine rather than native code. The 6502 works well for intepreters.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Mon May 25, 2020 9:55 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Have you considered Lua?


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Mon May 25, 2020 2:35 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
BitWise wrote:
BASIC and LISP are normally interpreted on 8-bit machines.


BASIC is a curios one in many respects. FORTRAN too, surprisingly. One way to compile FORTRAN is a one-pass compiler that generates an intermediate code plus a symbol table - a block of variable storage, if you like, followed by an interpreter for this intermediate code. I have such a system on my PDP-8.

BASIC - often 'tokenised' then "detokenised" for the LIST or editing command (With note that some Forths do this too) However some BASICS only tokenise the keywords, so you end up with

Code:
PRINT 3.14


being stored as a token for PRINT, then the textual representation of 3.14 which has to subsequently be parsed at run-time.

Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.

Back to compilers - what I did was essentially a one-pass compiler with the symbol table and some run-time fixups (scan for goto/gosub/proc/func targets) there is also a "just in time" compiler for expression evaluation - I cache the RPN generated by the shunting yard expression evaluator so turning that into a 2-pass compiler would not be that hard - and it's on my "to-do" list once I'm happy with my BCPL system on my '816 board. (Although plan A there is to translate the existing tokeniser/sym. table builder into BCPL then have the run-time in ASM, however after doing a lot more with BCPL I'm now looking at compiling directly into code for the BCPL Cintcode VM)

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Mon May 25, 2020 4:48 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
drogon wrote:
Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.

I am involved in a detailed analysis of WozBASIC, which was the first 6502 BASIC interpreter. Woz hand-assembled it, so there are several obvious non-linear patches and a couple of minor bugs, but his philosophy is quite evident: tokenize as much as you reasonably can, including constants. He even has several different tokens that execute differently but LIST the same, like PRINT a blank line, PRINT a variable, PRINT a string literal, etc. His comma tokens are even more diverse. The run-time subroutines for each one are generally short and simple (sometimes just an RTS), but the tokenizer and the program LISTer are quite complex. He has numerous eccentric coding methods, but the result is compact and very efficient. It's a shame that he was too busy to add floats, a situation that I hope to rectify some day, assuming I am able to complete my analysis and extend his code without disrespecting his legacy.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Compiler writing
PostPosted: Mon May 25, 2020 6:29 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
barrym95838 wrote:
drogon wrote:
Some years back I wrote what I considered my "ideal" BASIC and I took it a stage further - I turned everything into a token. So the above was turned into 2 tokens - one for keyword and one for a constant - stored in floating point binary representation a symbol table. The token was an index into the symbol table. That made run-time fast as there were never any numbers to parse or variables to create storage for. It also made LIST hard because you now had a binary representation of a number to print out (So I cheated and also stored the textual representation for nothing other than the LIST command. I even tokenised comments.

I am involved in a detailed analysis of WozBASIC, which was the first 6502 BASIC interpreter. Woz hand-assembled it, so there are several obvious non-linear patches and a couple of minor bugs, but his philosophy is quite evident: tokenize as much as you reasonably can, including constants. He even has several different tokens that execute differently but LIST the same, like PRINT a blank line, PRINT a variable, PRINT a string literal, etc. His comma tokens are even more diverse. The run-time subroutines for each one are generally short and simple (sometimes just an RTS), but the tokenizer and the program LISTer are quite complex. He has numerous eccentric coding methods, but the result is compact and very efficient. It's a shame that he was too busy to add floats, a situation that I hope to rectify some day, assuming I am able to complete my analysis and extend his code without disrespecting his legacy.


My float issue in LIST was that one day I entered a constant (I forget what, but lets say 10) and when I LISTed the program saw 10.00000001, so good luck with printing floats in the listing :-)

Sounds like a worthwhile project though.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: Miles J., nollkolltroll and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: