6502.org • View topic - Crenshaw - Let's Build a Compiler

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Crenshaw - Let's Build a Compiler

Page 1 of 7

[ 103 posts ]

Go to page 1, 2, 3, 4, 5 ... 7 Next

Previous topic | Next topic

Author

Message

barnacle

Post subject: Crenshaw - Let's Build a Compiler

Posted: Sun Sep 15, 2024 9:23 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

I've been playing with the code examples he gives. It's a little tricky because (a) he uses Pascal, which I don't, and (b) he builds for a 68000 while my eventual target will be a 65c02.

Nonetheless, my C version can so far correctly identify all that needs to be done for (one-digit number only!) basic arithmetic:

Code:

> 1*((2+3)/(4-5))
Executing Crenshaw...
   Total = 1
   Push Total
   Total = 2
   Push Total
   Total = 3
   Total = (popped) TOS + Total
   Push Total
   Total = 4
   Push Total
   Total = 5
   Total = (popped) TOS - Total
   Total = (popped) TOS / Total
   Total = (popped) TOS * Total

Next steps: skip spaces, multi-digit numbers, and variables. In the first instance I'm aiming at a Tiny Basic.

Neil

p.s. note that those '=' are assignments, not comparison tests.

Top

BigEd

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Sun Sep 15, 2024 9:45 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

Seems like a good start! (Hope you don't mind, I thought it worth posting an index thread nearby)

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Sun Sep 15, 2024 11:13 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

A good thought, thanks.

Of course, the translation of these text instructions to actual 65c02 code is another question. At the moment, the basic takes a mere 86k of x86 code from C. I think it can be improved, as all it does is correctly insert text into the memory (and this).

Neil

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Tue Sep 24, 2024 8:50 pm

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

A thought: how deep would you make a calculation stack? It's obviously dependent on the number of brackets in the expression you're trying to evaluate, but beyond that... for basic arithmetic, Crenshaw's method doesn't go beyond one level, absent brackets.

Code:

> (17*43)/((344/3)-2)
Executing Crenshaw...
   Expr = 17
   Push Expr
0011 0000 0000 0000 0000 0000 0000 0000 
.    ^
   Expr = 43
   Expr = (popped) TOS * Expr
0011 0000 0000 0000 0000 0000 0000 0000 
^
   Push Expr
02db 0000 0000 0000 0000 0000 0000 0000 
.    ^
   Expr = 344
   Push Expr
02db 0158 0000 0000 0000 0000 0000 0000 
.    .    ^
   Expr = 3
   Expr = (popped) TOS / Expr
   Push Expr
02db 0072 0000 0000 0000 0000 0000 0000 
.    .    ^
   Expr = 2
   Expr = (popped) TOS - Expr
   Expr = (popped) TOS / Expr
Result = 6 

The stack pushes then increments, or decrements and pops. The caret points to the top of stack - i.e. next place to write. Values in hex.

I guesstimated twenty levels, implemented sixteen, and haven't been past three... (I'm still trying to prove to myself whether the stacked outputs can be interleaved with return addresses; I'm not at all sure yet. If they can, so much the better.)

Neil

Top

GARTHWILSON

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Sep 25, 2024 12:52 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

The "Enough stack space?" page of the 6502 stacks treatise at http://wilsonminesco.com/stacks/enoughstackspace.html might help; also the "Stacks potpourri" page, at http://wilsonminesco.com/stacks/potpourri.html .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Sep 25, 2024 6:14 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

Thanks Garth, I'd seen them before.

Code:

> (1+(1+(1+(1+(1+(1+(1+(1+(1+(1+(1)))))))))))
Executing Crenshaw...
   Expr = 1
   Push Expr
0001 0000 0000 0000 0000 0000 0000 0000 
.    ^
   Expr = 1
   Push Expr
0001 0001 0000 0000 0000 0000 0000 0000 
.    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0000 0000 0000 0000 0000 
.    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0000 0000 0000 0000 
.    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0000 0000 0000 
.    .    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0001 0000 0000 
.    .    .    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0001 0001 0000 
.    .    .    .    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    ^
   Expr = 1
   Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    .    ^
   Expr = 1
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    ^
   Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
^
Result = 11 

As you can see, this is just sticking an int (so two bytes) on the stack at each iteration (I only display the first eight words of the stack, but the pointer keeps going

)

The same calculation without the brackets just uses the top position, as expected.

Code:

(((((((((((1)))))))))))

doesn't use the stack at all.

Neil

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Sep 25, 2024 11:07 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

But it looks like the program stack is, um, fifty-five deep at least, according to the trace :mrgreen:

Code:

> (((((((((((1)))))))))))
Executing Crenshaw...
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
GetNum
GetChar
   Expr = 1
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Result = 1 

Top

Yuri

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Thu Sep 26, 2024 9:58 am

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 257
Location: Texas

Haven't see the Crenshaw stuff yet, though my original introduction to compilers was through the book:
"Writing Compilers and Interpreters"
Ronald Mak

(Second Edition)

Amazon Link

(There's a Third Edition, but he changes from C to Java for the host language)

In any event, doing the basic lexing for my stuff has usually followed the rough pseudo code of:

Code:

lex(file)
  while not eof(file) do
    ch = readch(file);
    switch (ch)
      case is digit: parse_a_number(file);
      case is letter: parse_a_keyword_or_ident(file);
      case is quote: parse_a_string(file);
      default: parse_special(file);
    end
  end
end

Or something along those lines.

Then for basic parsing I've generally written a recursive decent parser. However, given the limited stack of a 6502 I'm contemplating if a shift-reduce parser might be easier to work with memory management wise. (Though considerably harder to write)

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Fri Sep 27, 2024 5:58 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

This tiny basic doesn't have anything as exciting as a file system, though it might get something in the future. It's simple enough that something very similar will apply as yours; the main thing I wanted from Crenshaw was the expression parser.

It treats input a line at a time: if the line starts with a positive number it stuffs it into memory; if not, and its a valid command, it tries to execute the command (e.g. run, list, print).

Once it's running, it executes lines - one instruction only per line - in order. I'm looking at only a few control flow constructs: for/to/next, goto, gosub, do/while, if/then. The usual tiny basic 26 variables and one string... we shall see how it goes; I've never written a basic before.

Neil

Top

dmsc

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Sat Sep 28, 2024 4:18 pm

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138

Hi!

Yuri wrote:

Then for basic parsing I've generally written a recursive decent parser. However, given the limited stack of a 6502 I'm contemplating if a shift-reduce parser might be easier to work with memory management wise. (Though considerably harder to write)

The FastBasic compiler uses a recursive-descent parser, but implemented as a PEG parser, so each character in the input is one token and it does not need a lexer. This makes the parser faster, and it uses less memory as you don't need to store the input tokens as you parse the source directly.

In FastBasic, each parser call uses 2 bytes of stack for the return address and 2 bytes for the parsing state (the input and output pointers), as it parses lines of up to 255 bytes generating up to 255 bytes of code maximum. With this constraints, it is very difficult to find sequences that overflow the stack, this is an example from the test-suite that gives an error, because the parser calls itself after each ','

Code:

data y()=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,
53,54,55

Limiting stack usage also keeps the parsing fast, as you avoid the worst exponential behaviours.

To make the parser code smaller, the constructs are not written directly in assembler, the full PEG parser is stored as a state machine that is interpreted by the 6502 parser.

Have Fun!

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Tue Oct 01, 2024 8:01 pm

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

Damn, I broke something:

Code:

 
3*4=12, but
(3)*(4)=16

Something in the Factor routine, it would seem, but I can't see it yet. :mrgreen:

Neil

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Tue Oct 01, 2024 8:26 pm

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

Code:

int16_t Factor (void)
{
#if TRACE == 1
   printf("Factor\n");
#endif
   char buf[80];
   int16_t factor;
   if (Look == '(')
   {
      Match ('(');
      factor = Expression ();
      Match (')');
   }
   else
   {
      factor = GetNum();
   }
   return factor;
}
   

int16_t Term (void)
{
#if TRACE == 1
   printf ("Term\n");
#endif
   value = Factor();
   while ((Look == '*') || (Look == '/'))
   {
      int16_t k;
      switch (Look)
      {
         case '*':
            Match ('*');
            printf ("value = %d\n", value);
            k = Factor();
            printf ("value = %d, k = %d\n", value, k);
            value *= k;
            break;
         case '/':
            printf ("div\n");
            Match ('/');
            value /= Factor();
            break;
         default:
            // we can't get here
            break;
      }
   }
   return value;
}

gives the following:

Code:

Tiny Basic 65c02
> (3)*(4)
GetChar
Expression
Term
Factor
Match '('
GetChar
Expression
Term
Factor
GetNum
GetChar
Match ')'
GetChar
Match '*'
GetChar
value = 3
Factor
Match '('
GetChar
Expression
Term
Factor
GetNum
GetChar
Match ')'
GetChar
GetChar
value = 4, k = 4
Result = 16 
barnacle@barnacle-Latitude-7480:~/Documents/Tiny basic$ ./tiny
Tiny Basic 65c02
> 3*4
GetChar
Expression
Term
Factor
GetNum
GetChar
Match '*'
GetChar
value = 3
Factor
GetNum
GetChar
GetChar
value = 3, k = 4
Result = 12 

So for some reason, calling Factor() is messing around with value...

Neil

Top

Yuri

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Oct 02, 2024 12:50 am

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 257
Location: Texas

Doesn't look like it's from the section of code you posted.

Hard to say without the rest of the code to look at.

I see a buffer declared in your Factor() function, but it isn't being used.

What's the function look like for parsing a number? Is that maybe overrunning something and writing off in to the stack somewhere?

What does the Expression() function look like?

What is setting/advancing Look? (I assume that's a character?)

Top

Yuri

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Oct 02, 2024 12:55 am

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 257
Location: Texas

Another thought, are you trying to compile and run this on a 6502 or a more modern computer? Allocating an 80 byte buffer on the 6502's stack like that is a monumental amount of memory.

Thought 2: I don't see 'value' declared in either of these functions. Did you make this a global variable?

Top

barnacle

Post subject: Re: Crenshaw - Let's Build a Compiler

Posted: Wed Oct 02, 2024 5:32 am

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 983
Location: Potsdam, DE

Thanks Yuri, you nailed it with Thought 2... a missing definition for value, which should be local. Not at all sure where it was picking up a global, though, but there is another 'value' in Expression(), which is recursively called.

The start of term() should have been

Code:

int16_t Term (void)
{
#if TRACE == 1
   printf ("Term\n");
#endif
   uint16_t value = Factor();
   while ((Look == '*') || (Look == '/'))
   {
...

buffer[] is left over from some now excised code generation (and now excised itself). Translating from forty-year-old Pascal to C isn't always obvious.

I'll post all the code soon when I've got it happy and working; it still needs a lot of clearing up. But Dr Crenshaw was working to Pascal's features, which don't always sit well with C's... Look is a global char which holds the character currently being considered; GetChar both loads that char and advances to the next character. Match checks whether Look is the character it's supposed to be and complains if it isn't - there's an amount of error handling which I need to do differently from Dr C - he just crashed back to the OS on an error but I don't necessarily want to do that.

Expression is of the same form as Term but checks for addition/subtraction operations:

Code:

int16_t Expression (void)
{
   int16_t value;
   if (IsAddOp(Look))
   {
      value = 0;
   }
   else
   {
      value = Term();
   }
   while (IsAddOp(Look))
   {
      switch (Look)
      {
         case '+':
            Match ('+');
            value += Term();
            break;
         case '-':
            Match ('-');
            value -= Term();
            break;
         default:
            // we can't get here
            break;
      }
   }
   return value;
}

(Yes, I do use lots of strictly not-necessary braces; I tend to use the MISRA model (and have to, for work)).

Dr C's code is remarkably short and very clever due largely to his use of recursion. I like it, once I've got my head round it - the tricky parts are translation to C and keeping track of his 'replace references to xxx with yyy' - it's easy to miss one. The expression parser is three routines of perhaps a dozen active lines each, and a handful of helpers.

Neil

Top

Page 1 of 7

[ 103 posts ]

Go to page 1, 2, 3, 4, 5 ... 7 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 10 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum