Crenshaw - Let's Build a Compiler

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Post Reply
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Crenshaw - Let's Build a Compiler

Post by barnacle »

I've been playing with the code examples he gives. It's a little tricky because (a) he uses Pascal, which I don't, and (b) he builds for a 68000 while my eventual target will be a 65c02.

Nonetheless, my C version can so far correctly identify all that needs to be done for (one-digit number only!) basic arithmetic:

Code: Select all

> 1*((2+3)/(4-5))
Executing Crenshaw...
	Total = 1
	Push Total
	Total = 2
	Push Total
	Total = 3
	Total = (popped) TOS + Total
	Push Total
	Total = 4
	Push Total
	Total = 5
	Total = (popped) TOS - Total
	Total = (popped) TOS / Total
	Total = (popped) TOS * Total
Next steps: skip spaces, multi-digit numbers, and variables. In the first instance I'm aiming at a Tiny Basic.

Neil

p.s. note that those '=' are assignments, not comparison tests.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by BigEd »

Seems like a good start! (Hope you don't mind, I thought it worth posting an index thread nearby)
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

A good thought, thanks.

Of course, the translation of these text instructions to actual 65c02 code is another question. At the moment, the basic takes a mere 86k of x86 code from C. I think it can be improved, as all it does is correctly insert text into the memory (and this).

Neil
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

A thought: how deep would you make a calculation stack? It's obviously dependent on the number of brackets in the expression you're trying to evaluate, but beyond that... for basic arithmetic, Crenshaw's method doesn't go beyond one level, absent brackets.

Code: Select all

> (17*43)/((344/3)-2)
Executing Crenshaw...
	Expr = 17
	Push Expr
0011 0000 0000 0000 0000 0000 0000 0000 
.    ^
	Expr = 43
	Expr = (popped) TOS * Expr
0011 0000 0000 0000 0000 0000 0000 0000 
^
	Push Expr
02db 0000 0000 0000 0000 0000 0000 0000 
.    ^
	Expr = 344
	Push Expr
02db 0158 0000 0000 0000 0000 0000 0000 
.    .    ^
	Expr = 3
	Expr = (popped) TOS / Expr
	Push Expr
02db 0072 0000 0000 0000 0000 0000 0000 
.    .    ^
	Expr = 2
	Expr = (popped) TOS - Expr
	Expr = (popped) TOS / Expr
Result = 6 
The stack pushes then increments, or decrements and pops. The caret points to the top of stack - i.e. next place to write. Values in hex.

I guesstimated twenty levels, implemented sixteen, and haven't been past three... (I'm still trying to prove to myself whether the stacked outputs can be interleaved with return addresses; I'm not at all sure yet. If they can, so much the better.)

Neil
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by GARTHWILSON »

The "Enough stack space?" page of the 6502 stacks treatise at http://wilsonminesco.com/stacks/enoughstackspace.html might help; also the "Stacks potpourri" page, at http://wilsonminesco.com/stacks/potpourri.html .
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

Thanks Garth, I'd seen them before.

Code: Select all

> (1+(1+(1+(1+(1+(1+(1+(1+(1+(1+(1)))))))))))
Executing Crenshaw...
	Expr = 1
	Push Expr
0001 0000 0000 0000 0000 0000 0000 0000 
.    ^
	Expr = 1
	Push Expr
0001 0001 0000 0000 0000 0000 0000 0000 
.    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0000 0000 0000 0000 0000 
.    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0000 0000 0000 0000 
.    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0000 0000 0000 
.    .    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0001 0000 0000 
.    .    .    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0001 0001 0000 
.    .    .    .    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    ^
	Expr = 1
	Push Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    .    ^
	Expr = 1
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    .    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
.    ^
	Expr = (popped) TOS + Expr
0001 0001 0001 0001 0001 0001 0001 0001 
^
Result = 11 
As you can see, this is just sticking an int (so two bytes) on the stack at each iteration (I only display the first eight words of the stack, but the pointer keeps going :) )

The same calculation without the brackets just uses the top position, as expected.

Code: Select all

(((((((((((1)))))))))))
doesn't use the stack at all.

Neil
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

But it looks like the program stack is, um, fifty-five deep at least, according to the trace :mrgreen:

Code: Select all

> (((((((((((1)))))))))))
Executing Crenshaw...
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
Match
GetChar
Expression
Term
Factor
GetNum
GetChar
	Expr = 1
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Match
GetChar
Result = 1 
User avatar
Yuri
Posts: 371
Joined: 28 Feb 2023
Location: Texas

Re: Crenshaw - Let's Build a Compiler

Post by Yuri »

Haven't see the Crenshaw stuff yet, though my original introduction to compilers was through the book:
"Writing Compilers and Interpreters"
Ronald Mak

(Second Edition)

Amazon Link

(There's a Third Edition, but he changes from C to Java for the host language)


In any event, doing the basic lexing for my stuff has usually followed the rough pseudo code of:

Code: Select all

lex(file)
  while not eof(file) do
    ch = readch(file);
    switch (ch)
      case is digit: parse_a_number(file);
      case is letter: parse_a_keyword_or_ident(file);
      case is quote: parse_a_string(file);
      default: parse_special(file);
    end
  end
end
Or something along those lines.

Then for basic parsing I've generally written a recursive decent parser. However, given the limited stack of a 6502 I'm contemplating if a shift-reduce parser might be easier to work with memory management wise. (Though considerably harder to write)
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

This tiny basic doesn't have anything as exciting as a file system, though it might get something in the future. It's simple enough that something very similar will apply as yours; the main thing I wanted from Crenshaw was the expression parser.

It treats input a line at a time: if the line starts with a positive number it stuffs it into memory; if not, and its a valid command, it tries to execute the command (e.g. run, list, print).

Once it's running, it executes lines - one instruction only per line - in order. I'm looking at only a few control flow constructs: for/to/next, goto, gosub, do/while, if/then. The usual tiny basic 26 variables and one string... we shall see how it goes; I've never written a basic before.

Neil
dmsc
Posts: 153
Joined: 17 Sep 2018

Re: Crenshaw - Let's Build a Compiler

Post by dmsc »

Hi!
Yuri wrote:
Then for basic parsing I've generally written a recursive decent parser. However, given the limited stack of a 6502 I'm contemplating if a shift-reduce parser might be easier to work with memory management wise. (Though considerably harder to write)
The FastBasic compiler uses a recursive-descent parser, but implemented as a PEG parser, so each character in the input is one token and it does not need a lexer. This makes the parser faster, and it uses less memory as you don't need to store the input tokens as you parse the source directly.

In FastBasic, each parser call uses 2 bytes of stack for the return address and 2 bytes for the parsing state (the input and output pointers), as it parses lines of up to 255 bytes generating up to 255 bytes of code maximum. With this constraints, it is very difficult to find sequences that overflow the stack, this is an example from the test-suite that gives an error, because the parser calls itself after each ','

Code: Select all

data y()=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,
53,54,55
Limiting stack usage also keeps the parsing fast, as you avoid the worst exponential behaviours.

To make the parser code smaller, the constructs are not written directly in assembler, the full PEG parser is stored as a state machine that is interpreted by the 6502 parser.

Have Fun!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

Damn, I broke something:

Code: Select all

 
3*4=12, but
(3)*(4)=16
Something in the Factor routine, it would seem, but I can't see it yet. :mrgreen:

Neil
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

Code: Select all

int16_t Factor (void)
{
#if TRACE == 1
	printf("Factor\n");
#endif
	char buf[80];
	int16_t factor;
	if (Look == '(')
	{
		Match ('(');
		factor = Expression ();
		Match (')');
	}
	else
	{
		factor = GetNum();
	}
	return factor;
}
	

int16_t Term (void)
{
#if TRACE == 1
	printf ("Term\n");
#endif
	value = Factor();
	while ((Look == '*') || (Look == '/'))
	{
		int16_t k;
		switch (Look)
		{
			case '*':
				Match ('*');
				printf ("value = %d\n", value);
				k = Factor();
				printf ("value = %d, k = %d\n", value, k);
				value *= k;
				break;
			case '/':
				printf ("div\n");
				Match ('/');
				value /= Factor();
				break;
			default:
				// we can't get here
				break;
		}
	}
	return value;
}
gives the following:

Code: Select all

Tiny Basic 65c02
> (3)*(4)
GetChar
Expression
Term
Factor
Match '('
GetChar
Expression
Term
Factor
GetNum
GetChar
Match ')'
GetChar
Match '*'
GetChar
value = 3
Factor
Match '('
GetChar
Expression
Term
Factor
GetNum
GetChar
Match ')'
GetChar
GetChar
value = 4, k = 4
Result = 16 
barnacle@barnacle-Latitude-7480:~/Documents/Tiny basic$ ./tiny
Tiny Basic 65c02
> 3*4
GetChar
Expression
Term
Factor
GetNum
GetChar
Match '*'
GetChar
value = 3
Factor
GetNum
GetChar
GetChar
value = 3, k = 4
Result = 12 
So for some reason, calling Factor() is messing around with value...

Neil
User avatar
Yuri
Posts: 371
Joined: 28 Feb 2023
Location: Texas

Re: Crenshaw - Let's Build a Compiler

Post by Yuri »

Doesn't look like it's from the section of code you posted.

Hard to say without the rest of the code to look at.

I see a buffer declared in your Factor() function, but it isn't being used.

What's the function look like for parsing a number? Is that maybe overrunning something and writing off in to the stack somewhere?

What does the Expression() function look like?


What is setting/advancing Look? (I assume that's a character?)
User avatar
Yuri
Posts: 371
Joined: 28 Feb 2023
Location: Texas

Re: Crenshaw - Let's Build a Compiler

Post by Yuri »

Another thought, are you trying to compile and run this on a 6502 or a more modern computer? Allocating an 80 byte buffer on the 6502's stack like that is a monumental amount of memory.

Thought 2: I don't see 'value' declared in either of these functions. Did you make this a global variable?
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Crenshaw - Let's Build a Compiler

Post by barnacle »

Thanks Yuri, you nailed it with Thought 2... a missing definition for value, which should be local. Not at all sure where it was picking up a global, though, but there is another 'value' in Expression(), which is recursively called.

The start of term() should have been

Code: Select all

int16_t Term (void)
{
#if TRACE == 1
	printf ("Term\n");
#endif
	uint16_t value = Factor();
	while ((Look == '*') || (Look == '/'))
	{
...
buffer[] is left over from some now excised code generation (and now excised itself). Translating from forty-year-old Pascal to C isn't always obvious.

I'll post all the code soon when I've got it happy and working; it still needs a lot of clearing up. But Dr Crenshaw was working to Pascal's features, which don't always sit well with C's... Look is a global char which holds the character currently being considered; GetChar both loads that char and advances to the next character. Match checks whether Look is the character it's supposed to be and complains if it isn't - there's an amount of error handling which I need to do differently from Dr C - he just crashed back to the OS on an error but I don't necessarily want to do that.

Expression is of the same form as Term but checks for addition/subtraction operations:

Code: Select all

int16_t Expression (void)
{
	int16_t value;
	if (IsAddOp(Look))
	{
		value = 0;
	}
	else
	{
		value = Term();
	}
	while (IsAddOp(Look))
	{
		switch (Look)
		{
			case '+':
				Match ('+');
				value += Term();
				break;
			case '-':
				Match ('-');
				value -= Term();
				break;
			default:
				// we can't get here
				break;
		}
	}
	return value;
}
(Yes, I do use lots of strictly not-necessary braces; I tend to use the MISRA model (and have to, for work)).

Dr C's code is remarkably short and very clever due largely to his use of recursion. I like it, once I've got my head round it - the tricky parts are translation to C and keeping track of his 'replace references to xxx with yyy' - it's easy to miss one. The expression parser is three routines of perhaps a dozen active lines each, and a handful of helpers.

Neil
Post Reply