BigDumbDinosaur wrote:
I’m quite fuzzy on how nested parentheses get evaluated, which is kind of key to a lot of language implementations...except, maybe, Forth.
Took a long while for me to wrap my head around that concept with compilers/languages as well. I'll take a stab at an explanation though:
With the compiler/interpreter that Crenshaw is describing is recursive decent parser, which is just a kinda fancy way of saying that you're relying on the nature of a high level language's way to effectively have a new local state with each recursive call to a function.
The magic of recursive decent is that it works backwards through the grammar if you will. That is to say, we start with a call with the thing with the least precedence and work our way down to the thing with the most precedence; effectively allowing the high precedence thing to take a stab at parsing what we need first.
I'll limit this to just +, * and () for the sake of simplicity, but we start of basically with this:
Code:
int parseAdditive(){ return 0; }
Now the parseAdditive() function doesn't really know what token the lexer just found, nor does it really need to care at this point. What it does know, is that it needs a left hand part of an additive expression, and that left hand part needs to follow the rules for multiplicative, so it asks the parseMultiplicative() to figure out what that left hand side is:
Code:
int parseMultiplicative() { return 0; }
int parseAdditive()
{
return parseMultiplicative(); // Don't know what's there yet, but we'll make the higher precedence figure that out.
}
At this point parseMultiplicative() also has no clue what's going on really, but it knows that it
also needs a left hand side, so it asks parseFactor() to give it an item to work with:
Code:
int parseFactor() { return 0; }
int parseMultiplicative()
{
return parseFactor(); // Hey I need a left hand side please, thanks!
}
int parseAdditive()
{
return parseMultiplicative(); // Don't know....
}
Now at this point parseFactor() kind has no one else to go to but the lexer itself, so it looks at the token to figure out what to do:
Code:
int parseFactor()
{
int rval = 0;
switch (currentToken)
{
case IDENT: // We have a variable, look it up! That's our left hand side!
rval = getSymValue(tokenLiteral);
accept(IDENT);
break;
case NUMBER: // We have a number, this is our left hand side!
rval = atoi(tokenLiteral);
accept(NUMBER);
break;
case '(': // Oh hey, it's a paren, the human is telling me there's a nested thing here
accept('('); // Eat the '(', we don't really need it any more.
rval = parseAdditive(); // Recursively call the chain
expect(')'); // Make sure there is a closing ')' or we give out a syntax error
break;
default:
reject(token); // This isn't an expected token, report an error.
break;
}
return rval;
}
The magic happens in that parseFactor() and the fact that on each entrance into the recursive call to parseAdditive(), we'll get a fresh new state on the stack for those calls that holds the state of what we're parsing at that particular moment. If you follow this hopefully you can see that it doesn't matter how many nested parens you have, it will still work out. (Stack overflow potentials not withstanding)
To complete the other two functions, they roughly work out as:
Code:
int parseMultiplicative()
{
int result = parseFactor();
while (currentToken == '*')
{
accept('*'); // Eat the '*' we don't really need it any more
int rightSide = parseFactor();
result *= rightSide;
}
return result;
}
int parseAdditive()
{
int result = parseFactor();
while (currentToken == '+')
{
accept('+'); // Eat the '+' we don't really need it any more
int rightSide = parseMultiplicative();
result += rightSide;
}
return result;
}
To
really simplify this problem consider this:
Code:
int parseFactor()
{
int rval = 0;
switch (current)
{
case NUMBER:
rval = atoi(tokenLiteral);
accept(NUMBER);
break;
case '(':
accept('(');
rval = parseFactor(); // Recursively call ourselves
expect(')');
break;
default:
reject(current); // Syntax error
break;
}
return rval;
}
In the above we've really simplified our grammar down to just accepting a number with any amount of parens surrounding it.