Parsing polynomials with the Pratt algorithm
polynomials-pratt-algorithm
Using a Pratt parser I aim to parse expressions like this x^2+y^2-1, x=1, y=1 and xy, x=2, y=3. Parsing mathematical expressions it's not hard but it already contains some interesting behaviour such as associativity between operators x+y*z is equal to x+(y*z) and not (x+y)*z. For no particular reason I decided to put the variable assignments after the polynomial. The expression x^2+y^2-1, x=1, y=1 is to be interpreted as you would with this pseudo code
x=1
y=1
print (x*2+y^2+1)
For the full source code see here
The Pratt algorithm allows to create a "general purpose parser" in which is possible to plug in all the parser rules needed for your language or grammar. In my implementation I decided to build a parsed with fixed parsing rules by adding them into the constructors. But as you can see in the Java example (see references) it's easy to create a generic parser and plug the rules by subclassing it.
The important stuff happens in the ParseExpression method. The first time it is called with precedence = 0 and let's see how it works with the polynomial x+y*z:
- advance to the next token, which is
x - retrieve the corresponding prefix denotation
VarDenotation. The prefix denotation cannot be null if the polynomial expressions is correct since we are at the beginning of the expression so the only valid tokens are exactly the ones associated with a prefix denotation, for example a polynomial cannot start with a '*'. - use the denotation to parse the token,
VarDenotationreturn aVariableExpr. Remember that a denotation can call ParseExpression, so here we have a kind of recursion. - we check if the next token
+as an higher precedence with respect of the current precedence parameters. Since the precedence of theBinaryOperatorDenotationof+is1we enter the while - advance to
+ - get the infix denotation
- use the denotation to parse the token. The
parsemethod on an infix denotation needs the previous expression, the token and the parser to callParseExpressionagain. It's easy to see the in order to parse anAddExprwe need the expression before the+and the expression after. - the
BinaryOperatorDenotationwill call theParseExpressionmethod withprecedence=1 - advance to
y - retrieve
VarDenotation - parse
yto aVariableExpr PeekPrecedence()will return the precedence related to theBinaryOperatorDenotationof '*' which is2so we enter again the while again. This reflects the fact that*has a higher precedence of+and entering the while means that theVariableExprofywill be a parameter of the*instead of+as one would expect.- advance to
* - get the infix denotation
- the parse of the
*call parse withprecedence=2. - (we're almost over) advance to
z - parse the
VarDenotation PeekPrecedence()returns0since we are at the end of the expression- we skip the while and returns the
zparselect - we're back in the binary operator related to
*which returns theProductExprrepresentingy*z - we're back in the
ParseExpressionwithprecedence=1 - we're back in the binary operator related to
+which returns the finalAddExprrepresentingx+(y*z) - we're back in the
ParseExpressionwithprecedence=0and we return the result.
public IExpr ParseExpression(int precedence = 0)
{
var token = Advance();
var prefix = GetPrefixDenotation(token.Type);
if (prefix is null) throw new ParseError($"Could not parse \"{token.Lexeme}\" at column {token.Column}");
var left = prefix.Parse(this, token);
while (precedence < PeekPrecedence())
{
token = Advance();
var infix = GetInfixDenotation(token.Type);
left = infix.Parse(this, left, token);
}
return left;
}
References
I came across the Pratt parsing algorithm while reading https://craftinginterpreters.com/, which is a great resource on interpreters and compilers. From the same author, a java implementation of a Pratt parser http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/.
More formal resources are the paper from Pratt in which he explains the algorithm for the first time https://dl.acm.org/doi/10.1145/512927.512931 and a thesis which is more approachable http://vandevanter.net/mlvdv/publications/a-formalization-and-proof-o.html