Parsing polynomials with the Pratt algorithm
polynomials-pratt-algorithm
Using a Pratt parser I aim to parse expressions like this x^2+y^2-1, x=1, y=1
and xy, x=2, y=3
. Parsing mathematical expressions it's not hard but it already contains some interesting behaviour such as associativity between operators x+y*z
is equal to x+(y*z)
and not (x+y)*z
. For no particular reason I decided to put the variable assignments after the polynomial. The expression x^2+y^2-1, x=1, y=1
is to be interpreted as you would with this pseudo code
x=1
y=1
print (x*2+y^2+1)
For the full source code see here
The Pratt algorithm allows to create a "general purpose parser" in which is possible to plug in all the parser rules needed for your language or grammar. In my implementation I decided to build a parsed with fixed parsing rules by adding them into the constructors. But as you can see in the Java example (see references) it's easy to create a generic parser and plug the rules by subclassing it.
The important stuff happens in the ParseExpression method. The first time it is called with precedence = 0
and let's see how it works with the polynomial x+y*z
:
- advance to the next token, which is
x
- retrieve the corresponding prefix denotation
VarDenotation
. The prefix denotation cannot be null if the polynomial expressions is correct since we are at the beginning of the expression so the only valid tokens are exactly the ones associated with a prefix denotation, for example a polynomial cannot start with a '*'. - use the denotation to parse the token,
VarDenotation
return aVariableExpr
. Remember that a denotation can call ParseExpression, so here we have a kind of recursion. - we check if the next token
+
as an higher precedence with respect of the current precedence parameters. Since the precedence of theBinaryOperatorDenotation
of+
is1
we enter the while - advance to
+
- get the infix denotation
- use the denotation to parse the token. The
parse
method on an infix denotation needs the previous expression, the token and the parser to callParseExpression
again. It's easy to see the in order to parse anAddExpr
we need the expression before the+
and the expression after. - the
BinaryOperatorDenotation
will call theParseExpression
method withprecedence=1
- advance to
y
- retrieve
VarDenotation
- parse
y
to aVariableExpr
PeekPrecedence()
will return the precedence related to theBinaryOperatorDenotation
of '*' which is2
so we enter again the while again. This reflects the fact that*
has a higher precedence of+
and entering the while means that theVariableExpr
ofy
will be a parameter of the*
instead of+
as one would expect.- advance to
*
- get the infix denotation
- the parse of the
*
call parse withprecedence=2
. - (we're almost over) advance to
z
- parse the
VarDenotation
PeekPrecedence()
returns0
since we are at the end of the expression- we skip the while and returns the
z
parselect - we're back in the binary operator related to
*
which returns theProductExpr
representingy*z
- we're back in the
ParseExpression
withprecedence=1
- we're back in the binary operator related to
+
which returns the finalAddExpr
representingx+(y*z)
- we're back in the
ParseExpression
withprecedence=0
and we return the result.
public IExpr ParseExpression(int precedence = 0)
{
var token = Advance();
var prefix = GetPrefixDenotation(token.Type);
if (prefix is null) throw new ParseError($"Could not parse \"{token.Lexeme}\" at column {token.Column}");
var left = prefix.Parse(this, token);
while (precedence < PeekPrecedence())
{
token = Advance();
var infix = GetInfixDenotation(token.Type);
left = infix.Parse(this, left, token);
}
return left;
}
References
I came across the Pratt parsing algorithm while reading https://craftinginterpreters.com/, which is a great resource on interpreters and compilers. From the same author, a java implementation of a Pratt parser http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/.
More formal resources are the paper from Pratt in which he explains the algorithm for the first time https://dl.acm.org/doi/10.1145/512927.512931 and a thesis which is more approachable http://vandevanter.net/mlvdv/publications/a-formalization-and-proof-o.html