Parsing polynomials with F# and FParsec
Parsing a polynomial expression is a simple but interesting example of parsing. We need to handle operator precedence and associativity, for example
* has a higher precedence than
+ and the exponentiation is right associative
x^y^z=x^(y^z) while subtraction is left associative
x-y-z=(x-y)-z. Moreover a straighforward definition of the grammar is left recursive and a recursive descent parser does not allow to parse such grammar. We need to arrange our grammar in order to avoid the left recursion.
In this post we will build a console app that:
- Ask the user for a polynomial expression such as
- Ask the user for the value of all the variables that appear in the polynomial.
- Print the value of the polynomial for the provided variables values.
The solution is very concise and simple thanks to the power of F# and the parser combinator library FParsec. Using simple and small building block we will build a recursive descent parser for our polynomial grammar. Let’s start with choosing a representation for the polynomials.
We use a discriminated union to define a type to represent a polynomial:
type Expression = | Add of Expression * Expression // x+y or z+1 | Const of double // 1 or 10.5 | Negative of Expression // -x | Pow of Expression * Expression // x^2 | Product of Expression * Expression // x*y | Subtract of Expression * Expression // x-1 or x-y | Variable of char // x or y
Note that the type is recursive since we want to represent complex polynomials such as
x*(y+1) which corresponds to
Product(Variable(x),Add(Variable(y),Const(1))). We could define a type for
Pow to have a better distinction between the base and the exponent, in this example the distinction is purely positional: the first
Expression is the base, the second one is the exponent.
In order to evaluate a polynomial we need:
- the polynomial itself which is an instance of Expression
- the values for all the variables in the polynomial which we choose to represent as a Map from
eval functions is straightforward to implement
let rec eval e (x:Map<char,float>) = match e with | Add (e1,e2) -> eval e1 x + eval e2 x | Const c -> c | Negative e1 -> - eval e1 x | Pow (b,exp) -> Math.Pow(eval b x,eval exp x) | Product (e1,e2) -> eval e1 x * eval e2 x | Subtract (e1,e2) -> eval e1 x - eval e2 x | Variable v -> x.[v]
We need a couple of helper functions:
- to list all the variable in an
- to ask the user of the console app the values for each one of the variables
let getVariables e = let rec impl e = match e with | Add (e1,e2) -> impl e1 + impl e2 | Const _ -> set  | Negative e1 -> impl e1 | Pow (b,exp) -> impl b + impl exp | Product (e1,e2) -> impl e1 + impl e2 | Subtract (e1,e2) -> impl e1 + impl e2 | Variable v -> set [v] (impl e) |> Seq.toList let askVariableValues l = let rec impl l (m:Map<char,float>) = match l with |  -> m | h::t -> printfn "Please enter value for %c" h let value = Console.ReadLine() let m' = m.Add(h,float value) impl t m' impl l (Map<_,_>([||]))
I present the grammar starting with the highest precedence productions, usually the grammars are presented in the opposite order but we will implement the functions in this order because we need to define a parser using only parsers we defined before the current one. The only exception is the Primary production which needs to use the Expression production, FParsec allow us to handle this pretty easily we’ll see how.
Variable = [a-zA-Z] // corresponds to letter parser Number = [0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)? // pfloat Primary = Variable | Number | '(' Expression ')' Unary = '-' Primary | Primary Power = Unary ('^' Unary)* Multiplication = Power ( ('*'|'/') Power )* Addition = Multiplication ( ('+'|'-') Multiplication)* Expression = Addition
Let’s start with analyzing the type of our parser, we start with a string such as
x-1 and we want an istance of
Expression. We cannot be sure that the input string will be a correct polynomial expression, for example we could feed into the parser
++ which is not a polynomial. A possible type for our parser is
string -> Expression option. This is too far away from the type that FParsec uses for the parser which is
type Parser<'TResult, 'TUserState> = CharStream<'TUserState> -> Reply<'TResult>
Parser type from FParsec is more sofisticated than our candidate. It supports a user state which allows to implement context sensitive parsers but it’s beyond the scope of the post and not needed to parse the polynomial grammar. The
The CharStream class provides a unified interface for efficiently reading UTF‐16 chars from a binary stream or an in‐memory char buffer (e.g. a string). It is optimized for the use in backtracking parser applications and supports arbitrary char‐based seeking, even for streams larger than the addressable memory (on 32‐bit platforms).
as stated in the official docs. The
Reply contains a status indicating success or failure, a possible null error and a result in case of success.
expression parser we want will have the type
Parser<Expression, unit> indicating that it will produce an instance of
Expression on success and that the user state is unused.
The first two productions are implemented with predefined parsers found in FParsec
let pvariable = letter |>> Variable let pconst = pfloat |>> Const
letter parser, parse a letter as the name suggest and return the letter itself which is feed into the
Variable constructor. The
pfloat parser works in the exacts same way but it returns a
|>> operator has the following signature
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
It takes a parser that produces an
'a on success and applies the function
('a -> 'b) to the result effectively producing a new parser of type
Parser<'b,'u>. It is lifting a function from the types
'a,'b to the ‘Parser’ types.
To define Primary we need a way to handle the reference to a parser which is not yet defined. FParsec provides a createParserForwardedToRef function
let expression, expressionRef = createParserForwardedToRef()
expression forwards all the call to
expressionRef which in turns is a mutable reference that we correctly fill later in the development. We are ready to define the primary parser using two helper functions and a couple of combinators
let pop = skipChar '(' let pcp = skipChar ')' let pprimary = choice [ pvariable; pconst; between pop pcp expression ]
skipChar parser matches a
char and ignore the value. We don’t need the ‘(‘ or the ‘)’ we just need to know if we encounter one. Then we use two different combinators
between. The first one tries to apply, in order, any of the parsers in the supplied list. The second one is for parsing in sequence
pcp and keeping the result of
The Unary production is
let punary = choice [ skipChar '-' >>. pprimary |>> Negative ; pprimary]
We are building with the already known parsers and combinators except for
>>. which parse in sequence its left operand and its right operand and keeps the result of the right one. The Power production is slightly more complicated but FParsec has already a combinator we can use to build our parser, the
val chainr1: Parser<'a,'u> -> Parser<('a -> 'a -> 'a),'u> -> Parser<'a,'u>
exactly what we need, quoting the documentation
The parser chainr1 p op parses one or more occurrences of p separated by op (in EBNF: p (op p)*)
Let’s recall the Power production rule
Power = Unary ('^' Unary)*
This means that:
pin the EBNF definition.
- we need the
opin the EBNF definition.
We know its type:
Parser<Expression -> Expression -> Expression, unit>, we know we need to parse an ‘^’ character and we have the
Pow constructor of
Expression of type
Expression * Expression -> Expression. FParsec has several parser for chars we choose the
skipChar which match a char and discard the result, since again we need to know that we found an ‘^’ but we don’t need it. Our
skipChar ‘^’ has type
Parser<unit>, how could we obtain the desired type? With another combinator! The operator
val (>>%): Parser<'a,'u> -> 'b -> Parser<'b,'u>
The parser p »% x applies the parser p and returns the result x. docs
And we define
let curry f = fun x -> fun y -> f(x,y) let powerOp = skipChar '^' >>% (curry Pow) let pexpr = chainr1 punary powerOp
It’s important to note that we choose
chainr1 which is right associative. Similar reasoning allows us to define all the other production in the grammar. Only the last one needs a special treatment since it is
expressionRef to addition we know that all calls to expression are going to be forwared to
paddition as we want. The full code listing (only 83 lines!) is here. What do you think? Reach me out if you want and let me know!