Sunday, September 25, 2011

Earley Parsing in Haskell

After a bit of a struggle I can run simple parsing examples using my Earley algorithm implementation [1] in Haskell. At this stage the implementation is likely to be flawed, with respect to correctness and likely performance. Regardless, the point I wanted to prove to myself is that it is perfectly plausible to define context-free grammars and their interpretation in plain Haskell, without resorting to code generation, or giving up full grammar analysis. I suspect it is a well-known method but it was a pleasure to discover. See for yourself:

{-# OPTIONS -XDoRec -XDeriveDataTypeable #-}
module Example where
import Text.Earley
import Data.Dynamic
import Control.Applicative
data E = Nat Int
| Add E E deriving (Show, Typeable)
grammar :: Grammar (Rule Char E)
grammar = do
nat <- rule "NAT"
[ fmap (\_ -> 0) (term '0')
, fmap (\_ -> 1) (term '1')
]
rec expr <- rule "EXPR"
[ fmap Nat $ var nat
, pure (\x _ y -> Add x y)
<*> var expr
<*> term '+'
<*> var expr
]
return expr
view raw CFGExample.hs hosted with ❤ by GitHub


Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.

Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.

[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf