ha4: Earley Parsing in Haskell

After a bit of a struggle I can run simple parsing examples using my Earley algorithm implementation [1] in Haskell. At this stage the implementation is likely to be flawed, with respect to correctness and likely performance. Regardless, the point I wanted to prove to myself is that it is perfectly plausible to define context-free grammars and their interpretation in plain Haskell, without resorting to code generation, or giving up full grammar analysis. I suspect it is a well-known method but it was a pleasure to discover. See for yourself:

	{-# OPTIONS -XDoRec -XDeriveDataTypeable #-}

	module Example where

	import Text.Earley
	import Data.Dynamic
	import Control.Applicative

	data E = Nat Int
	\| Add E E deriving (Show, Typeable)

	grammar :: Grammar (Rule Char E)
	grammar = do
	nat <- rule "NAT"
	[ fmap (\_ -> 0) (term '0')
	, fmap (\_ -> 1) (term '1')
	]
	rec expr <- rule "EXPR"
	[ fmap Nat $ var nat
	, pure (\x _ y -> Add x y)
	<*> var expr
	<*> term '+'
	<*> var expr
	]
	return expr

view raw CFGExample.hs hosted with ❤ by GitHub

Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.

Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.

[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf

Sunday, September 25, 2011

Earley Parsing in Haskell