This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{-# OPTIONS -XDoRec -XDeriveDataTypeable #-} | |
module Example where | |
import Text.Earley | |
import Data.Dynamic | |
import Control.Applicative | |
data E = Nat Int | |
| Add E E deriving (Show, Typeable) | |
grammar :: Grammar (Rule Char E) | |
grammar = do | |
nat <- rule "NAT" | |
[ fmap (\_ -> 0) (term '0') | |
, fmap (\_ -> 1) (term '1') | |
] | |
rec expr <- rule "EXPR" | |
[ fmap Nat $ var nat | |
, pure (\x _ y -> Add x y) | |
<*> var expr | |
<*> term '+' | |
<*> var expr | |
] | |
return expr |
Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.
Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.
[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf