ha4: Combinatory Regular Expressions in Haskell

Thursday, July 14, 2011

Combinatory Regular Expressions in Haskell

Combinatory parsing fascinates me. It is an area that has open problems, but also a lot of solved problems that make great exercises. A while ago I tried to define a combinatory regular expression matcher in F#, and was puzzled at how complex it turned out to be. As usual, marginal understanding was to blame.

Now I turned to Haskell for a change. I do think it helped, to learn more both about the problem and about Haskell. I hope to backport this exercise to the ML world, in the meanwhile here is the gist:

10 comments:

AnonymousJuly 16, 2011 at 12:10 PM
That's funny, I was thinking on the same problem recently, and it developed into the regex-applicative package. https://github.com/feuerbach/regex-applicative

I will also evaluate your approach and let you know what I think.
ReplyDelete
Replies
AnonymousJuly 17, 2011 at 6:29 AM
I've taken a look at this. The problem here is that you don't "merge" the states back. Thus, the code suffers from a combinatorial explosion.

Example:
test n = run (compile $ length <$> (many $ string "a" <|> string "aa")) $ replicate n 'a'

Test> test 20
Just 20
(0.13 secs, 18891204 bytes)
Test> test 22
Just 22
(0.38 secs, 44580336 bytes)
Test> test 24
Just 24
(1.05 secs, 112208144 bytes)
Test> test 25
Just 25
(1.82 secs, 179842856 bytes)
Test> test 26
Just 26
(2.50 secs, 289945732 bytes)
Test> test 27
Just 27
(4.44 secs, 467622444 bytes)

(The code was compiled with -O2)
ReplyDelete
Replies
UnknownJuly 17, 2011 at 9:16 AM
Yes, another good test it fails is

https://gist.github.com/1087579
ReplyDelete
Replies
UnknownJuly 17, 2011 at 2:37 PM
Hi Roman,

I am still thinking about this problem and trying it out in both ML and Haskell.

I have also tried out your implementation in regex-applicative.

Unfortunately it looks like your implementation (as well as all my attempts so far) have exponential behavior on testTNFA benchmark in the above gist. I will be glad to be proven otherwise since I might be measuring it wrong.

The testTNFA benchmark comes from this article:

http://swtch.com/~rsc/regexp/regexp1.html

Is inevitable for our a combinatory implementation to have this exponential behavior on testTNFA I wonder..

Otherwise your work looks really cool!
ReplyDelete
Replies
AnonymousJuly 17, 2011 at 6:24 PM
Anton,

how did you measure it?

I ran that test and regex-applicative passed it. The details are here: https://github.com/feuerbach/regex-applicative/wiki/Exptest
ReplyDelete
Replies
UnknownJuly 18, 2011 at 2:24 PM
Hi!

The same test you link to with n=1000 takes minutes to compute on my machine in GHCI.. Is it because I am missing -O2 or something? Frankly I have no idea how to benchmark Haskell.

But you are absolutely right, not merging the states as I do in the gist above won't do at all. Now I am curious how you merge them, will read into your as soon as I get some time for that.

Thanks,

A
ReplyDelete
Replies
UnknownJuly 18, 2011 at 2:33 PM
I have GHC 6.12.3 btw and I had to relax the constraint on `base` to build regex-applicative, in case this is relevant..
ReplyDelete
Replies
UnknownJuly 19, 2011 at 8:35 AM
Silly me, the time must be polynomial not exponential so yes that warrants a pass.
ReplyDelete
Replies
AnonymousJuly 19, 2011 at 12:13 PM
Right. Ideally, it would be linear, and I'm working on that at the moment. Still, it's clearly not exponential.

If you have tested the package with some old packages, please let me know the exact versions and I'll relax the constraints in the next release.
ReplyDelete
Replies
UnknownJuly 19, 2011 at 6:41 PM
I have implemented state sharing, now instead of exponential I also get polynomial behavior. Micro-benchmarks seem only 10-100x slower than your regex-applicative, which makes me quite happy. Thanks!
ReplyDelete
Replies

Add comment