ha4: September 2011

Wednesday, September 28, 2011

Transforming Large Unions in F#

Writing compilers is not a really a niche activity. If you think of it, every programmer is doing precisely that, defining a specific language for a given domain and then explaining its semantics to the computer. F#, ML and Haskell programmers know that union types are a life-saver for dealing with languages systematically. These types do, however, sometimes grow large, becoming less and less nice to work with.

If a 30-case recursive union type, or 2-3 co-recursive 20-30 case union types, is something you have to deal with, consider writing a transform function, a shallow map over the immediate children. This simple addition helps your more complex transformations focus on the cases they really need to focus on, leaving the boilerplate work to transform. The gist below provides an extended example.

There is, of course, a downside to using such shortcuts. The compiler no longer forces you to re-read and mentally check all transformations when you change the Expression definition, for example by adding new cases. If you add LetRecursive, the code will still compile but will be broken because of the catch-all pattern in getFreeIdentifiers.

Some readers will note that these transformations can be thought of cata-, ana-, para-, or some-such morphisms and can be treated directly as such. Others will perhaps suggest using reflection or meta-programming to automatically derive transform, or even more complex transforms. As always, you should use what works for you.

As for me, I find that I am not yet comfortable with climbing the abstraction ladder much further. It seems to inevitably involve some loss of control. It helps a lot to have explicit control over doing top-down, bottom-up, or mixed traversals, as evident in recursive function definitions. The technique also scales easily to languages with expression-statement distinction. Most importantly, my brain does not explode when I look at such code the way it does when dealing with monstrous morphisms.

To summarize, factoring out mapping over intermediate sub-expressions helps a lot to reduce boilerplate in compiler code while keeping this code relatively easy to read and understand. The technique, although presented in F#, applies equally well to any language with algebraic
datatypes.

Is all this practical? We found it very useful in WebSharper, the F#->JavaScript platform my company is developing. There we have union types to represent JavaScript syntax, simplified optimisation-friendly JavaScript-like language we call Core, and F# itself, as starting from version 2.4 we had to use our own quotation type in an attempt to avoid reflection.

Tuesday, September 27, 2011

A 30-40% Faster and Leaner WebSharper 2.4 is Coming

We are currently finalizing the 2.4 release of WebSharper. This release is mainly concerned with bug fixes and quality-of-implementation improvements. The exciting part is that we are witnessing about 30%-35% generated code size reduction, and a 40% reduction in compilation time.

Getting there was rather tedious. One key refactoring was replacing System.Reflection with Mono.Cecil as the main workhorse for processing F# assemblies. To use Mono.Cecil, and also to avoid some of the nasty reflection bugs in the F# itself, I had to rewrite quotation metadata parsing by having a peek at F# compiler sources. I also had to refactor most of the data structures to stop assuming System.Reflection availability.

For getting better code generation, I revised our Core language for simplicity. Both Core and associated optimizer are made simpler. Compiler optimizations are more focused, targeting the common use-cases.

In particular, local tail-call elimination combined with uncurrying finally allows for automated unrolling of recursive functional code into JavaScript loops. The following code, for example, is now safe for stack use:

I am thankful for my colleagues and their effort at building the F# community site, FPish. This codebase provided an invaluable resource for testing and optimizing the compiler on a large project.

I feel that I have learned a lot on this project. It turns out that reading about compilers is not quite the same as writing compilers! You only appreciate this truth after trying. I hardly discovered anything you might call a research contribution, but there are some practical little techniques specialized to F#/.NET that made my life easier and might help you too. Stay tuned for discussions of an approach to maintain and traverse large F# union types, considerations for designing data structures, the memoizing fixpoint pattern, large F# codebase organization ideas, and other things along those lines.

Monday, September 26, 2011

Firefox fails on nested closures at 16 layers

How deep do you think you can nest your closures in JavaScript? Yes, we know there is no tail-recursion, probably no inlining either, we know we should not do this, but sometimes we still do. Before today I expected something like 64-128 levels of nesting to work out fine. In fact, Chrome chokes only at level 512 or so, due to stack overflow - fine.

EDIT: There is Bug #517781 reported 2009-09-20, and STILL NOT ADDRESSED. Please upvote it if you can.

But Firefox.. At level 16! Reference error: x16 is not defined

This is on 6.0.2. Shame.

Sunday, September 25, 2011

Earley Parsing in Haskell

After a bit of a struggle I can run simple parsing examples using my Earley algorithm implementation [1] in Haskell. At this stage the implementation is likely to be flawed, with respect to correctness and likely performance. Regardless, the point I wanted to prove to myself is that it is perfectly plausible to define context-free grammars and their interpretation in plain Haskell, without resorting to code generation, or giving up full grammar analysis. I suspect it is a well-known method but it was a pleasure to discover. See for yourself:

Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.

Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.

[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf