Friday, December 30, 2011

Solving F# Async.StartChild Leak: Futures

I discovered a memory leak in Async.StartChild and here discuss a workaround based on a Future abstraction.

I noticed what appears to be a memory leak in F# standard library function Async.StartChild. This happened in a context of a socket server, where I attempted to perform socket reading and writing in parallel. It seems that memory use slowly grows and memory profiler points to some CancellationTokenSource-related objects not being released.

As a non-leaking alternative, I used my own abstractions. The basic idea is to use synchronizable events. Unfortunately Event is already used in F# to mean something different, so I will use the word Future instead. If you know F# events, the key problem is that subscribing to events after they happen is meaningless, for example this code procuces nothing:

let test () =
    let e = Event<_>()
    e.Trigger(1)
    e.Publish.Add(printf "%i")

In contrast, Future objects retain the value. For simplicity, I allow subscribing and triggering only once. In addition, the sample includes Executor code. I found by experimentation that running short-lived coordination tasks on a single thread instead of the ThreadPool is beneficial. Enjoy:

(*
Copyright (c) 2008-2011 IntelliFactory
GNU Affero General Public License Usage The code
is free software: you can redistribute it and/or
modify it under the terms of the GNU Affero
General Public License, version 3, as published by
the Free Software Foundation.
The code is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero
General Public License for more details at
<http://www.gnu.org/licenses/>.
If you are unsure which license is appropriate for
your use, please contact IntelliFactory at
<http://intellifactory.com/contact>.
See this blog for the discussion:
<http://tinyurl.com/fsharp-futures>
*)
#if INTERACTIVE
#else
namespace IntelliFactory.Examples
#endif
open System
open System.Threading
open System.Threading.Tasks
open System.Collections.Concurrent
type private FutureState<'T> =
| Computed of 'T
| Created
| Finalized
| Waiting of ('T -> unit)
[<Sealed>]
type Future<'T>() =
let root = obj ()
let transact f = lock root f ()
let mutable state : FutureState<'T> = Created
let await f =
transact <| fun () ->
match state with
| Computed x -> state <- Finalized; (fun () -> f x)
| Created -> state <- Waiting f; ignore
| Finalized -> invalidOp "Future is finalized."
| Waiting f -> invalidOp "Future is already waited on."
let provide value =
transact <| fun () ->
match state with
| Computed x -> invalidOp "Future is already provided."
| Created -> state <- Computed value; ignore
| Finalized -> invalidOp "Future is finalized."
| Waiting f -> state <- Finalized; (fun () -> f value)
let event = Async.FromContinuations(fun (k, _, _) -> await k)
member this.Await = event
member this.Provide(value) = provide value
[<Sealed>]
type Executor(?maxTaskCount, ?logError) =
let logError = defaultArg logError ignore
let mailbox =
let n = defaultArg maxTaskCount 128
new BlockingCollection<_>(ConcurrentQueue(), n)
let work () =
let mutable loop = true
while loop do
match mailbox.Take() with
| None -> loop <- false
| Some exec -> try exec () with e -> logError e
let task =
Task.Factory.StartNew(work,
TaskCreationOptions.LongRunning)
member this.Dispose() =
mailbox.Add(None)
mailbox.CompleteAdding()
task.Wait()
task.Dispose()
mailbox.Dispose()
member this.Fork(job: Async<'T>) =
let f = Future()
let work () =
Async.StartWithContinuations(job,
f.Provide, logError, logError)
this.Schedule(work)
f.Await
member this.Schedule(task) = mailbox.Add(Some task)
member this.TaskCount = mailbox.Count
interface IDisposable with
member this.Dispose() = this.Dispose()
#if INTERACTIVE
let test () =
use e = new Executor()
let task =
async {
let read = e.Fork(async { return stdin.ReadLine() })
do stdout.WriteLine("Waiting for input..")
return! read
}
Async.RunSynchronously task
#endif
view raw AsyncUtils.fs hosted with ❤ by GitHub

Symbol Interning in WebSharper 2.4

When WebSharper compiles F# to JavaScript it preserves namespaces, module and class nesting to make it easy to navigate the compiled code from JavaScript shell. Roughly speaking, A.B.C.D.E identifier in F# can be found by typing A.B.C.D.E in JavaScript.

This poses a challenge: as you can imagine, emitting long qualified identifiers everywhere is not a good idea for compact code generation. To save space WebSharper 2.4 does class/module interning. The basic idea is to say L=Microsoft.FSharp.Core.ListModule once and then say L.ofArray at every use site.

An example of this in action can be seen below:

namespace Website
module Client =
open IntelliFactory.WebSharper
open IntelliFactory.WebSharper.Html
[<JavaScript>]
let SayHello () =
JavaScript.Alert("HELLO!")
[<JavaScript>]
let MakeHelloDiv () =
SayHello ()
Div [ Text "HELLO THERE" ]
module Controls =
open IntelliFactory.WebSharper
type HelloControl() =
inherit Web.Control()
[<JavaScript>]
override this.Body = Client.MakeHelloDiv() :> _
view raw Client.fs hosted with ❤ by GitHub
(function()
{
var Global=this,Runtime=this.IntelliFactory.Runtime,Website,Client,WebSharper,Html,Default,List,alert;
Runtime.Define(Global,{
Website:{
Client:{
MakeHelloDiv:function()
{
Client.SayHello();
return Default.Div(List.ofArray([Default.Text("HELLO THERE")]));
},
SayHello:function()
{
return alert("HELLO!");
}
},
Controls:{
HelloControl:Runtime.Class({
get_Body:function()
{
return Client.MakeHelloDiv();
}
})
}
}
});
Runtime.OnInit(function()
{
Website=Runtime.Safe(Global.Website);
Client=Runtime.Safe(Website.Client);
WebSharper=Runtime.Safe(Global.IntelliFactory.WebSharper);
Html=Runtime.Safe(WebSharper.Html);
Default=Runtime.Safe(Html.Default);
List=Runtime.Safe(WebSharper.List);
return alert=Runtime.Safe(Global.alert);
});
Runtime.OnLoad(function()
{
});
}());
view raw Website.dll.js hosted with ❤ by GitHub
(function(){var $$=this,$=this.IntelliFactory.Runtime,a,b,c,d,e,f,g;$.Define($$,{Website:{Client:{MakeHelloDiv:function(){b.SayHello();return e.Div(f.ofArray([e.Text("HELLO THERE")]));},SayHello:function(){return g("HELLO!");}},Controls:{HelloControl:$.Class({get_Body:function(){return b.MakeHelloDiv();}})}}});$.OnInit(function(){a=$.Safe($$.Website);b=$.Safe(a.Client);c=$.Safe($$.IntelliFactory.WebSharper);d=$.Safe(c.Html);e=$.Safe(d.Default);f=$.Safe(c.List);return g=$.Safe($$.alert);});$.OnLoad(function(){});}());

Thursday, December 29, 2011

F# Frustration: More Async Memory Leaks

Can anyone explain why this code leaks memory?

let leak1 () =
let thread () =
async {
let x = Array.create 1024 0uy
return ()
}
|> Async.Start
async {
while true do
do thread ()
}
|> Async.RunSynchronously
let leak2 () =
let a1 () = async.Return()
let a2 () =
async {
let! complete = Async.StartChild(a1 ())
return! complete
}
|> Async.Start
let rec loop (k: int) =
if k > 0 then
do a2 ()
loop (k - 1)
loop 100000
stdout.Write("PRESS ANY KEY") // >200 Mb memory use, retained
System.Console.ReadLine()
|> ignore
view raw MemoryLeak.fs hosted with ❤ by GitHub

Looks like using Async.StartAsTask makes it complete in constant space.

Friday, December 23, 2011

Hacking Type Classes in F#

Recent FPish FPish discussion focused on some hacks available in F# to write code that resembles using Haskell type classes. I particularly enjoyed the comments by Gustavo Leon and Loic Denuziere.

To cut the long story short, before compiling to .NET F# expands methods delcared inline and does overload resolution. This was intended to support flexible operator overloading, but opens up the door for interesting hacks. Even code that generalizes over higher kinds and therefore cannot exist at .NET level can with these hacks exist at the inline F# level.

Although these remain hacks (ugly code, unreadable inferred types, fragility to changes when type inference stops working, difficulty in scaling to more complicated examples), I cannot help but enjoy it. Here is a writeup that infers JSON serialization automatically over lists at the inline level:

type Json =
| Array of list<Json>
| Double of double
| False
| Integer of int64
| Null
| Object of list<string*Json>
| String of string
| True
static member inline ToJson x : Json =
((^a or ^b) :
(static member ToJson : ^a * ^b -> Json) (Null, x))
static member ToJson(_: Json, x: int) = Integer (int64 x)
static member ToJson(_: Json, x: string) = String x
static member inline ToJson(_: Json, x: list<_>) =
Array (List.map Json.ToJson x)
static member inline FromJson(x: Json) =
((^a or ^b) : (static member FromJson : ^a * ^b -> ^a)
(Unchecked.defaultof<_>, x))
static member FromJson(_: int, x: Json) =
match x with
| Integer x -> int x
| _ -> invalidArg "x" "Conversion failed."
static member FromJson(_: string, x: Json) =
match x with
| String x -> x
| _ -> invalidArg "x" "Conversion failed."
static member inline FromJson(_: list<_>, x: Json) =
match x with
| Array xs -> [for x in xs -> Json.FromJson x]
| _ -> invalidArg "x" "Conversion failed."
type Person =
{
age: int
name: string
}
static member ToJson(_: Json, x: Person) =
Object [
"age", Integer (int64 x.age);
"name", String x.name
]
static member FromJson(_:Person, j: Json) =
match j with
| Object ["age", Integer age; "name", String name] ->
{ age = int age; name = name }
| _ ->
failwith "Conversion failed."
let test =
let p = {name="Vladimir Putin"; age=59}
let r : list<Person> = Json.FromJson(Json.ToJson [p; p; p])
r
view raw JsonClasses.fs hosted with ❤ by GitHub

Thursday, December 15, 2011

Making Async 5x Faster

In this article I discuss why F# Async is a good thing for writing concurrent software on .NET and show how to implement your own Async specialized for low-concurrency use. As a sample application, I look at a simple CML-style blocking channel. 30-50 lines of custom async and threadpool implementation increase the throughput from 100-400 K to 1M messages a second.

Concurrency? Again?

It is hard to believe that after so many man-years of computer science research anyone would still have quetions about how to write concurrent software. But I do, if only out of ignorance.

My specific problem right now is writing a decently scalable web server. It is really nothing more than developing a sensible bridge API between a web application and the OS, with a bit of parsing thrown in. All heavy lifting is already done by IO completion ports available to .NET code through System.Net.Socket and SocketAsyncEventArgs types.

Blocking and Non-Blocking IO

In a web server, there are two flows of data: one flow comes from the client via the server to the application, and the other goes in the reverse direction. Imperative programming answer to this the System.IO.Stream object which your app uses to read and write. It is easy to use, but there are problems.

The first obvious problem is that writes cannot really block. If you attempt to write some data, the client socket may not be ready. In fact, it can not become ready for a while. Making the thread wait for it would be lovely, but .NET threads are expensive. Not a good idea if one is to cope with thousands of concurrent clients.

Fine, how about Non-Blockig IO? It seems to solve the problem as application code resumes immediately. But what then, where does the data go? We need a buffer of unbounded size somewhere to hold the data. What happens here is the application receives no feedback when it produces data too quickly. A pathological case would be generating an 8G file.

Reading from a System.IO.Stream poses even more problems. What if the data is not there yet? Blocking wastes a thread, not blocking makes the application retry but requires programming arbitrary timeouts or busy-waits. In light of this it seems much more reasonable to invert control and have the server write data to the application instead as it arrives.

First-Class Continuations

From now on let us consider designing a writer that does not block but also does not require an unbounded buffer, throttling the data producer if the consumer does not keep up. This is an instance of the classic tradeoff between asynchronous and synchronous channels.

Ideally the language would support efficient first-class continuations. Then one would reify the continuation and shove it somewhere, to resume computation later when the client is ready:

(call-with-current-continuation save-for-later)

This, as far as I understand, is indeed how John Reppy's Concurrent ML was originally implemented. CML first-class channels are unbuffered: you can read and write on them, but both writers and readers block until they are matched. Sounds good, but is this not back to blocking threads again?

Enter Async

The answer greatly depends on what you define as a thread. CML used light-weight threads built on first-class continuations, benchmarking at millions of synchronizations per second. Without first-class continuations, what can .NET do? The answer is F# async: we can model threads as chains of callbacks, then they become first-class and capturable.

As a first cut, let us forget about Async. Just a simple channel that accepts continuation functions with its Read and Write operations. If a match is available, the continuation is invoked immediately, otherwise it is queued for later. Locking is designed to minimize the critical section region:

[<Sealed>]
type Channel<'T>() =
let readers = Queue()
let writers = Queue()
member this.Read ok =
let task =
lock readers <| fun () ->
if writers.Count = 0 then
readers.Enqueue ok
None
else
Some (writers.Dequeue())
match task with
| None -> ()
| Some (value, cont) ->
spawn cont
ok value
member this.Write(x: 'T, ok) =
let task =
lock readers <| fun () ->
if readers.Count = 0 then
writers.Enqueue(x, ok)
None
else
Some (readers.Dequeue())
match task with
| None -> ()
| Some cont ->
spawn ok
cont x
member inline this.Read() =
Async.FromContinuations(fun (ok, _, _) ->
this.Read ok)
member inline this.Write x =
Async.FromContinuations(fun (ok, _, _) ->
this.Write(x, ok))
view raw Channel.fs hosted with ❤ by GitHub

Note: that the code assumes a spawn : (unit -> unit) -> unit function that forks of the computation on a thread pool. It is just a form of trampolining: one could call it immediately but that makes it a non-tail call and can lead to stack overflow. You could probably work around the need to do this..

Benchmarking Message-Passing

Now let us check out the implementation with F# Async. Note that it is very easy to write adapters to Read and Write methods on the channel to give them an Async signature. F# Async syntax also makes it very easy to define new "threads" - these are exactly what they are conceptually. Although async code may jump between real .NET threads, it is logically a single thread and allows, for example, to safely use single-threaded data structures.

For the benchmark we define two async chains: one pushes a series of numbers on the channel and another one reads them and computes their sum. Then we run them in parallel.

let test (n: int) =
let chan = Channel()
let rec writer (i: int) =
async {
if i = 0 then
return! chan.Write 0
else
do! chan.Write i
return! writer (i - 1)
}
let rec reader sum =
async {
let! x = chan.Read()
if x = 0
then return sum
else return! reader (sum + x)
}
Async.Start(writer n)
let clock = System.Diagnostics.Stopwatch()
clock.Start()
let r = Async.RunSynchronously(reader 0)
stdout.WriteLine("Hops per second: {0}",
float n / clock.Elapsed.TotalSeconds)
r
view raw Test.fs hosted with ❤ by GitHub

Great. I have been getting 100, 200, sometimes if I am very lucky up to 400 K hops on this benchmark. That sounds like a good enough result at it might be for some applications. However, it looks like the benchmark is using all 4 of my cores. In fact, they seem to spend a jolly lot of time coordinating an essentially sequential computation.. More on this later.

Async is The Answer

Performance issues aside for the moment, the take home message I want you to get is this: Async is the answer for how to write concurrent code on .NET. The writer interface I talked about earlier would essentially be a form of channel passing byte array segments. The key is coordination: Async makes it easy to write and coordinate code, and does not block .NET-level threads.

But.. How About Performance?

Sceptics would say Async is a performance hog here. Just think of all the callbacks it allocates, the GC pressure, the thread hopping and work stealing, and what not! SocketAsyncEventArgs examples out there seem to offer plenty of evidence that one can design socket servers that do not allocate at all, somehow chaining the components together with preallocated callbacks.

I thought a lot about this and even tried to write some of those optimized solutions. Upon reflection, the criticism is probably valid but it is more a comment about the implementation, not the concept. Async provides a general, easy to read, understand and proofread way to orchestrate concurrent code: I simply do not see a viable general alternative. Attempting to specialize code by hand seems to yield impossible to maintain and fix spaghetti. If you know a competitive way to structure such code sensibly I will be very happy to hear about it.

For Real Performance Junkies

For real performance junkies like me, the lesson is this: if you are unhappy with the peformance of beautiful Async code, before rewriting it to spaghetti consider specializing Async implementaion. In fact, try to get the F# compiler expand the code to spaghetti instead of doing it manually. F# is not as good as GHC at this, but its inline facility does seem to help in this.

What exactly do we want from Async here? Not much. Throw cancellation and exception handling out the window. Throw thread hopping and parallelism out the window. Assume that all our callbacks execute quickly and without exceptions. Inline everything in hope that F# will reduce at least some of the intermediate lambdas at compile time, and lower GC pressure. With this in mind I got something like this:

type Async<'T> = ('T -> unit) -> unit
[<Sealed>]
type Async() =
static let self = Async()
member inline this.Return(x: 'T) : Async<'T> =
fun f -> f x
member inline this.ReturnFrom(x: Async<'T>) = x
member inline this.Bind
(x: Async<'T1>, f: 'T1 -> Async<'T2>) : Async<'T2> =
fun k -> x (fun v -> f v k)
static member inline Start(x: Async<unit>) =
Pooling.Pool.Spawn(fun () -> x ignore)
static member inline RunSynchronously(x: Async<'T>) : 'T =
let res = ref Unchecked.defaultof<_>
let sem = new System.Threading.SemaphoreSlim(0)
Pooling.Pool.Spawn(fun () ->
x (fun v ->
res := v
ignore (sem.Release())))
sem.Wait()
!res
static member inline FromContinuations
(f : ('T -> unit) *
(exn -> unit) *
(System.OperationCanceledException -> unit)
-> unit) : Async<'T> =
fun k -> f (k, ignore, ignore)
let async = Async()
view raw Async.fs hosted with ❤ by GitHub

The idea is that since we are not using all the features of F# async, our specialized implementation will be a better fit. Incidentally, the same applies to the thread pool. Instead of scheduling many tasks let us just run one (you can have N if you want) threads dedicated to running these short-lived callbacks in a tight loop. Forget work stealing, cancellation, and so on: our objective is to maximize the peformance of the mostly-sequential processes. Here is my humble attempt:

open System.Collections.Concurrent
open System.Threading.Tasks
type Pool private () =
let queue = new BlockingCollection<_>(ConcurrentBag())
let work () =
while true do
queue.Take()()
let long = TaskCreationOptions.LongRunning
let task = Task.Factory.StartNew(work, long)
static let self = Pool ()
member private this.Add f = queue.Add f
static member Spawn(f: unit -> unit) = self.Add f
view raw ThreadPool.fs hosted with ❤ by GitHub

Drumroll..

With these changes, I managed to get up to 1M hops. A lot better. The key observation is that having a single thread in the pool is critical. As more threads fight for work, performance quickly decreases: the process is essentially sequential and likes to run as such. Inlining in async also seems to help but just a little. Having a simpler async helps too but the vanilla one can be tolerated.

Is this result relevant to web servers? I hope so. It seems to me that webservers run mostly-sequential processes, albeit a lot of them in parallel. Dumping all those on a shared pool would make the work jump around a lot and probably be detrimental to throughput. Is specializing Async necessary to help here? Probably not. I will be glad to hear of ways to set thread or core affinity or something like that to get similar improvements.

In conclusion, Async is the right tool for concurrent code, but you might want to tailor it a bit to your application's need.

Full code of the article:

(*
Copyright (c) 2008-2011 IntelliFactory
GNU Affero General Public License Usage The code
is free software: you can redistribute it and/or
modify it under the terms of the GNU Affero
General Public License, version 3, as published by
the Free Software Foundation.
The code is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero
General Public License for more details at
<http://www.gnu.org/licenses/>.
If you are unsure which license is appropriate for
your use, please contact IntelliFactory at
http://intellifactory.com/contact.
See this blog for the discussion:
http://t0yv0.blogspot.com/2011/12/making-async-5x-faster.html
*)
#if INTERACTIVE
#else
namespace IntelliFactory.Examples
#endif
open System
open System.Collections.Concurrent
open System.Collections.Generic
open System.Threading
open System.Threading.Tasks
[<Sealed>]
type Pool private () =
let queue = new BlockingCollection<_>(ConcurrentBag())
let work () = while true do queue.Take()()
let long = TaskCreationOptions.LongRunning
let task = Task.Factory.StartNew(work, long)
static let self = Pool()
member private this.Add f = queue.Add f
static member Spawn(f: unit -> unit) = self.Add f
[<AutoOpen>]
module FastAsync =
type Async<'T> = ('T -> unit) -> unit
[<Sealed>]
type Async() =
member inline this.Return(x: 'T) : Async<'T> =
fun f -> f x
member inline this.ReturnFrom(x: Async<'T>) = x
member inline this.Bind
(x: Async<'T1>, f: 'T1 -> Async<'T2>) : Async<'T2> =
fun k -> x (fun v -> f v k)
static member inline Start(x: Async<unit>) =
Pool.Spawn(fun () -> x ignore)
static member inline RunSynchronously(x: Async<'T>) : 'T =
let res = ref Unchecked.defaultof<_>
use sem = new SemaphoreSlim(0)
Pool.Spawn(fun () ->
x (fun v ->
res := v
ignore (sem.Release())))
sem.Wait()
!res
static member inline FromContinuations
(f: ('T -> unit) *
(exn -> unit) *
(OperationCanceledException -> unit) -> unit)
: Async<'T> =
fun k -> f (k, ignore, ignore)
let async = Async()
[<Sealed>]
type Channel<'T>() =
let readers = Queue()
let writers = Queue()
member this.Read ok =
let task =
lock readers <| fun () ->
if writers.Count = 0 then
readers.Enqueue ok
None
else
Some (writers.Dequeue())
match task with
| None -> ()
| Some (value, cont) ->
Pool.Spawn cont
ok value
member this.Write(x: 'T, ok) =
let task =
lock readers <| fun () ->
if readers.Count = 0 then
writers.Enqueue(x, ok)
None
else
Some (readers.Dequeue())
match task with
| None -> ()
| Some cont ->
Pool.Spawn ok
cont x
member inline this.Read() =
Async.FromContinuations(fun (ok, _, _) ->
this.Read ok)
member inline this.Write x =
Async.FromContinuations(fun (ok, _, _) ->
this.Write(x, ok))
module Main =
let test (n: int) =
let chan = Channel()
let rec writer (i: int) =
async {
if i = 0 then
return! chan.Write 0
else
do! chan.Write i
return! writer (i - 1)
}
let rec reader sum =
async {
let! x = chan.Read()
if x = 0
then return sum
else return! reader (sum + x)
}
Async.Start(writer n)
let clock = System.Diagnostics.Stopwatch()
clock.Start()
let r = Async.RunSynchronously(reader 0)
stdout.WriteLine("Hops per second: {0}",
float n / clock.Elapsed.TotalSeconds)
r
[<EntryPoint>]
let main args =
test 1000000
|> printfn "Result: %i"
0
#if INTERACTIVE
#time
Main.test 1000000
#endif

Wednesday, December 7, 2011

Coq Trivia: Dependent Pattern-Matching and Inversion

I found some more time to study Coq. One place where I stumble very frequently is case analysis of value-indexed inductive types. There are often cases that lead to contradiction. Other cases intuitively imply some variables to be equal. Vanilla match constructs gives you no help, leaving you stranded. In proof mode, inversion tactic helps a lot. However, generated proofs are huge and hard to navigate. Another option is to use dependent pattern matching directly (which is what inversion generates for you).

I was eager to figure this out because there are times when proof mode is too confusing for me as a beginner or I seem to lack control defining precisely what I want. In particular, if you do code extraction, you do not want to use inversion-generated convoluted code. But up until today I had little success. Here is the first, and very modest, success story: retrieving the head of a non-empty length-indexed list, and discharging the Nil case with dependent pattern-matching:

(* Consider lists indexed by length. *)
Inductive List (t : Type) : nat -> Type :=
| Nil : List t 0
| Cons : forall n, t -> List t n -> List t (S n).
(* Every non-empty list has a head element. *)
Definition Head : forall t n, List t (S n) -> t.
intros t n list. inversion list. assumption. Defined.
(* In the proof, "inversion" tactic saved the day, eliminating the
"Nil" case. However, the body of the proof term is not very
pretty. Consider: *)
Print Head.
(*
Head =
fun (t : Type) (n : nat) (list : List t (S n)) =>
let X :=
match list in (List _ n0) return (n0 = S n -> t) with
| Nil =>
fun H : 0 = S n =>
let H0 :=
eq_ind 0 (fun e : nat => match e with
| 0 => True
| S _ => False
end) I (S n) H in
False_rect t H0
| Cons n0 X X0 =>
fun H : S n0 = S n =>
let H0 :=
let H0 :=
f_equal (fun e : nat => match e with
| 0 => n0
| S n1 => n1
end) H in
eq_rect n (fun n1 : nat => t -> List t n1 -> t)
(fun (X1 : t) (_ : List t n) => X1) n0 (eq_sym H0) in
H0 X X0
end in
X (eq_refl (S n))
: forall (t : Type) (n : nat), List t (S n) -> t
*)
(* Whoah, what a proof! Can we do better manually? Yes, if we find a
way to discharge the "Nil" case. Dependent pattern-matching to the
rescue: *)
Definition HeadManual t n (list: List t (S n)) : t :=
match list in List _ k return match k with
| O => unit
| S _ => t
end with
| Nil => tt
| Cons _ x _ => x
end.
(* This worked out because "O" case never matches, that is, we
exploited the contradiction and were allowed to return unit. The
automated proof was more explicit about it, deriving an arbitrary
value from "0 = S n" which implies "False". This example is very
simple and in a sense we were lucky here. For more involved worked
examples see this CPDT chapter:
http://adam.chlipala.net/cpdt/html/DataStruct.html *)

Monday, December 5, 2011

Stack-Allocated Lists

Don Syme and several others sent tweets to correct my claims about boxing in F#. I made the mistake of assuming that structs cannot be used as interface implementations without being boxed and allocated on the heap. As it turns out, .NET constraints allow structs to implement interfaces without being boxed. Out of sheer curiosity, I attempted to (ab)use this facility to systematically remove heap allocation. I decided to write a simple functional program that would not use the heap at all, performing all computation on the stack.

I went for encoding lists and implementing list reversal:

type IList<'T> =
abstract member Tail : #IListConsumer<'T,'R> -> 'R
abstract member Head : 'T
abstract member IsEmpty : bool
and IListConsumer<'T,'R> =
abstract member Consume : #IList<'T> -> 'R
[<Struct>]
type Nil<'T> =
interface IList<'T> with
member this.Head = failwith "Empty"
member this.IsEmpty = true
member this.Tail k = failwith "Empty"
[<Struct>]
type Cons<'T,'L when 'L :> IList<'T>>(head: 'T, tail: 'L) =
interface IList<'T> with
member this.Head = head
member this.IsEmpty = false
member this.Tail k = k.Consume tail
let reverse list =
let rec rev acc list =
match list with
| [] -> acc
| x :: xs -> rev (x :: acc) xs
rev [] list
[<Struct>]
type ListSum(sum: int) =
member this.Consume(list: #IList<int>) =
if list.IsEmpty then sum else
list.Tail (ListSum(sum + list.Head))
interface IListConsumer<int,int> with
member this.Consume list = this.Consume list
[<Struct>]
type ListPrinter<'T> =
interface IListConsumer<'T,int> with
member this.Consume list =
if list.IsEmpty then 0 else
stdout.WriteLine list.Head
list.Tail this
type Lists =
static member Reverse<'R,'T,'L1,'L2,'K when 'L1 :> IList<'T>
and 'L2 :> IList<'T>
and 'K :> IListConsumer<'T,'R>>
(acc: 'L1) (list: 'L2) (k: 'K) =
if list.IsEmpty
then k.Consume acc
else list.Tail (ListReverser(list.Head, acc, k))
static member Sum (list: #IList<int>) =
ListSum(0).Consume(list)
static member ReverseSum (list: #IList<int>) =
Lists.Reverse (Nil ()) list (ListSum(0))
static member ReversePrint list =
Lists.Reverse (Nil ()) list (ListPrinter())
and [<Struct>] ListReverser<'T,'R,'L,'K when 'L :> IList<'T>
and 'K :> IListConsumer<'T,'R>>
(head: 'T, acc: 'L, k: 'K) =
interface IListConsumer<'T,'R> with
member this.Consume tail = Lists.Reverse (Cons (head, acc)) tail k
#time
let r = ref 0
!r
for i in 1 .. 10000000 do
let list = Cons(1, Cons(2, Cons(3, Cons (4, Cons (5, Nil())))))
r := !r + Lists.ReverseSum list
for i in 1 .. 10000000 do
let list = [1; 2; 3; 4; 5]
r := !r + List.sum (List.rev list)


This method for getting rid of heap allocation would not work in C as this particular use of generics implies runtime code generation. As far as I understand, the CLR here generates new code paths for every possible list length.

My little benchmark has stack-allocated lists of 5 elements each being reversed a bit slower than heap-allocated lists. But it is fun to try. FSI reports zero GC usage.

Friday, November 11, 2011

Optimizing the Heck Out of F#: HTTP Request Parsing

As part of the WebSharper web server effort, I have been writing an HTTP request parser. Tuning the parser for performance for the common simple case (small, correct HTTP request) has improved performance 8-fold, from 30K to 250K sample requests parsed every second on a single core of my Core i3. Let me review what I have learned from this.

Indexing


Accessing array elements goes through a bounds check. Unmanaged C++ code clearly wins here. C# has unsafe regions, but F# does not. So what can we do in F# to be competitive? The only option left is using bulk operations from the standard library. BCL is not at all helpful here - it is not clear from the documentation which functions drop the check. Also, many operations one would want are simply missing.

For an example where it matters, I was not able to match the performance of System.Array.FindIndex with any F# code I wrote to do the same job.

I imagine this is a killer problem for numerical computing. With unavoidable bounds checking, one really cannot hope to design numerical code in safe managed .NET that would match fortran routines.

Specialization


Generic code gets a staggering performance hit when certain simple operations such as equality do not get specialized to the simple value type you are using. Polymorphism has a cost. Inline F# functions sometimes help here. But it is unfortunate there is no flag to monomorphise some code. MLton users, I envy you here.

Value Types


Using value types such as structs and enums reduces the GC pressure. Note, however, that they still get boxed sometimes. For example, if a struct implements an interface, code that expects an interface implementation will receive the struct boxed. This code has to be specialized to structs.

Mutation


If we care about every bit of performance, mutation matters. However, I found myself wasting lots of time trying to wrap my head around a problem thinking about it in terms of mutation. Clearly, the diagnosis is premature optimization. What I found more helpful is writing a purely functional solution and then transforming it to eliminate allocations and introduce mutation.

Note also that the GC is good enough in most cases. One cannot afford to allocate on the heap per every byte, but allocating short-lived objects does not matter much until you need to do it 100K times a second.

Profiling


Profiling is a life-saver. I used a SlimTune profiler this time. My first discovery was that using System.Collections.Specialized.NameValueCollection for headers is really expensive. It spends a lot of time computing case-insensitive hash values for the header keys. What a bother, especially when the application does not look into the headers. I settled for queuing the headers instead and exposing them as a sequence.

The profiler helps to spend your time effectively - optimizing what really matters.

Specifics of HTTP Request Parsing


The problem is rather simple: HTTP requests keep arriving and need to be parsed and forwarded to the responder thread. In the keep-alive scenario many requests arrive on the same socket. If there is pipelining, they all come at once.

What I wanted to solve here is parsing the requests incrementally, so that if half of a request arrives we say OK and suspend in mid-air until more data is available.

Iteratees are the general solution here. However, iteratees are allocating on the heap, and F#, unlike Haskell, does not do any program transformation magic to simplify them. For this reason it seems that it is not the ideal solution, at least on the byte level.

What I ended up doing instead with incomplete requests is re-parsing. The parsing logic is expressed over a TextReader-like interface. Parser return codes are Done, Error, or Waiting. If the parser says Waiting, I keep the data in the buffer. If it succeeds, the data is discarded. Errors cannot be recovered from.

To some extent micro-parsers can be combined without using the heap. The trick here is to use mutation to return the result on success. Since the return code is an enum, I can join parsers with `&&&`:

parseMethod r req
&&& skipChar r ' '
&&& parseUntil r ' ' &req.uri
&&& parseVersion r req
&&& parseHeaders r req

In case of an early error, parsing does not stop, but there is no reason to care since most requests are well-formed.

To work with TextReader-like interface and avoid allocation, I use a constant-space ring buffer that acts as a limited-size queue for bytes. Most servers limit the size of the request head to 8192, this is what I do as well. It provides its own TextReader that assumes ASCII encoding.

The most rewarding optimization was adding custom methods to the buffer and its reader to make parseUntil and r.ReadLine possible. Instead of going byte-by-byte through several layers of indirection, I switched to System.Array.IndexOf. A ring buffer needs to do it at most twice per operation.

Monday, November 7, 2011

An F# Web Server From Sockets and Up

I have implemented a simple web server in F#. The idea was to try to marry .NET asynchronous socket operations with F# async. Result: F# async seems to be the right tool for the job of webserver implementation: it makes asynchronous programming intuitive without adding too much performance overhead. The server executes 3500 keep-alive or 1000 normal request per second on my Core i3 machine, compared to 2500/500 requests per second using IIS or System.Net.HttpListener.

Asynhronous Socket Operations


Working with sockets in .NET is done with the Socket class. From the MSDN documentation, the recommended approach is to use the asynchronous methods such as AcceptAsync, SendAsync and ReceiveAsync. These methods register callbacks to be executed when data arrives or ships through the socket. As a result of the callback approach, no threads are blocked by slow connections.

Sockets and F# Async


Unfortunately, the default interface is not very intuitive. The example code is atrocious. Since the operations are callback-based, this seems like a good match for
F# async. I went for the first mapping that came to mind:

open System.Net.Sockets
/// Thrown when sockets encounter errors.
exception SocketIssue of SocketError
/// Performs AcceptAsync.
val Accept : Socket -> Async<Socket>
/// Performs ReceiveAsync.
val Receive : Socket -> System.ArraySegment<byte> -> Async<int>
/// Performs SendAsync.
val Send : Socket -> System.ArraySegment<byte> -> Async<unit>


Implementing this interface is easy - it is just working around the boilerplate: creating a SocketAsyncEventArgs object, registering a callback, calling the method, checking the result for errors. I was able to express all of it in a single helper method:

open System.Net.Sockets
type A = System.Net.Sockets.SocketAsyncEventArgs
type B = System.ArraySegment<byte>
exception SocketIssue of SocketError with
override this.ToString() =
string this.Data0
/// Wraps the Socket.xxxAsync logic into F# async logic.
let inline asyncDo (op: A -> bool) (prepare: A -> unit)
(select: A -> 'T) =
Async.FromContinuations <| fun (ok, error, _) ->
let args = new A()
prepare args
let k (args: A) =
match args.SocketError with
| System.Net.Sockets.SocketError.Success ->
let result = select args
args.Dispose()
ok result
| e ->
args.Dispose()
error (SocketIssue e)
args.add_Completed(System.EventHandler<_>(fun _ -> k))
if not (op args) then
k args
/// Prepares the arguments by setting the buffer.
let inline setBuffer (buf: B) (args: A) =
args.SetBuffer(buf.Array, buf.Offset, buf.Count)
let Accept (socket: Socket) =
asyncDo socket.AcceptAsync ignore (fun a -> a.AcceptSocket)
let Receive (socket: Socket) (buf: B) =
asyncDo socket.ReceiveAsync (setBuffer buf)
(fun a -> a.BytesTransferred)
let Send (socket: Socket) (buf: B) =
asyncDo socket.SendAsync (setBuffer buf) ignore
view raw AsyncSockets.fs hosted with ❤ by GitHub


Optimizations


It seems that the common optimization paths include pooling sockets, pooling SocketAsyncEventArgs, and pooling buffers to prevent memory fragmentation. The latest point is the most interesting. Socket code itself is written in unmanaged C and passing data between garbage-collected .NET code and C code is done by pinning the .NET array used as a buffer. A pinned array is never relocated by the garbage collector, so the C code
has no trouble finding it. A lot of pinned arrays to work around make garbage collector's job harder - memory gets fragmented.

To avoid fragmentation issues, instead of allocating a lot of small arrays I allocate one huge array and then lease sections of it to be used as buffers by the socket operations.

I have not yet tried pooling `Socket` or `SocketAsyncEventArgs` objects in a similar manner.

Benchmarks


For benchmarking I have used Apache Bench (ab) tool running on Arch Linux inside a VirtualBox VM. All benchmarks involved dynamically generating and serving a "HELLO, WORLD" document on my Core i3 laptop, with ab -k -c 1000 -n 10000:

ServerKeep-alive r/sRegular r/s
F# WebServer35001000
Haskell warp/wai GHC 735003500
IIS 2500500
System.Net.HttpListener?500
node.js (Windows)800400
node.js (Linux)?3000

I do not feel very good about these numbers, in particular because I have seen claims of Haskell WARP doing 90000 r/s on only slightly faster hardware (8-core Core i5). It may be that I am hitting VirtualBox networking overhead or I have not built the Haskell code with proper flags.

But for what they are worth, the numbers seem to indicate that F# async is a good enough foundation for web servers with performance in the IIS league. It does not need to be faster, it just needs to be good enough. The real advantage is that F# async code is tremendously easier to read and write than explicit callback code.

EDIT: Please do take the benchmarks with a grain of salt. They are far from comprehensive or correctly done.

Wednesday, September 28, 2011

Transforming Large Unions in F#

Writing compilers is not a really a niche activity. If you think of it, every programmer is doing precisely that, defining a specific language for a given domain and then explaining its semantics to the computer. F#, ML and Haskell programmers know that union types are a life-saver for dealing with languages systematically. These types do, however, sometimes grow large, becoming less and less nice to work with.

If a 30-case recursive union type, or 2-3 co-recursive 20-30 case union types, is something you have to deal with, consider writing a transform function, a shallow map over the immediate children. This simple addition helps your more complex transformations focus on the cases they really need to focus on, leaving the boilerplate work to transform. The gist below provides an extended example.

(*
We start designing a representation for some functional language.
Keeping it simple, we use strings to represent identifiers (to be
changed later by something more fancy), define a few primitives,
constants, and finally expressions.
*)
type Id = string
type Constant =
| Bool of bool
| Double of double
| Int of int
| String of string
type Primitive =
| Add
| Divide
| Multiply
| Remove
type Expression =
| Apply of E * E
| Call of Primitive * list<E>
| Constant of Constant
| If of E * E * E
| Lambda of Id * E
| Let of Id * E * E
| Var of Id
and private E = Expression
(*
The compiler will have to do a quite a few transformations on this
data structure. Let us start with something simple, like getting the
set of free identifiers in an expression. Trying to keep it as simple
as possible, we just write a recursive function again:
*)
let rec getFreeIdentifiers (expr: Expression) =
let (!) = getFreeIdentifiers
let (!!) = List.map (!)
match expr with
| Apply (f, x) ->
!f + !x
| Constant _ ->
Set.empty
| Call (_, xs) ->
Set.unionMany !!xs
| If (c, t, e) ->
!c + !t + !e
| Let (id, value, body) ->
Set.union !value (Set.remove id !body)
| Lambda (id, body) ->
Set.remove id !body
| Var id ->
Set.singleton id
(*
Let us now consider a transformation of type `E -> E`. For example,
note that in our untyped language `Apply (Lambda (x, y), z)` is
semantically equivalent to `Let (x, z, y)`.
As an aside, this kind of reasoning is quite dangerous! You better be
sure you are not fooling yourself by imagining a semantic equivalence
where there is none. Using proof assistants such as Coq is a great
help.
We can normalize our representation by writing a transformation that
finds all reducible expressions and converts them to `Let`
expressions:
*)
let rec removeRedexes (expr: Expression) =
let (!) = removeRedexes
let (!!) = List.map (!)
match expr with
| Apply (f, x) ->
match !f, !x with
| Lambda (id, body), value -> Let (id, x, body)
| f, x -> Apply (f, x)
| Constant _ ->
expr
| Call (p, xs) ->
Call (p, !!xs)
| If (c, t, e) ->
If (!c, !t, !e)
| Let (id, value, body) ->
Let (id, !value, !body)
| Lambda (id, body) ->
Lambda (id, !body)
| Var _ ->
expr
(*
The above code is clear, but there is a maintainability problem. When
the `Expression` definition is changing over time, when it has 30 or
so cases, or when we write a lot of transformations, pattern-matching
becomes very, very tedious. Note also that in both examples only a
few cases were interesting, that is, relevant to the algorithm, and
the rest were processed in a generic fashion.
The practical solution is to extract the common pattern into a single
function that maps over the expression's immediate children. It
greatly reduces the boilerplate code.
*)
let transform (!) expr =
let (!!) = List.map (!)
match expr with
| Apply (f, x) -> Apply (!f, !x)
| Call (p, xs) -> Call (p, !!xs)
| Constant _ -> expr
| If (c, t, e) -> If (!c, !t, !e)
| Lambda (v, b) -> Lambda (v, !b)
| Let (v, x, y) -> Let (v, !x, !y)
| Var _ -> expr
(*
The `transform` function has to be touched every time `Expression`
definition changes, and is proportional to `Expression` in length.
But then it is very straightforward, and other transformations become
much shorter, for example:
*)
let rec removeRedexesShort (expr: Expression) =
let (!) = removeRedexesShort
match expr with
| Apply (f, x) ->
match !f, !x with
| Lambda (id, body), value -> Let (id, x, body)
| f, x -> Apply (f, x)
| _ ->
transform (!) expr
(*
A truly lazy programmer can define a `fold` over immediate
sub-expressions in terms of `transform`, so that `fold` does not have
to be touched when `Expression` changes:
*)
let fold (f: 'S -> E -> 'S) (z: 'S) (expr: E) : 'S =
let r = ref z
let g expr =
r := f !r expr
expr
transform g expr
|> ignore
!r
(*
The simple `fold` helps to collect free identifiers in
accumulator-fashion, again ignoring boring cases and focusing only on
what is relevant:
*)
let rec getFreeIdentifiersShort (expr: E) =
let rec free (acc: Set<Id>) (expr: E) =
let (!) = getFreeIdentifiersShort
match expr with
| Let (id, value, body) ->
let set = free acc body
free (Set.remove id set) value
| Lambda (id, body) ->
Set.remove id (free acc body)
| Var id ->
Set.add id acc
| _ ->
fold free acc expr
free Set.empty expr


There is, of course, a downside to using such shortcuts. The compiler no longer forces you to re-read and mentally check all transformations when you change the Expression definition, for example by adding new cases. If you add LetRecursive, the code will still compile but will be broken because of the catch-all pattern in getFreeIdentifiers.

Some readers will note that these transformations can be thought of cata-, ana-, para-, or some-such morphisms and can be treated directly as such. Others will perhaps suggest using reflection or meta-programming to automatically derive transform, or even more complex transforms. As always, you should use what works for you.

As for me, I find that I am not yet comfortable with climbing the abstraction ladder much further. It seems to inevitably involve some loss of control. It helps a lot to have explicit control over doing top-down, bottom-up, or mixed traversals, as evident in recursive function definitions. The technique also scales easily to languages with expression-statement distinction. Most importantly, my brain does not explode when I look at such code the way it does when dealing with monstrous morphisms.

To summarize, factoring out mapping over intermediate sub-expressions helps a lot to reduce boilerplate in compiler code while keeping this code relatively easy to read and understand. The technique, although presented in F#, applies equally well to any language with algebraic
datatypes.

Is all this practical? We found it very useful in WebSharper, the F#->JavaScript platform my company is developing. There we have union types to represent JavaScript syntax, simplified optimisation-friendly JavaScript-like language we call Core, and F# itself, as starting from version 2.4 we had to use our own quotation type in an attempt to avoid reflection.

Tuesday, September 27, 2011

A 30-40% Faster and Leaner WebSharper 2.4 is Coming

We are currently finalizing the 2.4 release of WebSharper. This release is mainly concerned with bug fixes and quality-of-implementation improvements. The exciting part is that we are witnessing about 30%-35% generated code size reduction, and a 40% reduction in compilation time.

Getting there was rather tedious. One key refactoring was replacing System.Reflection with Mono.Cecil as the main workhorse for processing F# assemblies. To use Mono.Cecil, and also to avoid some of the nasty reflection bugs in the F# itself, I had to rewrite quotation metadata parsing by having a peek at F# compiler sources. I also had to refactor most of the data structures to stop assuming System.Reflection availability.

For getting better code generation, I revised our Core language for simplicity. Both Core and associated optimizer are made simpler. Compiler optimizations are more focused, targeting the common use-cases.

In particular, local tail-call elimination combined with uncurrying finally allows for automated unrolling of recursive functional code into JavaScript loops. The following code, for example, is now safe for stack use:

[<JavaScript>]
let Factorial (n: int) =
let rec fac n acc =
match n with
| 0 | 1 -> acc
| n -> fac (n - 1) (acc * n)
fac n 1
view raw Factorial.fs hosted with ❤ by GitHub


I am thankful for my colleagues and their effort at building the F# community site, FPish. This codebase provided an invaluable resource for testing and optimizing the compiler on a large project.

I feel that I have learned a lot on this project. It turns out that reading about compilers is not quite the same as writing compilers! You only appreciate this truth after trying. I hardly discovered anything you might call a research contribution, but there are some practical little techniques specialized to F#/.NET that made my life easier and might help you too. Stay tuned for discussions of an approach to maintain and traverse large F# union types, considerations for designing data structures, the memoizing fixpoint pattern, large F# codebase organization ideas, and other things along those lines.

Monday, September 26, 2011

Firefox fails on nested closures at 16 layers

How deep do you think you can nest your closures in JavaScript? Yes, we know there is no tail-recursion, probably no inlining either, we know we should not do this, but sometimes we still do. Before today I expected something like 64-128 levels of nesting to work out fine. In fact, Chrome chokes only at level 512 or so, due to stack overflow - fine.

EDIT: There is Bug #517781 reported 2009-09-20, and STILL NOT ADDRESSED. Please upvote it if you can.

But Firefox.. At level 16! Reference error: x16 is not defined

/// generated JavaScript for k=16
function bind(x,f) {return f(x)}
function test(x0) {
var k0;
k0 = function (x1) {
var k1;
k1 = function (x2) {
var k2;
k2 = function (x3) {
var k3;
k3 = function (x4) {
var k4;
k4 = function (x5) {
var k5;
k5 = function (x6) {
var k6;
k6 = function (x7) {
var k7;
k7 = function (x8) {
var k8;
k8 = function (x9) {
var k9;
k9 = function (x10) {
var k10;
k10 = function (x11) {
var k11;
k11 = function (x12) {
var k12;
k12 = function (x13) {
var k13;
k13 = function (x14) {
var k14;
k14 = function (x15) {
var k15;
k15 = function (x16) {
var k16;
k16 = function (x17) {
return x16 + 1;
};
return bind(x16,k16)
};
return bind(x15,k15)
};
return bind(x14,k14)
};
return bind(x13,k13)
};
return bind(x12,k12)
};
return bind(x11,k11)
};
return bind(x10,k10)
};
return bind(x9,k9)
};
return bind(x8,k8)
};
return bind(x7,k7)
};
return bind(x6,k6)
};
return bind(x5,k5)
};
return bind(x4,k4)
};
return bind(x3,k3)
};
return bind(x2,k2)
};
return bind(x1,k1)
};
return bind(x0,k0)
}
view raw generated.js hosted with ❤ by GitHub


This is on 6.0.2. Shame.

Sunday, September 25, 2011

Earley Parsing in Haskell

After a bit of a struggle I can run simple parsing examples using my Earley algorithm implementation [1] in Haskell. At this stage the implementation is likely to be flawed, with respect to correctness and likely performance. Regardless, the point I wanted to prove to myself is that it is perfectly plausible to define context-free grammars and their interpretation in plain Haskell, without resorting to code generation, or giving up full grammar analysis. I suspect it is a well-known method but it was a pleasure to discover. See for yourself:

{-# OPTIONS -XDoRec -XDeriveDataTypeable #-}
module Example where
import Text.Earley
import Data.Dynamic
import Control.Applicative
data E = Nat Int
| Add E E deriving (Show, Typeable)
grammar :: Grammar (Rule Char E)
grammar = do
nat <- rule "NAT"
[ fmap (\_ -> 0) (term '0')
, fmap (\_ -> 1) (term '1')
]
rec expr <- rule "EXPR"
[ fmap Nat $ var nat
, pure (\x _ y -> Add x y)
<*> var expr
<*> term '+'
<*> var expr
]
return expr
view raw CFGExample.hs hosted with ❤ by GitHub


Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.

Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.

[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf

Thursday, August 25, 2011

Minimalistic Unit Testsing in F#

So you want to do some unit testing in F#. Should you really use a framework?

The blessing and the curse of it is that there are so many frameworks out there that choosing one takes about as much time as creating one. Besides, what should a unit testing framework do anyway? There are frameworks offering real features such as randomized (QuickCheck) or exhaustive (SmallCheck) testing. On the other hand, frameworks like NUnit and xUnit do not seem to offer much - just bells and whistles such as result reporting and IDE integration. Is it worth it?

To roll our own, throw out everything but the bare bones. No reflection - all tests first class. No randomized / exhaustive testing - simple tests do not need that. Simply spend a few lines to define nice syntax, and start testing. Compile the test assembly to an executable, and run it on every build.

For example:

let mutable section = ""
let mutable status = 0
let mutable count = 0
exception Fail of string
let run (name: string) test =
try
test ()
count <- count + 1
with
| Fail reason ->
status <- 1
stderr.WriteLine("FAIL: {0}.{1} -- {2}", section, name, reason)
| e ->
status <- 1
stderr.WriteLine("FAIL: {0}.{1}", section, name)
stderr.WriteLine(e)
type Test(name: string) =
member this.Delay(f : unit -> unit) = run name f
member this.Zero() = ()
let Section name =
section <- name
let Throws<'T when 'T :> exn> f =
try
f ()
raise (Fail (sprintf "Does not throw: %O" typeof<'T>))
with
| :? 'T ->
()
let ( =? ) a b =
if a <> b then
raise (Fail (sprintf "Expected %A and got %A." a b))
let ( <>? ) a b =
if a = b then
raise (Fail (sprintf "Unexpected %A." a))
let runTests () =
Section "Arithmetic"
Test "addition" {
1 + 1 =? 2
1 + 0 =? 0
}
[<EntryPoint>]
let main args =
runTests ()
if status = 0 then
stdout.WriteLine("OK, {0} tests passed.", count)
status
view raw Test.fs hosted with ❤ by GitHub

Thursday, July 14, 2011

Combinatory Regular Expressions in Haskell

Combinatory parsing fascinates me. It is an area that has open problems, but also a lot of solved problems that make great exercises. A while ago I tried to define a combinatory regular expression matcher in F#, and was puzzled at how complex it turned out to be. As usual, marginal understanding was to blame.

Now I turned to Haskell for a change. I do think it helped, to learn more both about the problem and about Haskell. I hope to backport this exercise to the ML world, in the meanwhile here is the gist:

{-# OPTIONS -XGADTs #-}
-- This is an excercise in constructing a combinator-based regular
-- expression matcher for self-education.
--
-- Special thanks to Roman Cheplyaka (@shebang) for pointing out to me
-- the need for state reduction to eliminate exponential complexity on
-- certain benchmarks.
module Regex (Regex, SM, token, compile, run, char, string) where
-- Regular expressions are parameterized by the token and return
-- types. For convenience I use GADTs to capture different ways regex
-- shapes as constructors, and compile them later.
--
-- There is a very curious situation with the Kleene star. I included
-- it initially, and indeed it seems that in ML it would be necessary.
-- However Haskell admits a definition:
--
-- star :: Regex t r -> Regex t [r]
-- star x = s where s = Choice (Zip (:) x s) (Empty [])
--
-- It is a lot less efficient than a custom case would be, but it
-- works as expected. `Control.Applicative` contains `many` that is
-- the generalized version of `star`.
import Control.Applicative
import Control.Monad.State
import Data.Char
import Data.List
import qualified Data.Set as S
data Regex t r where
Empty :: r -> Regex t r
Fail :: Regex t r
Map :: (a -> b) -> Regex t a -> Regex t b
Zip :: (a -> b -> r) -> Regex t a -> Regex t b -> Regex t r
Choice :: Regex t r -> Regex t r -> Regex t r
Token :: (t -> Maybe r) -> Regex t r
instance Functor (Regex t) where
fmap = Map
instance Applicative (Regex t) where
pure = Empty
(<*>) = Zip ($)
instance Alternative (Regex t) where
empty = Fail
(<|>) = Choice
-- In order to recognize state sharing I need to compare
-- states. Physical equality is not available in Haskell, so instead I
-- mark them with fresh identifiers.
newtype Id = Id Integer deriving (Eq, Ord, Show)
data Gen = Gen Integer Integer
type Fresh a = State Gen a
initial :: Gen
initial = Gen 1 0
gen :: Gen -> (Id, Gen)
gen (Gen a b) = (Id b, Gen a (a + b))
split :: Gen -> (Gen, Gen)
split (Gen a b) = (Gen (2 * a) b, Gen (2 * a) (a + b))
fresh :: Fresh Id
fresh = do (id, next) <- gets gen
modify (const next)
return id
-- State trees may be infinite and therefore I need to sometimes split
-- ID generation.
par :: Fresh a -> Fresh b -> Fresh (a, b)
par a b = do (g0, g1) <- gets split
let (g2, g3) = split g0
(x, _) = runState a g2
(y, _) = runState b g3
modify (const g1)
return (x, y)
-- State machines represent compiled regular expressions. The machine
-- is either in an final state (accept or reject), or is waiting for
-- the next token. Continuations are provided for no token (end of
-- stream) and every possible next token. Waiting states are marked,
-- so that forked states can be reduced by eliminating duplicates.
data SM t r where
Accept :: r -> SM t r
Reject :: SM t r
Expect :: Id -> SM t r -> (t -> SM t r) -> SM t r
Fork :: SM t r -> SM t r -> SM t r
instance Functor (SM t) where
fmap f (Accept x) = Accept (f x)
fmap f Reject = Reject
fmap f (Expect i z e) = Expect i (fmap f z) (fmap f . e)
fmap f (Fork a b) = Fork (fmap f a) (fmap f b)
-- Identical state reduction happens here. Backup values implement
-- greedy matching (see machine runner below).
cut :: Maybe r -> SM t r -> (Maybe r, SM t r)
cut backup sm = (backup' `mplus` backup, sm') where
(sm', (backup', _)) = runState (mk sm) (Nothing, S.empty)
jn a@(Accept _) _ = return a
jn a b@(Accept x) = do let up (Nothing, s) = (Just x, s)
up x = x
modify up
return a
jn Reject x = return x
jn x Reject = return x
jn x y = return $ Fork x y
mk (Fork a b) = do ax <- mk a
bx <- mk b
jn ax bx
mk sm@(Expect id _ _) = do let f (_, s) = S.member id s
let up (x, s) = (x, S.insert id s)
seen <- gets f
if seen
then return Reject
else do modify up
return sm
mk sm = return sm
-- Compilation uses CPS with accept and reject continuations.
compile :: Regex t r -> SM t r
compile rx = fst $ runState (c (Accept id) Reject rx) initial
c :: SM t (a -> r) -> SM t r -> Regex t a -> Fresh (SM t r)
c y n (Empty x) = return $ fmap ($x) y
c y n Fail = return $ n
c y n (Map f a) = c (fmap (\r x -> r (f x)) y) n a
c y n (Zip f a b) = do let k r x y = r (f y x)
bm <- c (fmap k y) (fmap (\x _ -> x) n) b
c bm n a
c y n (Choice a b) = do (am, bm) <- c y n a `par` c y n b
return $ Fork am bm
c y n (Token f) = do i <- fresh
let e t = p t $ f t
p t (Just x) = fmap ($x) y
p t Nothing = push t n
return $ Expect i n e where
push :: t -> SM t r -> SM t r
push t (Expect _ _ f) = f t
push t (Fork a b) = Fork (push t a) (push t b)
push _ x = x
advance :: SM t r -> SM t r
advance (Expect _ z _) = z
advance (Fork a b) = Fork (advance a) (advance b)
advance x = x
-- Finally, the machine runner function.
run :: SM t r -> [t] -> Maybe r
run sm ts = r Nothing sm ts where
r _ (Accept x) _ = Just x
r b Reject _ = b
r b sm (t:ts) = r b' (push t sm') ts where (b', sm') = cut b sm
r b sm [] = r b' (advance sm') [] where (b', sm') = cut b sm
string :: Eq t => [t] -> Regex t [t]
string [] = Empty []
string (c:cs) = Zip (:) (char c) (string cs)
token :: (t -> Maybe r) -> Regex t r
token = Token
char :: Eq t => t -> Regex t t
char c = Token (\x -> if x == c then Just c else Nothing)
view raw Regex.hs hosted with ❤ by GitHub

Thursday, April 28, 2011

Pretty-Printed Code Blocks in LaTeX

I came across PLT Scribble which appears to be the best tool available today for writing documentation. You write text with some Scheme in it, it generates HTML and LaTeX. Without much ado, I could use PLT Racket to define a lexer for F# that pretty-prints code blocks. It worked beautifully for HTML, but in LaTeX I hit a problem – how do I define a macro that typesets its argument with preserved whitespace and mono-space font, while keeping other commands as-is?

Note, verbatim does not work – it types inner commands verbatim, instead of applying them to, say, make keywords blue. There is alltt and fancyvrb but they do not work either: somehow they change the reader too much. Scribble generated commands of the form “\char’146” that LaTeX accepts in default mode but fails to accept inside alltt or Verbatim from fancyvrb.

I knew then I had no choice but to roll my own..

Enter 1970-s text processing technology. Without proper documentation either, because I do not currently have a copy of the TeX book handy. Anyhow, here is what I got after more time than I care to admit:

% ``\Listing{...}'' command typesets its argument as a code block -
% with monospace font and whitespace preserved. It is an
% environment-like command accepting multiple paragraphs.
\def\Listing{\bigskip\begingroup%
\setlength{\parskip}{0pc}\addtolength{\parindent}{1pc}%
\tt\scriptsize\obeylines\obeyspaces\PreserveWhitespace\ListingBody}
% ``\ListingBody'' is an auxillary command need to make Listing work.
% It inserts the argument and closes the group opened by Listing.
\long\def\ListingBody#1{#1\endgroup\smallskip}
% \PreserveWhitespace makes leading spaces appear as-is. To make
% leading spaces appear, we have to set space to be an active
% character that exands to a non-breakable space. The command for
% this is ``\let =\ ``, but unfortunately lifting this command to a
% macro does not quite work because space is not active during the
% definition of the macro - it needs to be made active using
% ``\catcode''. Therefore we define our macro in a little block of
% code where space is made active:
\begingroup
\catcode32=\active
\gdef\PreserveWhitespace{\catcode32=\active\let =\ }
\endgroup
view raw Listing.tex hosted with ❤ by GitHub

Friday, February 18, 2011

Groking HOAS in F#

It is 3 AM in Budapest and, quite typically, I cannot sleep since my mind is actively engaged in remembering all the functional programming tricks I have read about. So, tonight it is HOAS, or Higher Order Abstract Syntax. If you have ever implemented a De Bruijn-indexed AST for Scheme and spent a few days catching indexing bugs, you surely would appreciate the beauty of this:

type Prim =
| Add
| Sub
| Mul
| Div
| Eq
| Not
type Value =
| Bool of bool
| Int of int
| Lambda of (list<Expr> -> Expr)
and Expr =
| Apply of Expr * list<Expr>
| Call of Prim * list<Expr>
| Const of Value
| If of Expr * Expr * Expr
| Let of Expr * (Expr -> Expr)
| LetRec of (Lazy<Expr> -> Expr * Expr)
let Op prim =
match prim with
| Add ->
fun [Int x; Int y] -> Int (x + y)
| Sub ->
fun [Int x; Int y] -> Int (x - y)
| Mul ->
fun [Int x; Int y] -> Int (x * y)
| Div ->
fun [Int x; Int y] -> Int (x / y)
| Eq ->
function
| [Int x; Int y] -> Bool (x = y)
| [Bool x; Bool y] -> Bool (x = y)
| Not -> fun [Bool x] ->
Bool (not x)
let (|Binary|_|) (expr: Expr) =
match expr with
| Call (p, [x; y]) -> Some (p, x, y)
| _ -> None
let rec Eval (expr: Expr) : Value =
match expr with
| Apply (f, xs) ->
match Eval f with
| Lambda f ->
Eval (f xs)
| Call (p, xs) ->
Op p (List.map Eval xs)
| Const x ->
x
| If (x, y, z) ->
match Eval x with
| Bool true -> Eval y
| Bool false -> Eval z
| Let (x, f) ->
Eval (f (Const (Eval x)))
| LetRec f ->
let rec x = lazy fst pair
and body = snd pair
and pair = f x
Eval body
let rec Fac x =
if x = 0 then 1 else x * Fac (x - 1)
let Fac10 =
let i x = Const (Int x)
let ( =? ) a b = Call (Eq, [a; b])
let ( *? ) a b = Call (Mul, [a; b])
let ( -? ) a b = Call (Sub, [a; b])
let ( ^^ ) f x = Apply (f, [x])
LetRec <| fun fac ->
let fac =
fun [x] ->
let (Lazy fac) = fac
If (x =? i 0, i 1, x *? (fac ^^ (x -? i 1)))
|> Lambda
|> Const
(fac, fac ^^ i 10)
Fac 10
|> printfn "%A"
Eval Fac10
|> printfn "%A"
view raw HOAS.fs hosted with ❤ by GitHub

Wednesday, February 16, 2011

Where F#/.NET Falls Short: Higher Kinds

Every programmer thinks their favorite language has a problem. I am no exception. I used to think that F# misses an ML module system or Haskell-style typeclasses, but I came to realize that the complaint is really about a feature called higher-kinded polymorphism. Without this feature, there is a lot of very beautiful, general, and type-safe code in OCaml and Haskell that cannot be expressed in type-safe F#. It can be expressed, but you are facing a tradeoff:

  • Sacrifice generality – this is what workflow builders do. In effect, F# has no true monads, because you cannot easily write code generalized over an arbitrary monad.
  • Sacrifice type safety by using downcasts and rely on convention.

Option #1 is by far the most popular, but very detrimental to the F# ecosystem, as it encourages duplication of specific code that could have otherwise been expressed by a single, general definition.

Option #2 is controversial. I have attempted a few encodings I personally have not yet found a practical one, and in practice usually fall back to Option #1.

To illustrate, suppose we have this Haskell code:

class Functor f where
fmap :: (a -> b) -> f a -> f b
instance Functor [] where
fmap = map
view raw Functor.hs hosted with ❤ by GitHub
Functor type class is parameterized over f, which is itself generic. This is exactly the piece that .NET and F# are lacking. If one really insists on writing code that is generic in every possible Functor, not just a concrete Functor instance, one ends up having to rely on downcasts and convention:

type Box<'T1,'T2> = { Value : obj }
module List =
let Box (list: list<'T>) : Box<list<obj>,'T> = { Value = list }
let Unbox (x: Box<list<obj>,'T>) : list<'T> = x.Value :?> _
type IFunctor<'T> =
abstract member Map : ('T1 -> 'T2) -> Box<'T,'T1> -> Box<'T,'T2>
let ListFunctor =
{
new IFunctor<list<obj>> with
member this.Map f x =
List.Box << List.map f << List.Unbox <| x
}
view raw Functor.fs hosted with ❤ by GitHub

As you can see, this approach (1) has a run-time cost and (2) gets annoying very quickly as the complexity of the code grows.

It is sad.

Thursday, February 10, 2011

The Sad State of fshtmldoc – Can We Do Better?

If you ever tried to generate HTML documentation to document the API of F#-produced assemblies, you have likely tried fshtmldoc, a tool that is a part of F# PowerPack. It does a few things well and another couple of things not so well, there is a single overarching problem this tool. It uses System.Reflection.

Why is System.Reflection evil? In fact it is not, it is simply an API designed for a purpose vastly different from ours (HTML API report generation). The side effects of using these APIs are: the runtime needs to load the documented assemblies, and their references, which is slow and error-prone. fshtmldoc often crashes as it cannot find a reference. Also, the runtime cannot unload these assemblies. So, using this from an MS Build task, for example, is not an option, as it can make your Visual Studio session leak memory.

Is there an alternative? A non-alternative is to use C#-centric tools such as monodoc or SandCastle. These tools do not know anything about F# constructs, and are likely to produce output that will easily confuse you for code that is highly functional – full of function types, unions, records.

Enter Mono.Cecil. There is the alternative reflection and bytecode-manipulation library. It addresses all of the above problems. Could we base a documentation tool on Cecil? Yes. All that needs to be done, is to detect F# constructs from metadata.

The good news is I have almost completed this – remaining tasks are fairly minor. I am testing this out on a bunch of our IntelliFactory assemblies. More coming soon..

Wednesday, February 9, 2011

Home-made Regular Expressions in F#: Thompson NFA

Russ Cox re-discovered and popularized an old (originating in a 1968 paper by Thompson) technique for implementing regular expressions: http://swtch.com/~rsc/regexp/regexp1.html. The article offers a wonderfully simple to understand presentation, and I just could not resist trying it in F#.

First, the regular expressions themselves: the source language can be nicely described with a union type, encompassing the empty string, choice, concatenation, the Kleene star, and a token parser. The  algorithm is general enough to accept the Filter case, which generalizes over character classes and the like.

type Regex<'T> =
| Choose of Regex<'T> * Regex<'T>
| Concat of Regex<'T> * Regex<'T>
| Empty
| Filter of ('T -> bool)
| Star of Regex<'T>
view raw Regex.fs hosted with ❤ by GitHub

The first step is to compile this definition to a non-deterministic finite automaton (NFA). As a first cut, the state of an NFA could look like this:

module NFA =
type State<'T> =
| Match
| Branch of State<'T> * State<'T>
| Read of ('T -> bool) * State<'T>
view raw NFA.fs hosted with ❤ by GitHub

When writing the compiler function, however, I realized that I would want two more features for the NFA state. First, compiling the Kleene star makes it hard to tie the knot, making one wish the language was Haskell. As a workaround, I simulate mutable union fields with a helper record. Second, it is nice to be able to compare state values to use binary search trees later in the game. To do this, I label every state with an integer and use it for comparison:

module NFA =
[<CustomEquality>]
[<CustomComparison>]
type State<'T> =
| Match
| Branch of Pair<'T>
| Read of int * ('T -> bool) * State<'T>
member this.Key =
match this with
| Match -> 0
| Branch x -> x.k
| Read (i, _, _) -> i
override this.Equals other =
match other with
| :? State<'T> as o ->
this.Key = o.Key
| _ ->
false
override this.GetHashCode() =
this.Key
interface IComparable with
member this.CompareTo other =
compare (hash other) this.Key
and Pair<'T> =
{
mutable k : int
mutable x : State<'T>
y : State<'T>
}
view raw NFA.State.fs hosted with ❤ by GitHub

The compiler looks like this:

module NFA =
let Compile regex =
let k = ref 0
let next () =
incr k
!k
let rec compile regex matched =
match regex with
| Choose (a, b) ->
let y = compile b matched
let x = compile a matched
Branch { k = next (); x = x; y = y }
| Concat (a, b) ->
matched
|> compile b
|> compile a
| Empty ->
matched
| Filter f ->
Read (next (), f, matched)
| Star x ->
let p = {
k = 0
x = Unchecked.defaultof<_>
y = matched
}
let r = Branch p
p.x <- compile x r
p.k <- next ()
r
compile regex Match
view raw NFA.Compiler.fs hosted with ❤ by GitHub

The key insight is that the NFA corresponds to a deterministic (DFA) machine. Whenever our machine encounters choice (Branch), it follows both options simultaneously. So then, the state of this machine (DFA state) is a set of NFA states. You can easily write a transition function that takes an input token and produces a transformation on the DFA state. Moreover, this function can be cached by token and input state. Once caching is introduced, we obtain a simple yet fascinating, lazily-constructed DFA machine.

The definition of the DFA state is not surprising:

module DFA =
open System.Collections.Generic
type State<'T> =
{
IsMatch : bool
States : Set<NFA.State<'T>>
Transitions : Dictionary<'T,State<'T>>
}
view raw DFA.fs hosted with ❤ by GitHub

In addition to token-based transitions, I also cache IsMatch information which can be obtained by traversing the NFA states and testing if any of them contains a Match case.

Here is the sketch of the transition function:

module DFA =
let rec nextState current token =
match current.Transitions.TryGetValue token with
| true, state -> state
| _ ->
let rec loop current next =
if Set.isEmpty current then getState next else
let min = Set.minElement current
let current = Set.remove min current
match min with
| NFA.Branch br ->
loop (current.Add(br.x).Add(br.y)) next
| NFA.Read (_, f, n) ->
if f token
then loop current (next.Add n)
else loop current next
| NFA.Match ->
loop current next
let next = loop current.States Set.empty
current.Transitions.[token] <- next
next

What remains is to write an interpreter for the DFA. I will post it in the next article when it is more tested. Perhaps I will also simplify the above definitions a bit. It is typical for me to find that the first code that I write for any problem is much more complicated than necessary.

Monday, February 7, 2011

WebSharper B5 with Extensions

We have finally released the next beta (5) of WebSharper that can run the samples I have been demonstrating on the blog. In addition, the download page now lets you play with an exciting set of extensions: Google Maps, Visualization, Info Vis Toolkit, Protovis, Raphael, jQuery Mobile and jQuery UI. Check it out: http://www.websharper.com/

Friday, January 28, 2011

WebSharper Sitelets Cont’d: Developing a Blog Service

The previous article might have left you with an erroneous impression that WebSharper sitelets only support a static set of URLs. This is not the case. In fact, the user-defined Action type can be an arbitrary data structure that maps to a URL. This comes particularly handy when designing a blogging website, since the set of blogs and their URLs is not statically bounded. Something like this would do:

type Id = int
type Action =
| ShowRecentBlogs
| CreateBlog
| ReadBlog of Id
| UpdateBlog of Id
| DeleteBlog of Id
view raw BlogAction.fs hosted with ❤ by GitHub

Now you would need to write a router, which is roughly a bijection between Action values and URLs. A boring way to do so might look like this:

let Router : Router<Action> =
let route = function
| GET (_, SPLIT_BY '/' []) ->
Some ShowRecentBlogs
| GET (_, SPLIT_BY '/' ["create"]) ->
Some CreateBlog
| GET (_, SPLIT_BY '/' ["read"; INT id]) ->
Some (ReadBlog id)
| GET (_, SPLIT_BY '/' ["update"; INT id]) ->
Some (UpdateBlog id)
| GET (_, SPLIT_BY '/' ["delete"; INT id]) ->
Some (DeleteBlog id)
| _ ->
None
let link x =
let uri =
match x with
| ShowRecentBlogs -> "/"
| CreateBlog -> "/create"
| ReadBlog x -> sprintf "/read/%i" x
| UpdateBlog x -> sprintf "/update/%i" x
| DeleteBlog x -> sprintf "/delete/%i" x
Some (Uri(uri, UriKind.Relative))
Router.New route link
view raw BlogRouter.fs hosted with ❤ by GitHub

Writing routers by hand you get full control – you can inspect an Http.Request value when routing, and generate any URL when linking. However writing the above code is not much fun. If you do not particularly care for the exact URLs, there is a much shorter way to get there:

let Router : Router<Action> = Router.Infer()
view raw AutoRouter.fs hosted with ❤ by GitHub

The Infer function inspects the Action type by reflection to derive a router automatically. It is smart enough to handle records, unions, most scalar types, lists, and options. In our case it derives the following mapping:

ShowRecentBlogs <=> /ShowRecentBlogs
CreateBlog <=> /CreateBlog
ReadBlog $x <=> /ReadBlog/$x
UpdateBlog $x <=> /UpdateBlog/$x
DeleteBlog $x <=> /DeleteBlog/$x
A smart thing to do would be to alter it just a little bit, for example by making ShowRecentBlogs available at the root. This can be done by combining a simple table router with the inferred router:

let Router : Router<Action> =
Router.Table [ShowRecentBlogs, "/"]
<|> Router.Infer()

This is much better: 80% of the benefit with 20% of the work.

Now that URL routing is out of the way, you can finally get down to the blog service itself. I present the most interesting thing first, the controller:

let Controller =
let handle = function
| ShowRecentBlogs ->
let blogs = Model.WithBlogs <| fun db -> db.GetRecentBlogs()
View.ShowRecentBlogs blogs
| CreateBlog ->
View.CreateBlog ()
| ReadBlog id ->
let blog = Model.WithBlogs <| fun db -> db.ReadBlog id
match blog with
| Some blog ->
View.ReadBlog blog
| None ->
Content.NotFound
| UpdateBlog id ->
let blog = Model.WithBlogs <| fun db -> db.ReadBlog id
match blog with
| None -> Content.Redirect Action.CreateBlog
| Some blog -> View.UpdateBlog blog
| DeleteBlog id ->
let blog = Model.WithBlogs <| fun db ->
let blog = db.ReadBlog id
if blog.IsSome then
db.DeleteBlog id
|> ignore
blog
match blog with
| Some blog ->
View.BlogDeleted blog
| None ->
Content.ServerError
{ Handle = handle }

The controller simply handles actions (Action values) and returns responses (Content values), no surprises here. Together with a router, the controller forms a sitelet, a self-contained website ready to go by itself or be embedded in a bigger application:

let Main : Sitelet<Action> =
{
Controller = Controller
Router = Router
}
view raw BlogsSitelet.fs hosted with ❤ by GitHub

For the model in this example, you can use the simplest thing that can possibly work in a development environment, namely a static mutable variable (see the listing at the end of the article). Now, views are quite straightforward as well. The only thing of interest is the use of WebSharper Formlets (client-side form combinators) to build the UI for editing and creating the blogs. I build a simple control that renders the form and talks to the model directly over RPC, redirecting to the homepage when done:

module Client =
open IntelliFactory.WebSharper
open IntelliFactory.WebSharper.Formlet
[<Rpc>]
let CreateBlog blog =
Model.WithBlogs <| fun db ->
db.CreateBlog { blog with Date = DateTime.UtcNow }
[<Rpc>]
let UpdateBlog blog =
Model.WithBlogs <| fun db ->
db.UpdateBlog { blog with Date = DateTime.UtcNow }
[<Inline "document.location = $location">]
let Redirect (location: string) = X<unit>
type BlogControl(homeLocation: string, blog: option<Blog>) =
inherit Web.Control()
new () = new BlogControl("?", None)
[<JavaScript>]
override this.Body =
let (id, title, summary, text) =
match blog with
| None -> (0, "", "", "")
| Some b -> (b.Id, b.Title, b.Summary, b.Text)
let titleForm =
Controls.Input title
|> Enhance.WithTextLabel "Title"
|> Validator.IsNotEmpty "Required."
let summaryForm =
Controls.TextArea summary
|> Enhance.WithTextLabel "Summary"
|> Validator.IsNotEmpty "Summary is required."
let textForm =
Controls.TextArea text
|> Enhance.WithTextLabel "Text"
|> Validator.IsNotEmpty "Text is required"
Formlet.Yield (fun title summary text ->
{
Id = id
Date = DateTime.UtcNow
Title = title
Summary = summary
Text = text
})
<*> titleForm
<*> summaryForm
<*> textForm
|> Enhance.WithLabelConfiguration {
Layout.LabelConfiguration.Default with
VerticalAlign = Layout.VerticalAlign.Top
}
|> Enhance.WithSubmitAndResetButtons
|> Enhance.WithFormContainer
|> Formlet.Run (fun newBlog ->
match blog with
| None -> ignore (CreateBlog newBlog)
| Some _ -> ignore (UpdateBlog newBlog)
Redirect homeLocation)
view raw BlogClient.fs hosted with ❤ by GitHub

Time to try it out:

And here is the complete code listing:

module My.Blogs
open System
open System.Collections.Generic
open System.Web
open IntelliFactory.WebSharper.Sitelets
type Id = int
type Html = string
type Blog =
{
Id : Id
Title : string
Date : DateTime
Summary : Html
Text : Html
}
type Action =
| ShowRecentBlogs
| CreateBlog
| ReadBlog of Id
| UpdateBlog of Id
| DeleteBlog of Id
module Model =
type Blogs =
{
GetRecentBlogs : unit -> seq<Blog>
CreateBlog : Blog -> Id
UpdateBlog : Blog -> bool
ReadBlog : Id -> option<Blog>
DeleteBlog : Id -> bool
}
let private data =
let d = Dictionary()
d.[0] <-
{
Id = 0
Title = "Blog-1"
Date = System.DateTime.Now
Summary = "summary.."
Text = "text.."
}
d.[1] <-
{
Id = 1
Title = "Blog-2"
Date = System.DateTime.Now
Summary = "summary.."
Text = "text.."
}
d :> IDictionary<_,_>
let WithBlogs action =
lock data <| fun () ->
let d = data
action {
GetRecentBlogs = fun blog ->
d.Values
|> Seq.toArray
|> Array.rev
:> seq<_>
CreateBlog = fun blog ->
let k = d.Count
d.[k] <- { blog with Id = k }
k
UpdateBlog = fun blog ->
if d.ContainsKey blog.Id then
d.[blog.Id] <- blog
true
else
false
ReadBlog = fun id ->
match d.TryGetValue id with
| true, blog -> Some blog
| _ -> None
DeleteBlog = fun id ->
d.Remove id
}
module Client =
open IntelliFactory.WebSharper
open IntelliFactory.WebSharper.Formlet
[<Rpc>]
let CreateBlog blog =
Model.WithBlogs <| fun db ->
db.CreateBlog { blog with Date = DateTime.UtcNow }
[<Rpc>]
let UpdateBlog blog =
Model.WithBlogs <| fun db ->
db.UpdateBlog { blog with Date = DateTime.UtcNow }
[<Inline "document.location = $location">]
let Redirect (location: string) = X<unit>
type BlogControl(homeLocation: string, blog: option<Blog>) =
inherit Web.Control()
new () = new BlogControl("?", None)
[<JavaScript>]
override this.Body =
let (id, title, summary, text) =
match blog with
| None -> (0, "", "", "")
| Some b -> (b.Id, b.Title, b.Summary, b.Text)
let titleForm =
Controls.Input title
|> Enhance.WithTextLabel "Title"
|> Validator.IsNotEmpty "Required."
let summaryForm =
Controls.TextArea summary
|> Enhance.WithTextLabel "Summary"
|> Validator.IsNotEmpty "Summary is required."
let textForm =
Controls.TextArea text
|> Enhance.WithTextLabel "Text"
|> Validator.IsNotEmpty "Text is required"
Formlet.Yield (fun title summary text ->
{
Id = id
Date = DateTime.UtcNow
Title = title
Summary = summary
Text = text
})
<*> titleForm
<*> summaryForm
<*> textForm
|> Enhance.WithLabelConfiguration {
Layout.LabelConfiguration.Default with
VerticalAlign = Layout.VerticalAlign.Top
}
|> Enhance.WithSubmitAndResetButtons
|> Enhance.WithFormContainer
|> Formlet.Run (fun newBlog ->
match blog with
| None -> ignore (CreateBlog newBlog)
| Some _ -> ignore (UpdateBlog newBlog)
Redirect homeLocation)
module View =
open IntelliFactory.Html
open IntelliFactory.WebSharper.Formlet
let ( => ) a b =
A [HRef b] -< [Text a]
let Page title makeBody =
PageContent <| fun ctx ->
{ Page.Default with
Title = Some title
Body =
[
UL [
LI ["Home" => ctx.Link ShowRecentBlogs]
LI ["Post" => ctx.Link CreateBlog]
]
Div (makeBody ctx)
]
}
let ShowRecentBlogs (blogs: seq<Blog>) =
Page "News" <| fun ctx ->
[
yield H1 [Text "News"]
for b in blogs do
yield A [HRef (ctx.Link (ReadBlog b.Id))]
-< [H2 [Text b.Title]]
yield P [Text b.Summary]
yield
UL [
LI [P ["Edit" => ctx.Link (UpdateBlog b.Id)]]
LI [P ["Delete" => ctx.Link (DeleteBlog b.Id)]]
]
]
let ReadBlog (blog: Blog) =
Page blog.Title <| fun ctx ->
[
H1 [Text blog.Title]
P [Text blog.Summary]
P [Text blog.Text]
UL [
LI [P ["Edit" => ctx.Link (UpdateBlog blog.Id)]]
LI [P ["Delete" => ctx.Link (DeleteBlog blog.Id)]]
]
]
let BlogDeleted (blog: Blog) =
Page "Blog Deleted" <| fun ctx ->
[
P [
Text "Successfully deleted blog: "
Text blog.Title
]
]
let CreateBlog () =
Page "New Post" <| fun ctx ->
let home = ctx.Link Action.ShowRecentBlogs
[
new Client.BlogControl(home, None)
]
let UpdateBlog (blog: Blog) =
Page "Update" <| fun ctx ->
let home = ctx.Link Action.ShowRecentBlogs
[
new Client.BlogControl(home, Some blog)
]
let Controller =
let handle = function
| ShowRecentBlogs ->
let blogs = Model.WithBlogs <| fun db -> db.GetRecentBlogs()
View.ShowRecentBlogs blogs
| CreateBlog ->
View.CreateBlog ()
| ReadBlog id ->
let blog = Model.WithBlogs <| fun db -> db.ReadBlog id
match blog with
| Some blog ->
View.ReadBlog blog
| None ->
Content.NotFound
| UpdateBlog id ->
let blog = Model.WithBlogs <| fun db -> db.ReadBlog id
match blog with
| None -> Content.Redirect Action.CreateBlog
| Some blog -> View.UpdateBlog blog
| DeleteBlog id ->
let blog = Model.WithBlogs <| fun db ->
let blog = db.ReadBlog id
if blog.IsSome then
db.DeleteBlog id
|> ignore
blog
match blog with
| Some blog ->
View.BlogDeleted blog
| None ->
Content.ServerError
{ Handle = handle }
let Router : Router<Action> =
Router.Table [ShowRecentBlogs, "/"]
<|> Router.Infer()
let Main : Sitelet<Action> =
{
Controller = Controller
Router = Router
}
view raw BlogEngine.fs hosted with ❤ by GitHub

Friday, January 21, 2011

WebSharper sitelets: building a two-page website

Let me show the simplest possible self-contained example involving WebSharper sitelets that are coming with the 2.1 release. You define a website with two pages (and two actions):

type Action =
| Home
| AboutUs

Let us quickly mash up a template for the website using F#-embedded HTML combinators. A template is just a function taking the body and decorating it:

module View =
let ( => ) a b =
A [HRef b] -< [Text a]
let Page title body =
PageContent <| fun ctx ->
{
Page.Default with
Title = Some title
Body =
H1 [Text title]
::
UL [
LI ["Home" => ctx.Link Home]
LI ["AboutUs" => ctx.Link AboutUs]
]
::
body
}
view raw Sitelet.View.fs hosted with ❤ by GitHub

Two things to notice:

  1. F# lets you define your own syntax, and the example makes liberal use of that (=>).
  2. Instead of generating URLs by hand, you ask the context to create a link to the action. This ensures safety: renaming Home action makes the project stop compiling, and moving it to a different URL keeps the links correct.

Now, you define the sitelets:

module Site =
let HomePage =
View.Page "Home" [
Div [Text "Welcome to our website!"]
]
|> Sitelet.Content "/" Home
let AboutUsPage =
View.Page "About Us" [
Div [Text "TODO: describe us."]
]
|> Sitelet.Content "/about" AboutUs
let Main =
Sitelet.Sum [
HomePage
AboutUsPage
]
view raw Sitelet.Site.fs hosted with ❤ by GitHub

HomePage and AboutUsPage are single-page sitelets, with a single URL and a single Action. They are combined into a website by the Sum operator.

Now, a little bit of administrative boilerplate:

type Website() =
interface IWebsite<Action> with
member this.Actions = []
member this.Sitelet = Site.Main
[<assembly: WebsiteAttribute(typeof<Website>)>]
do ()

And you are done! Let’s browse:

s2 s1

So far so good. The pages have the expected URLs and the menu links work.

The above could have been accomplished by any reasonable web framework. Let us push the limit and spice it up a bit now. Let us add a few lines of F# that will actually compile to JavaScript:

module Client =
open IntelliFactory.WebSharper.Html
[<JavaScript>]
let Button label =
Button [Text label]
|>! OnClick (fun button _ ->
button.Text <- "CLICKED")
type ButtonControl(label: string) =
inherit Web.Control()
new () = new ButtonControl("unlabelled")
[<JavaScript>]
override this.Body = Button label :> _

So there is a button that changes its title when clicked. AND there is a control. Now, this while the Button lives on the client-side completely (constructs DOM nodes, in fact), the Control executes a quantum leap: the constructor executes on the server, and the body on the client. But that means you can use the Control to glue the server and the client together. Let us update the AboutUs page:

let AboutUsPage =
View.Page "About Us" [
Div [Text "TODO: describe us."]
Div [new Client.ButtonControl("Click me!")]
]
|> Sitelet.Content "/about" AboutUs

That’s it. The user will now see a clickable, JavaScript-enabled, F#-implemented button, right where you expect it. No script tags to worry about, no dependency chasing, no “ondocumentready” worries, it just works:

s2

Below is the complete listing. As soon as the 2.1 release becomes public, you will able to enjoy running it yourself. Stay tuned!

namespace WebSharperSiteletsProject
open System
open System.IO
open System.Web
open IntelliFactory.Html
open IntelliFactory.WebSharper
open IntelliFactory.WebSharper.Sitelets
type Action =
| Home
| AboutUs
module View =
let ( => ) a b =
A [HRef b] -< [Text a]
let Page title body =
PageContent <| fun ctx ->
{
Page.Default with
Title = Some title
Body =
H1 [Text title]
::
UL [
LI ["Home" => ctx.Link Home]
LI ["AboutUs" => ctx.Link AboutUs]
]
::
body
}
module Client =
open IntelliFactory.WebSharper.Html
[<JavaScript>]
let Button label =
Button [Text label]
|>! OnClick (fun button _ ->
button.Text <- "CLICKED")
type ButtonControl(label: string) =
inherit Web.Control()
new () = new ButtonControl("unlabelled")
[<JavaScript>]
override this.Body = Button label :> _
module Site =
let HomePage =
View.Page "Home" [
Div [Text "Welcome to our website!"]
]
|> Sitelet.Content "/" Home
let AboutUsPage =
View.Page "About Us" [
Div [Text "TODO: describe us."]
Div [new Client.ButtonControl("Click me!")]
]
|> Sitelet.Content "/about" AboutUs
let Main =
Sitelet.Sum [
HomePage
AboutUsPage
]
type Website() =
interface IWebsite<Action> with
member this.Actions = []
member this.Sitelet = Site.Main
[<assembly: WebsiteAttribute(typeof<Website>)>]
do ()

Thursday, January 20, 2011

Can WebSharper beat ASP.NET MVC at the routing game?

At work we have been having a lot of fun designing WebSharper sitelets lately. A quick recap: WebSharper is our web development framework that is all about cross-compiling F# to neatly interoperating client-side JavaScript and server-side .NET assemblies - a GWT of sorts. Sitelets are the recently introduced server-side abstraction, the new kid on the block where the bully is of course the notorious ASP.NET MVC.

ASP.NET MVC operation is all about controllers and actions: a request comes in, is matched to a controller/action, and then gets handled. At first glance it appears that controllers are classes and actions are methods. But NO! By introducing something they consider an improvement, the MVC team decoupled actions from methods and insists now that both controllers and actions are strings:

http://haacked.com/archive/2008/08/29/how-a-method-becomes-an-action.aspx

If you now want to construct a link to a particular action in a view, you have to know it by its string name. And what if it changes? Bummer, the project compiles fine and then fails at runtime.

This is particularly sad given that they used to offer a safer way:

http://weblogs.asp.net/scottgu/archive/2007/12/03/asp-net-mvc-framework-part-2-url-routing.aspx 

view raw ActionLink.cs hosted with ❤ by GitHub

While ASP.NET MVC fans wait for this feature to come back, we can do better with F#. The basic idea is that we model everything first-class: an action is just a value of a user-defined Action data type, and a controller is a value as well. Consider:

type Action =
| HomePage
| AboutUs
| BlogHomePage
| Blog of BlogId

Now, a controller is simply a function of type Action –> Response. And a website? A website, or a sitelet, as we call it in WebSharper, is a controller coupled with a router, where a router is a bijection from locations to actions.

While the controller is fun to write, the router is not. One tedious way to write it is to write two F# functions, one pattern-matching a location and returning an Action, and the other doing the reverse. For actions that can be listed in a finite list, a less tedious is to write a table with a 1:1 correspondence between locations and actions. But the easiest is to automate it all:

Router.Infer<'T>() : Router<'T>
view raw RouterInfer.fs hosted with ❤ by GitHub

This little function will automatically infer locations such as “/HomePage”, “/AboutUs”, “/Blog/123”... All the tedious work done, time to write the controller!

And how about safe links in the view? Nothing is easier:

A [Href (ctx.Link (Blog 5))] [Text "Next blog"]
view raw ViewExample.fs hosted with ❤ by GitHub

And if anything happens to change, F# compiler will make sure the links are still correct.

Sitelets offer a lot more than this, such as easy embedding of F#-defined client-side (JavaScript) controls or the pre-generation of the static part of the web application as a set of HTML and resource files. But none of it can beat the fun of designing routing the functional way.