Friday, December 30, 2011

Solving F# Async.StartChild Leak: Futures

I discovered a memory leak in Async.StartChild and here discuss a workaround based on a Future abstraction.

I noticed what appears to be a memory leak in F# standard library function Async.StartChild. This happened in a context of a socket server, where I attempted to perform socket reading and writing in parallel. It seems that memory use slowly grows and memory profiler points to some CancellationTokenSource-related objects not being released.

As a non-leaking alternative, I used my own abstractions. The basic idea is to use synchronizable events. Unfortunately Event is already used in F# to mean something different, so I will use the word Future instead. If you know F# events, the key problem is that subscribing to events after they happen is meaningless, for example this code procuces nothing:

let test () =
    let e = Event<_>()
    e.Trigger(1)
    e.Publish.Add(printf "%i")

In contrast, Future objects retain the value. For simplicity, I allow subscribing and triggering only once. In addition, the sample includes Executor code. I found by experimentation that running short-lived coordination tasks on a single thread instead of the ThreadPool is beneficial. Enjoy:

(*
Copyright (c) 2008-2011 IntelliFactory
GNU Affero General Public License Usage The code
is free software: you can redistribute it and/or
modify it under the terms of the GNU Affero
General Public License, version 3, as published by
the Free Software Foundation.
The code is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero
General Public License for more details at
<http://www.gnu.org/licenses/>.
If you are unsure which license is appropriate for
your use, please contact IntelliFactory at
<http://intellifactory.com/contact>.
See this blog for the discussion:
<http://tinyurl.com/fsharp-futures>
*)
#if INTERACTIVE
#else
namespace IntelliFactory.Examples
#endif
open System
open System.Threading
open System.Threading.Tasks
open System.Collections.Concurrent
type private FutureState<'T> =
| Computed of 'T
| Created
| Finalized
| Waiting of ('T -> unit)
[<Sealed>]
type Future<'T>() =
let root = obj ()
let transact f = lock root f ()
let mutable state : FutureState<'T> = Created
let await f =
transact <| fun () ->
match state with
| Computed x -> state <- Finalized; (fun () -> f x)
| Created -> state <- Waiting f; ignore
| Finalized -> invalidOp "Future is finalized."
| Waiting f -> invalidOp "Future is already waited on."
let provide value =
transact <| fun () ->
match state with
| Computed x -> invalidOp "Future is already provided."
| Created -> state <- Computed value; ignore
| Finalized -> invalidOp "Future is finalized."
| Waiting f -> state <- Finalized; (fun () -> f value)
let event = Async.FromContinuations(fun (k, _, _) -> await k)
member this.Await = event
member this.Provide(value) = provide value
[<Sealed>]
type Executor(?maxTaskCount, ?logError) =
let logError = defaultArg logError ignore
let mailbox =
let n = defaultArg maxTaskCount 128
new BlockingCollection<_>(ConcurrentQueue(), n)
let work () =
let mutable loop = true
while loop do
match mailbox.Take() with
| None -> loop <- false
| Some exec -> try exec () with e -> logError e
let task =
Task.Factory.StartNew(work,
TaskCreationOptions.LongRunning)
member this.Dispose() =
mailbox.Add(None)
mailbox.CompleteAdding()
task.Wait()
task.Dispose()
mailbox.Dispose()
member this.Fork(job: Async<'T>) =
let f = Future()
let work () =
Async.StartWithContinuations(job,
f.Provide, logError, logError)
this.Schedule(work)
f.Await
member this.Schedule(task) = mailbox.Add(Some task)
member this.TaskCount = mailbox.Count
interface IDisposable with
member this.Dispose() = this.Dispose()
#if INTERACTIVE
let test () =
use e = new Executor()
let task =
async {
let read = e.Fork(async { return stdin.ReadLine() })
do stdout.WriteLine("Waiting for input..")
return! read
}
Async.RunSynchronously task
#endif
view raw AsyncUtils.fs hosted with ❤ by GitHub

Symbol Interning in WebSharper 2.4

When WebSharper compiles F# to JavaScript it preserves namespaces, module and class nesting to make it easy to navigate the compiled code from JavaScript shell. Roughly speaking, A.B.C.D.E identifier in F# can be found by typing A.B.C.D.E in JavaScript.

This poses a challenge: as you can imagine, emitting long qualified identifiers everywhere is not a good idea for compact code generation. To save space WebSharper 2.4 does class/module interning. The basic idea is to say L=Microsoft.FSharp.Core.ListModule once and then say L.ofArray at every use site.

An example of this in action can be seen below:

namespace Website
module Client =
open IntelliFactory.WebSharper
open IntelliFactory.WebSharper.Html
[<JavaScript>]
let SayHello () =
JavaScript.Alert("HELLO!")
[<JavaScript>]
let MakeHelloDiv () =
SayHello ()
Div [ Text "HELLO THERE" ]
module Controls =
open IntelliFactory.WebSharper
type HelloControl() =
inherit Web.Control()
[<JavaScript>]
override this.Body = Client.MakeHelloDiv() :> _
view raw Client.fs hosted with ❤ by GitHub
(function()
{
var Global=this,Runtime=this.IntelliFactory.Runtime,Website,Client,WebSharper,Html,Default,List,alert;
Runtime.Define(Global,{
Website:{
Client:{
MakeHelloDiv:function()
{
Client.SayHello();
return Default.Div(List.ofArray([Default.Text("HELLO THERE")]));
},
SayHello:function()
{
return alert("HELLO!");
}
},
Controls:{
HelloControl:Runtime.Class({
get_Body:function()
{
return Client.MakeHelloDiv();
}
})
}
}
});
Runtime.OnInit(function()
{
Website=Runtime.Safe(Global.Website);
Client=Runtime.Safe(Website.Client);
WebSharper=Runtime.Safe(Global.IntelliFactory.WebSharper);
Html=Runtime.Safe(WebSharper.Html);
Default=Runtime.Safe(Html.Default);
List=Runtime.Safe(WebSharper.List);
return alert=Runtime.Safe(Global.alert);
});
Runtime.OnLoad(function()
{
});
}());
view raw Website.dll.js hosted with ❤ by GitHub
(function(){var $$=this,$=this.IntelliFactory.Runtime,a,b,c,d,e,f,g;$.Define($$,{Website:{Client:{MakeHelloDiv:function(){b.SayHello();return e.Div(f.ofArray([e.Text("HELLO THERE")]));},SayHello:function(){return g("HELLO!");}},Controls:{HelloControl:$.Class({get_Body:function(){return b.MakeHelloDiv();}})}}});$.OnInit(function(){a=$.Safe($$.Website);b=$.Safe(a.Client);c=$.Safe($$.IntelliFactory.WebSharper);d=$.Safe(c.Html);e=$.Safe(d.Default);f=$.Safe(c.List);return g=$.Safe($$.alert);});$.OnLoad(function(){});}());

Thursday, December 29, 2011

F# Frustration: More Async Memory Leaks

Can anyone explain why this code leaks memory?

let leak1 () =
let thread () =
async {
let x = Array.create 1024 0uy
return ()
}
|> Async.Start
async {
while true do
do thread ()
}
|> Async.RunSynchronously
let leak2 () =
let a1 () = async.Return()
let a2 () =
async {
let! complete = Async.StartChild(a1 ())
return! complete
}
|> Async.Start
let rec loop (k: int) =
if k > 0 then
do a2 ()
loop (k - 1)
loop 100000
stdout.Write("PRESS ANY KEY") // >200 Mb memory use, retained
System.Console.ReadLine()
|> ignore
view raw MemoryLeak.fs hosted with ❤ by GitHub

Looks like using Async.StartAsTask makes it complete in constant space.

Friday, December 23, 2011

Hacking Type Classes in F#

Recent FPish FPish discussion focused on some hacks available in F# to write code that resembles using Haskell type classes. I particularly enjoyed the comments by Gustavo Leon and Loic Denuziere.

To cut the long story short, before compiling to .NET F# expands methods delcared inline and does overload resolution. This was intended to support flexible operator overloading, but opens up the door for interesting hacks. Even code that generalizes over higher kinds and therefore cannot exist at .NET level can with these hacks exist at the inline F# level.

Although these remain hacks (ugly code, unreadable inferred types, fragility to changes when type inference stops working, difficulty in scaling to more complicated examples), I cannot help but enjoy it. Here is a writeup that infers JSON serialization automatically over lists at the inline level:

type Json =
| Array of list<Json>
| Double of double
| False
| Integer of int64
| Null
| Object of list<string*Json>
| String of string
| True
static member inline ToJson x : Json =
((^a or ^b) :
(static member ToJson : ^a * ^b -> Json) (Null, x))
static member ToJson(_: Json, x: int) = Integer (int64 x)
static member ToJson(_: Json, x: string) = String x
static member inline ToJson(_: Json, x: list<_>) =
Array (List.map Json.ToJson x)
static member inline FromJson(x: Json) =
((^a or ^b) : (static member FromJson : ^a * ^b -> ^a)
(Unchecked.defaultof<_>, x))
static member FromJson(_: int, x: Json) =
match x with
| Integer x -> int x
| _ -> invalidArg "x" "Conversion failed."
static member FromJson(_: string, x: Json) =
match x with
| String x -> x
| _ -> invalidArg "x" "Conversion failed."
static member inline FromJson(_: list<_>, x: Json) =
match x with
| Array xs -> [for x in xs -> Json.FromJson x]
| _ -> invalidArg "x" "Conversion failed."
type Person =
{
age: int
name: string
}
static member ToJson(_: Json, x: Person) =
Object [
"age", Integer (int64 x.age);
"name", String x.name
]
static member FromJson(_:Person, j: Json) =
match j with
| Object ["age", Integer age; "name", String name] ->
{ age = int age; name = name }
| _ ->
failwith "Conversion failed."
let test =
let p = {name="Vladimir Putin"; age=59}
let r : list<Person> = Json.FromJson(Json.ToJson [p; p; p])
r
view raw JsonClasses.fs hosted with ❤ by GitHub

Thursday, December 15, 2011

Making Async 5x Faster

In this article I discuss why F# Async is a good thing for writing concurrent software on .NET and show how to implement your own Async specialized for low-concurrency use. As a sample application, I look at a simple CML-style blocking channel. 30-50 lines of custom async and threadpool implementation increase the throughput from 100-400 K to 1M messages a second.

Concurrency? Again?

It is hard to believe that after so many man-years of computer science research anyone would still have quetions about how to write concurrent software. But I do, if only out of ignorance.

My specific problem right now is writing a decently scalable web server. It is really nothing more than developing a sensible bridge API between a web application and the OS, with a bit of parsing thrown in. All heavy lifting is already done by IO completion ports available to .NET code through System.Net.Socket and SocketAsyncEventArgs types.

Blocking and Non-Blocking IO

In a web server, there are two flows of data: one flow comes from the client via the server to the application, and the other goes in the reverse direction. Imperative programming answer to this the System.IO.Stream object which your app uses to read and write. It is easy to use, but there are problems.

The first obvious problem is that writes cannot really block. If you attempt to write some data, the client socket may not be ready. In fact, it can not become ready for a while. Making the thread wait for it would be lovely, but .NET threads are expensive. Not a good idea if one is to cope with thousands of concurrent clients.

Fine, how about Non-Blockig IO? It seems to solve the problem as application code resumes immediately. But what then, where does the data go? We need a buffer of unbounded size somewhere to hold the data. What happens here is the application receives no feedback when it produces data too quickly. A pathological case would be generating an 8G file.

Reading from a System.IO.Stream poses even more problems. What if the data is not there yet? Blocking wastes a thread, not blocking makes the application retry but requires programming arbitrary timeouts or busy-waits. In light of this it seems much more reasonable to invert control and have the server write data to the application instead as it arrives.

First-Class Continuations

From now on let us consider designing a writer that does not block but also does not require an unbounded buffer, throttling the data producer if the consumer does not keep up. This is an instance of the classic tradeoff between asynchronous and synchronous channels.

Ideally the language would support efficient first-class continuations. Then one would reify the continuation and shove it somewhere, to resume computation later when the client is ready:

(call-with-current-continuation save-for-later)

This, as far as I understand, is indeed how John Reppy's Concurrent ML was originally implemented. CML first-class channels are unbuffered: you can read and write on them, but both writers and readers block until they are matched. Sounds good, but is this not back to blocking threads again?

Enter Async

The answer greatly depends on what you define as a thread. CML used light-weight threads built on first-class continuations, benchmarking at millions of synchronizations per second. Without first-class continuations, what can .NET do? The answer is F# async: we can model threads as chains of callbacks, then they become first-class and capturable.

As a first cut, let us forget about Async. Just a simple channel that accepts continuation functions with its Read and Write operations. If a match is available, the continuation is invoked immediately, otherwise it is queued for later. Locking is designed to minimize the critical section region:

[<Sealed>]
type Channel<'T>() =
let readers = Queue()
let writers = Queue()
member this.Read ok =
let task =
lock readers <| fun () ->
if writers.Count = 0 then
readers.Enqueue ok
None
else
Some (writers.Dequeue())
match task with
| None -> ()
| Some (value, cont) ->
spawn cont
ok value
member this.Write(x: 'T, ok) =
let task =
lock readers <| fun () ->
if readers.Count = 0 then
writers.Enqueue(x, ok)
None
else
Some (readers.Dequeue())
match task with
| None -> ()
| Some cont ->
spawn ok
cont x
member inline this.Read() =
Async.FromContinuations(fun (ok, _, _) ->
this.Read ok)
member inline this.Write x =
Async.FromContinuations(fun (ok, _, _) ->
this.Write(x, ok))
view raw Channel.fs hosted with ❤ by GitHub

Note: that the code assumes a spawn : (unit -> unit) -> unit function that forks of the computation on a thread pool. It is just a form of trampolining: one could call it immediately but that makes it a non-tail call and can lead to stack overflow. You could probably work around the need to do this..

Benchmarking Message-Passing

Now let us check out the implementation with F# Async. Note that it is very easy to write adapters to Read and Write methods on the channel to give them an Async signature. F# Async syntax also makes it very easy to define new "threads" - these are exactly what they are conceptually. Although async code may jump between real .NET threads, it is logically a single thread and allows, for example, to safely use single-threaded data structures.

For the benchmark we define two async chains: one pushes a series of numbers on the channel and another one reads them and computes their sum. Then we run them in parallel.

let test (n: int) =
let chan = Channel()
let rec writer (i: int) =
async {
if i = 0 then
return! chan.Write 0
else
do! chan.Write i
return! writer (i - 1)
}
let rec reader sum =
async {
let! x = chan.Read()
if x = 0
then return sum
else return! reader (sum + x)
}
Async.Start(writer n)
let clock = System.Diagnostics.Stopwatch()
clock.Start()
let r = Async.RunSynchronously(reader 0)
stdout.WriteLine("Hops per second: {0}",
float n / clock.Elapsed.TotalSeconds)
r
view raw Test.fs hosted with ❤ by GitHub

Great. I have been getting 100, 200, sometimes if I am very lucky up to 400 K hops on this benchmark. That sounds like a good enough result at it might be for some applications. However, it looks like the benchmark is using all 4 of my cores. In fact, they seem to spend a jolly lot of time coordinating an essentially sequential computation.. More on this later.

Async is The Answer

Performance issues aside for the moment, the take home message I want you to get is this: Async is the answer for how to write concurrent code on .NET. The writer interface I talked about earlier would essentially be a form of channel passing byte array segments. The key is coordination: Async makes it easy to write and coordinate code, and does not block .NET-level threads.

But.. How About Performance?

Sceptics would say Async is a performance hog here. Just think of all the callbacks it allocates, the GC pressure, the thread hopping and work stealing, and what not! SocketAsyncEventArgs examples out there seem to offer plenty of evidence that one can design socket servers that do not allocate at all, somehow chaining the components together with preallocated callbacks.

I thought a lot about this and even tried to write some of those optimized solutions. Upon reflection, the criticism is probably valid but it is more a comment about the implementation, not the concept. Async provides a general, easy to read, understand and proofread way to orchestrate concurrent code: I simply do not see a viable general alternative. Attempting to specialize code by hand seems to yield impossible to maintain and fix spaghetti. If you know a competitive way to structure such code sensibly I will be very happy to hear about it.

For Real Performance Junkies

For real performance junkies like me, the lesson is this: if you are unhappy with the peformance of beautiful Async code, before rewriting it to spaghetti consider specializing Async implementaion. In fact, try to get the F# compiler expand the code to spaghetti instead of doing it manually. F# is not as good as GHC at this, but its inline facility does seem to help in this.

What exactly do we want from Async here? Not much. Throw cancellation and exception handling out the window. Throw thread hopping and parallelism out the window. Assume that all our callbacks execute quickly and without exceptions. Inline everything in hope that F# will reduce at least some of the intermediate lambdas at compile time, and lower GC pressure. With this in mind I got something like this:

type Async<'T> = ('T -> unit) -> unit
[<Sealed>]
type Async() =
static let self = Async()
member inline this.Return(x: 'T) : Async<'T> =
fun f -> f x
member inline this.ReturnFrom(x: Async<'T>) = x
member inline this.Bind
(x: Async<'T1>, f: 'T1 -> Async<'T2>) : Async<'T2> =
fun k -> x (fun v -> f v k)
static member inline Start(x: Async<unit>) =
Pooling.Pool.Spawn(fun () -> x ignore)
static member inline RunSynchronously(x: Async<'T>) : 'T =
let res = ref Unchecked.defaultof<_>
let sem = new System.Threading.SemaphoreSlim(0)
Pooling.Pool.Spawn(fun () ->
x (fun v ->
res := v
ignore (sem.Release())))
sem.Wait()
!res
static member inline FromContinuations
(f : ('T -> unit) *
(exn -> unit) *
(System.OperationCanceledException -> unit)
-> unit) : Async<'T> =
fun k -> f (k, ignore, ignore)
let async = Async()
view raw Async.fs hosted with ❤ by GitHub

The idea is that since we are not using all the features of F# async, our specialized implementation will be a better fit. Incidentally, the same applies to the thread pool. Instead of scheduling many tasks let us just run one (you can have N if you want) threads dedicated to running these short-lived callbacks in a tight loop. Forget work stealing, cancellation, and so on: our objective is to maximize the peformance of the mostly-sequential processes. Here is my humble attempt:

open System.Collections.Concurrent
open System.Threading.Tasks
type Pool private () =
let queue = new BlockingCollection<_>(ConcurrentBag())
let work () =
while true do
queue.Take()()
let long = TaskCreationOptions.LongRunning
let task = Task.Factory.StartNew(work, long)
static let self = Pool ()
member private this.Add f = queue.Add f
static member Spawn(f: unit -> unit) = self.Add f
view raw ThreadPool.fs hosted with ❤ by GitHub

Drumroll..

With these changes, I managed to get up to 1M hops. A lot better. The key observation is that having a single thread in the pool is critical. As more threads fight for work, performance quickly decreases: the process is essentially sequential and likes to run as such. Inlining in async also seems to help but just a little. Having a simpler async helps too but the vanilla one can be tolerated.

Is this result relevant to web servers? I hope so. It seems to me that webservers run mostly-sequential processes, albeit a lot of them in parallel. Dumping all those on a shared pool would make the work jump around a lot and probably be detrimental to throughput. Is specializing Async necessary to help here? Probably not. I will be glad to hear of ways to set thread or core affinity or something like that to get similar improvements.

In conclusion, Async is the right tool for concurrent code, but you might want to tailor it a bit to your application's need.

Full code of the article:

(*
Copyright (c) 2008-2011 IntelliFactory
GNU Affero General Public License Usage The code
is free software: you can redistribute it and/or
modify it under the terms of the GNU Affero
General Public License, version 3, as published by
the Free Software Foundation.
The code is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU Affero
General Public License for more details at
<http://www.gnu.org/licenses/>.
If you are unsure which license is appropriate for
your use, please contact IntelliFactory at
http://intellifactory.com/contact.
See this blog for the discussion:
http://t0yv0.blogspot.com/2011/12/making-async-5x-faster.html
*)
#if INTERACTIVE
#else
namespace IntelliFactory.Examples
#endif
open System
open System.Collections.Concurrent
open System.Collections.Generic
open System.Threading
open System.Threading.Tasks
[<Sealed>]
type Pool private () =
let queue = new BlockingCollection<_>(ConcurrentBag())
let work () = while true do queue.Take()()
let long = TaskCreationOptions.LongRunning
let task = Task.Factory.StartNew(work, long)
static let self = Pool()
member private this.Add f = queue.Add f
static member Spawn(f: unit -> unit) = self.Add f
[<AutoOpen>]
module FastAsync =
type Async<'T> = ('T -> unit) -> unit
[<Sealed>]
type Async() =
member inline this.Return(x: 'T) : Async<'T> =
fun f -> f x
member inline this.ReturnFrom(x: Async<'T>) = x
member inline this.Bind
(x: Async<'T1>, f: 'T1 -> Async<'T2>) : Async<'T2> =
fun k -> x (fun v -> f v k)
static member inline Start(x: Async<unit>) =
Pool.Spawn(fun () -> x ignore)
static member inline RunSynchronously(x: Async<'T>) : 'T =
let res = ref Unchecked.defaultof<_>
use sem = new SemaphoreSlim(0)
Pool.Spawn(fun () ->
x (fun v ->
res := v
ignore (sem.Release())))
sem.Wait()
!res
static member inline FromContinuations
(f: ('T -> unit) *
(exn -> unit) *
(OperationCanceledException -> unit) -> unit)
: Async<'T> =
fun k -> f (k, ignore, ignore)
let async = Async()
[<Sealed>]
type Channel<'T>() =
let readers = Queue()
let writers = Queue()
member this.Read ok =
let task =
lock readers <| fun () ->
if writers.Count = 0 then
readers.Enqueue ok
None
else
Some (writers.Dequeue())
match task with
| None -> ()
| Some (value, cont) ->
Pool.Spawn cont
ok value
member this.Write(x: 'T, ok) =
let task =
lock readers <| fun () ->
if readers.Count = 0 then
writers.Enqueue(x, ok)
None
else
Some (readers.Dequeue())
match task with
| None -> ()
| Some cont ->
Pool.Spawn ok
cont x
member inline this.Read() =
Async.FromContinuations(fun (ok, _, _) ->
this.Read ok)
member inline this.Write x =
Async.FromContinuations(fun (ok, _, _) ->
this.Write(x, ok))
module Main =
let test (n: int) =
let chan = Channel()
let rec writer (i: int) =
async {
if i = 0 then
return! chan.Write 0
else
do! chan.Write i
return! writer (i - 1)
}
let rec reader sum =
async {
let! x = chan.Read()
if x = 0
then return sum
else return! reader (sum + x)
}
Async.Start(writer n)
let clock = System.Diagnostics.Stopwatch()
clock.Start()
let r = Async.RunSynchronously(reader 0)
stdout.WriteLine("Hops per second: {0}",
float n / clock.Elapsed.TotalSeconds)
r
[<EntryPoint>]
let main args =
test 1000000
|> printfn "Result: %i"
0
#if INTERACTIVE
#time
Main.test 1000000
#endif

Wednesday, December 7, 2011

Coq Trivia: Dependent Pattern-Matching and Inversion

I found some more time to study Coq. One place where I stumble very frequently is case analysis of value-indexed inductive types. There are often cases that lead to contradiction. Other cases intuitively imply some variables to be equal. Vanilla match constructs gives you no help, leaving you stranded. In proof mode, inversion tactic helps a lot. However, generated proofs are huge and hard to navigate. Another option is to use dependent pattern matching directly (which is what inversion generates for you).

I was eager to figure this out because there are times when proof mode is too confusing for me as a beginner or I seem to lack control defining precisely what I want. In particular, if you do code extraction, you do not want to use inversion-generated convoluted code. But up until today I had little success. Here is the first, and very modest, success story: retrieving the head of a non-empty length-indexed list, and discharging the Nil case with dependent pattern-matching:

(* Consider lists indexed by length. *)
Inductive List (t : Type) : nat -> Type :=
| Nil : List t 0
| Cons : forall n, t -> List t n -> List t (S n).
(* Every non-empty list has a head element. *)
Definition Head : forall t n, List t (S n) -> t.
intros t n list. inversion list. assumption. Defined.
(* In the proof, "inversion" tactic saved the day, eliminating the
"Nil" case. However, the body of the proof term is not very
pretty. Consider: *)
Print Head.
(*
Head =
fun (t : Type) (n : nat) (list : List t (S n)) =>
let X :=
match list in (List _ n0) return (n0 = S n -> t) with
| Nil =>
fun H : 0 = S n =>
let H0 :=
eq_ind 0 (fun e : nat => match e with
| 0 => True
| S _ => False
end) I (S n) H in
False_rect t H0
| Cons n0 X X0 =>
fun H : S n0 = S n =>
let H0 :=
let H0 :=
f_equal (fun e : nat => match e with
| 0 => n0
| S n1 => n1
end) H in
eq_rect n (fun n1 : nat => t -> List t n1 -> t)
(fun (X1 : t) (_ : List t n) => X1) n0 (eq_sym H0) in
H0 X X0
end in
X (eq_refl (S n))
: forall (t : Type) (n : nat), List t (S n) -> t
*)
(* Whoah, what a proof! Can we do better manually? Yes, if we find a
way to discharge the "Nil" case. Dependent pattern-matching to the
rescue: *)
Definition HeadManual t n (list: List t (S n)) : t :=
match list in List _ k return match k with
| O => unit
| S _ => t
end with
| Nil => tt
| Cons _ x _ => x
end.
(* This worked out because "O" case never matches, that is, we
exploited the contradiction and were allowed to return unit. The
automated proof was more explicit about it, deriving an arbitrary
value from "0 = S n" which implies "False". This example is very
simple and in a sense we were lucky here. For more involved worked
examples see this CPDT chapter:
http://adam.chlipala.net/cpdt/html/DataStruct.html *)

Monday, December 5, 2011

Stack-Allocated Lists

Don Syme and several others sent tweets to correct my claims about boxing in F#. I made the mistake of assuming that structs cannot be used as interface implementations without being boxed and allocated on the heap. As it turns out, .NET constraints allow structs to implement interfaces without being boxed. Out of sheer curiosity, I attempted to (ab)use this facility to systematically remove heap allocation. I decided to write a simple functional program that would not use the heap at all, performing all computation on the stack.

I went for encoding lists and implementing list reversal:

type IList<'T> =
abstract member Tail : #IListConsumer<'T,'R> -> 'R
abstract member Head : 'T
abstract member IsEmpty : bool
and IListConsumer<'T,'R> =
abstract member Consume : #IList<'T> -> 'R
[<Struct>]
type Nil<'T> =
interface IList<'T> with
member this.Head = failwith "Empty"
member this.IsEmpty = true
member this.Tail k = failwith "Empty"
[<Struct>]
type Cons<'T,'L when 'L :> IList<'T>>(head: 'T, tail: 'L) =
interface IList<'T> with
member this.Head = head
member this.IsEmpty = false
member this.Tail k = k.Consume tail
let reverse list =
let rec rev acc list =
match list with
| [] -> acc
| x :: xs -> rev (x :: acc) xs
rev [] list
[<Struct>]
type ListSum(sum: int) =
member this.Consume(list: #IList<int>) =
if list.IsEmpty then sum else
list.Tail (ListSum(sum + list.Head))
interface IListConsumer<int,int> with
member this.Consume list = this.Consume list
[<Struct>]
type ListPrinter<'T> =
interface IListConsumer<'T,int> with
member this.Consume list =
if list.IsEmpty then 0 else
stdout.WriteLine list.Head
list.Tail this
type Lists =
static member Reverse<'R,'T,'L1,'L2,'K when 'L1 :> IList<'T>
and 'L2 :> IList<'T>
and 'K :> IListConsumer<'T,'R>>
(acc: 'L1) (list: 'L2) (k: 'K) =
if list.IsEmpty
then k.Consume acc
else list.Tail (ListReverser(list.Head, acc, k))
static member Sum (list: #IList<int>) =
ListSum(0).Consume(list)
static member ReverseSum (list: #IList<int>) =
Lists.Reverse (Nil ()) list (ListSum(0))
static member ReversePrint list =
Lists.Reverse (Nil ()) list (ListPrinter())
and [<Struct>] ListReverser<'T,'R,'L,'K when 'L :> IList<'T>
and 'K :> IListConsumer<'T,'R>>
(head: 'T, acc: 'L, k: 'K) =
interface IListConsumer<'T,'R> with
member this.Consume tail = Lists.Reverse (Cons (head, acc)) tail k
#time
let r = ref 0
!r
for i in 1 .. 10000000 do
let list = Cons(1, Cons(2, Cons(3, Cons (4, Cons (5, Nil())))))
r := !r + Lists.ReverseSum list
for i in 1 .. 10000000 do
let list = [1; 2; 3; 4; 5]
r := !r + List.sum (List.rev list)


This method for getting rid of heap allocation would not work in C as this particular use of generics implies runtime code generation. As far as I understand, the CLR here generates new code paths for every possible list length.

My little benchmark has stack-allocated lists of 5 elements each being reversed a bit slower than heap-allocated lists. But it is fun to try. FSI reports zero GC usage.