ha4: An F# Web Server From Sockets and Up

Monday, November 7, 2011

An F# Web Server From Sockets and Up

I have implemented a simple web server in F#. The idea was to try to marry .NET asynchronous socket operations with F# async. Result: F# async seems to be the right tool for the job of webserver implementation: it makes asynchronous programming intuitive without adding too much performance overhead. The server executes 3500 keep-alive or 1000 normal request per second on my Core i3 machine, compared to 2500/500 requests per second using IIS or System.Net.HttpListener.

Asynhronous Socket Operations

Working with sockets in .NET is done with the Socket class. From the MSDN documentation, the recommended approach is to use the asynchronous methods such as AcceptAsync, SendAsync and ReceiveAsync. These methods register callbacks to be executed when data arrives or ships through the socket. As a result of the callback approach, no threads are blocked by slow connections.

Sockets and F# Async

Unfortunately, the default interface is not very intuitive. The example code is atrocious. Since the operations are callback-based, this seems like a good match for
F# async. I went for the first mapping that came to mind:

Implementing this interface is easy - it is just working around the boilerplate: creating a SocketAsyncEventArgs object, registering a callback, calling the method, checking the result for errors. I was able to express all of it in a single helper method:

Optimizations

It seems that the common optimization paths include pooling sockets, pooling SocketAsyncEventArgs, and pooling buffers to prevent memory fragmentation. The latest point is the most interesting. Socket code itself is written in unmanaged C and passing data between garbage-collected .NET code and C code is done by pinning the .NET array used as a buffer. A pinned array is never relocated by the garbage collector, so the C code
has no trouble finding it. A lot of pinned arrays to work around make garbage collector's job harder - memory gets fragmented.

To avoid fragmentation issues, instead of allocating a lot of small arrays I allocate one huge array and then lease sections of it to be used as buffers by the socket operations.

I have not yet tried pooling `Socket` or `SocketAsyncEventArgs` objects in a similar manner.

Benchmarks

For benchmarking I have used Apache Bench (ab) tool running on Arch Linux inside a VirtualBox VM. All benchmarks involved dynamically generating and serving a "HELLO, WORLD" document on my Core i3 laptop, with ab -k -c 1000 -n 10000:

Server	Keep-alive r/s	Regular r/s
F# WebServer	3500	1000
Haskell warp/wai GHC 7	3500	3500
IIS	2500	500
System.Net.HttpListener	?	500
node.js (Windows)	800	400
node.js (Linux)	?	3000

I do not feel very good about these numbers, in particular because I have seen claims of Haskell WARP doing 90000 r/s on only slightly faster hardware (8-core Core i5). It may be that I am hitting VirtualBox networking overhead or I have not built the Haskell code with proper flags.

But for what they are worth, the numbers seem to indicate that F# async is a good enough foundation for web servers with performance in the IIS league. It does not need to be faster, it just needs to be good enough. The real advantage is that F# async code is tremendously easier to read and write than explicit callback code.

EDIT: Please do take the benchmarks with a grain of salt. They are far from comprehensive or correctly done.

9 comments:

RyanNovember 8, 2011 at 11:43 AM
Anton, why don't you join our effort on fracture? We already have a SocketAsyncEventArgs pool, which should help with both memory and the waste of constantly adding and removing event handlers. That should improve the F# performance, especially in the areas of memory and GC shred.
ReplyDelete
Replies
RyanNovember 8, 2011 at 11:46 AM
I hadn't even considered using Async.FromContinuations. This is what I had tried, though there exists a gap between calling the async method and hooking up the event handler: https://github.com/fsharp/fsharpx/commit/69594bcb69c532edeb67cd8cd1f6d14992f7fe4c
ReplyDelete
Replies
Dave ThomasNovember 8, 2011 at 12:24 PM
There may be a slight performance defecit with that, we need to be careful the whole point go SAEA is to avoid the fragmentation of AsyncResult objects. Micro optimisations are key in a high usage areas.
ReplyDelete
Replies
Dave ThomasNovember 8, 2011 at 12:27 PM
Ryan, Anton, Just for reference Fracture roughly matches the keep alive rate with its Hello World response implementation, thats with header and body parsing too.
ReplyDelete
Replies
UnknownNovember 8, 2011 at 12:38 PM
@ryan, one (bad) reason is that I am pathetically unable to work with
other people :) Also we have slightly different requirements for the
webserver than Fracture currently has (such as fewer dependencies,
.NET 3.5).

Concerning Async.FromContinuations, my first guess is that it does not
really matter if you use that or you use your approach. Probably has
roughly the same performance. I give it a try.

@dave, concerning fragmentation, do SAEA objects get pinned or just
the buffers? If only the buffers, is there any reason to pool these
SAEA objects? Is it just to avoid GC pressure for allocating
short-lived objects?

Good to know about Fracture. I did expect it to have about the same
performance and slightly faster (within 10x). I am not really doing
anything better, if anything I do fewer optimizations, and less work.
But I do parse the headers and the body.
ReplyDelete
Replies
Dave ThomasNovember 8, 2011 at 1:04 PM
The pool of SAEA object is mainly there to share pre allocated buffers and callbacks into the infrastructure.

Going forward the plan is to put in an iteratee based parser (If Ryan gets a move on :-) ) and a pipeline based processing mechanism.
ReplyDelete
Replies
UnknownNovember 8, 2011 at 1:11 PM
Right. I guess I share the buffers but not the callbacks. For all I care, let GC gen-0 do the job :)

Good luck with the iteratee parsing, can be fun.

For the use cases I envision in WebSharper it would be overkill though, I think we are fine with conventional parsing - pushing request head into a MemoryStream, and another Stream for the body if necessary.

I do give the hosted applications a chance to write the response inside an async loop. On the request side, I cannot think of an app that would want to consume the request asynchronously.
ReplyDelete
Replies
Ed KorsbergNovember 9, 2011 at 9:59 AM
First off, thank you for posting this. I am still learning F# but so far many of the example programs I have seen do not seem particularly different from other .NET approaches. We all know that VB.NET and C# are so similar that there are plenty of tools that transform VB<->C#

My comment is briefly reviewing the above F# code is that the equivalent code could be written in C# and it would look surprising similar in my opinion but I admit to not having tried this. Would you agree to my assessment of this VB<->C#<->F# transliteration?
ReplyDelete
Replies
UnknownNovember 9, 2011 at 10:24 AM
Hi Ed, I am by no means an expert on C#/Vb so I am not sure I am qualified to answer this, but for what it is worth, I think that while C# and VB make it *possible* to express exactly the same computational process, F# Async makes for a syntax that is the easiest to read and to write.

F# async syntax is about as close as it gets to being perfect for this task. See, for example, the main loop of the server in this gist. This nice sequential-looking code de-sugars to callback code with error handling and Disposable pattern support.

In old C# this was a pain to write. The new C# has some sort of async syntax that I am not very familiar with that might get close. I have not heard of VB having anything like this.
ReplyDelete
Replies

Add comment