Saturday, March 14, 2009

the ultimate language

Periodically, a manager will mandate that I perform my work using some particular language or technique (buzzword). I usually comply by writing (adapting) a translator from SCM to that language -- Aubrey Jaffer

Friday, March 13, 2009

ANN: open lecture series in algorithms in Kyiv

The lecture series will be based on Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein, with supplementary materials from MIT-OCW.

Instructors: Oleg Smirnov, Ivan Veselov

The schedule is to be confirmed yet.

First lecture: Saturday, March 14, 17:00

Venue: G-Club (courtesy of GlobalLogic), Bozhenko 86D, Kyiv, Ukraine

Mailing list (russian): kiev-clrs/googlegroups.com

Sunday, March 8, 2009

Pandoc. Can we use it ouside of Haskell?

Pandoc is great. It a Markdown implementation on steroids, supporting a number of different input/output formats besides HTML, such as TeX. And yet it is Markdown: the source still looks great, just like plain email! As an extra bonus, Pandoc, being written in Haskell and compiled to native code, works a lot faster than many alternative markup language processors.

The problem is.. It is a Haskell library. If it is indeed the great software it seems to be, one would want to use it from scripting languages (PHP, Python, Perl), OCaml, Schemes, Java, .NET and so on.

The first step towards that is to bind it to C.

Good news: yes, it is possible, and Pandoc DLL or Shared Object builds on both Windows and Linux. It is callable from other languages (currently C and PLT Scheme).

Bad news: the binary is large, as GHC compiles everything, including its runtime, into it. The binding is untested, poorly configured and very incomplete. But, well, it is open-source and will get better with time. You can see for yourself: libpandoc

Saturday, March 7, 2009

learning to coax haskell with phantom types

The problem: write a function maxBits that will return 8 for Word8, 16 for Word16, 32 for Word32, and so on. Actually this is a function from types to values. The best I came up with is below:
maxBits :: (Bits a, Bounded a) => a -> Int 
maxBits x = bitSize m where 
    m = maxBound
    _ = x == m
Note the last line. It constrains x and m to be of the same type. More generally, it could be done for example thus:
sameType :: a -> a -> ()
sameType = undefined

maxBits :: (Bits a, Bounded a) => a -> Int 
maxBits x = bitSize m where 
    m = maxBound
    _ = sameType x m
With the maxBits we can do the following, which was what I was after.
toBits :: (Bounded a, Bits a) => a -> [Bool]
toBits x = [testBit x i | i <- [0 .. maxBits x - 1]] 

ofBits :: Bits a => [Bool] -> a
ofBits bits = foldr f 0 bits where
    f True n  = shiftL n 1 + 1
    f False n = shiftL n 1

propWord8Bits :: Int -> Bool
propWord8Bits x = y == ofBits (toBits y) where
    y :: Word8
    y = fromIntegral (x `mod` 256)

propWord16Bits :: Int -> Bool
propWord16Bits x = y == ofBits (toBits y) where
    y :: Word16
    y = fromIntegral (x `mod` 256)

propWord32Bits :: Int -> Bool
propWord32Bits x = y == ofBits (toBits y) where
    y :: Word16
    y = fromIntegral (x `mod` 256)
However, from the theoretical viewpoint, I am rather uncertain I understand the picture completely. Suppose there are uninhabited types:
data A
data B
Let's give them some properties:
class TShow a where
  showType :: string
But this is rejected by the compiler as showType does not mention a... Is there a way around? Yes. For example:
data Shown a = Shown String

class TShow a where
  showType :: Shown a

instance TShow A where
  showType = Shown "A"

instance TShow B where
  showType = Shown "B"
But when we want to write a function tshow that accepts a type as an argument, we have to either use showType and pass that argument in the return type (yes, lovely Haskell permits passing information through the return type), or make believe the type is inhabited, but to do it well we need to constrain the type of showType, which is tricky:
tshow :: TShow a => a -> String
tshow a = s where
    shown   = showType
    Shown s = shown                           
    constrain :: a -> Shown a -> ()
    constrain = undefined
    _ = constrain a shown
Then, we can use tshow like this:
tshow (undefined :: A)
Does it makes sense? Hm, not really. Passing the type through the return type makes more sense:
let (Shown x) = (showType :: Shown A) in x
is a better design.. So something like this should do instead:
data MaxBits a = MaxBits Int

maxBits :: (Bits a, Bounded a) => MaxBits a 
maxBits = r m where 
    m  = maxBound
    r :: Bits a => a -> MaxBits a
    r m = MaxBits (bitSize m)
I am still having this cognitive difficulty coaxing GHC to type expressions like maxBits. It does not seem natural. Maybe I am missing something beautiful right under my nose?

Friday, March 6, 2009

F#: Haskellers beware!

Equational reasoning may fail you. F# land is the land where nuclear missiles are launched all the time, and puny gnomes carry wands of death.
#light

let g = new System.Random()
let s = Seq.init_infinite (fun _ -> g.NextDouble())

/// You thought 'a seq is Haskell [a]? Not quite! It does not memoize:
printfn "%A" (Seq.hd s, Seq.hd s) (* Surprise! fst and snd differ! *)


type X = 
    static member X = g.NextDouble()

/// You thought X is a value? Not quite! It is a property (think function):
printfn "%A" (X.X, X.X) (* Surprise! fst and snd differ! *)

Thursday, March 5, 2009

F# interface implementation - is `unit` special?

#light

type 'a A = 
  interface
    abstract member A : unit -> 'a
  end

let x = { new int  A with member x.A() = 1 }  // ALLOWED
let y = { new unit A with member x.A() = () } // ERROR?

Cyrillic typesetting in LaTeX on Arch

Write foo.tex:
\documentclass{article}
\usepackage[T2A]{fontenc}
\usepackage[utf8x]{inputenc}
\usepackage[russian]{babel}
\usepackage{ucs}

\begin{document}
Привет, мир!
\end{document}

Then pdflatex foo.tex.

Arch package required: texlive-langcyrillic.

Wednesday, March 4, 2009

F# Quirks

A feature or a bug?

An innocent-looking pair of parentheses ends up influencing the runtime representation of discriminated union types.

I would prefer not to have this. The two representations seem conceptually redundant, it would be better to just pick the faster one.

Given this foo.fs:

#light

open Microsoft.FSharp.Reflection

type A1 = A1 | B1 of (A1 * A1)

type A2 = A2 | B2 of A2 * A2


printfn "%A" ((FSharpType.GetUnionCases(typeof<A1>)
              |> Array.map (fun x -> x.GetFields())));;

printfn "%A" ((FSharpType.GetUnionCases(typeof<A2>)
              |> Array.map (fun x -> x.GetFields())));;

Run fsi --quiet --exec foo.fs to produce this:

[|[||]; [|Microsoft.FSharp.Core.Tuple`2[FSI_0001.Foo+A1,FSI_0001.Foo+A1] B11|]|]
[|[||]; [|FSI_0001.Foo+A2 B21; FSI_0001.Foo+A2 B22|]|]

I found this code while testing a library that produced different results depending on the presence or absence of the parentheses.

Update: A feature! Thanks to Zedd. I should know my specs and my OCaml:

Objective Caml version 3.10.2
# type c = C of int * int;;
type c = C of int * int
# match C(1, 2) with | C x -> x;;
The constructor C expects 2 argument(s), but is here applied to 1 argument(s)
# type c = C of (int * int);;
type c = C of (int * int)
# match C(1, 2) with | C x -> x;;
- : int * int = (1, 2)
#