Sunday, July 5, 2009

posting formulas in blogger

Posting math in Blogger posts can be tricky.

Luckily, images can be inlined in HTML:

<img src="data:image/png;base64,iVBORw0KGgoAAAA...

This code will be readable by major browsers including IE.

Knowing that, one can set up a toolchain that extracts LaTeX math declarations from source, processes them with TeX, and embeds the resulting PNGs in the HTML output.

For example, the following:

$$\begin{align}
  \left(1+x\right)^n  =& 1 + nx + \frac{n\left(n-1\right)}{2!}x^2 +\\
  +& \frac{n\left(n-1\right)\left(n-2\right)}{3!}x^3 +\\
  +& \frac{n\left(n-1\right)\left(n-2\right)\left(n-3\right)}{4!}x^4 +\\
  +& \dots
\end{align}$$

Becomes:

\begin{align} \left(1+x\right)^n = 1 + nx + \frac{n\left(n-1\right)}{2!}x^2 +\\ + \frac{n\left(n-1\right)\left(n-2\right)}{3!}x^3 +\\ + \frac{n\left(n-1\right)\left(n-2\right)\left(n-3\right)}{4!}x^4 +\\ + \dots \end{align}

Unfortunately Pandoc does not do this inlining, nor does it interface TeX. This is a design decision, as Pandoc tries to be zero-dependency.

The solution I came up with involves hacking the texvc OCaml program that ships with Wikipdia. I now have it do XML processing, substituting code of the form [EQ]\frac{1}{2}[/EQ] to an embedded image. The whole toolchain is still unfortunately quite ugly, and looks like this:

pandoc --standalone --no-wrap --gladtex $@ \
    | xmllint --dropdtd --recover - 2/dev/null \
    | texmi \
    | tidy --show-body-only true - 2/dev/null \
    | pandoc -f html -t html --no-wrap 

So the first step is converting from markdown to HTML with [EQ]-style mathematics (--gladtex), then the output is xmllinted so that the parser does not choke, then temxi renders the mathematics into HTML or inline images, and finally tidy makes sure the XML output is OK HTML. The last line is only necessary for Blogger's non-standard whitespace handling.

I will try to improve on this as time permits.

No comments:

Post a Comment