Rendering TeX online: A smooth workflow for publishing mathematics and science on the web

Problem

  1. Science and Mathematics have traditionally been published in journals and letters. Individual scientists and mathematicians devote a significant amount of time to rendering their ideas and results in “camera ready” formats. Learning how to do this efficiently and effectively is a skill which requires significant tool support. The tool of choice for many is LaTeX/TeX. These tools give the fine-grained control required for expressing complex ideas and their syntax, semantics and libraries represent decades of evolution under the close attention of skilled practitioners.
  2. The web is now the fastest, most effective way for humans to share ideas and information.
  3. LaTeX/TeX on the web is terrible.

The first two points are uncontroversial and I think I can convince you of the third.

In this article I explain my solution to the problem of expressing maths and science on the web. It is the system I use for this site and it is a huge improvement on how I used to do things.

How I used to do things.

There were three technologies I used to rely on for getting my information into “web ready” versions;

All suffer from the same problem, they don’t have the packages I use. Here are the package definitions for some recent work of mine, after I have removed pdf-specific things such as geometry and bookmarks.

% document settings
\usepackage[usenames,dvipsnames]{color}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{ifthen}
\usepackage{subfig}
\usepackage{listings,setspace}
\usepackage{tabulary}
\usepackage{supertabular}
\usepackage{mathpartir}

% listing settings
\lstset{                  % general command to set parameter(s) 
  basicstyle=\ttfamily,   % print whole listing in tt font 
  keywordstyle=,          % nothing happens
  identifierstyle=,       % nothing happens 
  commentstyle=,          % nothing happens
  stringstyle=,           % typewriter type for strings
  breaklines=true,       	% sets automatic line breaking
  mathescape=true, 
  showstringspaces=false, % no special string spaces 
  literate={->}{{$\rightarrow\;\;$}}2 {\\}{{$\lambda$}}1 {|>}{{$\triangleright$}}1
   {|>}{{$\triangleright$}}1 {forall}{{$\forall$}}1 {exists}{{$\exists$}}1
   {(-)}{{$\circleddash$}}1
}

As well as these packages, I often make heavy use of xypic. Putting aside the custom listings setup, were I to try latex2html/hyperlatex with this list of packages I would get lots of incompatibilities because these tools only work on a subset of typical latex distributions (such as texlive). It is hard enough working out how to render what I want in latex without needing to worry about what subset of latex will work. Furthermore, for parts that don’t work, you need to create workarounds - eek! The same thing applies for the javascript libraries and the webservices (like texify).

Furthermore, many of these tools are buggy, or at least operate differently to texlive. Sometimes this is because they are trying to render TeX in HTML but sometimes it is just … well, I don’t know. If you have used TeX for any period of time you know that debugging it is hard, you need every trick you can get your hands on. Using opaque - arms length - tools like texify make this harder. For example, here is the very first thing I tried to typeset for an article on type system proofs

\begin {align*}
\llbracket \forall t. \sigma \rrbracket & = \forall t. \llbracket \sigma \rrbracket \\
\llbracket t \rrbracket      & = t  \\
\llbracket \tau \rightarrow \nu \rrbracket & = \llbracket \tau \rrbracket \rightarrow \llbracket \nu \rrbracket \\
\llbracket T\ \tau_1 \cdots \tau_n \rrbracket & = T\ \llbracket \tau_1 \rrbracket \cdots \llbracket \tau_n \rrbracket
\end {align*}

Using texify, I unexpectedly got

texify

while using my local latex, I got what I wanted

What went wrong with texify? I am sure there is a reasonable explanation, but I learnt TeX, not texify. I have enough trouble with TeX on my own machine.

Wordpress

In Australia, when you think maths, you think Terence Tao. Terence has a blog which includes reams of latex and which renders very nicely. It is an example of probably the best way to get tex online at the moment, wordpress. If it is good enough for Terence, it should be good enough for me. However, wordpress suffers from the same problem as above, it includes only three packages. They certainly cover a lot of ground, but one I use quite a lot, mathpartir, is not amongst them.

Just generate a pdf

There is a solution that is obvious, I could publish everything in pdf. There is some problems with that though. Firstly, people don’t seem to “browse” pdfs on the web in the same way they browse web pages. Sites that aggregate interesting reading material such as reddit even warn people if a link is to a pdf instead of an HTML page.

More importantly however is that pdfs are really difficult to read on mobile devices. The fixed page size means that if your screen is smaller than the page you are reading, you have lots of scrolling to do.

Finally, pdfs are static while web pages are living documents. Commenting on articles has become a huge part of discourse on the web. Embedding comments in an article is superior to having a separate thread in my opinion, and something you can’t do with pdfs.

Summing up my requirements

  • I need to use all my existing tools and packages, including the macros I have built up over the years.
  • I need to generate HTML pages including comments.

My solution

In short, my solution is to statically generate HTML documents using nanoc, generating images via my local tex install for all material that does not easily render in HTML. This generates a static web site which can be hosted. I use web-services for dynamic page content like comments.

Take this page as an example. On my computer is a file called online_tex.md which contains the source for this article. That file is mostly markdown formatted text, but it includes an aligned environment with the latex shown above. When the page is processed by nanoc, it notices the aligned block and passes that to my local latex distribution asking it to render an image for that latex. I can use any local packages, any custom commands or environments, I choose the font (in this case mathpazo to match the palatino font of the article). I get exactly the latex I want rendered in an image. My nanoc code takes care of putting that image in the right place and replacing the latex in the online_tex.md file with a link to the image file.

At this point I have not really improved on pdfs since the page is static. In particular, there are no comments. nanoc takes care of creating a dynamic-looking website (for example, it generates a list of articles) but I need an external provider for my comments. As you can see below, I use disqus. This web-service allows me to embed a comment thread in an otherwise static page.

Advantages

  • You can host your pages anywhere, no databases, no content management system, no extra requirements.
  • You have all the latex packages and commands/environments you are used to and they work exactly the same as they always have.
  • Running nanoc is a lot like running latex and you get to see all your latex rendering output in case things are going wrong.
  • The workflow with nanoc is very similar to latex. With your source in one window, you run a compile command and your browser is automatically updated with the new page. This is exactly how I work using TextMate/Skim on my Mac and how I have seen others work with Emacs/TeXShop/etc.

Disadvantages

  • People commenting on your article don’t have access to the same system. They can use, for example, texify, so I don’t think they are any worse off than they were before.
  • Comments from your site are dependent on a third-party.
  • You have to learn how to use nanoc. I found this much simpler than learning the alternatives though.

The Code

The system I describe is not available out of the box anywhere, I had to customise nanoc and create a few scripts on my machine to make this work. Here I describe what I did in case you want to do it too.

TeX Renderer

You will need to write a script which can take any file, embed it in some latex scaffolding (which includes all the packages, commands and environments you want to use) and run tex/latex over the resulting file. I do this with a script called texit. I keep it in my ~/bin directory. I won’t show you mine since I wrote it while I was experimenting with some Haskell libraries and it is quite ridiculous. To start off with though, you can just create a simple texit script with no scaffolding by calling pdflatex directly

#!/usr/bin/bash

pdflatex $1

You will need something called texit since we will use it from our nanoc code. This generates a full page, you need to crop that page into a little image. We will use pdfcrop for this, so ensure it is available in your path and update texit to use it.

#!/usr/bin/bash

pdflatex $1
pdfcrop (basename $1).pdf $2

Running texit on a file now results in a little pdf of the rendered latex, but we need a web-ready image format, so ensure you have image magick installed before the next part.

nanoc filter

The final part of the puzzle is a nanoc filter which looks for latex in a source file and passes it off to texit before converting it with image magick. However, doing this naively will slow down the compilation of a site terribly. As you build up content you end up compiling hundreds of little tex snippets, which takes a very long time. The code below names each tex snippet with a hash of its contents and only regenerates it if the contents have changed. We cache/memoise the latex images making site compilation very fast. This also means that as you are drafting your document, you can quickly recompile the page after every change, giving you efficient feedback. This nanoc filter looks for two latex delimiters

  • \begin and \end, allowing you to embed any latex environment
  • double dollar signs ($$), allowing you to put maths inline.
# textit filter
module Nanoc3::Filters
  class Texit < Nanoc3::Filter
    identifiers :texit
    
    def run(content, params={})
      req_images = Array[]
      image_dir = @item.reps[0].raw_path.gsub(/(.*)index\.html/){|mm| "#{$1}images/"}
      if (!File.exists?(image_dir))
        system("mkdir -p " + image_dir)
      end
      # any environments
      after_envs = content.gsub(/\\begin\{(.*?)\}(.+?)\\end\{\1\}/m) { |m| 
                     digest = Digest::MD5.hexdigest(m)
                     image_loc = image_dir + digest
                     # keep track of the images we need because we will delete the others later.
                     req_images << digest
                     if !(File.exists?(image_loc+".jpg"))
                       File.open(image_loc+".tex", 'w') {|f| f.write(m) }
                       system("texit", image_loc,image_loc+".pdf")
                       system("convert","-density","500", image_loc+".pdf","-resize", "20%", "-quality", "100",image_loc+".jpg")
                     end
                     "<div class='math'><img src='images/" + digest + ".jpg'/></div>"
                   }
      # any inline math
      after_inline = after_envs.gsub(/\$\$(.+?)\$\$/) { |m| 
                       digest = Digest::MD5.hexdigest(m)
                       image_loc = image_dir + digest
                       req_images << digest
                       if !(File.exists?(image_loc+".jpg"))
                         File.open(image_loc+".tex", 'w') {|f| f.write(m) }
                         system("texit", image_loc,image_loc+".pdf")
                         system("convert", "-density","500", image_loc+".pdf","-resize", "20%", "-quality", "100", image_loc+".jpg")
                       end
                       "<span class='math'><img src='images/" + digest + ".jpg'/></span>"
                     }
      # working on code to remove images that are no longer needed, still needs some work
      Dir.foreach(image_dir) { |d|
        p d
        if (!(req_images.include?(d.gsub(/\....$/, ""))) && !File.directory?(image_dir + d))
          p req_images
          p "heh"
          File.delete(image_dir + d)
        end
      }
      after_inline
    end
  end

This code is a work in progress, so you can expect the version above is very slightly out of date (I will update it with major changes). You can get the very latest version from the repository of this site. This code should go in some .rb file in nanoc’s lib directory (I use the existing default.rb file) so that it can be seen by nanoc’s rules. The final thing left to do is to run this filter on all files that might contain embedded latex, in my case that is my articles, hence the following rule in my nanoc Rules file. The filter: texit command will run the above filter over the whole file, generating all the required latex.

compile '/articles/*/' do
  filter :erb
  filter :texit
  filter :kramdown
  layout 'article'
  layout 'default'
end