Jon Zelner Social epidemiology, infectious diseases, and statistics

Autogenerating research reports using Make, Knitr and Latex

Although I know I’m decidedly late to the party, I’ve recently rediscovered the incredible usefulness of Make, particularly as part of a reproducible scientific workflow.

In my own work, which runs towards the computational, the easy reproducibility you get from make is critical not just for the usual reasons about being able to prove you did what you said you did, but also for being able to easily make changes when I’ve stepped away from a project for awhile and forgotten how some of the layers work, or when it’s time to do revisions and I can’t remember how something was done.

Reducing the friction involved involved in updating all the outputs downstream from some change, whether it’s fixing a data-cleaning issue or using a better algorithm for some step of the analysis, is just good for every aspect of any project.

To this end, this post is about sharing the simple example of a Makefile that uses using knitr, watchman and make to update a PDF file on the fly as changes are saved to the input knitr file.

When doing this, I personally like to work with emacs in one window and Skim next to it (ideally using my favorite mac tiling window manager, Amethyst to make full use of the desktop). Skim will automatically detect changes to the output PDF and silently reload them. This gives you something like WYSIWYG editing for Latex, with a minimum of fuss.

This is territory I’ve covered before for generating Jekyll posts in markdown using knitr and watchmedo, but having tried the makefile/watchman way, I think I’m going to go ahead and take this approach to my blog posts as well:

filestem := myfile
knitrfile := $(filestem).Rnw
texfile := $(filestem).tex
pdffile := $(filestem).pdf

all: texfile pdffile

tex: $(texfile)

pdf: $(pdffile)

tex : $(knitrfile)
	$(info ************  CONVERTING KNITR TO TEX  ************)
	Rscript -e "require(knitr); knit('$(knitrfile)')"

pdf : tex
	$(info ************  COMPILING PDF  ************)
	pdflatex $(texfile)

watch:
	$(info ************  WATCHING FOR CHANGES  ************)
    watchman watch "$(shell pwd)"
	watchman -- trigger "$(shell pwd)" remake *.Rnw -- make pdf

unwatch:
    $(info ************  CANCELLING WATCH  ************)
	watchman watch-del "$(shell pwd)"

What’s great about the makefile approach is that you can just change the filestem at the top and re-use the file (or some version of it) for each new knitr/pdf document or whatever output format you like.

Parts of this example about using watchman are shamelessly lifted from this post about the use of make for front-end web development.

This particular makefile assumes that there is one common filestem across the input knitr latex (Rnw) file, represented by the variable filestem, the tex generated by knitr, and the output pdf. To use this makefile, just put it in the directory containing your Rnw file and edit the first line to have the appropriate name.

You can use this makefile to generate the pdf as a one-off, just using make pdf. You can alsostart a watchman task on the directory containing the makefile using make watch. Watchman will then wait for any changes to any Rnw files in the directory and call make pdf whenever a change is detected. Whenever you want watchman to stop watching the input file, just execute make unwatch within the same directory as your makefile, which will cancel the watchman trigger for the current directory.

(Another option for building a pdf from knitr input is to use the knit2pdf function in knitr, but I prefer to manage the latex build on my own, since this makes it easier to manage bibtex, etc.)

Aside from facilitating WYSIWYG editing, watchman is also useful as a kind of local, easy-to-configure continuous integration tool for re-running analyses on the fly as they’re modified.

This is just one of those things that’s obvious only once you know about it and personally saves me a lot of irritation on a daily basis. It’s also just a very small example of how central make has become for managing the inherent complexity in most research projects and how useful it is for documentation and accountability.