Tools for making a paper
Will Lowe (2013-03-02 00:02)
Since it seems to be the fashion, here’s a post about how I make my academic papers. Actually, who am I trying to kid? This is also about how I make slides, letters, memos and “Back in 10 minutes” signs to pin on the door. Nevertheless it’s for making academic papers that I’m going to recommend this particular set of tools.
I use the word make deliberately because I’m thinking of ‘academic paper’ broadly, as the sum of its words, analyses, tables, figures, and data. In this sense, papers can contain their own replication materials and when they do it should be possible in a single movement to rerun a set of analyses and reconstruct the paper that reports them.
To get anywhere near that goal, I use a mix of latex, its newer bibliography system biblatex, and the R package knitr. Also, I use a Mac, though that won’t make very much difference to the exposition.
Here’s how this currently works…
Since I write in latex I have texlive installed, in the gargantuan but friendly form of Mactex. To actually write things I use the TeXShop editor that comes with it, but only after mapping the default font to something non-proportional. (What were they thinking?)
My basic paper template starts like this
\documentclass[11pt,a4paper]{report}
\usepackage[utf8]{inputenc}
%% fonts
\usepackage[charter]{mathdesign}
\usepackage[scaled=.95]{inconsolata}
%% page margins, inter-paragraph space and no chapters
\usepackage[margin=1.1in]{geometry}
\setlength{\parskip}{0.5em}
\renewcommand{\thesection}{\arabic{section}}
%% bibliography
\usepackage[american]{babel}
\usepackage{csquotes}
\usepackage[style=apa,natbib=true,backend=biber]{biblatex}
\DeclareLanguageMapping{american}{american-apa}
\addbibresource{biblio.bib}
%% for memisc
\usepackage{booktabs}
\usepackage{dcolumn}
%% define a dark blue
\usepackage{color}
\definecolor{darkblue}{rgb}{0,0,.5}
%% hyperlinks to references
\usepackage{hyperref}
\hypersetup{colorlinks=true, linkcolor=darkblue, citecolor=darkblue,
filecolor=darkblue, urlcolor=darkblue}
\author{Will Lowe\\Universität Mannheim \and Coauthor}
\title{Because We Can: Studying Twitter in Political
Science\thanks{Paper presented at some conference or other.}}
\date{March 2013}
\begin{document}
\maketitle
\begin{abstract}
What the paper is all about
\end{abstract}
Pretty vanilla stuff for a latex person, but still there are a few things to note:
Input encoding
UTF-8. Always, for everything*. Paper, bibliography, data, and
code. This file is in UTF-8 because I don’t want to live in the late
20th century any more, and I don’t want to have to get all {\"u}
about the perfectly respectable (and in my part of the world
ubiquitous) u-with-an-umlaut ü or its non-ASCII brethren.
Motto: If it’s common enough to get its own key on the keyboard then it’s not a candidate for an escape sequence.
Bibliography
I use biblatex, not bibtex, for very much the same reasons as I insist
on UTF-8. Try it. You’ll like it. It’s better put together, behaves
well with Unicode, and doesn’t require any changes in your .bib
files.
If you happen to use Bibdesk (also bundled with Mactex) to edit your bibliography, you may want to add the extra biblatex fields like DOI, as described here.
Here I’ve loaded the excellent APA style. (All the lines in that block
are required.) I’ve also switched on natbib
emulation so I can use
the good old citep
, citet
, etc. citation commands I grew up
with, under the new biblatex regime.
Preparing for R
The booktabs
and dcolumn
package serve to
style and digit-align latex tables respectively. They’re here so
that the R package
memisc that
auto-generates all my tables can use them. Because nobody still
writes data tables by hand, do they?
do they?
Now it’s time to set up the R parts. I use knitr to embed R in documents and so should you. Think of it a non-cryptic Sweave that’s isn’t just a massive Perl script and always knows where its style files are. Here I set an important default in an uncached chunk:
<<set-options, echo=FALSE, cache=FALSE>>=
opts_knit$set(stop_on_error=2L)
@
Why? Because when your R code fails - and if you’re writing paper and code together it will fail at some point - then without further guidance knitr will just keep on trucking. This is not necessarily a good thing: Either the nasty error that replaces your desired output happens to compile as latex, in which case there is nothing to tell you that Figure 3, your pride, your joy, and the product of many hours getting your head into the ggplot2 zone is simply missing from the final pdf. Alternatively, it does not happen not to compile as latex, which will give you the mistaken impression there is something wrong with your document rather than with your code. Now, these are occupational hazards of mixing document and code, but since I’m doing just that I can at least ensure that the code stops when it’s broken. And that’s what this knitr option does.
Notice that, despite eschewing Sweave, the chunks of R are still wrapped in the aesthetically-challenged noweb syntax, bristling with angle brackets and at signs. Other syntax is possible with knitr, but it’s probably safest to stick with noweb. Also, it doesn’t confuse the old timers.
Speaking of whom… a couple of observations for those who already have lots of documents set up for Sweave. First, my sympathies - it must have been horrible. Second, be careful because Knitr’s option syntax is not quite the same as Sweave now it’s all going through R. Some of the changes are listed here where there’s also a function to turn the one into the other. Happily, you’ll find that knitr makes more sense.
Next I load some R packages. Here it’s the very handy
memisc
which I use this mostly for its wide-ranging toLatex
function, and
the
aprtable
package for typesetting regression tables.
<include=FALSE>=
library(memisc)
library(apsrtable)
@
Here include=FALSE
means that nothing that happens in this chunk
will make it into the paper, including any start up messages or
exciting news about which functions overwrite which other
ones. (Thanks here to Matheiu from the comments).
If you find yourself wanting to suppress the output of some R
functions but not others then wrap your noisier functions in
suppressMessages
.
It’s about time for a table. I like to use the document itself to
control the formatting of the table, perhaps because I can never
remember how to get ctable
to do what I want, so my typical tables
tend to look as follows, with the R code wedged into the middle:
\begin{table}[htdp]
\caption{A fascinating table}
\begin{center}
<>=
tab <- HairEyeColor # the data: a three way table
toLatex(ftable(tab))
@
\end{center}
\label{tab:mytable}
\end{table}
Here, by the way, is one more reason to use memisc’s toLatex
rather than
xtable:
memisc can typeset a flat table. It also restricts itself to returning
a tabular environment and leaves the whole surrounding table business
to me.
The results of this chunk are set to be ‘asis’ so that nothing untoward happens to the generated latex table code on the way into the document.
Similarly, my typical figure looks like this:
\begin{figure}[htbp]
\begin{center}
<>=
mosaicplot(HairEyeColor)
@
\caption{A fascinating plot}
\label{plot:fascinating}
\end{center}
\end{figure}
Unlike Sweave it’s not necessary to say that the code chunk is going to be a figure. Just make the plot and it will get inserted. By default it will take up the width of the text.
For my sins I find myself writing about regression models. Sometimes I
cannot avoid having to show their coefficients in a big table. R
packages for turning regression output from several models into nicely
formatted latex tables include
apsrtable,
memisc,
and
stargazer. You
can see an example in another
post. Here’s an
example using apsrtable and some random attitude
data that comes
with R:
\begin{table}[htbp]
\caption{A fascinating regression table}
\label{lm:fascinating}
\begin{center}
<>=
m1 <- lm(rating ~ complaints + privileges + learning + raises + critical,
data=attitude)
m2 <- lm(rating ~ complaints + privileges + learning, data=attitude)
apsrtable(m1, m2, Sweave=TRUE)
@
\end{center}
\end{table}
In this package Sweave=TRUE
ensures the regression tabular
environment doesn’t get wrapped in its own table.
The last part of the document just pushes out the reference list and shuts up shop:
\printbibliography
\end{document}
Save this document with suffix ‘.Rnw’ and it’s ready to go.
I mentioned that I write in TeXShop, which has the notion of compilation engines. For example, there’s one for ordinary latex that calls pdflatex, and one for XeLaTeX which uses that instead. Once defined, these engines all live on a button in the main interface. Compilation is then a matter of pressing it or remembering that Apple-T does the same thing.
There isn’t a built-in engine for knitr, but it’s easy to make one. The engine itself is just a shell script. Here’s my belt-and-braces version that believes you are on a unix machine but doubts that your paths are set up properly:
#!/bin/bash
export PATH=$PATH:/usr/texbin:/usr/local/bin
if (Rscript -e "library(knitr); knit('$1')") then
latexmk -pdf "${1%.*}"
fi
In brief, this tries to run the R code in double quotes on the first
argument ($1) which is the name of the .Rnw file. If this succeeds
then the transformation from latex+R to pure latex must have been
successful, so we can call latexmk
on the resulting
file. latexmk
runs latex, then biber, then latex, then latex
again, then… until all the citations are cited, the contents are
tabled, and all the cross references are happy again.
To get TeXShop to treat this file as an engine, save it as
~/Library/TeXShop/Engines/Knitr.engine
and don’t forget to make it executable.
My paper writing process then consists of writing words and code, and compiling intermittently to see where I am. When I’m happy with the result I can open up a R session and type
library(knitr)
purl("myfile.Rnw")
and get the R code extracted from the surrounding paper in a called ‘myfile.R’. That, along with any files or data that are called in the course of the document, constitutes the replication materials.