# Tools for making a paper

Since it seems to be the fashion, here’s a post about how I make my academic papers.
Actually, who am I trying to kid? This is also about how I make slides, letters, memos and “Back in 10 minutes” signs to pin on the door. Nevertheless it’s for making academic papers that I’m going to recommend this particular set of tools.

I use the word make deliberately because I’m thinking of ‘academic paper’ broadly, as the sum of its words, analyses, tables, figures, and data. In this sense, papers can contain their own replication materials and when they do it should be possible in a single movement to rerun a set of analyses and reconstruct the paper that reports them.

To get anywhere near that goal, I use a mix of latex, its newer bibliography system biblatex, and the R package knitr. Also, I use a Mac, though that won’t make very much difference to the exposition.

Here’s how this currently works…

Since I write in latex I have texlive installed, in the gargantuan but friendly form of Mactex. To actually write things I use the TeXShop editor that comes with it, but only after mapping the default font to something non-proportional. (What were they thinking?)

My basic paper template starts like this

\documentclass[11pt,a4paper]{report}
\usepackage[utf8]{inputenc}

%% fonts
\usepackage[charter]{mathdesign}
\usepackage[scaled=.95]{inconsolata}

%% page margins, inter-paragraph space and no chapters
\usepackage[margin=1.1in]{geometry}
\setlength{\parskip}{0.5em}
\renewcommand{\thesection}{\arabic{section}}

%% bibliography
\usepackage[american]{babel}
\usepackage{csquotes}
\usepackage[style=apa,natbib=true,backend=biber]{biblatex}
\DeclareLanguageMapping{american}{american-apa}

%% for memisc
\usepackage{booktabs}
\usepackage{dcolumn}

%% define a dark blue
\usepackage{color}
\definecolor{darkblue}{rgb}{0,0,.5}

\usepackage{hyperref}

\author{Will Lowe\\Universität Mannheim \and Coauthor}
\title{Because We Can: Studying Twitter in Political Science\thanks{Paper presented at some conference or other.}}
\date{March 2013}

\begin{document}
\maketitle

\begin{abstract}
What the paper is all about
\end{abstract}

Pretty vanilla stuff for a latex person, but still there are a few things to note:

Input encoding: UTF-8. Always, for everything. Paper, bibliography, data, and code. This file is in UTF-8 because I don’t want to live in the late 20th century any more, and I don’t want to have to get all {\”u} about the perfectly respectable (and in my part of the world ubiquitous) u-with-an-umlaut ü or its non-ASCII brethren.

Motto: If it’s common enough to get its own key on the keyboard then it’s not a candidate for an escape sequence.

Bibliography: I use biblatex, not bibtex, for very much the same reasons as I insist on UTF-8. Try it. You’ll like it. It’s better put together, behaves well with Unicode, and doesn’t require any changes in your .bib files.

If you happen to use Bibdesk (also bundled with Mactex) to edit your bibliography, you may want to add the extra biblatex fields like DOI, as described here.

Here I’ve loaded the excellent APA style. (All the lines in that block are required.) I’ve also switched on natbib emulation so I can use the good old citep, citet, etc. citation commands I grew up with, under the new biblatex regime.

Preparing for R: The booktabs and dcolumn package serve to style and digit-align latex tables respectively. They’re here so that the R package memisc that auto-generates all my tables can use them. Because nobody still writes data tables by hand. Do they?

Now it’s time to set up the R parts. I use knitr to embed R in documents and so should you. Think of it a non-cryptic Sweave that’s isn’t just a massive Perl script and always knows where its style files are. Here I set an important default in an uncached chunk:

<<set-options, echo=FALSE, cache=FALSE>>=
opts_knit$set(stop_on_error=2L) @ Why? Because when your R code fails – and if you’re writing paper and code together it will fail at some point – then without further guidance knitr will just keep on trucking. This is not necessarily a good thing: Either the nasty error that replaces your desired output happens to compile as latex, in which case there is nothing to tell you that Figure 3, your pride, your joy, and the product of many hours getting your head into the ggplot2 zone is simply missing from the final pdf. Alternatively, it does not happen not to compile as latex, which will give you the mistaken impression there is something wrong with your document rather than with your code. Now, these are occupational hazards of mixing document and code, but since I’m doing just that I can at least ensure that the code stops when it’s broken. And that’s what this knitr option does. Notice that, despite eschewing Sweave, the chunks of R are still wrapped in the aesthetically-challenged noweb syntax, bristling with angle brackets and at signs. Other syntax is possible with knitr, but it’s probably safest to stick with noweb. Also, it doesn’t confuse the old timers. Speaking of whom… a couple of observations for those who already have lots of documents set up for Sweave. First, my sympathies – it must have been horrible. Second, be careful because Knitr’s option syntax is not quite the same as Sweave now it’s all going through R. Some of the changes are listed here where there’s also a function to turn the one into the other. Happily, you’ll find that knitr makes more sense. Next I load some R packages. Here it’s the very handy memisc which I use this mostly for its wide-ranging toLatex function, and the aprtable package for typesetting regression tables. <<loadpackages,include=FALSE>>= library(memisc) library(apsrtable) @  Here include=FALSE means that nothing that happens in this chunk will make it into the paper, including any start up messages or exciting news about which functions overwrite which other ones. (Thanks here to Matheiu from the comments). If you find yourself wanting to suppress the output of some R functions but not others then wrap your noisier functions in suppressMessages. It’s about time for a table. I like to use the document itself to control the formatting of the table, perhaps because I can never remember how to get ctable to do what I want, so my typical tables tend to look as follows, with the R code wedged into the middle: \begin{table}[htdp] \caption{A fascinating table} \begin{center} <<tab-fascinating,results='asis',echo=FALSE>>= tab <- HairEyeColor # the data: a three way table toLatex(ftable(tab)) @ \end{center} \label{tab:mytable} \end{table}  Here, by the way, is one more reason to use memisc's toLatex rather than xtable: memisc can typeset a flat table. It also restricts itself to returning a tabular environment and leaves the whole surrounding table business to me. The results of this chunk are set to be 'asis' so that nothing untoward happens to the generated latex table code on the way into the document. Similarly, my typical figure looks like this: \begin{figure}[htbp] \begin{center} <<plot-fascinating,echo=FALSE>>= mosaicplot(HairEyeColor) @ \caption{A fascinating plot} \label{plot:fascinating} \end{center} \end{figure}  Unlike Sweave it's not necessary to say that the code chunk is going to be a figure. Just make the plot and it will get inserted. By default it will take up the width of the text. For my sins I find myself writing about regression models. Sometimes I cannot avoid having to show their coefficients in a big table. R packages for turning regression output from several models into nicely formatted latex tables include apsrtable, memisc, and stargazer. You can see an example in another post. Here's an example using apsrtable and some random attitude data that comes with R: \begin{table}[htbp] \caption{A fascinating regression table} \label{lm:fascinating} \begin{center} <<lm-fascinating,results='asis',echo=FALSE>>= m1 <- lm(rating ~ complaints + privileges + learning + raises + critical, data=attitude) m2 <- lm(rating ~ complaints + privileges + learning, data=attitude) apsrtable(m1, m2, Sweave=TRUE) @ \end{center} \end{table}  In this package 'Sweave=TRUE' ensures the regression tabular environment doesn't get wrapped in its own table. The last part of the document just pushes out the reference list and shuts up shop: \printbibliography \end{document}  Save this document with suffix '.Rnw' and it's ready to go. I mentioned that I write in TeXShop, which has the notion of compilation engines. For example, there's one for ordinary latex that calls pdflatex, and one for XeLaTeX which uses that instead. Once defined, these engines all live on a button in the main interface. Compilation is then a matter of pressing it or remembering that Apple-T does the same thing. There isn't a built-in engine for knitr, but it's easy to make one. The engine itself is just a shell script. Here's my belt-and-braces version that believes you are on a unix machine but doubts that your paths are set up properly: #!/bin/bash export PATH=$PATH:/usr/texbin:/usr/local/bin

if (Rscript -e "library(knitr); knit('$1')") then latexmk -pdf "${1%.*}"
fi


In brief, this tries to run the R code in double quotes on the first argument ($1) which is the name of the .Rnw file. If this succeeds then the transformation from latex+R to pure latex must have been successful, so we can call latexmk on the resulting file. latexmk runs latex, then biber, then latex, then latex again, then... until all the citations are cited, the contents are tabled, and all the cross references are happy again. To get TeXShop to treat this file as an engine, save it as ~/Library/TeXShop/Engines/Knitr.engine and don't forget to make it executable. My paper writing process then consists of writing words and code, and compiling intermittently to see where I am. When I'm happy with the result I can open up a R session and type library(knitr) purl("myfile.Rnw")  and get the R code extracted from the surrounding paper in a called 'myfile.R'. That, along with any files or data that are called in the course of the document, constitutes the replication materials. ### 21 Comments 1. Pete says Thanks for sharing, I’ve started learning latex so its most welcome! Any chance you could include an example pdf/screenshot of how your standard template looks? Cheers 2. Will says Sure. Here’s a draft paper that will be presented at the ECPR Joint Sessions in Mainz. It exercises all the constructions above, except for the regression table. 3. ctpfaff says He thanks for the nice article. You will love the the Open-Science-Paper repository which you can find on github https://github.com/cpfaff/Open-Science-Paper The Open-Science-Paper document produces nice looking documents in scientific paper format and offers a lot of options to modify the look and feel of the paper. Just read into the documentation. • Will says That’s a very nice template. Perhaps a little bit ‘decomposed’ for my preferred writing style – I need all my sections to be in a file in front of me or I forget what I’m talking about. I look forward to seeing what you make of the presentation version. (Also, I had no idea that ‘widows’ were ‘Hurenkinder’ in German typography!) 4. ctpfaff says Thanks. Well, this distraction into separate files is one of the strengths of this document. It helps me to focus on a section and to keep the overview when the document grows. And there is another important benefit of splitting up documents. You can comment out all the chapters you are not working on which speeds up the document compilation. I think “Hurenkinder” is a very old german word for widows. Today it is more common to say “Wittwe” but I thought it is funny and used the older terms. 5. Mathieu says Very nice overview! Just a comment, about the suppressMessages command. I think you should be safe with the following instead (include = FALSE evaluates the chunk, but do not include anything in the LaTeX file) : <<loadpackages,include=FALSE>>= library(memisc) library(apsrtable) @  This makes your code more standard. Mathieu. • Mathieu says Mmmmh… Nice try, but epic fail! The header of the chunk should include: loadpackages, include = FALSE. Mathieu. • Will says Quite right about the include. I’ve taken the liberty of adjusting your comment to say what you wanted it to. Hope you don’t mind. • Will says It seems to be a slightly annoying to get latex to work properly with Greek. Nevertheless I’ve answered that stackoverflow question. To answer your question about the template above, here’s a version of the file in the question that compiles with XeLaTeX (which comes with your latex distribution) and knitr: \documentclass[a4paper]{article} \usepackage{fontspec} \setmainfont{Times New Roman} \setsansfont{Arial} \newfontfamily\greekfont[Script=Greek]{Linux Libertine O} \newfontfamily\greekfontsf[Script=Greek]{Linux Libertine O} \usepackage{polyglossia} \setdefaultlanguage{english} \setotherlanguage{greek} \title{Sweave Example 1} \author{George Dontas} \begin{document} \maketitle Ελληνικό κείμενο In this example we embed parts of the examples from the \texttt{kruskal.test} help page into a \LaTeX{} document: Αυτό είναι κείμενο στα Ελληνικά <<echo=TRUE>>= data(airquality) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: \begin{center} <<echo=FALSE>>= boxplot(Ozone ~ Month, data = airquality) @ \end{center} \end{document}  and a slightly adjusted engine file that will work with it: #!/bin/bash export PATH=$PATH:/usr/texbin:/usr/local/bin

if (Rscript -e "library(knitr); knit('$1')") then latexmk -xelatex "${1%.*}"
fi

6. RogerVV says

Hi, Thanks for your blog. I wonder if you have any recommendations about how to use Latex and reproducible files with other collaborators without Latex exposure. At this point this final barrier is the only thing that keeps me from using Latex in my workflow. Even though I dislike Microsoft’s word, its tracking changes feature is still incredibly helpful.

• Will says

As far as I know there is absolutely no way to embed analyses (or frankly anything) programmatically in the middle of an existing Word documents. (A quick glance at the documentation of the POI project indicates what a nasty tangle of data structures a Word document really is underneath).

It is, on the other hand, it’s quite easy to generate Word documents after the relevant materials have been inserted, thinking of the format rather as PDF rather than as source. A good friendly starting format is Markdown, which is much less forbidding than LaTeX, but for most reporting purposes equally capable. I sketch out how to use pandoc and R to do reproducible research and then generate Word documents in another post. RStudio has similar capacity for html reports built in, so try that first. Of course, once you’re in Word you’re not getting out again without a fight. That is the design of the product.

Also, getting your collaborators to work on a shared Markdown document doesn’t get you your track changes. But it all depends what you use track changes for. If it’s to make comments, then both Markdown and LaTeX have comments anyway, which a good editor will highlight as such. If it’s for accepting changes and updating the document, then the industrial strength solution is version control. For me that’s been less than ideal and I’ve only ever had one collaborator (from computer science) who was happy to use it. However, I’ve found that Dropbox itself tends to have enough roll back capacity for what my collaborators tend to want to do, so this problem has turned out to be more theoretical than practical.

Ultimately, I think the two greatest practical hindrances to reproducible research are Word and Excel. One can only really try to work around them. As long as academics think of research as primarily putting words in order on a page then a wordprocessor will seem like an ideal solution.

• Yihui says

Just wait for a couple of months. We are working on this issue 🙂

• I was writing a 15 page grant with an anthropologist PI who had never used latex. I set up the main grant and put it in https://www.sharelatex.com/ even with some pretty crazy hacks (to meet the template) and she was not only able to edit it but to create PDFs after she had with no problems. Latex is pretty intuitive as long as someone’s set it all up (you’d never guess the commands) & you have a good color-highlighting editor.

7. This is great – I tweeted a link to your article (@polscireplicate)! I’m doing a reproducibility course on coursera.com at the moment with short videos and tutorials (https://www.coursera.org/course/repdata) on knitr, R markdown etc. I hope tutorials like yours and the coursera course encourage more researchers to spend some time on learning and using such software.

8. mark says

Hi Will,
Thanks for your post. I am trying to recreate the workflow and have just today installed MacTex and am able to create the basic header format, i.e.

\documentclass[11pt,a4paper]{report}
\usepackage[utf8]{inputenc}

\begin{abstract}
What the paper is all about
\end{abstract}
\end{document}

I save with utf8 encoding and it can typeset (by pressing the typeset button in TexShop) quite happily and generates a pdf.
I then add the set-options between \end{abstract} and \end{document} as below:
<>=
opts_knit$set(stop_on_error=2L) @ And on typesetting TexShop has the guile to give me an error saying Missing$ inserted and refers me to the line that starts opts_

I am completely new to this type of work flow. Have you any idea what this error might be due to or where I might look to try and figure it out? I should add that I do have R and knitr version 3.0.2 installed on my machine via RStudio. Thanks. Mark

• Will says

It sound like you tried to compile with TeXShop’s ‘LaTeX’ engine rather than the ‘knitr’ engine that this post shows how to make. Maybe toggle the engine list pulldown next to the ‘Typeset’ button and select the knitr one.

• mark says

Hi Will, thank you, that’s got it!

9. mark says

Sorry, 3.0.2 is the knitr build, the version is 1.7.

This site uses Akismet to reduce spam. Learn how your comment data is processed.