I just finished teaching Computer-Assisted Content Analysis at the IQMR summer school at Syracuse. With three lecture and three labs the problem every year is getting the right R packages onto people's machines. In particular, anything that involves compilation - and when you're using quanteda, readtext, and stm, that's lots of …

Last summer, I and my trusty henchpeople from the Department of Politics ran an intensive six week summer course for incoming freshmen on data science ('POL245', for locals).

This post sketches out how I think course infrastructure should work, and provides some practical details of how we arranged things. Most …

TheyWorkForYou is a great website for keeping up with British politics and one of the many fine things mySociety does to make democracy in the UK more transparent.

There's also an API, accessible via http and wrapped up for a few languages. However, R is not amongst them, so I …

Pretty regularly - usually in the middle of one of those interminable fixed-vs-random effects discussions - someone will pipe up that "Of course, for Bayesians this random vs fixed effect distinction makes no sense because all parameters are random".

To the extent it can be made to make sense, the claim is …

Some people think it isn't rational to vote. Usually the argument is as follows: the probability of being pivotal, that is: the probability that your vote will 'decide' the winner, shrinks rapidly as the number of voters increases. So if you vote in the hope of determining an outcome, then …

A: You know I like the idea of using logic and logical deduction to understand how thinking should be done. This idea that beliefs are, or at least should be, the conclusions of deductive arguments is very clear and elegant. But I do worry...

B: You worry?  Tell me about …

Sometimes a bit of R code needs to know what operating system it's running on. Here's a short account of where you can find this information and a little function to wrap the answer up neatly.

Operating systems are a platform issue, so let's start with the constants in the …

A little while back a New York Times article discusses the consequences for college admission of saying undiplomatic things in social media. Apparently colleges monitor, or at least check up on, the social media presence of their potential applicants to see whether they're the right kind of person for the …

Hey Mac OSX users with Java 1.8 installed. Did R just request a Java 1.6 installation and then promptly crash your session? If so, read on...

The Problem

A few days ago I was attempting to use the mallet package for topic models and I found that typing …

Bye.

Significance

Most undergraduate methods textbooks give the impression that there is only one form of statistical inference. It involves defining a stochastic model of the data generation process and for each interesting parameter constructing a statistic whose distribution under a some 'null' hypothesis is known. After the observations are made …

A couple of days ago the Monkey Cage featured Ben Lauderdale's one-dimensional scaling model of US State of the Union addresses. In this post, I replicate the analysis with a closely related model, ask what the scaled dimension actually means, and consider what things would look like if we added …

I've been playing around with the R package texreg for creating combined regression tables for multiple models. It's not the only package to do that - see the R to LateX packages overview for a review - but it's often handy to be able to generate both ascii art, latex, and html …

Perhaps you tried to open some application or mount some DMG on your Mac and encountered the following alarming message

"[Application] is damaged and can't be opened. You should move it to the trash."

Perhaps it is indeed damaged. But more likely it is just not signed by its developer …

If you are planning to attend the European Political Science Association (EPSA) meeting in Barcelona next week you might find a searchable online programme helpful (scraped out of the original pdf).

Making available replication materials for the research you do is A Good Thing. It's also work, and it's quite easy to never get around to. Certainly I claim no special virtue in this department so I am always happy when there's an institutional stick to prod my better nature in …

There are now quite a few R packages to turn cross-tables and fitted models into nicely formatted latex. In a previous post I showed how to use one of them to display regression tables on the fly. In this post I summarise what types of R object each of the …

Since it seems to be the fashion, here's a post about how I make my academic papers. Actually, who am I trying to kid? This is also about how I make slides, letters, memos and "Back in 10 minutes" signs to pin on the door. Nevertheless it's for making academic …

Inspired by Preis et al.'s article Quantifying the advantage of looking forward, recently published in Scientific Reports (one of Nature publishing group's journals), I wondered if similar big-data web-based research methods might address a question even bigger than how much different countries wonder about next year. How about the …

You've got a pdf file and you'd like to view it with whatever the system viewer is. As usual, that requires something special for Windows and something general for the rest of us. Here goes...

openPDF <- function(f) {
os <- .Platform\$OS.type
if (os=="windows")
shell.exec(normalizePath(f))
else …

At least five R packages will turn your regression models into pretty latex tables: texreg, xtable, apsrtable, memisc, and stargazer.  This is very nice if you happen to be a latex document or its final reader, but it's not so great if you're making those models to start with.

What …

Dean Burnett writes a column in Guardian, sometimes about science but more entertainingly on pseudo-, wannabe-, and not-actually- science. Most of the time this is good BS-shovelling fun and I recommend it. Unfortunately today we get some ill-considered overreach under the guise of shovelling.

The subject is a silly equation …

R's formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples' heads too. So, let's dive into the intersection of these three.

I'm aware that there are lots of packages for …

A little while ago I got a query about the calculation of the logit policy scales from Lowe et al. (2011). I thought it might be useful to repeat the answer slightly more publicly, in case anybody else was wondering. The pesky constants in that paper confuse people. Anyway, here's …

You have an SQLite database, perhaps as part of some replication materials, and you want to query it from R. You might want to be able to say:

results <- runsql("select * from mytable order by date")


and get the results back as an R object. Here's a function to do …

Perhaps you are trying to add your nice new object as data for an R package. But wait. It has [gasp] foreign letters in its dimnames, so 'R CMD check' will certainly complain.

What you need is something to turn R's natural Unicode-processing goodness into a relic from the early …

A couple of months ago somebody asked how to convert a new dictionary file so that it would run in the Yoshikoder. The format of the original file had two parts: the first half looked like:

%
1   funct
2   pronoun
3   ppron
4   i
5   we
...


which assigns identifiers to …

Perhaps you have a file written in Markdown with embedded R of the kind that RStudio makes so nice and easy but you'd like a range of output formats to keep your collaborators happy. Say latex, pdf, html and MS Word. Here's what you might do

