Configuring Azure and RStudio for text analysis

I just finished teaching Computer-Assisted Content Analysis at the IQMR summer school at Syracuse. With three lecture and three labs the problem every year is getting the right R packages onto people’s machines. In particular, anything that involves compilation – and when you’re using quanteda, readtext, and stm, that’s lots of things – is going to be trouble. Over the years, R and the various operating systems it has to live in have got a lot better about this but ultimately the best solution is… not to do it at all. That is, to run everything on somebody else’s computers, excuse me, ‘in the cloud’. When students access an appropriately provisioned RStudio Server through their browsers they’re good to go from Lab one.

This post is about how to set all that up.

Continue reading “Configuring Azure and RStudio for text analysis”

Packaging the TheyWorkForYou API

TheyWorkForYou is a great website for keeping up with British politics and one of the many fine things mySociety does to make democracy in the UK more transparent.

There’s also an API, accessible via http and wrapped up for a few languages. However, R is not amongst them, so I wrote twfy. You can install it from CRAN.

It was my first proper API package and a bit of a learning experience. If you want to hear more about that, read on.
Continue reading “Packaging the TheyWorkForYou API”

More SOTU Scaling

A couple of days ago the Monkey Cage featured Ben Lauderdale’s one-dimensional scaling model of US State of the Union addresses. In this post, I replicate the analysis with a closely related model, ask what the scaled dimension actually means, and consider what things would look like if we added another one.

The technical details are all at the bottom of the post if you want to try this at home.

Continue reading “More SOTU Scaling”

Call them what you will

I’ve been playing around with the R package texreg for creating combined regression tables for multiple models. It’s not the only package to do that – see here for a review – but it’s often handy to be able to generate both ascii art, latex, and html versions of the same table using almost identical syntax. Also, the ascii art creating screenreg function allows me to bypass the pdf construction cycle I previously described here. The coefficient plots from plotreg are pretty cool too.

This post is about making the variables listed in those combined regression tables more readable. That is particularly important when data comes from variable-mangling statistical software or from co-authors whose idea of a descriptive name could pass for an online banking password. Even R will cheerfully mash up your carefully chosen variable names through formulas, factors, and interactions. So for work people are going to see, variables should have sensible names.

First I’ll walk through the existing texreg machinery for renaming, omitting, and reordering variables, and then propose a hopefully more intuitive implementation. I’ll demonstrate all this using screenreg on a classic data set on job prestige.
Continue reading “Call them what you will”

Which political science journals will have a data policy?

Making available replication materials for the research you do is A Good Thing. It’s also work, and it’s quite easy to never get around to. Certainly I claim no special virtue in this department so I am always happy when there’s an institutional stick to prod my better nature in the right direction. One such institutional prod comes from academic journals and their data policies.  If you have to give them your replication data before you they’ll publish your paper, then you probably will. What sorts of journals have data policies?
Continue reading “Which political science journals will have a data policy?”

Tools for making a paper

Since it seems to be the fashion, here’s a post about how I make my academic papers.
Actually, who am I trying to kid? This is also about how I make slides, letters, memos and “Back in 10 minutes” signs to pin on the door. Nevertheless it’s for making academic papers that I’m going to recommend this particular set of tools.

I use the word make deliberately because I’m thinking of ‘academic paper’ broadly, as the sum of its words, analyses, tables, figures, and data. In this sense, papers can contain their own replication materials and when they do it should be possible in a single movement to rerun a set of analyses and reconstruct the paper that reports them.

To get anywhere near that goal, I use a mix of latex, its newer bibliography system biblatex, and the R package knitr. Also, I use a Mac, though that won’t make very much difference to the exposition.

Here’s how this currently works…
Continue reading “Tools for making a paper”

Quantifying the international search for meaning

Inspired by Preis et al.’s article Quantifying the advantage of looking forward, recently published in Scientific Reports (one of Nature publishing group’s journals), I wondered if similar big-data web-based research methods might address a question even bigger than how much different countries wonder about next year. How about the meaning of life. Who is searching for clarification about the meaning of life? And how is that related to the more obvious life task of getting richer?
Continue reading “Quantifying the international search for meaning”