I just finished teaching Computer-Assisted Content Analysis at the IQMR summer school at Syracuse. With three lecture and three labs the problem every year is getting the right R packages onto people’s machines. In particular, anything that involves compilation – and when you’re using quanteda, readtext, and stm, that’s lots of things – is going to be trouble. Over the years, R and the various operating systems it has to live in have got a lot better about this but ultimately the best solution is… not to do it at all. That is, to run everything on somebody else’s computers, excuse me, ‘in the cloud’. When students access an appropriately provisioned RStudio Server through their browsers they’re good to go from Lab one.
This post is about how to set all that up.
Last summer, I and my trusty henchpeople from the Department of Politics ran an intensive six week summer course for incoming freshmen on data science (‘POL245’, for locals).
This post sketches out how I think course infrastructure should work, and provides some practical details of how we arranged things. Most of our structures worked pretty well. Some didn’t, so I also try to say what we’ll do differently next time. Some of this is, inevitably, specific to POL245, but maybe there’s something you can take away if you ever teach such a course yourself.
Let’s start with what course infrastructure is for.
TheyWorkForYou is a great website for keeping up with British politics and one of the many fine things mySociety does to make democracy in the UK more transparent.
There’s also an API, accessible via
http and wrapped up for a few languages. However, R is not amongst them, so I wrote twfy. You can install it from CRAN.
It was my first proper API package and a bit of a learning experience. If you want to hear more about that, read on.
Pretty regularly – usually in the middle of one of those interminable fixed-vs-random effects discussions – someone will pipe up that “Of course, for Bayesians this random vs fixed effect distinction makes no sense because all parameters are random”.
To the extent it can be made to make sense, the claim is false. It’s also unhelpful because it’s pretty much guaranteed to confuse and put-off people who have better things to do than pay attention to arguments in statistics.
But on the off chance you have a moment for one of those, let me try to disentangle things.
Some people think it isn’t rational to vote. Usually the argument is as follows: the probability of being pivotal, that is: the probability that your vote will ‘decide’ the winner, shrinks rapidly as the number of voters increases. So if you vote in the hope of determining an outcome, then the probability of that happening is small enough for it not to be worthwhile trying.
Let’s leave aside the virtues of this argument and consider a hypothetical against vaccination.
A: You know I like the idea of using logic and logical deduction to understand how thinking should be done. This idea that beliefs are, or at least should be, the conclusions of deductive arguments is very clear and elegant. But I do worry…
B: You worry? Tell me about your worries.
Sometimes a bit of R code needs to know what operating system it’s running on. Here’s a short account of where you can find this information and a little function to wrap the answer up neatly.
A little while back a New York Times article discusses the consequences for college admission of saying undiplomatic things in social media. Apparently colleges monitor, or at least check up on, the social media presence of their potential applicants to see whether they’re the right kind of person for the school. Inevitably, students scrub, curate, or simply hide their account in response.
Leaving aside the possible rights and wrongs of this behaviour, we might ask: how does the college identify a social media account as belonging to one of their potential students? The general answer is that social media like Facebook and Google plus have a ‘real names policy’. And the answer to why they have that is, allegedly, that people behave more civilly towards one another when they are not hidden behind an anonymising username. One may doubt that is the only reason, given the value of the personal information thereby acquired. Nevertheless it seems to widely believed that, despite some awkward evidence to the contrary, that this works.
How does it work, if it does? Continue reading
Hey Mac OSX users with Java 1.8 installed. Did R just request a Java 1.6 installation and then promptly crash your session? If so, read on…