Positions, Categories, and Text

Much of political science traffics in positions of one form or another. I work on statistical models for extracting positions using text as data these are mostly applied to legislative politics (but not always, as you’ll see below). For example Lowe 2008 took apart the Wordscores text scaling method and reconstructed it to show that it is both one step of an iterative estimator for correspondence analysis and an approximation to a reasonable statistical ideal point model. Lowe and Benoit 2011 showed how a cunning re-parameterisation of the Wordfish text scaling model can be used to generate position uncertainty without bootstrapping. But since we don’t really believe the model assumptions, Lowe and Benoit 2013 compared position estimates and a variety of uncertainty measures, including various types of bootstrapping, to the estimates of human raters and offered some practical suggestions for representing uncertainty.

I’ve also worked on extracting positions from texts that are already coded into categories. Lowe et al. 2011 presented a model-free way to turn category counts from the Comparative Manifestos Project into party positions using an insight from psychophysics. We also make positions available on a number of more targeted dimensions, in case you’d like to use them for your research. Benoit et al. 2012 showed the importance of using a more targeted scale rather than a generic left-right measure.

Most recently I’ve been thinking about a unified statistical framework that encompasses pretty much all the methods of position estimation from text that you’ve ever heard of. That’s a useful thing not only because it answers some outstanding questions about dimensionality, offers some new models, and some fun visualisation methods, but also because it should stop the flow of papers doing ‘compare and contrast’ by applying different text scaling models to their favourite data. Or maybe it won’t. In any case, you can hear more about this at APSA 2013 in Chicago.

I’ve been interested in statistical approaches in manual content analysis for a while, from early think pieces (Lowe 2004, 2006) to applied work. For example, Sullivan and Lowe 2009 analysed the independence rhetoric of Chen Shui-Bian in his speeches as president as a function of audience preferences and his own agenda. And in a social media, Theocharis et al. 2013 was a comparative study of social movements, mobilisation, and grievances as seen through Twitter in Spanish, Greek and US samples.

Most statistical work in content analysis obsesses over rater reliability measures but I’m less interested in discovering whether coders are any good and more interested in representing the sorts of mistakes they make, because I’d like to use that to correct subsequent inferences. Fortunately for me, much of the German Longitudinal Election Study lives upstairs, so I have no shortage of hand coded media data to work on.

Event Data and International Relations

I like to work with event data, the automatically generated output of information extraction systems that tells you who did what to whom in as much temporal detail as newswires can provide. King and Lowe 2003 compared the performance of an automated event data extraction system to human coders (Lowe and King 2003 is the short version).

I’m also interested in doing sensible things with event data once its coded. In a paper that was going to go to APSA 2012 until that was rained off and will now be at EPSA 2013, I show how the Goldstein conflict-cooperation scores that are traditionally assigned to international events before doing time series analysis can be recovered from raw event counts automatically with an unfolding type ideal point model. By ‘recovered’, I mean: you give me the event and their counts and I give you back a scale value for each temporal unit that correlates highly with the average Goldstein score. (For weekly data on the Balkans in the 1990s, ‘highly’ means >0.9) If this sounds like another scaling application, that’s because it is. In the same paper I clarify what you can, should, and probably should not do event data in the light of a proper dynamic measurement model.

I also work a lot on the data side of international relations. For example, Carey et al. 2013 introduces a searchable database of militia and other pro-government armed groups worldwide from 1981. Turning to their opponents, Jenne et al. 2007 looks at the determinants of ethnic group secession. And for a more legal perspective, I designed the database of implementations of the Rome statute that you see on the ICC’s legal tools website, with the help of the Human Rights Law Centre at Nottingham.