Constants in Logit scales

A little while ago I got a query about the calculation of the logit policy scales from Lowe et al. (2011). I thought it might be useful to repeat the answer slightly more publicly, in case anybody else was wondering. The pesky constants in that paper confuse people. Anyway, here’s the question:

In the article you give the formula as log(R+.5)-log(L+.5). I had assumed that in the formula that ‘R’ and ‘L’ were the total number of sentences on each ‘side’ of a policy scale and so consequently .5 is added to the total number of sentences in all the manifesto categories assigned to each side of a policy scale. However I was reading an article [… where they] seem to add .5 to each of the categories assigned to a policy scale and then also divide by the number of items used in the scale (their approximate formula [without proper subscripting] is: p = [(log(p_1 +0.5)-log(p_2+.5)+ \ldots +(log(p_3+.5)-log(p_4+.5)]/3) where p is a manifesto category). Consequently I’m slightly worried that I’ve misinterpreted how you calculate your scale

OK, so the way to think about this scale is as follows…

You have a known assignment of categories into left and right sides of an issue. (In the limit case you have just one category on each side). Call this a dimension. By assigning categories to a dimension as we do in the paper we are asserting that sentences in the grouped categories are equivalent for the purposes of making or taking a position on that dimension.

Now, for each case you have counts from each category involved on either side of the dimension. Summarise the counts of all left categories as L, all right categories as R, and R+L as T.

If you had an explicit statistical model of what makes a party take a position in a country-year on this dimension, e.g. involving some variables X, then you might sensibly fit a logistic regression to this data. For each case i, that would assume that [R_i, L_i] \sim \text{Binomial}(\pi_i, T_i) where \pi_i is the probability of seeing a sentence coded into one of the categories on the right side of the dimension and
\text{logit}(\pi_i) = \text{log} \frac{\pi_i}{1-\pi_i} = X_i \beta
for some coefficients \beta. But if, on the other hand, you had no idea what should be in X, or wanted to stay as close to the data as possible, then you might take the ’empirical’ logit straight from the observed proportions \text{logit}(\pi_i) = \text{log} \frac{R_i/T_i}{L_i/T_i} = \text{log} \frac{R_i}{L_i}. Adding a constant term, say a=0.5, to each R_i and L_i smooths all the empirical proportions, and hence the empirical logits. This not only gives you an estimate when a term is zero, but more relevantly smooths everything slightly towards 0 (the low count ones more than the others). This works the same way as an X-based model, which would smooth the predicted logits towards whatever the model form was expecting – maybe towards a country or party mean or some more complicated function – rather than towards zero.

Now its clear what a is doing, you can ask whether it is sensible to smooth each category separately and then combine or to smooth the resulting aggregate. That rather depends. The smoothing towards zero effect is going to be stronger if it is done in multiple places rather than just to the aggregate. (btw The division by the number of categories that went on one side or the other will simply rescale the measure). If you were interested in the individual category ‘positions’ as well as the dimensions you are asserting that they form, you might want good estimates of them too, in which case smoothing each separately might be a good idea. However, from a statistical estimation or mean squared error perspective there is no particular reason to think that the optimal estimates of individual categories combined in the optimal fashion into dimensions should give you the optimal estimate of position on that dimension (think of all those multilevel model justifications). For this it depends on how sure you are that the categories really are equivalent.

In any case, the ‘right’ answer is clearly to fit an explicit model you can stand behind – maybe a nice random effects model with year and party as crossed factors – and then take the posterior estimates from that as your positions. That’s closest to the ‘smooth at the end’ strategy, but you can see what you’re doing a bit better, the category equivalence assumptions are front and centre, and there’s no smoothing factors to worry about.

Leave a Reply

Your email address will not be published. Required fields are marked *