## On the use and abuse of weasels in science journalism

Dean Burnett writes a column in Guardian, sometimes about science but more entertainingly on pseudo-, wannabe-, and not-actually- science. Most of the time this is good BS-shovelling fun and I recommend it. Unfortunately today we get some ill-considered overreach under the guise of shovelling.

The subject is a silly equation purporting to define how depressing any day of the year is, and thereby to identify the most depressing one. It is sufficiently silly that it doesn’t deserve a link, has no redeeming features and if you’ve not read it yet you’re just lucky. He’s right. It is nonsense.

The arguments are more interesting, if a bit alarming. To put it bluntly: if they were sound they would sink all regression models about any social issue of any interest to anybody ever.
Continue reading “On the use and abuse of weasels in science journalism”

## Constants in Logit scales

A little while ago I got a query about the calculation of the logit policy scales from Lowe et al. (2011). I thought it might be useful to repeat the answer slightly more publicly, in case anybody else was wondering. The pesky constants in that paper confuse people. Anyway, here’s the question:

In the article you give the formula as $log(R+.5)-log(L+.5)$. I had assumed that in the formula that ‘R’ and ‘L’ were the total number of sentences on each ‘side’ of a policy scale and so consequently .5 is added to the total number of sentences in all the manifesto categories assigned to each side of a policy scale. However I was reading an article [… where they] seem to add .5 to each of the categories assigned to a policy scale and then also divide by the number of items used in the scale (their approximate formula [without proper subscripting] is: $p = [(log(p_1 +0.5)-log(p_2+.5)+ \ldots +(log(p_3+.5)-log(p_4+.5)]/3)$ where $p$ is a manifesto category). Consequently I’m slightly worried that I’ve misinterpreted how you calculate your scale

Continue reading “Constants in Logit scales”

## Unicode in R packages (not)

Perhaps you are trying to add your nice new object as data for an R package. But wait. It has [gasp] foreign letters in its dimnames, so ‘R CMD check’ will certainly complain.

What you need is something to turn R’s natural Unicode-processing goodness into a relic from the early days of computing without inadvertently aliasing any words that differ only by non-ASCII element. Here’s a handy iconv-invoking function to do that…
Continue reading “Unicode in R packages (not)”

## A conversion to Yoshikoder format

A couple of months ago somebody asked how to convert a new dictionary file so that it would run in the Yoshikoder. The format of the original file had two parts: the first half looked like:

%
1	funct
2	pronoun
3	ppron
4	i
5	we
...


which assigns identifiers to category labels, and a second half that looked like:

%
a	1	10
abandon*	125	127	130	131	137
abdomen*	146	147
abilit*	355
...

in which words or wildcarded patterns were assigned to categories via their identifiers. How to get it into Yoshikoder-readable format?
Continue reading “A conversion to Yoshikoder format”