Software
Let’s start with some R Packages. There are more on my GitHub pages.
twfy
twfy
is a light wrapper around
TheyWorkForYou‘s API. You can find it
here: https://conjugateprior.github.io/twfy.
cbn
cbn
contains replication materials for Caliskan, Bryson and Narayanan
(2017), adds some new bootstrap statistics, and allows you to do
similar analysis on your own word vectors. You can find it here:
https://conjugateprior.github.io/cbn
qss.student
qss.student
bundles the problem sets from Imai, K. (2016) A First Course in
Quantitative Social Science. It is designed for your students to install if you
are teaching with this book. You can find it here:
https://conjugateprior.github.io/qss.student
events
events
is for manipulating event data of the kind generated by
KEDS/Tabari. Go to the events
on CRAN.
Update 2020: These days you can do most of this just as easily with a
combination of dplyr
and tidyr
.
austin
austin
is an R package for doing things with words. Right now it
allows you to scale texts in the style of Wordscores and Wordfish. Go
to the austin homepage.
Update 2020: don’t forget to check out quanteda
,
which integrated a lot of this code.
Resha
Resha
is an R package re-implementing Harun Reşit Zafer’s stemming
tool for Turkish. He says it’s “less aggressive than Snowball”. You
can find it here: https://github.com/conjugateprior/Resha.
Java
Now for some Java applications that I haven’t worked on for a looong time. As far as I can tell they all still work though. You probably came looking for the Yoshikoder. It’s right here:
Yoshikoder
Yoshikoder is a cross-platform multilingual content analysis program. Go to the Yoshikoder homepage.
JFreq
JFreq counts words, quickly. If you have a lot of documents that need to be preprocessed and turned into a word frequency matrix in a hurry without filling up your disk, this might be the software for you. Go to the JFreq homepage.
YKConverter
The YKConverter is a utility that tries to extract the text from documents in various formats (HTML, Word, PDF, Powerpoint, Excel, encoded text) and save it as UTF-8 encoded plain text. Go to the YKConverter homepage.
Re-encoder
The Re-encoder takes a folder full of text files in one file encoding and switches them into another one. It needs a better name. Go to the Re-encoder homepage.
Python
There’s also this ancient python tutorial, slightly updated so it runs on Python 3. It’s older code, but it checks out.
Content Analysis in Python
A brief demonstration of how easy it is to do basic content analysis in python. Not really software but not really a tutorial either. Perhaps it will be useful or inspiring to someone.
Third Party Software
VBPro
VBPro is Mark Miller’s classic free content analysis software. I am simply hosting the latest version and cannot answer questions about it. Please address questions to Mark: mmarkmiller@mac.com.
Download vbpro.zip (for Windows only).