JCA is a suite of tools to count words, apply content analysis dictionaries, examine keywords and categories in local document context (KWIC), and a few other things.
It is available in three forms: jca, which is both a Java library and some associated with command line tools, and rjca which is an R package. Probably you want the R package version.
- Creates word frequency matrices in sparse formats (LDA-C or Matrix Market) with minimal memory usage. Supports stop word, currency, and number removal, and stemming.
- Reads in content analysis dictionaries in Yoshikoder, Lexicoder, LIWC, VBPro, and Wordstat format. Supports multi-word pattern matching.
- Creates concordances (keywords in context) for categories, dictionaries, words or phrases in text or HTML.
- Turns text documents into single files suitable for Mallet.
The rjca package drives the JCA tools and offers some convenience functions for dealing with the output. Here’s the package vignette that works through some of them.
Everything here is open source and distributed under the Gnu Public License (GPL).
If you’d like to refer to this software in written work, you can use one of these:
Lowe W. (2015) ‘rjca: An R package to drive Java Content Analysis tools’. R software version 0.2, URL https://github.com/conjugateprior/rjca
Lowe W. (2015) ‘jca: Java Content Analysis tools’. Java software version 0.2.4.1, URL https://github.com/conjugateprior/jca