Blog

Unicode in R packages (not)

Perhaps you are trying to add your nice new object as data for an R package. But wait. It has [gasp] foreign letters in its dimnames, so ‘R CMD check’ will certainly complain.

What you need is something to turn R’s natural Unicode-processing goodness into a relic from the early days of computing without inadvertently aliasing any words that differ only by non-ASCII element. Here’s a handy iconv-invoking function to do that…
Continue reading Unicode in R packages (not)

A conversion to Yoshikoder format

A couple of months ago somebody asked how to convert a new dictionary file so that it would run in the Yoshikoder. The format of the original file had two parts: the first half looked like:

%								
1	funct							
2	pronoun							
3	ppron							
4	i							
5	we							
...

which assigns identifiers to category labels, and a second half that looked like:

%								
a	1	10
abandon*	125	127	130	131	137
abdomen*	146	147			
abilit*	355
...

in which words or wildcarded patterns were assigned to categories via their identifiers. How to get it into Yoshikoder-readable format?
Continue reading A conversion to Yoshikoder format