ACTIV-ES: a comparable, cross-dialect corpus of ‘everyday’ Spanish from Argentina, Mexico, and Spain

The first release of the ACTIV-ES Spanish dialect corpus based on TV/film transcripts is now available here:

It includes 3,460,172 total tokens (Argentina: 1,103,039 Mexico: 976,192 Spain: 1,380,941) and comes in running text and word list (1:5 gram) formats. Each format has both a plain text and part-of-speech tagged version.

For more information about the development and evaluation of this resource you can download our paper at the Ninth Annual Language Resources and Evaluation Conference (LREC 2014) here:

Overcoming an IMDbPY installation issue on Ubuntu 11.04

IMDbPY is a Python module to enable backend search and retrieval of information from the IMDB. To install IMDby on Ubuntu you’ll need to download the module here. Then you’ll need to extract the module and run (as root):

$ sudo python install

You may get an error complaining about a ‘gcc’ compiler, I did, even though a quick:

$ which gcc

returns a live ‘gcc’ compiler on my box. The trick I found here is to install ‘python-dev’ through your Ubuntu package manager.

$ sudo apt-get install python-dev

Then you should be able to run the earlier module installation without errors. Fire up python and check it out to make sure.

$ python
>>> import imdb

Things should be fine!