ACTIV-ES: a comparable, cross-dialect corpus of ‘everyday’ Spanish from Argentina, Mexico, and Spain

The first release of the ACTIV-ES Spanish dialect corpus based on TV/film transcripts is now available here: https://github.com/francojc/activ-es

It includes 3,460,172 total tokens (Argentina: 1,103,039 Mexico: 976,192 Spain: 1,380,941) and comes in running text and word list (1:5 gram) formats. Each format has both a plain text and part-of-speech tagged version.

For more information about the development and evaluation of this resource you can download our paper at the Ninth Annual Language Resources and Evaluation Conference (LREC 2014) here: https://www.academia.edu/6962707/ACTIV-ES_a_comparable_cross-dialect_corpus_of_everyday_Spanish_from_Argentina_Mexico_and_Spain
plot_country-year-genre

Install vislcg3 tools on Mac OS X

Here are the instructions to install the vislcg3 constraint grammar on a Mac.

1. Install the Xcode developer tools (App Store)

2. Install cmake and boost. I use Homebrew, but I imagine you could use MacPorts or Fink.

3. Install ICU. This takes a few steps:
A. Download the package here: http://download.icu-project.org/files/icu4c/4.8.1/icu4c-4_8_1-src.tgz (or the latest version) and decompress it:

$ gunzip -d < icu4c-4_8_1-src.tgz | tar -xvf -

Then run:

$ cd icu/source/

It's a good idea to make sure the permissions are set so run:

$ chmod +x runConfigureICU configure install-sh

B. Now run the runConfigureICU like so:

$ ./runConfigureICU MacOSX

C. You'll then make and make install, and you should be golden:

$ make
$ sudo make install

4. Now it's time to get to vislcg3.
A. Download the files from the svn repository:

$ svn co http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3

Then move into the main directory:

$ cd vislcg3/

B. Do a checkup on the install:

$ ./cmake.sh

C. Run make and make install to finalize this thing.

$ make
$ sudo make install

D. Now check to see that it's in your path:

$ which vislcg3

And if you get a path to the binary, you're ready to go!

Setting up RStudio server with Apache2 proxy

I just set up a server instance of RStudio on our Language Lab server (running Ubuntu 11.04). I tried following the instructions here, but I was a bit confused where to add the proxy configuration. It turns out you will need to add it to your /etc/apache2/sites-enabled/000-default file. So you fire up the terminal and type …


$ sudo pico /etc/apache2/sites-enabled/000-default

Once in the file skim down to the bottom, and add …

<Proxy *>
Allow from localhost
</Proxy>

ProxyPass / http://localhost:8787/
ProxyPassReverse / http://localhost:8787/

Before the closing </VirtualHost> tag.

This will allow you to connect to your server at http://yourserver.com:80/

You can get creative and add a custom directory, so you can access the RStudio server at http://yourserver.com/rstudio. Just change the above proxy configuration with:

<Proxy *>
Allow from localhost
</Proxy>

ProxyPass /rstudio/ http://localhost:8787/
ProxyPassReverse /rstudio/ http://localhost:8787/
RedirectMatch permanent ^/rstudio$ /rstudio/

You can replace “rstudio” with whatever name you want.

Another piece of useful information. If you plan on having multiple users have access to the server at the same time, and you want them to be able to have separate sessions you will need to add them as users at the system level. The following documentation give some instructions.

Deploying RStudio Server for Classrooms / FAQ / FAQs – RStudio Support.