ACTIV-ES: a comparable, cross-dialect corpus of ‘everyday’ Spanish from Argentina, Mexico, and Spain

The first release of the ACTIV-ES Spanish dialect corpus based on TV/film transcripts is now available here: https://github.com/francojc/activ-es

It includes 3,460,172 total tokens (Argentina: 1,103,039 Mexico: 976,192 Spain: 1,380,941) and comes in running text and word list (1:5 gram) formats. Each format has both a plain text and part-of-speech tagged version.

For more information about the development and evaluation of this resource you can download our paper at the Ninth Annual Language Resources and Evaluation Conference (LREC 2014) here: https://www.academia.edu/6962707/ACTIV-ES_a_comparable_cross-dialect_corpus_of_everyday_Spanish_from_Argentina_Mexico_and_Spain
plot_country-year-genre

A special lecture by: Dr. Adam Ussishkin at WFU, Thursday March 1st @ 4pm in Greene Hall 162

Our WFU Interdisciplinary Linguistics Minor
announces a special lecture by

Dr. Adam Ussishkin
University of Arizona

Assoc. Professor of Linguistics & Cognitive Science


Psycholinguistics of under-studied languages: the case of subliminal speech priming in Maltese


Early and automatic processing of linguistic stimuli is fairly well-studied for resource-heavy languages such as English (cf. work on visual masked priming by Forster and Davis 1984, Forster et al. 2003, among many others), whereas psycholinguistic studies on languages with few resources are much rarer. In this talk, I first describe the creation of the first online language corpus of Maltese, a Semitic languages for which few electronic resources exist. Next, I discuss the application of the corpus to a psycholinguistic question and investigate the psycholinguistic reality of the consonantal root, a building block of Semitic languages. This investigation is carried out using the relatively novel subliminal speech priming technique.

Thursday March 1st @ 4pm in Greene Hall 162

Differences among languages: True untranslatability

via Differences among languages: True untranslatability.

ROMAN JAKOBSON, a linguist, is credited with the notion that languages differ not so much in what they can express as what they must express. The common trope that language X has no word for Y is usually useless (it usually means language X uses several words instead of one for Y). But languages do differ significantly in what they force speakers to express, something Lera Boroditsky talks about often in support of the “linguistic relativity” hypothesis.

I was thinking of this today when on the subway, I saw a young man whose shoulder bag bore six red buttons, with “I am loved” written in white, identical except that each was in a different language. They look like this. (I later learned that this is an old campaign that began with the Helzberg Diamond company.)

What struck me was that three of the buttons identified him as female: soy amada (Spanish), io sono amata (Italian) and sou amada (Portuguese). In each, the past participle of “to love” (amar/amare) must agree with the loved thing, and the -a is a feminine ending. The young chap should have had soy amado etc. The poor button-makers had to pick one or the other, and chose feminine.

The German forced no such choice: a man or a woman can say Ich bin geliebt, as the young commuter’s pin did. And Russian doesn’t require it either, but the translation is menya lyubyat, “they love me”.  

And Russian (more than most languages) forces a bunch of other distinctions on English speakers. The average verb of motion requires you to express whether you’re going by vehicle or foot, one-direction or multidirectionally, and in the past tense, makes you include an ending for your own gender. So “I went” would, in one Russian word (khodila, say), express “I [a female] went [by foot] [and I came back].” If you don’t want to express all of that, tough luck. You have to. Jakobson himself was Russian. Perhaps his native language led him to the insight above; learning the English verb go might have had the Russian wondering “that’s it? By what means? There and back, or what? We would never put up with this in Russian.” 

When most people tell you some very unusual word “can’t be translated”, they usually mean words like these “Relationship words that aren’t translatable into English”: shockingly specific single words in other languages like mamihlapinatapei, which is apparently Yagan for “the wordless yet meaningful look shared by two people who desire to initiate something, but are both reluctant to start.” But of course mamihlapinatapei is translatable into English. It’s ”the wordless yet meaningful look shared by two people who desire to initiate something, but are both reluctant to start.” Needing several words for one isn’t the same as untranslatability. 

What really can’t be translated properly is “go” into Russian, or “loved” into Spanish, not because the English words are too specific but because they’re too vague. Those languages force you to say much more, meaning the poor Helzberg Diamond people can’t make a single button reading “I am loved” in Spanish for both men and women.  The traditional idea of “can’t be translated” has the facts exactly backwards. Who knew that the truly untranslatable words were those that say the least?