The number of languages spoken around the world is declining rapidly. Researchers say about half of the world's 6,000 languages will be extinct by the end of the century.
While linguists have sounded an alarm, relatively few research dollars have been directed to reversing this global trend. One effort is a collaboration between computer scientists in the United States and indigenous peoples in Chile, to ensure a future for languages at risk.
Mapudungun, is a language spoken by several hundred thousand Mapuche Indians in Chile. The language is not in any immediate danger of extinction, but the Mapuche fear that over time Spanish, the national language, will take over and Mapudungun will fade away.
To prevent that, computer scientists from Carnegie Mellon University Language Technologies Institute in Pittsburgh, Pennsylvania are developing new machine translation tools with the Mapuche. Institute director Jamie Carbonell says the initiative, backed by a $5 million grant from the National Science Foundation, is important both for cultural and scientific reasons. "The minority language or the home language encodes much of the culture. So, you do not want your cultural belief, your cultural tradition [or] your heritage to die with the language. And it often does. If a language disappears the culture that is promulgated [through] that language usually disappears as well. And so we want to preserve it for cultural reasons.
"We also want to preserve linguistic diversity in order to study the range of language that the human mind is capable of producing," he says. "As a scientific study if the number of things that we can study is reduced by a half or a quarter then there is much less than we can conclude about how the human mind works, about how languages originate, about how they evolve."
The first task, he says, is to understand how the minority language works. "Its words, its dictionary, its grammar rules or its syntax and structure and the way it is used, the situation of that language within the culture to convey ideas, to convey traditions and so forth. That codification of the language if you do it electronically it can also be used for education in that language so for children who learn that language at best only in their house and don't know how to write it."
Mr. Carbonell says hundreds of hours of Mapudungun have been recorded and transcribed.
Carbonell: "We have been building a lexicon which is the first step in building a full dictionary. We are starting to build a grammar of that language leading to translation between Spanish and Mapudungun and Mapudungun and Spanish. That's the next step."
Skirble: "This is very labor intensive."
Carbonell: "Unfortunately, yes."
Indigenous speakers fluent in Mapudungun and Spanish meticulously enter words and their translations into a computer program. Later, when the user types in a sentence, two languages appear on the screen with lines connecting the translated words. Jaime Carbonell says these linguistic maps help researchers understand the basic grammar and syntax of the minority language. "It is clear that all languages use the concept of words. It is clear that all languages use the principle of what's called head, meaning that a phrase or sentence or clause typically has one word or concept that dominates the other. For example, in a noun phrase the main noun is the head. In a clause the main verb is a head.
"The existence of those principles is useful because it means we save work, but almost all the work goes into not the few common points, but into the much richer diversity of points that are not in common."
Jaime Carbonell says the hard work begins at the point where words or phrases between the languages don't match.
Carbonell: "And when the mapping fails, it's very interesting. It is usually because the language has some unique feature that the minority language or new language does not have. Usually field linguists have already discovered [that feature], but the [field linguists] have provided one example. And we want a large number of examples to find out in general how to handle this unique feature. One example is that some languages have singular, dual and plural. They use a different word when there are two or many of something. How do you map that into [the] English language that has only singular or plural or how do you map the other way around that divide the world into two or three. Those are some of the interesting challenges that you have."
Skirble: "How good can a machine translation be?"
Carbonell: "The simple answer is that it can be good enough to be understood in general. Machine translation today is not as good as expert human translation. There is machine translation technology in a narrow domain that if you are translating only text about a subject matter you know ahead of time, and you put in all the possible structures that occur in that subject matter, then the translations within that domain can be as good as the best human translators. But that's only because a huge amount of work has gone into perfecting that narrow subject matter. Across the board we can not get that high quality translation. If something is crucially important than you have a human bilingual touch up the machine translation."
Skirble:: "And, this gives a certain dignity to the language [at risk]."
Carbonell: "Absolutely. Imagine having spoken a language that the only thing that are transmitted are inside the family and now having a medical text appears in that language or web pages appearing in that language or even something as mundane as a manual for how to run your automobile appearing in that language. It is not just a matter of dignity, but a matter of saying my language is good enough for everything. Whereas before the implicit message is that it is good enough for things totally unrelated from the larger society."
Computer scientists at Carnegie Mellon University are also working with indigenous peoples in Alaska, Peru and Columbia to preserve native languages in those countries. Jaime Carbonell says his institute trains native peoples to collect and enter the data, which, he says, both reduces the cost of software development, and empowers the indigenous community to direct how the new computer translation tools will be used.