Thursday, January 8, 2009

The mismatch between haplogroup and language

Language is an important part of culture, identity, and everyday life, so it makes sense that we want to collapse language and culture/ethnicity. However, it is almost never the case that linguistic and genetic boundaries line up exactly, certainly not in modern times, and relatively rarely even in ancient times (as far as can be determined). Don Ringe recently did a fascinating guest post on Language Log on the linguistic diversity of aboriginal Europe. He notes specifically the quick spread of Indo-European languages, with the result that "while most Europeans’ linguistic ancestors were speakers of PIE, many or even most of their biological ancestors at the same time depth were speakers of non-IE languages already residing in Europe."

Nowhere is this more evident than the modern United States. With the exception of the 1% or so who speak an indigenous American language, virtually all of the 300 million American citizens speak English (even if it is only as a second language). Even immigrants who learn English poorly or not at all tend to have children who are at least bilingual, if not monolingual in English. This same kind of dominant language spread most likely occurred in ancient Europe, and indeed all over the world. Yet often anthropologists and occasionally linguists like to attempt to tie genetic groups to certain languages.

One famous example of this is Greenberg's three language groups in the Americas. While most linguists who study indigenous American languages allow around 70-80 language families (at the low end; many people insist on many more), Greenberg claimed on the basis of his language comparisons that there are no more than three "stocks" in the Americas: Eskimo, Na-Dene, and Amerind. While the first two are recognized families, "Amerind" lumps together the rest of the 80 or so language families spoken from Canada to Chile. Most linguists, especially historical linguists, object to Greenberg's style of classification because it relies on shallow, wide surveys of languages rather than narrow, in-depth analysis. Much of Greenberg's evidence for relatedness comes solely from the frequency of /n/ in first-person markers. Much more of his data lists cognates between words which have different numbers of morphemes, or are clearly borrowings.

Greenberg's theory is often "supported" by those who point to the three genetic groups in the Americas, which rougly correspond to Greenberg's three linguistic families. The error here is in thinking that because two peoples belong to the same genetic group, they speak the same language (or even related languages). Now, it may be the case that Chapakuran languages are related to Algic languages, but if they are, the relation is so distant that we will never find evidence of a link unless we develop time travel. Glottochronology (which relies on assumptions and rates of change rejected by all but the most die-hard language lumpers) predicts that after 15,000 years, two related languages will share about 6-7% of their vocabulary -- approximately the same as chance resemblance. Mark Rosenfelder discusses this at length in his article How likely are chance resemblances between languages?.

So that's not to say that it's incorrect to equate haplogroups and linguistic stocks, just that it's not falsifiable. It may well be that there was a single Proto-World language from which all languages are descended. However, because of the great time depth at issue, this is a matter for faith, not for science.


NW said...

Do we even have enough data to do rigorous analyses of American Indian languages? I've never looked into them, but have always gotten the impression that the data was scarce or non-existant for most languages. As a side question, how much should we care if two languages are related to one another? If we assume a proto-world language, then they ALL are related, we just don't know the branches that they took (not meant to sound flippant).

Ryan Denzer-King said...

While American languages are certainly underdocumented compared to, say, Indo-European languages, linguists have millions of pages of field notes covering every extant family and virtually every language (extant or not) at least for North America, if not Central and South. Obviously it's a question of whether we should care, but we can say that about anything. Why should we care about linguistics at all? There's no objective reason, I just happen to find it fascinating.