Language is an important part of culture, identity, and everyday life, so it makes sense that we want to collapse language and culture/ethnicity. However, it is almost never the case that linguistic and genetic boundaries line up exactly, certainly not in modern times, and relatively rarely even in ancient times (as far as can be determined). Don Ringe recently did a fascinating guest post on Language Log on
the linguistic diversity of aboriginal Europe. He notes specifically the quick spread of Indo-European languages, with the result that "while most Europeans’ linguistic ancestors were speakers of PIE, many or even most of their biological ancestors at the same time depth were speakers of non-IE languages already residing in Europe."
Nowhere is this more evident than the modern United States. With the exception of the 1% or so who speak an indigenous American language, virtually all of the 300 million American citizens speak English (even if it is only as a second language). Even immigrants who learn English poorly or not at all tend to have children who are at least bilingual, if not monolingual in English. This same kind of dominant language spread most likely occurred in ancient Europe, and indeed all over the world. Yet often anthropologists and occasionally linguists like to attempt to tie genetic groups to certain languages.
One famous example of this is Greenberg's three language groups in the Americas. While most linguists who study indigenous American languages allow around 70-80 language families (at the low end; many people insist on many more), Greenberg claimed on the basis of his language comparisons that there are no more than three "stocks" in the Americas: Eskimo, Na-Dene, and Amerind. While the first two are recognized families, "Amerind" lumps together the rest of the 80 or so language families spoken from Canada to Chile. Most linguists, especially historical linguists, object to Greenberg's style of classification because it relies on shallow, wide surveys of languages rather than narrow, in-depth analysis. Much of Greenberg's evidence for relatedness comes solely from the frequency of /n/ in first-person markers. Much more of his data lists cognates between words which have different numbers of morphemes, or are clearly borrowings.
Greenberg's theory is often "supported" by those who point to the three genetic groups in the Americas, which rougly correspond to Greenberg's three linguistic families. The error here is in thinking that because two peoples belong to the same genetic group, they speak the same language (or even related languages). Now, it may be the case that Chapakuran languages are related to Algic languages, but if they are, the relation is so distant that we will never find evidence of a link unless we develop time travel. Glottochronology (which relies on assumptions and rates of change rejected by all but the most die-hard language lumpers) predicts that after 15,000 years, two related languages will share about 6-7% of their vocabulary -- approximately the same as chance resemblance. Mark Rosenfelder discusses this at length in his article
How likely are chance resemblances between languages?.
So that's not to say that it's incorrect to equate haplogroups and linguistic stocks, just that it's not falsifiable. It may well be that there was a single Proto-World language from which all languages are descended. However, because of the great time depth at issue, this is a matter for faith, not for science.