Saturday, December 17, 2011

thinking scientifically about language

I just submitted my grades for my fall LING 101 course, and I'm busy preparing for my spring LING 101 course, so the question of how to get people to think scientifically about language has been much on my mind recently. I've found that one of the most useful entry-level questions is "How is human language difference from other forms of animal communication?" The media loves animal language stories, and so the uniqueness and complexity of human language is one of the first things I cover. One of the most important things about discussing such topics is not "Is human language unique?", but "What concrete properties of human language distinguish it from animal communication?" We're doing science, and so we want to point to specific criteria to distinguish the two; we want a theory of human language that predicts specific empirical facts. This is not how the general public usually thinks about language (or about anything; critical thinking is far removed from the natural pattern of human cognition).

Another topic I've always wanted to cover in more detail is speech perception. Often when I tell non-linguists that I work on how we perceive speech sounds and assign them to various categories, I get blank stares. Certainly before I was in linguistics I gave no thought to speech perception. When I lived in Italy in elementary school it was inconceivable to me that Italian speakers couldn't understand English; my first hypothesis (admittedly quickly discarded) was that when someone spoke English they simply heard nothing. We think of language as magic: direct communication from one mind to another. It takes a bit of work to transition into the type of thinking that evaluates the creation of sound by the human vocal tract and analyses how these sounds are transmitted as vibrations through the air, and then perceived by the human auditory and perceptual apparati (yes, I know that's not the proper Latinate plural). The question of how we distinguish a bilabial nasal from an alveolar nasal is not a natural one to ask, but it's an important question for linguists.

These are some of the basic concepts I'm planning to use in my 101 class next semester. If anyone has suggestions for other concepts useful for introducing people to the scientific study of language, I'd be glad to hear about them in the comments.

Saturday, December 10, 2011

nominal tense

I read a headline the other day that gave me pause: "Cleveland to demolish serial killer's home". The reading I got initially was that someone charged with murder was living in a house, and the city of Cleveland was getting ready to tear it down, perhaps as additional punishment for the man's heinous crime. Of course, in reality the article was about the demolition of the house where the serial killer had lived and disposed of the bodies of his victims. It would be rather strange to demolish a home just because a criminal had formerly occupied it, but it makes perfect sense to demolish a home that had been used as a crime scene and tomb. I think it was "home" that threw me off -- this calls to mind homey connotations for me, rather that simply referring to an inhabited structure.

Another source of ambiguity is that English has no nominal tense. (There are numerous theoretical reasons to distinguish nominal "tense" from relative-to-utterance-time markers on verbs, but I'll stick with the term here since it's descriptively useful, especially in languages that use the same affixes on nouns and verbs.) In English, when I say "my house", it could mean a number of different things depending on context. I could say "I like my house", meaning the one I currently occupy, or I could say "My house was small", discussing the one I grew up in. To overtly signify that the house in question either no longer exists or is no longer attached to me, I could use "former". Some languages (such as Wakashan languages), on the other hand, have tense affixes that attach to nouns as well as verbs. The most natural translation in English is usually something like "my former house", with the ambiguity between whether the house is former because it no longer exists or still exists but is no longer in my possession. If we all spoke Nitinaht, maybe the headline would have been less ambiguous. Or maybe not.

Saturday, December 3, 2011

some, or all?

I ran into some difficulty during a LING 101 lecture the other day. I was talking about entailments, and focusing specifically on superset/subset relations. I started with some simple examples: "I eat bacon" entails "I eat meat", because bacon is a type of meat. I then moved on to what I considered were essentially identical statements. One of these was "John hates music" entails "John hates country music". Here I started getting blank looks. Several people didn't understand why this was the case, since John could hate some other type of music. After a second of musing, I found the problem: mass and bare plural nouns in English. If I say "John hates music", this can mean one of two things. The first is what I had in mind: that John hates all music. On this reading, "John hates music" entails "John hates country music", because country music is a subset of all music. However, there is another reading for "John hates music": that there is some type of music that John hates. On this reading "John hates music" does not entail "John hates country music", because John's hating black metal could satisfy the "some music" reading of "John hates music" without satisfying "John hates country music". General plurals (and mass nouns like "music") have a funny way of interacting with verbs in ambiguous ways, a fact that has led Mark Liberman to propose a voluntary ban on generic plurals to express statistical differences between populations.

Saturday, November 19, 2011


Since I began working in theoretical linguistics several years ago, I've been struck by a specific usage of the word "crucially". In common parlance, we typically use "crucial" to mean "absolutely necessary" or "the best course of action". We might say "It's crucial that we arrive before midnight", perhaps because the road closes at that time. But I'd say the adverbial form is less common. COCA returns 15344 hits for "crucial" and 417 for "crucially", for a ratio of 37:1 in favor of the adjective. On the other hand, "quick" returns 33060 hits, and "quickly" returns 61284, showing the adverbial form is significantly more common, with a ratio of 2:1 in favor of the adverb. In scientific parlance, on the other hand, "crucially" is typically used to indicate a piece of data that shows beyond a reasonable doubt that the argument goes through. As an abstract example, let's say we want to show that the numeral "one" is more common than the numeral "two" -- in all languages. We compare a number of languages, and all but language X show "one" with a higher frequency than "two". This goes against our argument, unless we can show a specific reason why we would expect "two" to be more frequent in language X in a way that does not predict this in the other languages. We might say "Crucially, the numeral 'two' in language X forms a part of the common idiom 'blah blah blah'". This crucial piece of data shows that language X does not form a counterexample.

I was interested in seeing if this use of crucially (or rather, the overwhelming commonality of using "crucially" when relating an argument) was specific to theoretical linguistics, or if other fields also present arguments this way. To examine this, I did a text search for "crucially" in the New England Journal of Medicine (non-social science), Natural Language & Linguistic Theory (theoretical linguistics), International Journal of American Linguistics (less theoretical linguistics), Political Behavior (non-linguistic social science), and Philosophical Issues (non-science). NEJM returned 19 articles in the past 10 years, for a rate of around 2/yr. NLLT returned 198 in the past 28 years, for a rate of around 7/yr. IJAL returned 23 for the past 18 years, for a rate of around 1/yr. PB returned 10 for 1979-2007, a rate of less than 1 every 2 years. PI returned 212 for 1991-1998, a rate of over 30/yr. My inability to verify how all of these journals and web sites conduct text searches makes it impossible to draw any conclusions from these numbers, but it does seem that compared to at least some other fields, theoretical linguistics uses the word "crucially" more often. (I'm for the most part leaving aside the issue of whether every article uses "crucially" in the sense I'm talking about; however, I did hand-check a number of the articles, which did indeed use it in the argumental sense I described above. Additionally, "crucially" is fairly rare in standard language, as evidenced by the COCA search.) The PI numbers I believe are ridiculously inflated; it looks like the results I got were for any issue of the journal that contained the word "crucially", rather than searching within the individual articles.

Saturday, November 12, 2011

English is hard

I ran across an interesting spelling pronunciation the other day (I have the sense that it was on Not Always Right, but I've been unable to find it). A woman ordering quiche asked for kwɪki rather than kiʃ. These types of spelling pronunciations are not uncommon for low-frequency words, where low-frequency varies according to dialect and context. Of course, English pronunciation rules don't get you from kiʃ to kwɪki. Since English has borrowed heavily from a number of languages (viz., French, Latin, and Greek), we have to figure out the source of a word before we can come up with a reasonable pronunciation. In the case of "quiche", we have to realize that the word originates in French, in which case we will probably know that "qui" is [ki] and that "ch" in French loans is typically a postalveolar fricative. It seems, however, that this woman thought the word was Greek, where word-final "che" is not uncommon in terms borrowed from Greek (e.g., synecdoche), and is pronounced [ki]. "Qui" as [kwɪ] is typically for English, though unusual for foreign loans. These types of errors are consequences of borrowing from so many different sources, and even more so of having a non-phonetic orthography.

Saturday, November 5, 2011

Rocket Languages

Over the summer I was asked if I would review Rocket Languages, a language learning company that sells online and physical media for language learning. I'm not typically the language course type (I prefer to buy grammars and dictionaries and dig through them with no hope of ever speaking the language conversationally), but I had a lot of fun poking around the online course I was given access to (the Premium online version of the Beginning German course). I found the site set up easy to navigate, with a simple table-of-contents style interface to choose lessons from. I've checked out sites in the past that make it virtually impossible to do one thing at a time and then come back to the content later, so this was a plus for me. As always, I wanted more overt grammatical content (one of the reasons I've never tried Rosetta Stone), but overall there was a decent balance between learning conversational phrases and looking at things like verb conjugations (biased towards the former, as with most popular language courses). There's also a handy "My Vocabulary" section where you can save words you find interesting or difficult to memorize for later reference. The feature I was able to use the least (because of my own busy schedule) is probably also the most exciting: the site has a community section where people can post about their language learning experiences. I think this is a great feature of Rocket Languages, and one I haven't yet encountered elsewhere (though surely it has been done before). The only way to learn a language is to use it, so the forum feature is in my opinion a necessary component to online course, even though many lack it.

In summary: if what you're looking for is an online language course, I can recommend Rocket Languages more than most. Note that for those who are looking for CDs, these are included at the higher levels. As with any online course, you're not going to be a fluent speaker just because you completed the course, but the forum feature goes some way toward encouraging learning to actually use the language rather than just reading about it. While this is no replacement for oral conversation, it's definitely a step up from just reading and listening on your own. For those who are turned off by the high price tag ($299.95 for physical media, $149.95 for online), they have a promotion through November 7 where you can gain access to the online version for $99.95.

Saturday, March 12, 2011

time adverbials

I don't remember how I stumbled across it, but I found the SFMTA page a couple weeks ago, and was interested in a completely ordinary construction on their home page, namely that their railway "today carries over 200 million customers per year". If we take the more narrow meaning of "today" as the day of the utterance, this sentence is nonsensical. You can't have a certain number of customers per year carried on a single day. But of course that's not how we actually interpret the sentence. Here we use "today" in a more general, perhaps not quite metaphorical sense, simply to mean the current relevant time. That could be limited to the actual day in question, but it could also be a month, a year, or a millenium -- we could also say "today the planet has significantly lower oxygen levels than during the Cretaceous", and we'd be talking about the current geological era. This type of coercion is extremely common in English, though it isn't possible in every language. We're quite willing to reinterpret the semantics of a verb or time adverbial in order to make the sentence interpretable. The predicate "reach the summit" is a classic example of what is typically called an achievement, a predicate that happens all at once. You've either reached the summit or you haven't; there's nothing in between. But we're perfectly happy to say "they reached the summit in twenty minutes", because we attach a preparatory process that consists of the stages coming immediately before actually reaching the summit (hiking, climbing, etc.).

Saturday, February 26, 2011


The suffix -kin seems to have a somewhat variable distribution across speakers. For some it appears to be a productive suffix that means "little" or "baby": if you have a wug, a wugkin would be a small or juvenile wug. For other speakers, like myself, -kin is not productive, and in fact I barely noticed the fact that it has any sort of smallish connotation until this was pointed out to me. How can this be the case for a single language like English? Because we're all exposed to differing dialects, registers, and ultimately, types and amounts of data. Someone growing up as an only child on a rural farm during the 13th century is certain to be exposed to less language that a child from a family of fourteen growing up in modern day New York. And even leaving aside such differences, we all hear slightly different corpora from slightly different speakers as we grow up. (This is, of course, the source of language change, especially when speaker communities have little interaction with each other.)

Perception also plays an important role. Words ending in -kin have a fairly fixed distribution in English. Despite what I said in the previous paragraph, I'd be surprised if words containing it varied significantly across regions, registers, or dialects (and of course this could be empirically tested). When I was acquiring English, I perceived the suffix as being rare enough that I did not generalize it to other forms, unlike very common suffixes like -er for 'one who...', e.g., farmer 'one who farms'. It also may be due to the fact that if you ask me for a word ending in -kin, the first one I go to is "munchkin", which is certainly not "a small munch". So why is this interesting? I find it interesting because it shows that even when exposed to (essentially) the same data, speakers will or won't generalize different patterns. Patterns that are pervasive in a language (75% seems to be the magic number in Artificial Language Learning tasks) are for the most part generalized by all speakers: everyone who speakers English has the suffix -er and is willing to use it on novel forms. Patterns that derive from a previously productive affix but are now opaque are almost never generalized: the prefix "with-" as in "withstand" indicating "against" is, I would wager, never used with novel forms. The interesting patterns are those that occur in a small but semi-regular subdomain of English, as with the Germanic "strong verb" pattern of strike/struck generalizing to sneak/snuck, or the case of -kin. (Firefox won't recognize "snuck", but it gets millions of ghits, including an entry on Speakers have to decide when confronted with such data whether these cases are simply a handful of random exceptions, or if there is some regular pattern that applies to only a small subset of lexemes.

Saturday, February 12, 2011

out of proportion to

I was reading something recently and the expression "out of proportion to" caught my eye. Somehow the preposition "to" seemed odd, but at the time I couldn't figure out what I would rather use. Since then I've decided that I probably use "out of proportion with", though I'm much more likely to rephrase the entire sentence so that I can use "disproportionate(ly)". So I thought I'd do a little searching and see which is more common. Sure enough, the "to" version is significantly more common, with 8.7M ghits versus only 1.5M for "with". Those seem to be the only prepositions possible, both from my own intuitions and looking around on the interwebs. COCA gets 161 hits for "to" versus a mere 21 for "with". I did turn up one more preposition that I hadn't thought of, and didn't find from random google searches: "out of proportion from". COCA gives two hits ("Pain out of proportion from injury" and "privileges and a scale of living that were not only far out of proportion from what we had experienced back in the United States"), and google gives ~95k hits, only about 30% of which seem to be genuine "out of proportion P" constructions, so this usage seems to be rather rare. Not sure why I felt "to" was odd, and honestly I'm not even sure that if I used the construction I wouldn't use "to", but I find this sort of variation in prepositional choices interesting.

Saturday, January 29, 2011

an eggcorn and a spelling pronunciation

In this post I just wanted to quickly document two items I came across recently.

The first is the substitution of "upmost" for "utmost". This fits the classic definition of an eggcorn: mistaking a particular turn of phrase for a phonologically similar word or phrase that makes more intuitive sense. When we talk about something "of the utmost importance", we mean something of the highest import, something that should be at the top of our list. Thus it makes perfect sense that some people would reanalyze "utmost" as "upmost", especially given that the stops are in coda position next to a bilabial /m/, making the phonetic distinction between the two probably very slim. This substitution seems to be fairly common; I got almost 5M ghits, and the top one was an article called "Don't Confuse 'Utmost' with 'Upmost'", hosted on a site related to grammar tips. COCA only returns 8 results, not all relevant, but given that "upmost" is most likely to occur in speech, and transcribers may simply hear "utmost" since that is the standard, most likely there would be significantly more results.

The spelling pronunciation I came across recently is "half to". While not an eggcorn ("half to" makes no more intuitive sense than "have to", in fact I'd say it makes less sense), I still find this interesting. Most likely the writer here is thinking of the fact that the word 'have' contains a /v/, and since the /v/ in "have to" is devoiced (obligatorily, at least for me), "half to" more accurately represents the phrase phonetically. Voicing the /v/ in "have to" sounds quite archaic to me, and primes constructions like "I still have homework to do" much more than the relevant meaning "I am required to X". Unfortunately constructions like "one and a half to two" and "half to death" make it almost impossible to turn up genuine results of this online. A similar situation obtains with "supposed to": if you tell me you're "suppo[zd] to do" something, my first thought is that someone's making a supposition about you, rather than giving you a requirement. The devoicing here is so necessary in my idiolect that voicing the final cluster sounds like hypercorrection to me. The spelling "suppose to" again seems very common: almost 7M ghits, with several grammar sites warning against this "mistake". COCA actually turns up some instances that seem to be genuine as well. This type of phonological reduction is common with set phrases, and I'm guessing is assimilation in voicing to the following /t/.

Saturday, January 15, 2011

pragmatic ambiguity?

Last weekend was the annual LSA meeting, and so I drove to Pittsburgh, PA to spend a few days carousing with linguists. On the PA Turnpike there are a number of tunnels through the mountains in central PA. Naturally, you should have your headlights on when driving through these tunnels (though they are somewhat lit). Signs just before the tunnels instruct you to do so: "Turn on headlights". However, I was more puzzled by the signs after such tunnels: "Headlights on?" I knew how to answer the question: "Yes." But why was it being asked? Clearly the designers thought it was an obvious question to ask, but I was more confused. Were they making sure I still had my headlights on, because I was going to be going through another tunnel soon? This doesn't seem right, because I'm pretty sure the signs appeared after every tunnel, including the last one coming through the mountains. But in that case it would seem they're asking to make sure I remember to turn them off. This seems odd because daytime running lights are a common safety feature on newer vehicles, and in fact some areas of the country require you to drive with your lights on all the time, since it increases the visibility of your car. So I'd be surprised if they were reminding me to turn off my lights. However, I can't really think of any other options. It seems insane to say that they're just calling my attention to the state of my lights so that I can adjust them as I see fit. What else is there?

Saturday, January 1, 2011

heritage languages

Many Americans don't give a thought to what their heritage language is. No doubt this is partly because we are titans of assimilation and monolingualism (and I don't mean that entirely as a bad thing -- it's no doubt the reason why it's not ridiculous to speak of "Americans" in a country with dozens of ethnic groups spanning almost 4 million square miles). I'm only the third generation born in the U.S. on my maternal side, and I know exactly three phrases in German (excluding things I learned outside my family), and one of them is "Gesundheit". I think another reason we don't think about heritage languages is that we have so many of them. I don't know if I know a single second-generation (or greater) American who has ancestors from a single ethnic group. Depending on which line I'm tracing, my "heritage language" might be German or Scottish or Irish (or maybe even Italian or Hungarian -- old census and shipping records aren't the clearest).

In another sense, we all speaker our "heritage language", if we define heritage language to refer to the language of the culture we identify with. While the U.S. has no official language, English has become the de facto lingua franca, with 82% speaking it natively and up to 96% fluently. Insofar as English is the national language of the U.S., and insofar as I consider myself an American, English is in some sense my heritage language. But I think for many people that use the word "heritage", it means a lot more than this. It's not just about genes or cultural affiliation, but about self-identification. If a German child is adopted by Lakota parents and grows up speaking Lakota without knowing a word of German, is Lakota his heritage language any less than his parents? I don't doubt that there are many who would say "yes", but it's a touchy subject trying to foist heritage languages on others.

It's not by accident that I chose Lakota in the above example. The concept of a heritage language is of utmost importance for people who are losing their language. For native peoples of the Americas (I'm choosing North America because that's where I live and those are the languages I've studied the most), language is much more bound up in culture than for Europeans. European traditions, religions, and politics have been translated and adapted so many times that I have met very few people of European descent who identify strongly with the language they speak. While nuances are of course lost in translation, I would wager that few would say that the ideas in Machiavelli's The Prince would be lost if we lost the original Italian printing. On the other hand, there is a very strong feeling among American language speakers that losing their language means losing their culture, and means losing unique ways of looking at the world. (NB: there are other American language speakers who feel equally strongly for the opposite view.) Thus the concept of a heritage language is a very important one.

These are just musings. You may disagree with some or all of them. That's fine. One final thought: given that languages evolve, where does our "heritage language" begin and end? If English is John's heritage language, is Middle English? Old English? Proto-Indo-European?