Saturday, November 17, 2007


After reading a Language Log post today on Low Attachment, I fortuitously heard this warning on a Celebrex commercial:

"People taking other NSAID's or the elderly should consult their doctor."

What, may I ask, are the elderly prescribed for? The correct parsing is, of course, [[people taking other NSAID's] or [the elderly]], but our syntax really wants to interpret this as people taking [[other NSAID's] or [the elderly]], however much our knowledge of semantics forbids this interpretation. I now know, thanks to Arnold Zwicky, that this is because of our attachment (if you will) to Low Attachment. That is, we want to attach that second constituent to the closest phrase-level category. In this example, that means interpreting "the elderly" as a second object of the verb "taking," as opposed to interpreting it as a second subject of the VP "should consult."

Monday, November 12, 2007

epenthetic consonants

An epenthetic sound is one that has no (historical) phonemic basis, and usually no orthographic basis, but is pronounced anyway. Usually it is something we pronounce without meaning to, as a way of easing from one sound to another more fluidly, for instance, saying "for instants." Check your pronunciation; this is certainly how I pronounce it, but the other day I actually say it written that way. The t-insertion is a natural result of trying to go from the voiced alveolar nasal stop /n/ to the voiceless alveolar fricative /s/. The /t/ is somewhere in between -- it retains the stop manner of articulation from the /n/, but acquires the voiceless and oral features of the /s/.

"Pumpkin" is another example of epenthesis. The "p" does not exist historically. Underlying the word is "pumkin," but the "p" jumps in there, just as the t did, keeping the stop articulation of the m and the voiceless, oral qualities of the k. There's something about those (nasal stop)(voiceless non-nasal sound) clusters that's just hard to pronounce. When I see someone write out "for instants," I immediately make a value judgment about the person's intellect, but I don't do the same with "pumpkin" because "pumkin" isn't a word at all. Maybe in a few hundred years "for instance" will have completely transitioned to "for instants."

The epenthetic /p/ in pumpkin also explains the alternative pronunciation "punkin." If the underlying form is "pumkin," the speaker has two options, since the (nasal)(oral) transition is so difficult: one is to insert the epenthetic /p/, the other is to assimilate the nasal consonant to the position of the following stop, hence "punkin," where the n represents a velar nasal. I find it interesting that we have no qualms about pronouncing /n/ as alveolar or velar, yet /m/ we want to pronounce bilabial to the extent that we'll insert epenthetic consonants in "pumpkin" and "hamster."

Thursday, November 8, 2007

in the light of

As someone who has used the phrase "in light of" his whole life, I was a little surprised to come across the variant "in the light of" the other day. I think it was in one of my students' papers, so I didn't really give it much thought, just corrected it and moved on. However, I found another instance of it today in Geoffrey Poole's Syntactic Theory. He, too, uses "in the light of," and I reasonably sure he's a native speaker (I would imagine writing a book on syntax in English would be rather difficult otherwise). So obviously it's not just a mistake; people say this.

What I'm puzzling over is whether they're way is "correct" or not. Obviously we don't talk much about what is "correct" in descriptive linguistics, so by correct I mean the original historical phrase. Where is this phantom article coming from? For me this is an idiomatic phrase. While I can understand what it means by looking at its constituents, really I don't break it down linguistically when I use it. I'm not thinking of knowledge shedding light on some topic, I'm just thinking "in light of" = "given this evidence." It could be that for some people who do a little less reanalysis, the article makes more sense, whether it was there to begin with or not. Wiktionary lists "in light of" but not "in the light of," though I can't really say that proves anything. Still, we could theoretically take that as showing the statistical preponderance of the version without the article: the fact that someone took the time to make a page for that version and not the arthrous one means its likely that more people use that anarthrous version.

Tuesday, November 6, 2007

possession in Blackfoot

I just finished a handout for a presentation I'm giving tomorrow on possession in Blackfoot, so I thought I would share some of the main points here. Blackfoot has an interesting way of marking verbs and nouns, part of which is the fact that they are marked essentially the same. The prefix nit- can indicate 1st person on a verb, or can indicate what in English would be indicated by the possessive pronoun "my." Another interesting quality of the language is that prefixes mark the person (1st, 2nd, or 3rd), while a suffix indicates the plural (with a separate suffix for each person. The 1st person prefix can range from n- to ni- to nit- to nits-, though I won't get into the variation here (and even if I did I wouldn't have the knowledge to explain all of it). 2nd person is the same, but with an initial k- instead of n-. Third person is generally marked by o-.

Then we have the plural suffixes: -(i)nan(a) for 1st exclusive, -(i)nun(a) for 1st inclusive, -oau(a) for 2nd, and -oauai (also -auai, oai, oaiau) for 3rd. The initial vowel in parentheses signifies that it is only realized after a consonant. The final vowel is parentheses signifies nothing consistent, merely that speakers often drop it. While a noun or verb can have only a prefix, it cannot have only one of these plural suffixes. Another concept that might not be immediately apparent to IE-speakers is the 1st person inclusive/exclusive distinction. Many (unrelated) Native American languages have a distinction semantically and morphologically for the difference between "we including you" and "we excluding you." "We" always conveys the speaker and various unspecified third persons, but in Native American languages there is a morphological distinction to indicate whether or not it also includes the addressee.

I won't get into too much more detail, but I will give some examples:
niksistanan - our (excl.) mother [ni-ksist-anan] = []
kiksistanun - our (incl.) mother [ki-ksist-anun] = []
kiksistoau - your (pl.) mother [ki-ksist-oau] = []
oksistoauai - their mother [o-ksist-oauai] = []

Tuesday, October 30, 2007

object-control languages

Sentences like "I want to win the game" pose(d) a unique challenge for syntacticians. The somewhat clear facts are that the sentence is composed of the subject "I," the verb "want," and the clausal complement to the verb "to win the game." The problem is that "to win the game" doesn't itself have a subject. It is quite obvious to the native speaker (and probably most non-native speakers as well) that the sentence means that I want myself to win the game, that is the subject of the subordinate C(lausal) P(hrase) is the same as the subject of the main CP. But how do we show that?

Generative syntax posits the existence of a semantically full but phonetically vacuous subject called PRO (read: "big pro"). This gives us a subject for the subordinate CP. In this case PRO is coindexed with "I," so that "I want to win the game" means something like "I want that I win the game" (and in fact this kind of relative, finite subordinate clause is exactly how you would express such a statement in most Balkan languages, e.g., Greek, Albanian, Romanian, Bulgarian, and many others). One can also have a non-indexed PRO that simply means "someone" or "something," e.g., "To break a leg is painful" would be diagrammed as "PRO to break a leg is painful," where PRO is simply unindexed and refers to some unspecified person.

The real reason I'm posting about this, though, is that I was fascinated to learn that English is one of very few languages where the object, not just the subject, can control PRO. In a sentence like "I want to win the game," PRO is subject-controlled, that is, PRO is the same as the subject of the main CP. However, we can also have sentences like "I persuaded Bill to go to college," (I persuaded Bill PRO to go to college), where PRO is object-controled, i.e., PRO is the same as the object of the main CP. I'll have to do some research and see if I can find any other languages where this is allowed.

Monday, October 29, 2007

folk etymology

I picked up a flyer today on computer loans while I was at the bank. It proudly proclaims that "MFCU can even disperse you loan funds directly to the Bookstore." What they meant, of course, is that the bank can disburse the funds to the bookstore. However, considering the much, much higher token frequency of "disperse" as compared to "disburse," it's not hard to understand how the error was made, especially when one takes into account that the unaspirated voiceless p in disperse it virtually indistinguishable from the voiced b in disburse to the average English speaker.

The title of this entry refers to the process by which words change in response to semantic opacity. In other words, a compound containing an unknown word that resembles a known word that fits the context undergoes change so that the phrase makes more sense to speakers. One of the more useful examples Wikipedia offers is "chaise lounge" from "chaise longue," the latter a French phrase which literally means "long chair." To English speakers used to lounging around on these pieces of furniture, it made more sense that this would be called a "chaise lounge." While disperse and disburse have fairly different meanings when taken very narrowly, they are not so different as to prevent some sort of folk etymology from working its magic here. After all, disperse has the sense of giving out or releasing something, and disburse is the act of giving out or releasing money for a specific purpose. That combined with the infrequency of the verb "disburse" in everyday speech will most likely end in tragedy for the latter verb.

Sunday, October 28, 2007

morphological reanalysis

Morphological reanalysis is the treatment of some set phrase as a single morpheme for purposes of stress assignment, pluralization, etc. For instance, "passer by" is semantically transparent, and theoretically composed of two morphemes. A passer by is one who passes by. Thus the plural would be "passers by," because you have two people passing by. However, morphological reanalysis would treat "passer by" as a single morpheme (which is why you might see it as "passer-by" or "passerby"), and this could lead to the plural "passer-bys" or some such. I would consider that "incorrect" by my prescriptive rules of grammar, but linguistically it's a perfectly natural phenomenon.

I do a lot of morphological reanalysis concerning A(djective) N(oun) or N N phrases. For instance, I pronounce "honey mustard" with the accent on the first word, because I treat the entire thing as one item, as opposed to "honey MUSTARD," which is how it would be said if I perceived myself to literally be talking about mustard of the honey variety. The same goes for "Six Flags," the amusement park. For me this is "SIX Flags," whereas counting the number of banners on some medieval castle would probably yield "SIX FLAGS" or "six FLAGS" with stress most likely on both words, but if I had to pick one it would probably be the second.

Some people are extremely reticent to reanalyze something morphologically (or morphologically reanalyze something if we want to reanalyze that phrase morphologically), whereas some do it on every set phrase. I tend to be in the former category; there are only a few set AN or NN phrases that I pronounce as if I'm merely describing the head noun, rather than treating the entire phrase as a single contituent for stress purposes. This often leads to clashes with other people who do so on a much more limited scale. The difference in pronunciation obviously sounds odd to both parties, and both think they are in the right. So, what phrases do you morphologically reanalyze, and which do you not?

Saturday, October 27, 2007

hot-buttered popcorn

At the movies today I was struck by an ad for "hot-buttered popcorn" in the lobby. Hot-buttered popcorn? As in popcorn which has been hot-buttered? While I would never use it, I can see hyphenating two adjectives even when one isn't necessarily describing the second. For instance, if I'm looking at a brick house which has been painted yellow, I would describe it as a "yellow brick house," but I wouldn't necessarily fault someone for writing "yellow-brick house" even though to me that would be a house composed of bricks which are yellow, not just a brick house which has been painted yellow. But when you have two adjectives describing a noun, and the second is a past participle of a verb, it seems to me very odd to hyphenate them in this way. What is it to "hot-butter" something? How would you diagram a sentence like that? Would you have an adjective phrase simultaneously containing two adjectives? It would seem exceedlingly strange to treat "hot-buttered" as a single adjective, yet that seems to be the only thing the authors intended in hyphenating it, unless (perhaps more likely) they simply don't understand what hyphenating two words means. I'm guessing this is what happened, and that thus the phrase "hot-buttered" is completely meaningless, that the authors really intended "hot, buttered" and are unaware of certain conventions of punctuation.

Thursday, October 25, 2007

I believe you that it's terrible

Let me start out by saying I don't expect the title of this post to be well-formed for everyone. Feel free to comment if you view it as ill-formed. But at least for me, this is indeed well-formed, and has a specifically different meaning from "I believe that it's terrible." The verb "believe" can take several different types of complements: a simple NP such as "you," a clausal complement such as "that it's terrible," or a PP like "in you." We're only concerned with the first two here. I'd say most of the time you would pick just one of these possibilities, but at least some of the time I'll come out with something like the above "I believe you that it's terrible." Semantically it's pretty straightforward: a combination of the sentences "I believe you" and "I believe (that) it's terrible." But how would we parse this syntactically?

In today's binary branching G&B X'-theory trees, I think the idea that the V "believe" takes two complements, "you" and "that it's terrible" (and two complements of different types, mind you) would be pretty much rejected out of hand. I suppose I would probably parse it as VP --> V' --> V' CP, where that second V' is "believe you" and the CP is "that it's terrible." Still, this doesn't seem quite right because "believe you" is a constituent, as it should be, but "believe that it's terrible" is not. It's as if the V "believe" is simultaneously taking two complements, either of which can be dropped, but as far I know there's no way to represent that in X' theory.

Just to wrap up, I'll clarify what the different between "I believe that it's terrible" and "I believe you that it's terrible" is, for those who would never use this construction. The former is simply a statement of belief, without reference to the interlocutor or anyone else, whereas the latter is stressing the fact that I believe what the speaker is saying, in addition to restating whatever it is that he's saying. "I believe that he won" is merely saying that I think it is the case that he won, whereas "I believe him that he won" is expressing that he told me he won, and I believe him.

Wednesday, October 24, 2007


Lee Mickelson brought up an interesting point in his comment on yesterday's post: double contractions. He seemed to think it was strange, but I would guess that it's fairly widespread, based on my own usage and those of other commenters. I was searching for a syntactic basis for allowing or disallowing a double contraction, when I realized that of course contractions have little or nothing to do with syntax or morphology, and everything to do with phonetics and frequency of use. Joan Bybee brings up the effect of token frequency on phonological and morphological reduction (as well as many other aspects of language) in her wonderful book Phonology and Language Use. Her point is that most exceptions, oddities, contractions, etc. in language are due to an extremeley high frequency of use. An example is that we tend to reduce words like every, camera, memory, and family, but not words like mammary, artillery, or homily, even though phonologically these sets of words are very similar.

We have contractions because we get tired of saying "can not" over and over again, or perhaps more exactly because people five hundred years ago did. This phonological reduction affects words like "not," "have," "would," "will," and not many others. However, there are certain rules for contractions. For instance, we would contract "Yes they are" to "Yes they're," because the verb is stressed. Lee's suggestion was that it seemed a bit extreme that he allowed something like "couldn't've" as well-formed. At first I saw no reason why this should be strange, it's merely an extended case of reduction with two words that are often redueced, as in "didn't" or "I've." But obviously there is a limit to how many words we are willing to contract onto the modal verb, because I don't think anyone would ever say "I'dn't've done that if I were you." My first instinct would be to say that the modal needs to be the stem for the contraction, but that's clearly not the case, since we use "I'd" all the time, e.g., "I'd have gone to the store" for "I would have gone to the store."

My question is (and I don't have an answer), what are the rules for this? Do they exist at all? As hard as I looked, I couldn't find any reason to alternate between "He's not going" and "He isn't going." They have exactly the same meaning/perlocutionary force/anything you can measure. So what exactly are the rules for contractions?

Tuesday, October 23, 2007

VP Deletion

So, here's the first post. If this actually gets a lot of readers I may invest in an actual domain/web site, but as for now I thought I'd start on a free blogging site. I'd love to have other contributors, so if you're a fellow linguist let me know if you'd like to contribute. Really I'd like to expand this into a web site dedicated to teaching and learning linguistics, so if there's anything you as the reader don't understand or would like to know more about, let me know so I can post about it. Okay, on to the post.

A commercial last night caught my attention, one I had seen many times before but had never really paid attention to. It's a commercial for a birth control pill that promises shorter periods, and involves two girls texting each other. One says, "You mean I could have been at the beach?" The other replies, "You could." Not to say that I would process this as ill-formed, but it's definitely marked in my grammar. I would definitely say "you could have" in this context. The process at work here is of course VP deletion, whereby a verb phrase is deleted when it can be filled in through contextual clues, e.g., "You could (have been at the beach)."

Now, I'm not an expert on syntax, but as of now I've been taught that the modal verb occupies the I node, and each auxiliary constitutes the head of its own VP node, i.e., I' --> I (modal) VP, VP --> V (aux) VP, VP --> V' --> etc. In this case "could" would be the ultimate constituent in the I node, "have" would be the auxiliary in the V node of that first VP, and "been at the beach" would be the second VP. (In case it isn't clear from my vague notation, the first and second instances of VP are the same VP, and the third and fourth instances of VP are the same VP, i.e., the first of each pair shows how it is dominated, the second of each pair shows what it in turn dominates. If this was totally unclear in the future I can try to make illustrations for syntax trees.)

So we have two VP's, the first of which dominates the second: the first VP is "have been at the beach," while the second is just "been at the beach." Obviously it is this first, superior VP that the girl in the commercial is deleting. "You could (have been at the beach)." My personal grammar allows that as being well-formed, but would generate "You could have (been at the beach)." So it seems my grammar has a preference for deleting the lowest VP possible. This works for longer and longer VP chains as well. For instance, in response to "I must have been sleeping when you called," I would probably say "You must have been." Again the lowest VP is deleted, though I would also accept and probably say from time to time "You must have" or "You must." Though it makes sense, I had never thought about the fact that VP deletion can target any VP, which is especially striking when you have these chains of auxiliaries creating chains of VP's.

What about your grammar? Can you select any VP, or do you have to pick the highest, or lowest? For those who aren't syntax-minded, this translates to, would you say "you must," "you must have," or "you must have been" in response to the statement "I must have been sleeping"? My guess is that most if not all people would recognize all three as well-formed, but which do you prefer?