Saturday, September 18, 2010


English is mildly notorious for its non-compositional compounds. In this case I don't necessarily mean that compound words or phrases have nothing to do semantically with their components, but rather that the relation between the components is somewhat unstructured: there is no strict relation between X and Y for a compound X-Y. One relatively well-used example of this is the difference in the semantic relation between the two components in "olive oil" and "baby oil". You make olive oil by squeezing olives until the oil runs out of them. This is not how you make baby oil. In fact, the first word in "baby oil" has a completely different relation than the first word in "olive oil". In "olive oil" the first word indicates the source of the primary component, the oil. (Compounds in English and some other languages are right-headed, meaning that the component on the right gives you the category and basic sense of the compound: "olive oil" is a type of oil, not a type of olive.) In "baby oil", on the other hand, the first word tells you something about how the oil is intended to be used. You can see the same difference in "spring water" and "holy water". Holy water may in fact be spring water (I'm not sure if churches typically use bottled water, tap water, or some specially sourced water for this), but "holy water" indicates something different because it indicates what the water is going to be used for, rather than where it came from.

One thing I hadn't thought much about until recently is that it's not just N+N compounds that behave in this peculiar way. For instance, there's good reason to be afraid of baseball-sized hail, but no real reason to fear a family-sized bag of candy. Like the N+N example above, these types of adjectival compounds can refer to completely different types of relations. Hail that is baseball-sized is the size of a baseball, but a bag of candy that is family-size is not the size of a family; rather, it's a bag that is a size appropriate for a family. English is not the only language that has these types of unpredicatable compounds. Blackfoot also has some unpredictable compounds. One of my favorite of these is the word for horse, ponokáómitaa literally means elk-dog, where ponoka is 'elk' and ómitaa is a bound form of the root for 'dog'. Presumably this stems from the association of horses, when they were first encountered a few hundred years ago, with the general ungulate form and size of an elk, and with the beast-of-burden function of a dog, which the Blackfeet used to carry travois and other equipment.


GamesWithWords said...

Interestingly, you produce novel compounds that are nonetheless correctly understood. There's been a lot of interest among psycholinguists over the years in this topic. I believe Lila Gleitman did some of the early work. There was just a paper in the last issue (I believe) of Cognitive Science looking at the role of prosody in determining the interpretation of the compound (short story is there appears to be a role in some cases, but there's a lot of variance left to be explained).

Anonymous said...

I love the thing about the horse.

'Holy water' isn't a fair comparison because it's adjective+noun, not compound noun. Also, I wouldn't define it with reference to purpose, but rather as 'water that has undergone the process of being made holy'.

In a sense, you can think of spring water as water that has undergone the process of being in a spring, but I would be the first to agree that that is a very tenuous sense indeed.

Unknown said...

The so-called non-compositional compounds is likely to be a universal phenomena in many languages. I have no idea if in German or Russian those synthetic languages it is the same case. But in Chinese, yes! It's hard to predicate the relation between two components in a word. Here is an example:
pi2xie2 - leather shoes
pao3xie2 - track shoes
nan2xie2 - shoes for male
liang2xie2 - sandal
(Numbers here stand for tones within a syllable)

Each word has two morphemes, respectively means:
xie2: shoes
pi2: leather
pao3: run
nan2: male
liang2: cool or cold

Obviously, semantically the structures are different. In "pi2xie2", the first morpheme (or we may say "zi" as a single character) indicates the material the shoes are made of. In "pao3xie2", "pao" indicates the specific way of doing sports. In "nan2xie2", "nan2" is the gender of a person who'll wear the shoes. While in "liang2xie2", "liang2" simply tells you that in summer when you put this kind of shoes on, you'll feel not so hot.

There is a lot to say on this kind of compound words. You can say both nan2xie(shoes for male) and nv3xie2(shoes for female). You can say pi2xie2(leather shoes), bu4xie2(cloth shoes) and even mu4xie(wood shoes). You can say pao3xie2(running shoes), but you can't say tiao4xie2(jumping shoes) or ti1xie2(kicking shoes). You can say liang2xie2(sandal) but not nuan3xie2(warm shoes).

It's quite interesting if you have enough time to discover more.