Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of...
Transcript of Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of...
Toward a Typology of Semantic Transparency: The Case of French Compounds
by
Yves Stephen Bourque
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Graduate Department of French University of Toronto
© Copyright by Yves Bourque 2014
ii
Toward a Typology of Semantic Transparency:
The Case of French Compounds
Yves Bourque
Doctor of Philosophy
Department of French Studies University of Toronto
2014
Abstract
This thesis proposes an extension to existing models of semantic transparency in compounding
by incorporating previously unexplored features into the concept. Although the term semantic
transparency is widely used in research on multi-word lexemes, the concept is often viewed as
simply a matter of semantic compositionality, which is to say that transparency is determined
solely by the meaning of individual constituents. While this approach offers a number of
advantages, it is nevertheless insufficient on two accounts: one, it groups together a number of
compositional compounds that are otherwise semantically distinct (e.g. ice breaker ~ ice cube ~
ice age), and two, it offers no clear means to order or rank partially compositional compounds
(e.g. firefly ~ butterfly ~ barfly). This thesis therefore argues for a more holistic approach to
semantic transparency, one that views the concept as both scalar and multi-faceted, the result of
which is a more granular model capable of further distinguishing between several different
compound types.
The typology of semantic transparency of compounds proposed in this work consists of four
basic factors supported by a dataset consisting of more than 1,000 French NN and N à N
compounds extracted from Wiktionary. The first factor, headedness, touches on features
iii
pertaining to a compound’s semantic head, for which concepts such as canonical position and
strong/weak centricity are advanced. Second, compositionality is formalized according to a four-
way configuration based on how individual constituents contribute meaning to the whole. The
third factor, semantic homogeneity, relates to the degree of shared meaning between
analogically similar compounds. Finally, the implicit semantic relations within compounds are
explored. Consequently, fifteen basic associations are proposed and evaluated using the
compound data collected from Wiktionary. Together, these four factors yield a typology
involving sixteen possible transparency profiles, each of which is ordered according to the
relative weight of its semantic features. It is believed that this typology offers a far richer
conceptual space within which to further the discussion of semantic transparency as it pertains to
complex constructions such as compounds.
iv
Acknowledgments
Although a thesis is often solitary work, it is by no means an independent one. I therefore wish
to thank the numerous people who have contributed to this work along the way.
I must first express my deepest gratitude to my advisor, Anne-Marie Brousseau, whose
mentorship has provided me with the guidance I needed to complete this project. Her
willingness to share her expertise and knowledge has contributed tremendously to my work and
I will forever be indebted to her for her unwavering support and direction over the past several
years.
I would also like to extend my thanks to my committee members, Yannick Portebois and Yves
Roberge. They have never shied from raising difficult questions or offering their pointed
criticism. Their remarks have led to many productive and fruitful discussions regarding the
direction of my work and I am grateful for their valuable input and insight. My sincerest thanks
as well to my external examinor, Jean-Pierre Koenig, for his detailed and constructive feedback.
This thesis would also not have been possible without the generous support of the French
Department, as well as the Ontario Graduate Scholarships I was fortunate to receive.
It is important to know that pursuing graduate studies means that you get to meet a lot of like-
minded people along the way. I am grateful to have met (in alphabetical order) Alisha, Maud,
Ritu, Ruth-Ellen, Simona, and Vince, with whom I have commiserated, deliberated, and
celebrated. I would also like to acknowledge Muriel and Erich for their companionship and
encouragement over these many years.
To my families, my heartfelt thanks for their steadfast support and encouragement.
Finally, Mary Jane, to whom I extend my eternal gratitude for having stuck with me all these
years. It was a long and difficult road and I couldn’t have traveled it without you.
v
Table of Contents
Acknowledgments .............................................................................................................................. iv
Table of Contents ................................................................................................................................. v
List of Tables ......................................................................................................................................... x
List of Figures .................................................................................................................................... xiii
List of Appendices ............................................................................................................................ xiv
Introduction ...................................................................................................................... 1 Chapter 1
1.1 Framework ............................................................................................................................................ 4 1.2 Object of Study ..................................................................................................................................... 5 1.3 Data: Wiktionary ................................................................................................................................. 6 1.4 Database: Polylexical.com ............................................................................................................... 7 1.5 Organization ......................................................................................................................................... 8
On Semantic Transparency ....................................................................................... 10 Chapter 2
2.1 Semantic Transparency: Preliminaries ..................................................................................... 11 2.1.1 A Brief Word on Transparency vs. Opacity ...................................................................................... 11 2.1.2 A Paucity of Description ............................................................................................................................ 12
2.2 Semantic Transparency: Experimental Studies ..................................................................... 15 2.2.1 Studies of Semantic Transparency and Compounding ................................................................ 17
2.3 Semantic Transparency: Definitions and Models .................................................................. 26 2.3.1 Some Definitions .......................................................................................................................................... 27 2.3.2 Transparency as a Continuum ............................................................................................................... 31 2.3.3 Explicit Semantic Transparency Clines .............................................................................................. 34
2.4 Semantic Transparency: A Working Definition ...................................................................... 40 2.4.1 Transparency vs. Compositionality ..................................................................................................... 40 2.4.2 Semantic Transparency Defined ........................................................................................................... 46
2.5 Summary .............................................................................................................................................. 48
French Nominal Compounds and Data Collection ............................................. 50 Chapter 3
3.1 Compounding ..................................................................................................................................... 50 3.1.1 Defining the Compound ............................................................................................................................ 51 3.1.1.1 Compounding Criteria ......................................................................................................................................... 56 3.1.1.2 Compounding in French ..................................................................................................................................... 60
vi
3.1.1.3 Which Compounds Should We Investigate? .............................................................................................. 62 3.2 Data: French Nominal Compounds ............................................................................................. 67 3.2.1 The Wiktionary Database ......................................................................................................................... 69
3.3 Selecting Compounds and Cleaning up the Data .................................................................... 71 3.3.1 Which Compounds to Include? .............................................................................................................. 72 3.3.2 Reducing the initial dataset ..................................................................................................................... 74
3.4 Labeling the Entries ......................................................................................................................... 75 3.4.1 Automatically Assigning Lexical Categories ..................................................................................... 76 3.4.2 Which Lexical Category? ........................................................................................................................... 79 3.4.3 Cleaning Up the Remaining Data ........................................................................................................... 86 3.4.4 Gender and Number ................................................................................................................................... 88
3.5 Summary .............................................................................................................................................. 90
Compound Meaning: Features and Factors ......................................................... 91 Chapter 4
4.1 Centricity ............................................................................................................................................. 91 4.1.1 Endocentric Compounds .......................................................................................................................... 93 4.1.2 Head position ................................................................................................................................................. 95 4.1.3 Coordinated Compounds .......................................................................................................................... 99 4.1.4 Exocentric Compounds .......................................................................................................................... 103 4.1.4.1 Exocentric by Trope .......................................................................................................................................... 107
4.1.5 Summary ....................................................................................................................................................... 112 4.2 Semantic Compositionality ......................................................................................................... 113 4.2.1 Definition and Approach ........................................................................................................................ 114 4.2.2 Metaphor and Metonymy ...................................................................................................................... 117 4.2.2.1 Combining Compositionality and Centricity ........................................................................................... 121
4.2.3 Summary ....................................................................................................................................................... 125 4.3 Semantic Homogeneity ................................................................................................................ 125 4.3.1 Semantic Reliability Index .................................................................................................................... 135 4.3.1.1 How Does the SRI Fit in? ................................................................................................................................. 143
4.4 Summary ........................................................................................................................................... 145
Compound Relations ................................................................................................. 147 Chapter 5
5.1 Studies on the Semantics of Compounds ............................................................................... 150 5.1.1 Early studies ................................................................................................................................................ 151 5.1.2 Recent Developments in Compound Relations ............................................................................ 160 5.1.3 Summary ....................................................................................................................................................... 167
vii
5.2 Retained Semantic Relations ..................................................................................................... 168 5.2.1 Interpreting Compounds ....................................................................................................................... 171 5.2.2 Presentation Format ................................................................................................................................ 177 5.2.2.1 Hypernymy ............................................................................................................................................................ 179 5.2.2.2 Coordination ......................................................................................................................................................... 180 5.2.2.3 Similarity ................................................................................................................................................................ 185 5.2.2.4 Function .................................................................................................................................................................. 188 5.2.2.5 Possession .............................................................................................................................................................. 190 5.2.2.6 Part ........................................................................................................................................................................... 194 5.2.2.7 Location .................................................................................................................................................................. 198 5.2.2.8 Composition .......................................................................................................................................................... 200 5.2.2.9 Source ...................................................................................................................................................................... 202 5.2.2.10 Cause and Production .................................................................................................................................... 204 5.2.2.11 Topic ...................................................................................................................................................................... 206 5.2.2.12 Time ....................................................................................................................................................................... 207 5.2.2.13 Use .......................................................................................................................................................................... 208 5.2.2.14 Purpose and Proper Function .................................................................................................................... 210
5.3 Summary ........................................................................................................................................... 215
Compound Relations: Application Results ......................................................... 217 Chapter 6
6.1 NN Compounds ................................................................................................................................ 217 6.1.1 Relations ....................................................................................................................................................... 217 6.1.2 Residual NN Compounds ....................................................................................................................... 224 6.1.2.1 Idiosyncratic and Partially Unrelated Compounds .............................................................................. 225 6.1.2.2 Nouns and Adjectives ....................................................................................................................................... 227 6.1.2.3 Classificatory Relation ...................................................................................................................................... 229 6.1.2.4 NN Compounds Involving Nominalizations ............................................................................................ 229
6.1.3 NN Compounds: Conclusion ................................................................................................................. 232 6.2 N à N Compounds ........................................................................................................................... 233 6.2.1 The Preposition À ..................................................................................................................................... 233 6.2.2 Results for N à N Compounds .............................................................................................................. 241 6.2.3 N à N: Residual Data ................................................................................................................................ 249 6.2.3.1 Idiosyncratic and Semantically Unrelated N à N .................................................................................. 249 6.2.3.2 N à N Compounds Involving Nominalizations ....................................................................................... 251
6.3 Summary ........................................................................................................................................... 252
Putting It All Together .............................................................................................. 254 Chapter 7
viii
7.1 Semantic Transparency: A Definition Revisited .................................................................. 254 7.2 Semantic Transparency: A First Pass ...................................................................................... 257 7.2.1 Primary Factors ......................................................................................................................................... 257 7.2.2 Semantic Relations ................................................................................................................................... 261 7.2.2.1 Relation Types ..................................................................................................................................................... 261 7.2.2.2 Ordering Subordinate Relations .................................................................................................................. 266 7.2.2.3 Source of the Relation ....................................................................................................................................... 269 7.2.2.4 Reversibility .......................................................................................................................................................... 271 7.2.2.5 Frequency .............................................................................................................................................................. 272
7.2.3 Summary ....................................................................................................................................................... 274 7.3 The Semantic Transparency of French Compounds .......................................................... 276 7.3.1 Canonical Endocentric Compounds .................................................................................................. 276 7.3.1.1 Strongly Endocentric, Fully Compositional: passage piétons and boîte à outils ...................... 276 7.3.1.2 Strongly Endocentric, Weakly Compositional: mot-‐clé and piano à queue ............................... 280 7.3.1.3 Strongly Endocentric, Partially Compositional: bateau-‐mouche ................................................... 280 7.3.1.4 Weakly Endocentric: valse-‐hésitation and poire à poudre ................................................................ 281
7.3.2 Non-‐Canonical Endocentric Compounds ........................................................................................ 282 7.3.2.1 Strongly Endocentric, Fully Compositional: bracelet-‐montre ......................................................... 283 7.3.2.2 Strongly Endocentric, Weakly Compositional: reine-‐marguerite .................................................. 284 7.3.2.3 Strongly Endocentric, Partially Compositional: aube-‐vigne ............................................................. 284 7.3.2.4 Weakly Endocentric: vidéo-‐lynchage ......................................................................................................... 285
7.3.3 Exocentric Compounds .......................................................................................................................... 285 7.3.3.1 Fully Compositional Exocentric: ballon-‐panier and pied à boule ................................................... 286 7.3.3.2 Weakly Compositional: radio-‐trottoir and cage à écureuil ............................................................... 288 7.3.3.3 Partially Compositional: chat-‐château and soda à pâte ..................................................................... 290 7.3.3.4 Non-‐Compositional: cap-‐mouton and sagne à tamis ........................................................................... 290
7.4 Summary ........................................................................................................................................... 291
Conclusion ..................................................................................................................... 294 Chapter 8
8.1 Contributions of the Thesis ........................................................................................................ 294 8.2 Remarks on the Wiktionary Data ............................................................................................. 300 8.3 Polylexical.com ............................................................................................................................... 300 8.4 Future Perspectives ...................................................................................................................... 302 8.4.1 Sense Extension ......................................................................................................................................... 303 8.4.2 Conceptual Classes ................................................................................................................................... 304 8.4.3 Frequency ..................................................................................................................................................... 305
ix
8.4.4 Family Size ................................................................................................................................................... 307 8.4.5 Testing the Typology and Closing Remarks .................................................................................. 308
References ......................................................................................................................................... 310
Appendices ........................................................................................................................................ 329
x
List of Tables
Table 2.1. Dressler’s hierarchy of morphotactic transparency (Dressler 1985: 330-331) ............ 35
Table 3.1. Gross’s (1988) typology of French nominal compounds. ........................................... 61
Table 3.2. Major classes from Mathieu-Colas’s typology retained for the present study. ........... 73
Table 3.3. Fradin’s (2009: 420) categorial distribution for French compounds. .......................... 78
Table 4.1. Types of exocentric compounds in Bauer (2010) ...................................................... 106
Table 4.2. Possible combinations of compound features. ........................................................... 122
Table 4.3. Number of templates and tokens found in the data. .................................................. 136
Table 4.4. List of papier-N compounds taken from LPR2010 under the entry for papier. ........ 138
Table 4.5. List of pompe à N compounds taken from LPR2010 under the entry for pompe. ..... 138
Table 4.6. NN compounds with average template SRI based on the left-most constituent. ....... 140
Table 4.7. NN compounds with average template SRI based on the right-most constituent ..... 140
Table 4.8. N à N compounds with average template SRI based on the left-most constituent .... 142
Table 4.9. N à N compounds with average template SRI based on the right-most constituent. 142
Table 5.1. Lees’s (1960) grammatical relations of nominal compounds. ................................... 153
Table 5.2. Adams’s (1973) compound classes. ........................................................................... 155
Table 5.3. Downing’s (1977) minimal compound relationships. ............................................... 156
Table 5.4. Warren’s (1978) semantic classes. ............................................................................. 157
Table 5.5. Levi’s (1978) Recoverably Deletable Predicates. ...................................................... 159
Table 5.6. Lauer’s (1995) preposition based treatment of compounds. ...................................... 161
xi
Table 5.7. Vanderwende’s (1994) classification schema of noun sequences. ............................ 162
Table 5.8. Jackendoff’s (2010) 14 Basic Functions. ................................................................... 164
Table 5.9. Arnaud’s (2003) high-level relations for French NN compounds. ............................ 166
Table 5.10. Summary of the number of semantic relations present in the literature. ................. 167
Table 5.11. Logico-semantic relations retained in this work. ..................................................... 170
Table 5.12. A comparison of compound interpretations across five different works. ................ 174
Table 5.13. Bauer’s five main types of coordinated compounds. ............................................... 182
Table 5.14. Compounds listed as dd or “N2 has N1” in Arnaud (2003). ................................... 191
Table 5.15. The SOURCE, PRODUCTION and CAUSE relations compared. ..................................... 203
Table 5.16. A comparison of PURPOSE and USE. ......................................................................... 211
Table 5.17. Summary of the relations retained in the present work. .......................................... 215
Table 6.1. Results of relations analysis for NN compounds. ...................................................... 218
Table 6.2. Results of judgement tests in Arnaud (2003). ............................................................ 219
Table 6.3. Summary of Knittel (2010) relations for the preposition à. ...................................... 240
Table 6.4. Results of Compound Relations for N à N compounds. ............................................ 241
Table 6.5. Relative frequency of relations across compound types. ........................................... 253
Table 7.1. Distribution of features for the French compounds collected from Wiktionary. ....... 260
Table 7.2. Number of compounds in the data for each subordinate relation. ............................. 272
Table 7.3. +canon, +strong, fully compositional compounds ordered according to relations. ... 277
Table 7.4. SRI values for compounds within recurring N-X templates. ..................................... 278
xii
Table 7.5. Strongly endocentric, fully compositional NN compounds, ordered by relation. ..... 283
Table 7.6. Exocentric NN compounds involving basic semantic relations. ............................... 286
Table 7.7. Exocentric N à N compounds involving basic semantic relations. ........................... 287
Table 7.8. Summary of compounds and their features, ordered by transparency. ...................... 292
Table 8.1. Transparency configurations, from most to least transparent. ................................... 297
xiii
List of Figures
Figure 2.1. The relationship between compositionality and transparency. ................................... 45
Figure 2.2. Representation of Cruse’s (1986) continuum of semantic transparency. ................... 32
Figure 2.3. Levi’s (1978) continuum of derivational transparency as applied to compounds. ..... 36
Figure 2.4. Ambiguous pairs in Libben’s (1998) typology of semantic transparency ................. 39
Figure 2.5. A continuum of semantic transparency. ..................................................................... 47
Figure 3.1. The results screen for the compound café-filtre in Mathieu-Colas’s database. .......... 89
Figure 4.1. Distribution of compounds according to features related to the head. ..................... 112
Figure 4.2. The relationship between compositionality and transparency. ................................. 114
Figure 4.3. Possible configurations for semantic compositionality. ........................................... 116
Figure 4.4. Distribution of features for endocentric compounds. ............................................... 124
Figure 4.5. Relationship between compound features and the semantic reliability index. ......... 144
Figure 7.1. A typology of semantic transparency of compounds. .............................................. 259
Figure 7.2. Difference between coordinated and hypernymic compounds. ................................ 264
Figure 8.1. Search interface for Polylexical.com. ....................................................................... 301
xiv
List of Appendices
Appendix A. Sample SRI calculations for various N-X templates............................................ 329
Appendix B. Comparison of compound relations in the literature............................................ 331
Appendix C. Partial schema of Wiktionary’s database structure............................................... 335
Appendix D. NN French compounds retained from Wiktionary............................................... 336
Appendix E. N à N French compounds retained from Wiktionary........................................... 350
1
Chapter 1
Introduction
Compounds are, in many ways, liminal objects located somewhere between the lexicon and
syntax: they are functionally similar to words, yet they consist of words themselves. In this
regard, compounds pose several descriptive challenges, many of which involve issues touching
on a number of different domains. One such issue has to do with meaning construal. Because
compounds, like phrases, typically consist of units that, in isolation, are both meaningful and
denotational, one might expect their sense to be determined according to basic principles of
compositionality, which is to say that the meaning of the whole is a function of its parts. At first
glance, this assumption seems well-founded: speakers often have little trouble understanding
unfamiliar compounds, even when they are provided with limited contextual support.
Wisniewski and Clancy (2004), for instance, conducted a survey of more than 700 novel
combinations in several magazine and newspaper articles and found that less than 15% of these
items were preceded elsewhere in the text by both the modifier and the head (cited in Storms
and Wisniewski 2005). Furthermore, the analysis of a random sample of these compounds (296)
revealed that only 11% of them were accompanied by some sort of definition (also cited in
Storms and Wisniewski 2005). These findings suggest that authors are not typically concerned
that the use of compounds will baffle their readers, presumably because these constructions are
easy to understand. Of course, this is not to say that such novel compounds are completely
independent of context (they are, after all, used within a larger body of content), but rather that
they are naturally occurring items that typically pose few processing challenges to speakers.
If we turn our attention, however, to established compounds, we notice that many of them are
not so easily understood. Let us consider, for instance, the following compounds:
(1) a. honeybee ‘a bee that makes honey’
b. honeydew ‘a sweet, sticky substance exuded by plants and insects’
c. honeymoon ‘a romantic vacation following a wedding’
2
Assuming that a speaker is unfamiliar with all three constructions, we might say that, intuitively,
the compound in (1a) is easier to understand than those in (1b) and (1c). This observation has
led some researchers to talk about the semantic transparency of compounds, a property of multi-
word lexemes related to how clearly their meaning emerges from their constituents (Sandra
1990, Zwisterlood 1994, Libben 1998). Consequently, honeybee in (1a) is viewed as
semantically transparent, while honeydew and honeymoon in (1b) and (1c) respectively, are
considered semantically opaque. This distinction is in fact what underlies a great many works on
the semantics of compounds and largely serves as an effective means of establishing which of
these constructions are harder to understand than others: all things being equal, a semantically
transparent compound is easier to interpret than a semantically opaque one.
While the matter may seem rather unambiguous, a close examination of different, yet related
compounds raises a number of questions. First, is a binary opposition sufficient? In other words,
are compounds either transparent or opaque, or might we benefit from a more granular approach
to the concept? This question stems from the fact that many compounds involving the same
lexemes show considerable differences at the level of meaning construal. The following
compounds, all of which contain the word fly, are good examples of this variation:
(2) a. housefly ‘a fly typically found in houses’
b. firefly ‘a nocturnal beetle that emits light’
c. butterfly ‘an insect with large, colourful wings’
d. barfly ‘a person who spends much time in a bar’
e. gadfly ‘an annoying person’
Without going into too much detail, we may state that the compounds in (2) above differ in
significant ways according to how the meaning of their constituents contribute to the meaning of
the whole. If we consider housefly transparent and gadfly opaque, what should we say about
firefly, butterfly, and barfly? Are they to be treated as either of these types or are they perhaps
located somewhere in between?
A second question that we may want to ask is if transparency is strictly an intuitive concept or if
it can be formalized? In other words, can we apply the labels transparent and opaque
consistently across compounds using a strict set of criteria? If such a formalization is possible,
what factors and features should such a model include? Revisiting the compounds in (2) above,
3
we notice that in some cases meaning involves a metonymy (e.g. firefly) and in other cases a
metaphor (e.g. barfly and gadfly). We may also notice that some compounds contain an element
with no bearing on the meaning of the whole (e.g. butter in butterfly). Can these observations be
incorporated into a model of semantic transparency and can such a model be used not only to
distinguish between compounds, but also to rank them?
Third, compounds differ from phrases and many other multi-word constructions in that many of
them are semantically incomplete, so to speak. This is especially true of binomial constructions,
where constituents are connected together by some unexpressed predicate. In housefly in (2a)
and barfly in (2d), the relation might be said to be locative, while in firefly in (2b), the relation is
one of production (albeit metaphorically). To my knowledge, although these implicit relations
have been widely researched (among others, Lees 1968, Adams 1973, Warren 1978, Levi 1978),
very little has been said regarding their role in a compound’s degree of semantic transparency. If
a formal model of transparency were indeed possible, how should semantic relations factor into
it?
Given these questions and premises, the aim of this work is to explore the dual-concept of
semantic transparency and opacity as it pertains to compounds. It is the contention of this thesis
that transparency is a scalar phenomenon that may be formalized according to several different
factors. The result of this formalization is a typology capable of classifying compounds
according to their degree of semantic transparency. Such a typology could potentially inform a
number of different fields, including psycholinguistics and computational linguistics, where
issues in lexical storage and processing, as well as automatic interpretation and translation, are
closely tied to the perceived semantic transparency of complex units such as compounds.
It is worth noting that the model developed in this work builds on a rich and varied body of
research on compounding, semantic transparency, and compositionality, as well as theories in
lexical processing. Although a number of formal models of transparency have been proposed
over the years, this thesis aims to show that further developments are in fact possible and that by
extending these models, we may introduce a far richer conceptual space within which to discuss
the semantics of compounds.
4
1.1 Framework
This thesis represents work in lexical semantics: the focus throughout this work is first and
foremost on the meaning of compounds from the perspective of traditional approaches in lexical
semantics (Cruse 1986). Because compounds are in fact complex units that share a number of
features with other types of lexical units, they must also be examined within a morphological
framework. Consequently, the theoretical framework adopted in this work is largely founded on
a lexicalist approach to morphology, which is to say that word-formation is a component of the
morphological module and is governed by rules or principles not shared with those of syntax
(Chomsky 1970). Compounds are viewed here as morphological units and are treated according
to existing principles in lexicalist morphology (Allen 1978). Moreover, this framework has also
served as the foundation for a great deal of research on compounding (among others, Adams
1973, Bauer 1978, Fabb 1998, Scalise and Bisetto 2009). The choice of this particular
morphological framework, however, is not necessarily crucial to the stated goals of this project.
Given that the emphasis is primarily on the semantic features of compounds, other frameworks,
such as distributed morphology (Halle and Marantz 1994) or the minimalist program (Chomsky
1995, Hale and Keyser 2002), are also viable options for this study.
Because semantic transparency also involves a number of cognitive processes, a good deal of
the background research used to support the typology of transparency proposed in this thesis
comes from work in both cognitive linguistics and psycholinguistics. On the one hand, the
processing of compounds is intimately related to how lexical items are stored and retrieved
(Frauenfelder and Schreuder 1992, Marslen-Wilson et al. 1994, Chialant and Caramazza 1995),
and on the other, how different concepts are combined and reconciled (Murphy 1988,
Wisniewski 1997, Gagné and Spalding 2007). Moreover, compounds often involve tropes, such
as metaphor and metonymy, which make use of several well-known cognitive processes (Lakoff
and Johnson 1980). Finally, experimental studies in psycholinguistics have sought to quantify
how speakers process compounds, often based on a number of different factors (Sandra 1990,
Zwisterlood 1994, Libben et al. 2003, Frisson et al. 2008). In the course of developing a
typology of semantic transparency, the work carried out in the following chapters will touch on
these and other research endeavours involving compounds.
5
1.2 Object of Study
Although much of the work on semantic transparency conducted in this thesis is based on
research on English compounding, the typology proposed in Chapter 7 is largely founded on a
study of French nominal compounds possessing the structure NN and N à N. That said, none of
the features or factors explored are specific to French compounding, which should allow for a
typology that remains applicable to compounds in other languages.
All matters pertaining to compounding are explored in greater detail in Chapter 3, but we may
say here that the decision to focus on French compounds is based on the following facts. First,
although well-represented in the literature, French compounds have not received a great deal of
attention from the perspective of semantic transparency; while research on compound
transparency is by no means exclusive to English, the majority of the work on this topic has
focused on constructions in that language. Second, compounding in French displays behaviour
not observed in English, namely in its variable head position (cf. Scalise and Fábregas 2010).
Third, French compounds, like other Romance languages, makes extensive use of prepositions
in otherwise binomial constructions. These prepositional linking units, typically absent from
English compounds, provide additional material with which to discuss transparency effects.
It is important to note that the study of semantic transparency offered in this work is based on
the premise that the concept should be examined from the perspective of the listener, which is to
say as a factor in interpretation and not in coining or formation. This position also assumes that
transparency is first and foremost a synchronic matter. Although the compounds under
investigation here have all entered the French lexicon at different stages of the language,
transparency is taken to be relevant only at the moment of interpretation. In other words, while it
may very well be true that a particular compound coined in the 18th century is less transparent
than another coined in the latter half of the 20th century—and that it may have once been more
transparent than it is now—this fact is not what may affect its interpretation. Rather, it is the
semantic drift that occurs, regardless of its origins, that poses a problem to the listener. Speakers
are often unaware of a compound’s etymology and even when they are, it is unclear that their
interpretation of it is based on their knowledge of its origins. This is not to say that one should
ignore etymology or the original motivation for the construction, but simply that for the
6
purposes of formalizing semantic transparency synchronically, all compounds are to be treated
equally, irrespective of when they were first introduced into the language.
Another point that needs to be made is that the compounds retained for this work come from a
wide range of French varieties. This is in part related to the source used to collect the data
(described in the following section), but also in the difficulties of ascertaining just what
constitutes an entry used by one group, but not another. The focus here is simply on compounds
that clearly belong to French, which is to say that they are composed of French constituents and
that they are themselves French lexemes. It is clear, however, that one group of speakers may
prefer the use of a specific type of compound over another, which might introduce interference
at the level of interpretation if a particular compound involves uncommon features (e.g.
marginal relation, atypical preposition, etc.). We may state now, however, that this thesis, with
its focus on the interpretative aspect of compounds, explores the concept of transparency within
an “ideal speaker-listener” paradigm (Chomsky 1965). In other words, semantic transparency is
formalized with a speaker’s competence in mind, which assumes that he or she is familiar with a
compound’s constituents on the one hand and able to make use of his or her grammar to
establish meaning on the other. This approach is especially crucial when discussing
compositional compounds as these items, while potentially stored as single units, nevertheless
allow for decomposition to occur. Thus, features that focus on speaker dependent factors, such
as variational differences between items or personal idiolect, are not taken into account as the
key criterion is that individual constituents belong to the “ideal” speaker’s lexicon.
1.3 Data: Wiktionary
The source of the data used in this thesis is Wiktionary, an online, user-generated dictionary
developed by the Wikimedia Foundation. All French compounds retained in the present work,
as well as those included in the corresponding online database (see Section 1.4), were extracted
from a February 2011 snapshot of Wiktionary’s French repository. Chapter 3 provides
additional information regarding the methods employed to collect the data used in this project.
Because Wiktionary’s entries and content is entirely managed by its users, it generally lacks the
methodological rigour found in traditional lexicographic works. Some may therefore object to
the use of Wiktionary as a source of data. The decision to use this resource is in part based on its
7
openness and freely accessible database. Chapter 3 goes into more detail regarding some of the
reasons that motivated this particular choice. It is important, however, to keep the following
point in mind: the compounds under investigation remain compounds regardless of the source.
In other words, whether the compounds chou-fleur or oiseau-mouche are taken from Wiktionary
or from traditional dictionaries such as Larousse or Le Petit Robert, they remain compounds in
French and are thus perfectly valid items with which to conduct an analysis of transparency.
Moreover, because Wiktionary does not adhere to traditional lexicographic methods, it may also
contain entries that have yet to be included in other works. It therefore offers the opportunity to
include in the analysis new or uncommon compounds. Any issues that may arise because of this
particular choice of resource are in fact largely mitigated by the fact that the focus here is on
transparency from the point of view of interpretation: even if some of the compounds retained
are infrequent or marginal, they remain valid if they possess a sense, which may then be
evaluated according to the same features used for established constructions.
1.4 Database: Polylexical.com
Another, secondary objective of this thesis is to provide other researchers with the tagged and
labeled data used for this study so that it may support future work on French compounding. All
compounds under investigation are therefore labeled with several features, such as lexical
categories and headedness, and can be found in the appendices (Appendix D and Appendix E).
Due to the limited nature of this method of presentation, however, I have also made available a
more complete version of the annotated data online at www.polylexical.com. The database
hosted on this site is searchable according to most of the parameters and features described in
this work, although only the NN and N à N compounds are fully annotated. All other types (e.g.
AN, NA, VN, N Prep N, etc.) may be queried according to their parts of speech, number, and
gender. Chapter 3 goes into greater detail regarding the methods used to label the contents of the
database, while Chapter 8 offers a brief look at the search interface I created to query the data.
8
1.5 Organization
This thesis is organised in the following manner.
Chapter 2 consists of a review of the literature on semantic transparency and focuses on the role
this concept has played in both experimental and theoretical research. By looking at previous
work on transparency, we find that, although many definitions show considerable overlap, there
is no universally accepted description of the concept. Moreover, the term is often used with little
explanation or description, which leads to several questions regarding the types of constructions
under investigation or the exact nature of the concept’s application. The chapter closes with a
working definition of semantic transparency and offers a brief look at the fundamental aspects
behind the typology proposed in the closing chapters of this work.
Chapter 3 presents the methodological underpinnings of this research, which begins with a brief
overview of compounding and its role in word formation. Although the focus here is largely on
NN French compounds, other combinations are also discussed, including those consisting of
adjectives, verbs, and prepositions. The second half of the chapter offers a thorough explanation
of the methods used to collect the data that serves as the basis of this work. More than 10,000
items were extracted from Wiktionary, all of which were tagged for part of speech, number, and
gender using software developed specifically for this project. In the end, only NN and N à N
compounds were retained for the analysis, which consist of 729 and 319 items respectively.
In Chapter 4, the focus is on three features relevant to compounds, namely headedness,
compositionality, and semantic homegeneity. First, it examines compound headedness, a well-
established and widely analyzed aspect of compounding. Based on an examination of the French
data, several observations are made regarding headedness, which are then formalized for
transparency. Second, the notion of semantic compositionality is explored, which is shown to
vary according to sense extension. Third, semantic homogeneity is discussed. It is in this section
that the semantic reliability index (SRI) is proposed, a numerical value meant to represent how
closely a given compound patterns semantically with other, similar compounds. It is argued that
the SRI may be used to further distinguish between otherwise identical compounds.
Chapters 5 and 6 focus on the unexpressed relations held between a compound’s elements.
Chapter 5 opens with an in-depth look at previous research on the topic and offers a detailed
9
comparison of sixteen different works on compound relations. Based on this research, fifteen
basic relations are proposed and described in detail with support from the French compounds
retained from Wiktionary. Chapter 6 examines the result of the application of these relations,
including their frequency and distribution across types, and subsequently discusses the
compounds that elude this type of analysis.
Finally, in Chapter 7, I offer a synthesis of the features retained. Semantic transparency is
formalized as the interplay between the features discussed in chapters 4 and 5, each of which
may be weighted according to their impact on a compound’s meaningfulness out of context. It is
here that a typology of semantic transparency is proposed. The remainder of the chapter re-
examines the data in light of this typology and offers a final ranking of compounds, the result of
which is a more granular approach to the concept of semantic transparency and one that better
reflects the numerous factors involved in how a compound’s meaning is established.
The thesis concludes in Chapter 8 with a look to the future and other potential avenues worth
investigating if the typology proposed in this work is to be both improved and expanded upon.
While several other features and factors are discussed, it is clear that the next step should be to
test the typology with speakers, a measure that would highlight any flaws in the proposed
model, as well as offer insight into how it could be improved.
10
Chapter 2
On Semantic Transparency
As was briefly mentioned in the introductory chapter, the concept of semantic transparency is
discussed or alluded to in a wide range of works in semantics, morphology, and lexicology, and
in recent years, has in fact been at the centre of a number of experimental studies in
psycholinguistics. Unlike many other linguistic concepts, however, there does not exist one
standard definition of semantic transparency. In fact, while many of the definitions formulated
over the years share some degree of similarity, often overlapping in crucial ways, it is not
uncommon to come across contradictory or confusing descriptions of the phenomenon. The goal
of this chapter is to propose a working definition of the concept that will serve to better orient
the work that will follow. To this end, this chapter will endeavour to highlight the theoretical
similarities and differences found in previous works on semantic transparency, thus serving to
build a case for my own definition of the concept.
Although a great deal of the work reported on in this chapter is from the field of
psycholinguistics, it should be noted that the discussion of this research is not meant to anchor
the present thesis in that field. Rather, it is meant to show that, on the one hand, semantic
transparency is a linguistic concept with psychological corollaries and, on the other hand, the
experimental work conducted on compound processing can serve to lend support to a theoretical
model of transparency. On occasion, it will be necessary to speculate on some of the cognitive
and psychological aspects behind the model advanced in the present work, but this is done with
the understanding that these assertions are hypothetical in nature.
In Section 2.1, I first make a few points regarding the use of the terms transparency and opacity,
as well as lay out some of the chief problems related to their usage. In Section 2.2, I discuss
some of the experimental work done on compounding in which transparency has been a key
feature. Section 2.3 focuses on the various definitions and hierarchies of semantic transparency
found in the literature. This is then followed in Section 2.4 by my own proposal of a working
11
definition of semantic transparency, which will serve to frame the approaches and proposals
argued for in the following chapters.
2.1 Semantic Transparency: Preliminaries
Although semantic transparency is at the heart of a considerable number of research projects,
the term is often used casually, as if it were a concept that requires no explanation. As this
section will show, however, the lack of detail that accompanies the use of the term often raises a
number of questions as to the exact nature of the phenomenon.
2.1.1 A Brief Word on Transparency vs. Opacity
The literature is occasionally split on the terms to be used: should we talk about semantic
transparency or semantic opacity? Although Cruse (1986) uses both the term semantic
transparency (or more precisely, “semantically transparent”) and semantic opacity, he clearly
favours the latter in his discussion of the concept. He reserves semantic transparency only for
those expressions that are said to be the binary complements of the semantically opaque ones.
Similarly, Gross (1996) prefers the term opacity to transparency, though he also uses the term
transparency throughout his work on fixed expressions. Again, the terms are used as
complementary antonyms. A priori, nothing seems to hinge, however, on the selection of one
term over the other: fundamentally, it is a matter of perspective.
The fact of the matter is that both terms appear frequently across a variety of works covering a
number of different languages. One must, however, choose a label for the concept, and it would
seem that transparency is the preferred designation: the literature favours transparency over
opacity, even if both terms are in fact used interchangeably. The reader is invited to consult the
bibliography at the end of the present work to confirm that this is in fact the case. Furthermore,
the choice of transparency as the umbrella term is also justifiable based on the fact that opacity
tends toward the absolute. In other words, calling something opaque leaves little room for
degree or nuance. The term opacity should be reserved only for those lexical items or phrases
that cannot be understood without a priori knowledge of their meaning. Simplex words, for
instance, are opaque: it is the essence of Saussure’s arbitrariness of the sign. Compounds,
however, may allow for meaning to be construed, which renders them transparent, albeit not all
12
to the same degree. It is with this in mind that I therefore favour the term semantic transparency
when discussing the concept as whole, that is to say, when referring to the phenomenon as a
feature of complex expressions. The terms transparency and opacity will be treated as
complementary antonyms, such that if an expression is said to be transparent, it cannot be
opaque, and vice-versa. This does not, however, prohibit the use of the term transparency in a
graded manner, whereas two expressions may be said to exhibit differing degrees of
transparency. The same cannot be said for opacity, as the use of the word will be reserved for
absolute cases, where meaning is not apparent given the construction’s form.
2.1.2 A Paucity of Description
In their chapter on morphology, Dirven and Verspoor (2004) offer the following assertion when
comparing compounds and syntactic groups: “On the whole, compounds are like simple words,
but in spite of their idiosyncratic meaning, the meaning of a compound is to a large extent
transparent” (57). Although the word transparent also appears earlier in the book, it is not
defined until much later and even then, the explanation is perhaps too brief to provide a full
account of the matter. According to Dirven and Verspoor, for a complex word or construction to
be transparent, its parts must be “recognizable in a larger unit” (2004: 222). While this may be a
necessary condition of transparency, it cannot under most circumstances be a sufficient one. In
the compound éléphant blanc (eng. white elephant: ‘an object or scheme with little use or value’),
for instance, the constituents are quite obviously recognizable, yet one would be hard pressed to
claim that it is also transparent. The above characterization of transparency, however, is the
closest Dirven and Verspoor get to defining the concept.
Many of the earlier references to semantic transparency in the literature are made in a similar
offhand manner, that is to say without much, if any, explanation of how the term is being used
or what it means in a given context. Often, the use of the term is such that the reader is required
to simply view it as a very basic label for something that is tacitly understood or for which
meaning has long been ascribed. While there is arguably something intuitive about the use of the
term, it remains surprising that its formalization hasn’t received more attention. For instance,
Herbst (1996), working on collocations, states that “the transparency of a combination does
indeed depend on the meaning attributed to its constituents” (386), but nothing more is said
about how this dependency is evaluated. Similarly, Downing (1977), in her work on the
13
formation of novel compounds, claims that “because compounds are considerably more
transparent semantically than novel monomorphemes, compounds are ideally suited to serve as
ad-hoc names” (837). While her statement may in fact be uncontroversial, very little is said
about just what makes a compound inherently transparent.
While great strides have been made in recent years to further expand on the concept, most
evidently in research surrounding semantic transparency as a factor in language processing, it is
still surprisingly common to find the term used as a label without any clear account of its
application. Jarema et al. (1999), for instance, designate their compounds and their constituents
as either transparent or opaque, but do not explain how these labels are assigned. Nor do
Dohmes et al. (2004) offer any explanation or description of the methodology adopted for the
categorization of their stimuli, which is done using the terms semantically transparent and
semantically opaque. Kehayia et al. (1999), in their study of the processing of Greek and Polish
compounds, examine what they claim are “transparent compounds that are fully compositional
in meaning,” but provide no further commentary on how this fact was established. Of course, it
is not immediately clear that the absence of a definition is in fact a problem, given that there is
genuinely something intuitive about the labels “semantically transparent” or “semantically
opaque.” Issues may arise, however, when attempting to replicate these studies or to apply these
labels independently. How can we be sure that transparency for one author means the same
thing for another? Even when the term is defined, questions relating to its application may still
be raised. For instance, Sandra (1990), though he defines semantically transparent compounds
as those whose “meaning is related in an obvious way to their constituent meanings” (531), does
not make clear just how obvious this relationship should be. For opaque compounds, this
relationship is said to be “obscure,” but again, at what point does a constituent’s meaning go
from obvious to obscure? Sandra in fact measured semantic transparency by asking individuals
to provide paraphrases for a series of compounds and used the frequency with which their
constituents were present in the paraphrase as an indicator of transparency. He does not,
however, specify what the frequency cut-off was.
Another case of terminological ambiguity can be found in Spalding and Gagné’s (2007) study of
novel compounds, in which they claim that their two-word expressions were all transparent
(“All target combinations were novel, transparent, modifier-noun combinations” (32)), but for
which they offer no explanation as to what this transparency entails. The absence of a definition
14
of the concept is significant for two reasons. First, they use novel compounds heretofore
unknown to English speakers (e.g. chocolate book, wool basket, tire tree). These combinations
therefore do not possess definitions, at least not in the traditional sense. In other words,
transparency is evaluated according to the meaning assigned by the researchers. In the case of
novel compounds, where meaning is largely determined by the coiner, it seems far more
appropriate to talk about meaning predictability rather than transparency (i.e. how likely is it
that a chocolate book means ‘a book about chocolate’ and not ‘a book that looks like chocolate’,
cf. Štekauer 2005). The second potential issue is in fact related to the first and has to do with
how Spalding and Gagné (2007) assigned meaning to their test items. In short, meaning was
established using data collected in an earlier study (Gagné et al. 2005), in which participants
were asked to choose between two pre-established definitions for each compound instead of
providing their own. Definitions were then labeled as either dominant or subdominant based on
the participants’ preferences. But just how were these two particular definitions chosen? As
Gagné et al. (2005) state: “Two definitions were constructed for each item. These definitions
represented what we thought (based on our intuition as well as input from research assistants
and students in our laboratory) would be the two most likely interpretations for the item” (208).
While their strategy may not be entirely misguided, it does have the potential to skew
transparency effects given the limited options available to participants. For instance, the two
provided definitions for paper stand were either ‘a stand for paper’ or ‘a stand made out of
paper’. It should come as no surprise that participants were strongly drawn toward the first
definition (85% chose ‘a stand for paper’ as the dominant meaning). What if instead of ‘a stand
made out of paper’, Gagné et al. had offered participants ‘a stand for selling paper’ (cf.
newspaper stand)? Would the results have been any different? Moreover, in some cases, the
dominant-subdominant distinctions were in fact borderline. For instance, when presented with
the novel compound wool basket, only 51% of participants preferred ‘a basket for wool’ over ‘a
basket made out of wool.’ Once again, because novel compounds do not possess established
designata, interpretation will often vary between speakers. As Gagné et al.’s (2005) work shows,
some combinations may not elicit any preference or may only show a clearly dominant sense
when presented alongside a far less probable one.
To be sure, there are a number of similar studies where the terms transparency and opacity are
used alongside some form of explanation or definition (Marslen-Wilson et al. 1994, Zwitserlood
15
1994, Libben et al. 2003, Feldman et al. 2004). The descriptions offered by these researchers,
while seldom identical, do in fact show a great deal of common ground across the various
interpretations. My contention, however, is that semantic transparency, especially in the case of
compounding, remains an insufficiently developed concept. The next few sections will look
more closely at the manner in which many researchers have broached the topic and will serve to
emphasize some of the key elements at issue when discussing semantic transparency.
2.2 Semantic Transparency: Experimental Studies
A number of experimental studies conducted over the years have looked at how
morphologically complex words are both accessed and stored. Two early theories emerged from
this research: either complex words are stored as whole units and thus accessed once their word
boundaries have been identified (so-called full storage theories, cf. Rubin et al. 1979,
Butterworth 1983), or they are perceived as amalgamations of morphemes and treated as such,
which is to say that they are first decomposed into their smaller constituents before lexical
access is performed (so-called decomposition theories, cf. Taft and Forster 1975, Taft 1981). In
an attempt to reconcile these two extreme positions, a third, arguably more defensible proposal
has also been advanced, one that argues for a theory of dual access in which complex words
trigger full storage access and decomposition simultaneously. The mental lexicon would thus
consist of both complex words and bound morphemes, as well as a mechanism allowing for
some degree of interaction between the two. Consequently, a number of models were developed
to explain how and when these units are accessed. In the Augmented Addressed Morphology
model (AAM, Caramazza et al. 1988, Chialant and Caramazza 1995), for instance, only
unfamiliar words are guaranteed to be decomposed because whole word activation is taken to be
quicker when the word is known, whereas in the Morphological Race Model (MRM,
Frauenfelder and Schreuder 1992, Schreuder and Baayen 1995), the successful processing
method is based on a number of factors such as lexical frequency and affix productivity.
Although these particular models focus on affixal morphology, the fundamental issues related to
lexical access via either decomposition or full access remain relevant for compounding.
Another factor said to influence how morphologically complex words are processed is semantic
transparency. In their seminal article on access representation, Marslen-Wilson et al. (1994)
looked at whether semantic transparency had an effect on lexical decision tasks for derived
16
words. They contrasted between transparent derivations (e.g. unhappy, punishment, friendly)
and opaque ones (e.g. department, casualty, release), stating that “a morphologically complex
word is semantically transparent if its meaning is synchronically compositional” (Marslen-
Wilson et al. 1994: 5). Their stimuli were also controlled for phonological alternations (i.e.
chaste / chastity). Using a series of cross-modal priming tests in which participants first heard a
prime, then saw a related probe on screen, their results led them to conclude that opaque
complex words are stored as single entries, while transparent ones are accessed via
decomposition. This conclusion is supported by significant priming effects for semantically
related pairs such as punishment (prime) / punish (target) and little effect for unrelated pairs
such as casualty (prime) / casual (target). A number of studies have since added additional
support to a theory of lexical processing that includes semantic transparency as a factor (Roelofs
and Baayen 2002, Feldman and Pastizzo 2003, Rastle and Merkx 2011), though a number of
other factors such as family size (Schreuder and Baayen 1997, Bertram et al. 2000, Juhasz and
Berkowitz 2011), word frequency (Baayen and Lieber 1996, Hay 2003, Ford et al. 2010), and
even the type of tasks involved (Feldman et al. 2004) are also said to be at play. Researchers
have also looked at the effects of semantic transparency in idiom interpretation (Flores d’Arcais
1993, Tabossi et al. 2008, Libben and Titone 2008).
The studies conducted on derived words have all treated semantic transparency as a
phenomenon based on compositionality (this distinction will be addressed in Section 2.4.1).
Bearing in mind that affixes generally comprise of a closed class with considerable meaning
predictability at the output of word formation (Aronoff 1976), such an approach is arguably
appropriate in derivational morphology1. The same, however, cannot be said of compounds
given the fact that the meaning of the whole may be related to those of its components in a
number of different ways (cf. milkman ~ snowman ~ garbage man). Libben (2006) thus
suggests that compounds offer a unique means to further the discussion on computation vs.
storage in language processing because they involve a number of factors absent elsewhere.
1 This is not to say that affixes are without their own analytical issues. A few examples of morphological and
semantic issues related to affixes, as noted by Aronoff (1976), are homonymous affixes (e.g. baker vs. cooker) and stems for which it is difficult to assign meaning (e.g. –mit in permit, remit, submit, etc.).
17
Libben (1998) makes clear just what it is about compounds that makes them ideal candidates for
the study of lexical processing:
Compound words present a paradox for models of morphological representation and processing. On the one hand compounding is a very productive morphological process, so that in a language such as English, the probability of encountering a novel compound form (e.g., SLUSHFOAM) is very high. Because such forms are easily comprehended and because this comprehension can only be achieved through the meanings of the compound’s constituents, these forms seem to be ideal candidates for routinized morphological decomposition. On the other hand, however, compounds are perhaps the multimorphemic forms that are most sensitive to semantic drift and thus frequently show high degrees of semantic opacity. It is this opacity that would thwart a routinized morphological decomposition procedure.” (Libben 1998, 34-35)
The question is therefore whether compounds such as milkman and snowman are stored as
unique entries, and thus accessed as such, or if their meaning is computed via the entries no
doubt stored for their components (/snoʊ/+/mæn/ and /mIlk/+/mæn/ respectively). For a number
of researchers, part of the answer to this question lies in how semantically transparent or opaque
these constructions are. The following section will look at some of the studies that have focused
on semantic transparency as a key component of compound access and processing.
2.2.1 Studies of Semantic Transparency and Compounding
The interpretation of a novel compound can be said to rely extensively on a speaker’s semantic
knowledge of the compound’s constituents–in most cases, speakers seem to be aware that
compounds consist of discrete units and that meaning composition may apply. Gleitman and
Gleitman (1971) showed that when presented with somewhat novel three-word compounds,
speakers consistently exploited the meaning of their constituents in order to provide the
interviewer with a paraphrase. For instance, the median definitions provided by each of their
three groups for the compound glass-bird house were as follows: i) a house which is lived in by
glass-birds, ii) a bird-house that’s made of glass, or iii) a house made of glass for birds. While
the proposed paraphrases sometimes violate the accepted rules and principles that govern
compounds (i.e. embedded head in (ii) and (iii)2), it is clear that the participants attempted to
2 Gleitman and Gleitman (1971) state that these incorrect paraphrases are due to errors of stress. Because English
compounds are usually stressed on the left-hand constituent, participants should be able to identify the internal organisation of the complex compound. The hyphen indicates how the stimuli was presented to the participants,
18
define novel compounds by means of semantic compositionality, often bridging gaps with extra-
linguistic knowledge even if this required that concepts be applied in unintuitive ways (e.g. a
bird made out of glass). We might therefore ask ourselves if compounds are always decomposed
(i.e. are their internal constituents always available to the speaker?) and if the meaning of its
constituents is always a factor in its interpretation? These questions are in fact related to the
concept of semantic transparency, though perhaps only laterally, because a widely held position
seems to be that the semantic transparency of a morphologically complex word is based on
whether the meaning of its constituents is present in the meaning of the whole.
Ryder (1994) conducted similar research on novel two-word compounds. In the surveys Ryder
administered to participants, she asked them to provide paraphrases for a variety of novel NN
English compounds. These novel constructions were all based on existing patterns in the
language. Some of the novel compounds Ryder looked at were constructed using frequent
“core” words, that is to say words that figure prominently in a number of English compounds
(e.g. board, box, man, etc.); others reflected looser recurring associations (e.g. X + LOCATION, X
+ CONTAINER, ANIMAL + ANIMAL, etc.). Ryder discovered that when presented with highly
probable and frequent associations, speakers offered a highly homogenous set of responses. For
instance, all participants paraphrased bean-garden with some variant of ‘garden containing
beans,’ while few participants offered the same type of definition for table-field or elephant jar.
The latter compounds, also cases of the X + LOCATION pattern according to Ryder, were
paraphrased based on other relationships such as similarity (e.g. ‘field as flat as a table’; ‘jar
shaped like an elephant’). Her results suggest that speakers are keenly aware of the semantics
held between a given compound’s elements and that prior knowledge is a factor in its
interpretation. Ryder also found a correlation between how homogeneous participants’
responses were based on how semantically established the pattern was for attested compounds.
Thus, 90% of responses to novel ANIMAL + ANIMAL compounds followed the ‘animal X like
animal Y’ frame, which reflects nearly all attested compounds of this type (e.g. zebra fish,
catfish, bull moose, etc.). Furthermore, when participants were asked to paraphrase novel
that is, with internal stress on GLASS. The result, in any case, is that some participants failed to identify the head of the embedded compound.
19
compounds based on substituted core words, their responses were often related to existing
compounds. For example, most participants interpreted milk-woman as ‘a woman who delivers
milk,’ no doubt basing their interpretation on the established compound milkman. Not
surprisingly, when the meaning of the existing compound was less obvious, participants
provided far more heterogeneous paraphrases (e.g. needlelizard based on needlefish). Her
findings suggest that speakers, when faced with novel compounds, will in some cases engage in
metalinguistic operations and attempt to establish meaning using their knowledge of existing
compounds. There are, however, a few shortcomings to Ryder’s work, due in large part to her
rather ambiguous stance on transparency and headedness. According to Ryder, “most
established compounds are fairly semantically transparent” (1994: 146), a view that has her treat
a number of dissimilar compounds as analogous (e.g. fireant and firefly, on which she bases the
novel compound fire-spider). The result is that not all participants base their responses on the
same established compound when attempting to interpret a novel variant, but with no clear
reason as to why (i.e. is fireant more or less transparent than firefly?). Moreover, Ryder’s core
words—the recurring word for a given pattern—were not controlled for their position within the
compound. Thus, compounds such as fishpond and catfish are said to be based on the same core
word (i.e. FISH), yet its role within each respective compound differs significantly (i.e. a catfish
is a fish, but a fishpond is a pond). Nevertheless, Ryder’s work offers support for a view of
compound processing that involves both internal word recognition and prior compound
knowledge. Her work will therefore be discussed in greater detail in Chapter 4, some of which
will be adapted in the context of the present research.
Gleitman and Gleitman’s (1971) and Ryder’s (1994) studies were both qualitative in nature:
they solicited responses from their subjects, the results of which were then examined for
patterns and error types. Most of the research on semantic transparency or compositionality,
however, has focused on lexical decision tasks that rely heavily on the reaction times of
participants. These quantitative experiments usually involve a priming element, related in some
fashion to the target word (or non-word) that participants are asked to evaluate. Results from
these experiments are unfortunately varied. While there is evidence that semantic transparency
plays some role in compound interpretation, a number of studies have produced results that
weaken many claims of transparency effects.
20
Taft and Forster (1976), following their earlier research on the processing of derived words
(Taft and Forster 1975), conducted a study on compounds in order to determine how speakers
process compound non-words. Using a lexical decision task, they found that complex words are
accessed primarily via their initial constituents. For instance, non-word compounds that
contained a real word in initial position (e.g. footmilge) took longer to classify as non-words
than those containing a real word in final position (e.g. trowbreak). Moreover, reaction times to
non-word compounds in which both constituents were also non-words (e.g. mowdflisk) were no
faster that ones containing a real word in final position only (e.g. trowbreak). Perhaps even more
revealing, however, is that the non-word compounds that produced the slowest reaction times
were those that consisted of two real words (e.g. dustworth, taxbrief). Participants identified
these items as non-words, but did so more slowly than for compounds containing at least one
non-word. Based on these results, Taft and Forster argued for a theory of compounding in which
decomposition is not only largely obligatory and automatic, but also governed by the lexical
status of the first constituent. They do not, however, speculate as to how this approach would
apply to non-words containing real words and for which meaning is either plausible (e.g.
chaircloth) or implausible (e.g. chairbird). Monsell (1985) obtained similar results for pseudo-
compounds, but also found that either constituent could be primed, regardless of its position in
the target, lending further support to automatic decomposition during lexical processing.
Sandra (1990), looking to establish whether all compounds are indeed automatically
decomposed during lexical processing, tested semantic priming effects for opaque (e.g.
buttercup, milky way) and transparent (e.g. butter dish, campfire) Dutch compounds, as well as
for pseudo-compounds in which one constituent looks like a lexeme but isn’t (e.g. boycott; cf.
cranberry morphs in Aronoff 1976). According to Sandra, because opaque and pseudo-
compounds have more in common with simplex words than they do complex ones, it is likely
that they are stored and accessed as single units. Conversely, transparent compounds should be
more susceptible to facilitation effects from priming because both constituents are present in the
meaning of the whole. As was mentioned earlier, Sandra does not go into great detail about what
exactly a transparent or opaque compound entails, but it is understood that an opaque compound
is one where the constituents have no bearing on the meaning of the whole, whereas they both
do for transparent ones (see Section 2.1.2 for comments on how this was established). In
Sandra’s first experiment, compounds were preceded by either semantically related or unrelated
21
primes. For instance, buttercup (opaque) could be primed by bread (targeting the constituent
butter), while toadstool could be primed by table (targeting the constituent stool). The
compounds used were divided into two groups based on the targeted constituent (i.e. initial or
final). For opaque compounds, as well as pseudo-compounds, primes were only semantically
related to the target constituent in isolation. Results revealed that there was no significant
facilitation effect for either opaque or pseudo-compounds regardless of the position of the
targeted constituent or the relatedness of the prime. In other words, opaque and pseudo-
compounds showed no facilitation effects to constituent priming. Sandra therefore concluded
that these particular compounds were not subjected to decomposition during access. The results
from a similar test conducted with transparent compounds (e.g. butter dish, campfire) showed
that prime type was a significant factor in reaction times. Participants reacted faster when the
prime was semantically related to the target than when it was unrelated. In a third experiment,
only the final constituent of both transparent and opaque compounds was targeted. Sandra found
that prime type (related-unrelated) and compound type (transparent-opaque) interaction showed
only borderline significance. He therefore suggests that this weak result was due to inter-item
variability, that is to say that not all transparent compounds were equally transparent. This leads
him to mention an often neglected, yet fundamental distinction:
This might be related to a difference between the notions ‘transparency’ and ‘compositionality’. Whereas the former notion refers to the relationship between compound and constituent meanings, the latter refers to the possibility of determining the whole-word meaning from the constituent meanings. (Sandra 1990: 550)
Once again, this distinction is most likely an important one and will be discussed in greater
detail in Section 2.4.1.
Similarly to Sandra (1990), Zwitserlood (1994) also looked at priming effects for Dutch
compounds using a series of lexical decision tasks. He classified his compounds as fully
transparent, partially opaque, or fully opaque. Like Sandra, Zwitserlood assessed compound
transparency using a pre-test that asked subjects to rate the semantic relatedness of a
compound’s constituents to the meaning of the whole. Subjects used a scale of 1 (= very
unrelated) to 5 (=very related) to evaluate the items. Only those constructions that received a
mean rating above four were deemed transparent. These rated compounds served as primes (e.g.
church organ) and were followed by targets identical to one of the prime’s constituents (e.g.
22
either church or organ). Results showed that both transparent and opaque compounds primed
their constituents, suggesting that speakers do in fact access, at some level, the components of
even the most lexicalized compounds. Zwitserlood also discovered that semantically transparent
compounds primed their second constituent more than their first, a result that contrasts with Taft
and Forster’s (1976) earlier findings for non-word compounds. He suggests that headedness
might prove to be a factor in compound interpretation, given that Dutch compounds are right-
headed. Furthermore, when Zwitserlood used semantically related targets (as Sandra 1990 did),
he found significant priming effects for both constituents of transparent and partially opaque
compounds, but none for fully opaque and pseudo-compounds. His results support Sandra’s
(1990) findings for pseudo-compounds, but not for partially opaque ones. This is largely due to
a difference in nomenclature: what Sandra calls an opaque compound, Zwitserlood calls a
partially opaque compound (e.g. a compound such as jailbird is considered opaque in Sandra,
but partially opaque in Zwitserlood). If Zwitserlood’s results are correct, they suggest that
speakers do in fact view compounds in which one constituent retains its meaning differently
from those in which neither component contributes to the meaning of the whole. What
Zwitserlood failed to control for, however, is headedness for partially opaque compounds (i.e.
those for which the head is the meaningful constituent versus those for which only the modifier
retains its meaning). Other researchers have since looked to explore the effects of headedness on
compound processing.
Jarema et al. (1999), following Libben’s (1998) proposal of a typology of semantic transparency
(see Section 2.3.3 for a detailed discussion of his approach), tested participant reaction times for
French compounds in order to address a previously ignored factor in compound interpretation,
which is to say headedness. While French compounds are primarily left-headed, there are a
number of right-headed cases. This fact allows for the priming of constituents based not only on
their position serially, but also according to the position of the head. They were thus able to test
for effects of constituent transparency along a wider distribution of parameters. They tested the
following five combinations: TT, TOL, OTR, TO, OO3, where T stands for transparent, O for
3 The examples Jarema et al. (1999) give for each combination are as follows: TT (haricot vert, “green bean”), TO
(argent liquide, “cash”), OO (éléphant blanc, “white elephant”), OTL (garçon manqué “tomboy”), and OTR (grasse matinée, “sleep in”).
23
opaque, and the subscript letters indicates headedness (Left or Right). Their experiment
consisted of a lexical decision task where either of a compound’s constituents, along with
unrelated lexemes, served as primes. Jarema et al. found that both initial and final constituents
showed significant priming effects, regardless of their transparency rating, but that the effect
was more pronounced for the initial constituent. They argue that these results reflect French’s
tendency for left-headed compounds. Of the four compound types tested, only OTR differed in
reaction time: whereas all other combinations saw a lower mean reaction time when the first
constituent was primed, OTR compounds were recognized faster when their final constituent
was primed. Again, Jarema et al. take this particular result as additional evidence that
morphological headedness plays a role in compound interpretation. Interestingly enough, they
did not test any right-headed TT compounds (e.g. auto-école, radio-taxi). If priming the
transparent head does in fact improve recognition reaction times, these compounds should
pattern like their OTR stimuli. Finally, Jarema et al. also found that comparing participant
reaction times based on the transparency of constituents (i.e. TT and TO versus OO and OTL for
initial constituent priming and TT and OTL versus OO and TO for final constituent priming)
revealed no significant effects. They do not offer any explanation for the absence of effect, but
they nevertheless claim that transparency is a factor in compound processing based on results
obtained for Bulgarian constructions also included in the same paper. They found that for
Bulgarian compounds, priming the second constituent of a right-headed TO compound showed
weaker priming effects than for its first constituent, which Jarema et al. took as evidence that
headedness affects semantic transparency at some level.
Libben et al. (2003), looking at English compounds, also found a similar effect in their own
series of lexical decision tests. Overall, reaction times for TT (e.g. car-wash) and OT (e.g.
strawberry) compounds patterned together, as did those for OO (e.g. hogwash) and TO (e.g.
jailbird) compounds. These results are somewhat contradicted in a study by Kehayia et al.
(1999), however, where they found that priming either constituent improved word recognition
times for both Greek and Polish compounds, but that priming the initial constituent showed a
greater overall effect. Their results also conflict with the findings in both Jarema et al. (1999)
and Libben et al. (2003) because, as in Bulgarian and English, Greek and Polish compounds are
primarily right-headed. Regardless, these cross-linguistic studies lend some support to a theory
of compounding in which the transparency of the head plays a role in how the speaker processes
24
compounds: reaction times during lexical decision tasks seem to pattern together based on the
transparency of the compound’s head. One key issue, however, is that headedness in both
Jarema et al. (1999) and Libben et al. (2003) is defined purely in terms of lexical category, a
consequence of claiming that the head of a given compound can in fact be opaque (i.e. car-wash
and garçon manqué are traditionally labeled as exocentric in nature4). Thus, in both cases,
semantic transparency is understood as a consequence of constituent meaning. This might also
explain, to some extent, the differing results obtained in Kehayia et al. (1999) as they only
looked at “transparent compounds that [were] fully compositional in meaning” (371).
Dohmes et al. (2004), instead of using lexical priming tests, used a picture naming task to
determine if constituent meaning had any morphological priming effects in German compounds.
In two related experiments, participants were asked to name pictures after having first been
presented with either related or unrelated compounds (distractors). For instance, the participant
might see either wildente ‘wild duck’ (transparent), zeitungsente ‘false report’ (opaque), or
honigwabe ‘honeycomb’ (unrelated), which was then followed by a picture of a duck (target =
de. ente). In all instances, semantically transparent and opaque compounds revealed nearly
identical facilitatory effects, that is to say, reaction times to the picture naming task were
reduced when participants were presented with compounds that contained the target morpheme,
regardless of its meaning within the compound. Dohmes et al. argue that semantic transparency
plays only a minor role in compound comprehension, but concede that this may not remain true
during actual language production. They also suggest that the absence of a transparency effect
may also be due to the fact that the target constituent was always in head position, arguing that
competition at the lemma level could generate interference during the picture naming task. They
therefore conducted a third experiment with compounds that contained the target morpheme at
onset and found that although transparent compounds showed a slight improvement in reaction
times over opaque compounds, the difference remained marginal overall. Based on these results,
Dohmes et al. maintain that semantic transparency is not a factor in compound processing.
4 The compound [[garçon]N [manqué]A]N is a noun, which is why Jarema et al. (1999) treat it as left headed.
According to traditional approaches to centricity, however, the head must be a hypernym of the compound, which is why it may also be regarded as exocentric.
25
Pollatsek and Hyönä (2005) employed yet another type of experiment on compound processing:
a reading task in which Finnish compounds were embedded into sentences and participants were
monitored using both eye and head tracking cameras. Transparent compounds were compared to
opaque compounds and were rated using a pre-test similar to those in Sandra (1990) and
Zwitserlood (1994). It is worth noting, however, that opaque compounds were either opaque at
the whole-word level (OO) or at the first constituent only (OT). Compounds were matched for
frequency, but first constituent frequency was manipulated (either low or high). Participants
were asked to read the sentences that appeared on screen, which they were then asked to
paraphrase. Pollatsek and Hyönä found that the frequency of the first constituent had a
significant effect on gaze duration: participants stared longer at the compound with a low
frequency first constituent. Gaze duration analysis also showed that transparency seemed to
have little to no effect on participants’ traversal of the sentence. Because these compounds were
embedded in very different sentence frames, they conducted a second test in which transparent
and opaque compounds, matched for frequency, were inserted into the same sentence. Although
gaze duration for transparent compounds was slightly lower than for opaque ones, the difference
was not significant. Pollatsek and Hyönä interpret these results as evidence that semantic
transparency plays little role in compound processing. A series of similar sentence reading
experiments conducted by Frisson et al. (2008) produced similar results for English compounds.
Their stimuli involved all possible permutations for constituent transparency (i.e. TT, OT, TO,
OO) and were embedded into sentences, which participants were asked to read. Overall, the
semantic transparency of the compounds’ constituents had no significant effect on gaze
duration.
Studies that have used experimental data to evaluate whether semantic transparency is a factor
in compound processing have produced mixed results. While studies involving lexical decision
tasks show evidence that constituent transparency does affect recognition times (Taft and
Forster 1976, Sandra 1990, Zwitserlood 1994, Jarema et al. 1999, among others), others show
limited support for transparency effects (Dohmes et al. 2004, Pollatsek and Hyönä 2005, Frisson
et al. 2008). While the results of these studies are not truly homogeneous, they do suggest that
speakers are aware of a compound’s internal structure, even in cases where constituent meaning
may be obscure. Overall, however, compounds comprising of opaque constituents are not
typically treated in the same fashion as those with transparent constituents. Moreover, there
26
seems to be some degree of interplay between transparency and headedness, though, here again,
the evidence is mixed. While Jarema et al. (1999) and Libben (2003) found that a compound’s
head plays a greater role in participant’s recognition times, Kehayia et al. (1999) did not see
similar effects with Greek and Polish compounds. As for those studies in which semantic
transparency was said to be irrelevant during compound processing, two were based on reading
tests using eye-tracking devices (Pollatsek and Hyönä 2005, Frisson et al. 2008). These varied
findings may stem from differences in the experimental paradigms used or in the types of
constructions examined (i.e. existing compounds, non-words, pseudo-compounds), or even in
the languages under investigation.
Another explanation for the wide range of results obtained in the above studies comes from the
rather liberal approach to transparency adopted by a number of authors. They do not all view
semantic transparency in the same way and many do not explicitly lay out what it is that makes
a compound either transparent or opaque. In most cases, constituents are said to bear the lion’s
share of compound transparency, yet little is said about just how much meaning must be
retained for a constituent to be transparent. This is largely due to the absence of a rigorous
definition of the concept. The remainder of this chapter will attempt to address this issue by
proposing a working definition of semantic transparency, one that will allow for a more
thorough definition to be advanced.
2.3 Semantic Transparency: Definitions and Models
Despite some of the criticism offered in the previous sections regarding the rather limited
description of transparency in the literature, a number of definitions of the concept have in fact
been proposed. Some of these definitions are explicit, while others must be surmised from a
variety of peripheral indications mentioned by the authors. On occasion, an author will offer
more than just a definition of semantic transparency and will also provide the reader with a
means to interpret the varying degrees of transparency exhibited by complex constructions.
Some of these hierarchies take on the form of clines comprised of discrete points on a linear
scale, while others are presented as a continuum for which transparency is a scalar phenomenon.
The following sections will explore a few of the definitions that have been proposed in the
literature, as well as the various models advanced to classify compounds and other complex
constructions according to their degree transparency.
27
2.3.1 Some Definitions
The list of definitions that follows is by no means exhaustive, but it does highlight some of the
similarities, as well as some of the differences, between them. Many of these definitions have
been applied to derived words, some to idioms, and others to compounds. I’ve chosen to present
them together so as to show the degree of overlap that exists between them, despite their
applications to different types of complex constructions. It is my position that although a
unifying definition may in fact be possible, I will concentrate solely on compounds when later
formulating a working definition of transparency5.
(3) For derived words:
a. “A morphologically complex word is semantically transparent if its meaning is
synchronically compositional” (Marslen-Wilson et al. 1994: 5).
b. “Semantically transparent words can be fully understood given the meaning of the affix
and the meaning of the base” (Baayen and Lieber 1996: 283).
c. “A morphologically complex word is semantically transparent if its meaning is
compositional” (Roelofs and Baayen 2002: 132).
d. “For both complex and compound words, those that retain the meaning of the base
morpheme are semantically transparent relative to opaque or partially transparent
relatives whose meanings tend to be more remotely related to that of the base” (Feldman
et al. 2004: 18).
(4) For idioms:
a. “In transparent idioms, [. . .] the literal meaning is available, whereas in an opaque
idiom [. . .] the literal interpretation is no longer available or has never been or is not
even possible” (Flores D’arcais 1993: 80).
5 Some definitions are in fact formulated around the term opacity and not transparency. See Section 2.1.1 for
additional information regarding this distinction.
28
b. “In these [compositional and transparent] idioms, there are one-to-one semantic
relations between the idiom’s words and components of the idiom’s meaning”
(Glucksberg 1993: 17).
c. “[An idiom’s] opacity (or transparency)–the ease with which the motivation for the use
(or some plausible motivation–it needn't be etymologically correct) can be recovered”
(Nunberg et al. 1994: 498).
d. “A transparent expression is an expression for which we understand its meaning. A
difficult (if not impossible) expression to understand is opaque6” (Svensson 2004: 98,
my translation).
Alternatively: “If, when presented with an expression, a language user understands it
without any problems, without any other previous knowledge than understanding the
separate words that make up the expression, then it is transparent” (Svensson 2008: 84).
(5) For compounds:
a. “A given sequence is said to be opaque when the meaning of the whole cannot be
reconstructed from the meaning of its constituting elements7” (Gross 1996: 155, my
translation).
b. “[T]ransparent compounds, whose meaning is related in an obvious way to their
constituent meanings” (Sandra 1990: 531).
c. “The meaning of a fully transparent compound is synchronically related to the meaning
of its composite words” (Zwitserlood 1994: 344).
d. “[T]he meanings of each of the constituents are transparently represented in the
meaning of the compound as a whole” (Libben et al. 2003: 50). Alternatively,
6 “Une expression transparente est une expression dont on comprend le sens. Une expression difficile (voire
impossible) à comprendre est opaque” (Svensson 2004: 98). 7 “Une séquence donnée est dite opaque quand, à partir des sens des éléments composants, on ne peut pas
reconstituer le sens de l’ensemble” (Gross 1996: 155).
29
“semantically transparent because the meaning of the entire string can be derived from
the combination of the meanings of its constituents” (Libben et al. 2003: 51).
e. “A compound word is usually defined as transparent when the meaning of the
compound word is consistent with the meanings of the constituents (e.g., carwash). In
contrast, a compound word is defined as semantically opaque, when its meaning cannot
be constructed by directly combining the meanings of the individual constituents (e.g.,
pineapple)” (Pollatsek and Hyönä 2005: 262).
f. “[T]he meaning of both constituents is transparently related to the meaning of the
compound word as a whole” (Frisson et al. 2008: 87).
Most of the above definitions are short, succinct characterizations of transparency. They also
share a number of similarities, even across all three expression types: transparency is in nearly
all cases based on a meaning of the whole ~ meaning of the parts relationship. In fact, of the 14
definitions retained, only two seem to differ in any significant way (i.e. Flores d’Arcais in 4a
and Nunberg et al. in 4c)8, both of which are for idioms. Moreover, every description implies
that transparency is related in some way to comprehension. While these recurring themes are by
far and large the most dominant elements of the above definitions, we see a number of other
common factors at play as well, namely synchrony (3a, 4c) and compositionality (3a, 3c, 4d).
Synchrony seems like a reasonably plausible factor for transparency if we are interested in
addressing it from the perspective of comprehension. If this were not the case, we would then be
required to take into account constituent meanings that may have long since fallen out of usage,
as well as their etymology. This diachronic approach seems misguided, however, given that
these factors would depend crucially on what could conceivably be understood as historical
knowledge, possessed by only a small subset of a given linguistic community. Transparency is
therefore best interpreted as a synchronic phenomenon. As for compositionality, while it
undoubtedly plays some role in transparency, it should not be taken as the deciding factor in the
concept, as I will argue in Section 2.4.1.
8 This is not meant as an accurate reflection of the distribution of definitions as these were selected based on the
needs and scope of my stated goals and do not account for all descriptions used in the literature.
30
Despite the high level of uniformity present in the above descriptions, we do find a number of
peculiarities that show just how ambiguous the concept can be. In some cases, the definitions
are in fact tautologies (i.e. a complex expression is transparent when it is transparent, as in 4d
and 4e). In others, key words are used rather loosely, as in Nunberg et al. (1994) for idioms,
where motivation is said to be a factor of transparency. This motivation is best understood as the
speaker’s ability to “wholly recover the rationale for the figuration it involves” (Nunberg et al.
1994: 496) and is reminiscent of Saussure’s distinction between arbitrary sign and motivated
symbol. This is the approach also adopted by Svensson (2004) in her work on idioms and is a
criterion largely applied to the expression ex post facto, that is to say after the speaker has
learned its meaning. This factor may prove to be applicable to compounds as well. Taking the
compound oiseau-mouche (eng. hummingbird) as an example, one can see how motivation could
apply: even if the speaker were not able to find any semantically plausible relation between the
constituents, he or she might be able to do so once the meaning of the compound were revealed
to him. This will be touched upon again in chapters 4 and 7.
One of the aspects of semantic transparency that is seldom discussed explicitly, however, is the
inherent ambiguity present in compounds. It is in this regard that Frisson et al. (2008) offer a
more sensible and mitigated description of semantic transparency. While they agree that the
degree to which a compound is transparent is closely related to the relationship between the
meaning of its constituents and the meaning of the whole, they note that the concept remains an
inexact and ambiguous phenomenon9:
Indeed, compound words can differ in their type of opacity, as there exist compound words composed of two constituents for which either the first, the second, or both, constituents can be opaque. [. . .] However, we should quickly note that transparency is a relative concept, as even the meaning of transparent compounds cannot be unambiguously computed from the meanings of the constituents. (Frisson et al. 2008: 87-88)
For compounds, this ambiguity is largely due to the lack of predication between constituents. As
Pollastek and Hyönä (2005) plainly state: “Usually, for most transparent compound words, the
9 It is worth noting that Frisson et al. (2008) adopt the typology of semantic transparency first proposed in Libben
(1998), that is to say one based on the transparency of the compound’s constituents. This will be discussed in further detail in Section 2.3.3.
31
meaning of the word cannot be uniquely computed from the constituents, as a carwash could be
some sort of device that washes with a car; instead the meaning is usually a highly plausible
combination of the constituent meanings” (262)10. These relations must, however, be taken into
account when discussing the transparency of compounds as they no doubt affect how these
constructions are interpreted and have in the past been an important part of research on
compounding (Lees 1968, Levi 1978, Ryder 1994, Adams 2001, Lieber 2004, Jackendoff 2009).
In fact, it is precisely these relations that have proven to be the most challenging aspect of
automatic meaning generation for compounds (Lauer 1995, Rosario and Hearst 2001, Séaghdha
2008). Arnaud et al. (2008) have also argued that a loss of transparency can occur for N-N
compounds when “the relationship between the pre-modifier noun and the modified noun is no
longer explicit,” which means that “the reader/hearer must infer the relationship from the
context or [that] it must be stored in memory” (112). It is precisely the “relative” nature of
transparency that makes the use of the label in many works seem incomplete or possibly even
insufficient. The fact that the meaning of many so-called transparent compounds cannot be
unequivocally established shows that a more in depth look at the concept is not without its
merits. In fact, based on the factors above, one could reasonably argue that there are few (if any)
fully transparent compounds and that those traditionally labeled as such are just far more
transparent than others. It is precisely this broad characterization of semantic transparency, as it
is applied to compounds, that motivates the present research.
2.3.2 Transparency as a Continuum
If we return to Cruse’s work in lexical semantics, we find that he goes to great lengths to
emphasize the importance of viewing semantic transparency (or opacity as he prefers to call it)
as a concept based on degrees, that is to say, as “a continuum of degrees of opacity” (39), with
“fully transparent” at one end and “to some degree opaque” at the other. Cruse avoids using the
terms “completely opaque” and “not completely opaque” as end-points, reasoning that it allows
for a more satisfactory grouping of elements with significant similarities. While I argued in an
earlier section that the term “completely transparent” might prove to be infelicitous, there are
10
This is not necessarily the case for synthetic or verbal-nexus compounds in which the non-head is an argument of the head (e.g. truck driver, snow removal, etc.). These types are also present in French and will be discussed in chapters 6 and 7.
32
undoubtedly compounds (and arguably idioms) for which the term “completely opaque” could
be used accurately, their semantic characteristics being similar to those of simplex forms (e.g.
éléphant blanc, compère-loriot). As for the region between these two poles, Cruse claims that
the continuum possesses a “somewhat indeterminate transitional zone between opacity and
transparency” (40). This nebulous area of semi-opaqueness revolves around semantic indicators,
which, according to Cruse, can be either full or partial. Full indicators are constituents that are
uniform in their meaning, both within and without the complex expression (Cruse offers black-
and -bird in blackbird as examples of full indicators). Partial indicators, on the other hand, are
constituents that are said to only retain some of their meaning within a given complex
expression (Cruse gives -house in greenhouse as an example). The degree of semantic opacity
of an expression is therefore in part derived from the number, as well as the nature, of its
constituent indicators. Cruse also claims that the discrepancy of the combined contribution of an
expression’s indicators and its global meaning also factors into the degree of opacity, though he
does admit, perhaps rightly so, that such a discrepancy is difficult to measure. In order to further
illustrate his approach to semantic opacity, I offer the following figure, along with examples
from French:
Figure 2.1. Representation of Cruse’s (1986) continuum of semantic transparency.
Because Cruse says nothing about the position of a multi-word lexeme’s indicators, one must
assume that the only requirement for it to be located in the “indeterminate transitional zone”
(40) labeled as semi-opaque in the range above is that its indicators be partial or null11. Even if
we were to set aside the position of the indicator as a factor, there remains the question of how
exactly one is to populate this section of the line if we distinguish between constituents whose
11
Cruse calls a constituent that does not contribute semantically to the meaning of the whole an “impure tally.” I will use “null indicator” so as to retain the term indicator for all possible values for a given constituent.
Fully Transparent Semi-Opaque To Some Degree Opaque
mot-clé clé-anglaise demi-clé
33
meaning contributes only partially to that of the whole and constituents that contribute nothing
semantically. A reasonable assumption would be to class the expressions containing partial
indicators left of centre on the scale and those containing semantically null indicators to the
right of centre. This approach, though justifiable, is not without its problems, however. Because
we are dealing with multi-part expressions, each of which may have a different value in terms of
its semantic contribution, the possible pairs are in fact numerous if we assume, as Cruse does,
that there are three levels of indicators: full, partial, and null (32 for 2-word lexemes).
(6) Fully transparent: [full + full]
Semi-opaque: [partial + partial]
[partial + full] OR [full + partial]
[partial + null] OR [null + partial]
[null + full] OR [full + null]
To some degree opaque: [null + null]
We are thus left with seven possible patterns to be dealt with at the semi-opaque level, but no
clear manner by which to classify them. This distribution also fails to account for the second
factor mentioned by Cruse when discussing the degree of opacity, that is to say the gap between
the combined meaning of the constituents and the actual meaning of the compound. Martin
(1997) discusses this particular property of compounds, stating that their meaning, while clearly
related to their parts, can occasionally go far beyond those parts in sometimes unpredictable
ways. For instance, the meaning of life guard certainly involves the meaning of its constituents,
but nothing about these elements indicates that it also involves bodies of water. Yet it seems
counter-intuitive to treat such a compound as opaque. Cruse clearly recognizes this, but avoids
factoring for this discrepancy, as it would only add to the difficulty of establishing a multi-word
lexeme’s position within his continuum.
Similarly to Cruse, Gross (1996) also claims that opacity is a scalar phenomenon, which can be
divided into three parts: completely opaque, partially opaque, or non-opaque12. His terminology
12
“[L]’opacité est un phénomène scalaire : elle peut être totale (la clé des champs), partielle (clé anglaise) ou inexistante (clé neuve)” (Gross 1996: 11).
34
differs from Cruse’s, however, and is arguably less susceptible to criticism as he avoids using
fully transparent and instead chooses to label such compounds as non-opaque. That said, we are
once again left to decipher the region occupying the continuum’s middle section, which consists
of constructions Gross views as “partially opaque.” Because Gross treats transparency as a
direct function of compositionality, we are presumably to populate this area with compounds in
which only one constituent imparts its meaning to the whole. This approach does not allow,
however, for partial semantic contributions by a compound’s constituents, as Cruse’s does. We
are then to assume that compounds such as sage-femme and mauvais oeil, in which only one
constituent contributes semantically to the whole, are indistinguishable in terms of transparency.
In both Gross’s and Cruse’s frameworks, compounds that differ from the perspective of
headedness may potentially be treated as having the same degree of transparency, a result that
seems not only counter-intuitive, but that may in fact be incorrect based on some of the
experimental work done on headedness and compounding (as discussed in Section 2.2.1).
Despite such criticism, I agree with both Cruse and Gross that semantic transparency is best
viewed as a scalar phenomenon, that complex expressions such as compounds exhibit
sufficiently different semantic features so as to be treated along a continuum and not in terms of
either/or labels. Granted, even in works where compounds or other similar constructions are
treated in a binary fashion (Marslen-Wilson et al. 1994, Dhomes et al. 2004), it is never
explicitly claimed that no middle ground between the two poles exists. The contention here is
that between transparent and opaque lies a graded spectrum of transparency that is best
represented in terms of degrees. What thus needs to be made explicit is the manner in which
these degrees are calculated.
2.3.3 Explicit Semantic Transparency Clines
Explicit clines of semantic transparency do in fact exist and have been applied to derivational
morphology, as well as compounding. One such cline comes from Dressler’s (1985) work on
morphotactic transparency. Although this hierarchy is, strictly speaking, a representation of
phonological transparency as applied to derivational morphology, it is significant because of its
recognition that transparency is best represented as a multi-leveled hierarchy. Dressler’s
approach shares much in common with strictly semantic based treatments of transparency,
which he defines as “a biunique relationship between meaning and form” (Dressler 1985: 329).
35
The result of his analysis is a ranking of derivational forms based on a variety of operational
rules, the final level representing the most opaque type of morphological operation (i.e.
suppletion):
Table 2.1. Dressler’s hierarchy of morphotactic transparency (Dressler 1985: 330-331)
I Intrinsic allophonic PRs excite$+ment
II PRs interfere, e.g. resyll. exis$t+ence
III Neutralizing PRs, e.g. flapping rid+er (am.)
IV MPRs (no fusion) velar softening electric+ity
V MPRs with fusion conclusion
VI MRs intervene, e.g. Great Vowel Shift decision
VII weak suppletion (no rules!) childr+en
VIII strong be, am, are, is, was
Legend: MR: morphological rule; PR: phonological rule; MPR: morphophonological rule
Dressler’s multi-level evaluation of a complex word’s degree of transparency, based on specific
phonological, morphological, and morphophonological rules, translates into a far more granular
approach to the concept. Thus, forms derived at level I are more transparent than those at level
II, which are more transparent than those at level III, and so forth. This hierarchy supposedly
served as the basis for Kopecka’s (2006) own cline in her work on motion verbs in French (i.e.
prefix + root, dé-rouler). Kopecka defines semantic transparency as “the extent [to which] each
constituent part of the derived word is semantically interpretable” (2006: 94) and organizes her
cline along the following labels:
(7) i. + transparent: the relation between form and meaning is perceptible and
comprehensible
ii. ± transparent: the relation between form and meaning is not clearly perceptible,
despite the formal link between the simple form and the derived form
iii. − transparent: the relation between form and meaning is lost
Here we see an explicit hierarchy of semantic transparency similar to those implicitly suggested
by Cruse (1986) and Gross (1996). Once again, however, there is the potential for an ambiguous
36
middle zone where a complex construction is considered more or less transparent, with little
way to further refine the distribution at this particular level. In Kopecka’s defence, the words
she includes under the rubric ± transparent are those she views as +form/−meaning, that is, that
are analyzable as prefix + root, but for which the meaning is not predictable (e.g. ac-céder ‘get
to’). This contrasts with those verbs she considers − transparent, which are −form/−meaning
(e.g. affluer ‘flow to’). In this way, her approach differs from the traditional means of assessing
semantic transparency, which usually consists of determining whether all, some, or none of a
complex form’s constituents contribute semantically to the whole. While Kopecka’s cline could
arguably prove useful if applied to fused compounds (e.g. vinaigre, plafond), it is doubtful that
it would be effective for traditional compounds as these are, by their very definition, analyzable
into distinct parts.
Levi (1978), in her work on complex nominals, talks about a “continuum of derivational
transparency,” and proposes a hierarchy consisting of five levels (64):
Figure 2.2. Levi’s (1978) continuum of derivational transparency as applied to compounds.
Although Levi’s continuum predates those discussed by Cruse (1986) and Gross (1996), it
distinguishes itself by recognizing headedness as a factor in a compound’s degree of
transparency. Setting aside for the moment the ambiguity of the description given in (b), what is
most striking about her cline is that exocentric compounds are considered more transparent than
the endocentric ones labeled partially idiomatic. If we rely solely on the semantic contribution
of a compound’s constituents, Levi’s approach may not only prove correct, but also consistent
Transparency
Opacity
a. derivable by regular syntactic processes (mountain village, family reunion) b. were once transparent, but have since become more opaque (grammar school, briefcase) c. exocentric (birdbrain, razorback) d. partially idiomatic (polka dot, monkey wrench) e. wholly idiomatic (honeymoon, fiddlesticks)
37
with those of other researchers: the examples she provides in (c) are compounds in which both
constituents conserve their meaning to some degree, while only one constituent (i.e. the head)
meaning is retained in those in (d). This strategy, however, can be criticized based on the fact
that it nestles endocentric compounds between exocentric ones (i.e. compounds in (a) (b) and (d)
are endocentric, while those in (c) and (e) are exocentric). Levi presumably does this in order to
group together compounds that are, on the surface, difficult to motivate semantically prior to
actually learning the meaning of the combination (those in (d) and (e)). After all, while a
monkey wrench is in fact a wrench, what role does monkey play in its interpretation? Her wholly
idiomatic compounds are thus those traditionally viewed as opaque, but remain distinct, in terms
of their transparency, from both compositional exocentric compounds and endocentric
compounds in which only the head contributes semantically to the meaning of the whole.
But what do we then do with compounds such as jailbird or cardshark? These are cases of
exocentric compounds where only the first constituent retains its meaning within the
compound. In Levi’s continuum, jailbird would be grouped with either the exocentric (c) or the
partially idiomatic (d) compounds. If we were to introduce additional levels to her hierarchy,
however, jailbird could be inserted either between (c) and (d), or between partially idiomatic
(d) and wholly idiomatic (e) compounds. Unfortunately, it is unclear which of these two options
is correct: between (c) and (d), we are emphasizing exocentricity as a factor of transparency;
between (d) and (e), we are emphasizing the fact that only one constituent retains its meaning.
Interestingly enough, if we were to treat exocentricity as a more fundamental element of
opacity and thus rearrange the continuum accordingly, only one insertion point would be
possible for compounds such as jailbird: (a) mountain village à (b) grammar school à (d)
monkey wrench à (c) birdbrain à jailbird à (e) honeymoon. Of course, the question is
whether birdbrain is more or less transparent than jailbird. Although this modification is based
on a number of assumptions regarding headedness as a factor in transparency, it is an approach
that has in fact been suggested elsewhere, most notably in Libben (1998).
The semantic transparency cline proposed by Libben (1998) focuses primarily on a given
compound’s constituents. Rather than simply looking at whether a compound’s constituents
impart their meaning to the whole, however, Libben also takes into account whether the
semantic head is present. On the one hand, he assigns a value of opaque (O) or transparent (T)
to each constituent based on its meaning within the compound, and on the other, he groups the
38
resulting permutations along what he calls componentiality (more traditionally known as
centricity). The resulting combinations are as follows (adapted from Libben 1998: 38):
(8) Componential (endocentric)
a. T-T blueberry
b. O-T strawberry
c. T-O shoehorn
(9) Non-componential (exocentric)
a. T-T bighorn
b. O-T yellowbelly
c. T-O jailbird
d. O-O hogwash
The assumption behind this approach is that endocentric compounds are inherently more
transparent than exocentric ones because their heads are hypernyms of the entity targeted by the
compound (i.e. a blueberry is a type of berry). According to Libben et al. (2003), endocentric
O-T compounds (chopstick) do in fact pattern with endocentric T-T (coalmine) for participant
reaction times and are generally more easily processed than exocentric T-O compounds
(cardshark). What their study suggests is that factoring in only the semantic contribution of a
given compound’s constituents is not sufficient when attempting to determine its degree of
semantic transparency. As Libben (1998) argues, morphological headedness also plays an
important role in the processing of compounds and should thus be included in any typology of
semantic transparency.
Libben’s cline is not without its problems, however. While grouping together compounds based
on their headedness does allow for a much richer typology, there remain ambiguities within
each cluster. For instance, he does not distinguish between O-T and T-O compounds in terms of
their respective degrees of transparency. Thus, it is unclear if shoehorn (T-O) is more or less
transparent than strawberry (O-T) given that they are both said to be componential (i.e.
endocentric). In the case of shoehorn, Libben argues that while it is not technically a horn, it
can be understood as ‘a horn for a shoe,’ presumably because the lexeme horn possesses a
somewhat marginal acceptation based on shape (‘a horn spoon or scoop’ in the OED entry II-
39
11a)13. The same problem arises for non-componential O-T (yellowbelly) and T-O (jailbird)
compounds. Part of the reason for these ambiguities is arguably because of English’s rather
rigid right-headedness for compounds. In French, however, although NN compounds are
mostly left-headed, there are a number of right-headed cases. This allows for O-T and T-O
compounds to be unequivocally endocentric (e.g. aube-vigne and bateau mouche, respectively).
The question, of course, is whether they should be treated as equally transparent. The following
figure illustrates the ambiguities related to Libben’s approach.
Figure 2.3. Ambiguous pairs in Libben’s (1998) typology of semantic transparency
One apparent way to account for these undetermined pairs (O-T and T-O) is simply to state that
they exhibit the same degree of semantic transparency. Again, this may or may not prove
correct, and will only be determined through further examination. Despite these questions,
however, Libben’s cline is most certainly a step in the right direction: it takes into account
factors other than the mere semantics of a compound’s constituents and recognizes that some of
its components may play a greater role than others in determining a compound’s degree of
transparency. I believe that this model can be further refined by integrating additional factors
into its framework.
13
This seemingly contradictory interpretation may in fact explain why Libben chooses the componential/non-componential dichotomy instead of the traditional endocentric/exocentric one.
ENDOCENTRIC T - T : stylo-feutre O - T : aube-vigne T - O : bateau mouche
Transparent
Opaque
f
EXOCENTRIC T - T : bec-figue O - T : chat-château T - O : trou-madame O - O : compère loriot
40
2.4 Semantic Transparency: A Working Definition
The formalization of semantic transparency with respect to compounds requires that a clear
description of the concept be put forward. This chapter will thus close with the proposal of a
working definition of transparency, which will then be expanded upon in Chapter 7 following a
thorough examination of the semantics of compounds. Before this definition may be formulated,
however, a word must first be said on the distinction between transparency and
compositionality, and on how the latter is understood in this thesis.
2.4.1 Transparency vs. Compositionality
A common thread throughout much of the work on semantic transparency is that the concept is
often taken to mean semantic compositionality. There are a number of cases where
compositionality, as applied to compounds, is defined similarly to semantic transparency (e.g.
“the meaning of compositional compounds can be successfully derived from the meaning of the
noun constituents” Girju et al. 2005: 488). This conflation of concepts is especially important
when discussing compounds, because some linguists believe that they are by their very nature
non-compositional constructions14. Does this then mean that transparency is merely an artefact
of compositionality?
The notion of semantic compositionality has long been applied to phrase generation in order to
account for a speaker’s ability to understand novel sentences or expressions, a concept that has
widely been called Frege’s Principle of Compositionality: “The meaning of a complex
expression is a function of the meanings of its parts and of the syntactic rules by which they are
combined” (Partee et al. 1990). Over the years, linguists and philosophers alike have often taken
issue with this principle, but accept that such a concept nevertheless does exist in some form or
another, for as Grandy (1990) sums it up, “in spite of the fact that we have no adequate
semantics for any natural language we feel that there MUST be compositional semantics for
14
In a post on his co-authored blog Language Log, Geoffrey K. Pullum criticized the Oxford Dictionaries organization for choosing squeezed middle as their Word of the Year 2011. The Oxford Dictionaries argued that the expression is a compound and thus eligible for Word of the Year status, but Pullum argued against the construction’s wordhood based on the fact that, according to him, its meaning is fully compositional (<http://languagelog.ldc.upenn.edu/nll/?p=3573>).
41
ALL natural languages, because otherwise people could not learn them” (557). At issue in the
present work is therefore not whether compositionality exists, but whether it differs from
transparency. My position is that we are dealing with two distinct, yet related concepts.
In its simplest form, semantic compositionality is “[understood] informally to mean that the
meaning of a complex syntactic expression is determined by its structure and the meanings of its
constituents” (Aronoff 2007: 803). For Weiskopf (2007), compositionality is in fact a somewhat
simpler operation, stating that “the default mode of semantic combination corresponding to
syntactic or morphological concatenation is set intersection” (162). Formally, compositionality
can be also be stated as a condition, as Katz (1973) does for traditional free-form constructions:
“For every syntactically complex constituent C of [sentence] S (including S itself) whose meaning is nonidiomatic, the set of semantic representations R assigned to C is a function of the sets of semantic representations assigned to the subconstituents that make up C and their grammatical relations in the sentence S” (Katz 1973: 357).
While the concept was originally applied to phrases, it has since been applied to a number of
more restricted constructions, such as collocations, idioms, and compounds. In many cases,
compositionality is a factor in establishing whether a particular construction is a member of a
particular group (i.e. Gibbs et al. 1989 for idioms, Tutin and Grossmann 2001 for collocations,
Weiskopf 2007 for compounds). It is also regarded as a fundamental component of derivational
morphology (Jackendoff 1974, Aronoff 1976, Lieber 1992, Bauer 2001b, Lieber 2004).
Compositionality is often referenced in terms of its applicability to a given expression and is
frequently discussed in its negative form. For instance, Gibbs et al. (1989) state that an
expression is “non-compositional” when “the figurative meaning of an idiom is not a function of
the meanings of its parts” (576). By the same token, we can thus assume that an expression is
compositional when its meaning is a function of the meanings of its parts. Regardless of how
one formulates compositionality, the term is frequently mentioned in a number of works
discussing semantic transparency (see definitions in Section 2.3.1 for example). It should come
as no surprise then, that some researchers have equated the concepts, viewing them as both
mutual and inseparable features of complex constructions.
This conflation of compositionality and transparency is often a result of overlapping definitions.
Roelofs and Baayen (2002), for instance, state that a morphologically complex word is
transparent if it is compositional. We see similar approaches in Marslen-Wilson (1994), Gross
42
(1996), and Libben et al. (2003). In the glossary to his work on French fixed expressions, Gross
defines compositionality as follows: “A given construction is said to be compositional when one
can deduce its meaning from that of its component elements linked by a specific syntactic
relation”15 (1996: 154, my translation). This definition is then followed with a note inviting the
reader to consult the entry for “Opacity,” in which Gross echoes his description of
compositionality, stating that a construction is opaque when its meaning cannot be derived from
the meaning of its parts. It is clear that in Gross’s view, non-compositionality and opacity are
not only related, but may in fact be indistinguishable concepts. There are two particular
components to Gross’s definition of compositionality: one is the semantic contribution of an
expression’s elements, the other is the syntactic relationship held between said elements. The
first is hardly controversial and is in fact the basis for many definitions of compositionality (i.e.
see the definitions mentioned earlier). The second component, however, is not so easily
characterized, especially for NN compounds. If two nouns in apposition are said to be in a
syntactic configuration, it is not usually clear what type of predication might follow from it. As
was mentioned in the previous section, many NN compounds can be interpreted by virtue of the
meaning of the individual constituents, yet the relational association between them is obscure
(e.g. wool basket = ‘basket for wool’ or ‘basket made of wool’, Spalding and Gagné 2007). This
stands in stark contrast to synthetic compounds in English (and to some extent, in French), in
which the relational component can be deduced from the verb-complement relationship still
evident in the deverbal head (e.g. a truck driver is a ‘driver of trucks’; Roeper and Siegel 1978,
Botha 1984). For NN compounds, however, the syntactic relation itself is insufficient to
establish their full meaning, which means that most primary compounds (i.e. those not based on
a deverbal head) must be treated as non-compositional constructions. While this point of view
may in fact coincide with a number of claims regarding compounds (i.e. a compound is a non-
compositional multi-word lexeme, as Langacker 2009 suggests), it fails to take into account the
fact that the semantics of many compounds remain closely linked to the meaning of their
constituents.
15
“Une construction donnée est dite compositionnelle quand on peut déduire son sens de celui de ses éléments composants reliés par une relation syntaxique spécifique” (Gross 1996: 154).
43
For the most part, compositionality is viewed as a binary property of complex constructions: in
the traditional Fregean view, a construction is either compositional or non-compositional. Thus,
according to the definitions discussed above, compositionality involves not only the meaning of
a construction’s parts, but also the rules or operations by which they are combined. If we treat
compositionality and transparency as identical concepts, then we must also view transparency as
a binary property: either a construction is transparent or it is opaque. If, on the other hand, we
view the two concepts as distinct, then we begin to allow for a more complex model of
transparency.
To be clear, a number of researchers clearly distinguish between compositionality and
transparency. Nunberg et al. (1994), in their seminal article on idioms, offer a definition of
compositionality that is distinct from that of transparency, describing the former as “the degree
to which the phrasal meaning, once known, can be analyzed in terms of the contributions of the
idiom parts” (498). This definition, however, is remarkably similar to some of those proposed
elsewhere for transparency, again showing just how blurred the line between the two concepts
is. Svensson (2004), in a more elaborate description of compositionality, defines the concept
along four distinct dichotomies, one of which is transparency - opacity. Although, she clearly
distinguishes between compositionality and transparency, she views the latter as a subset or
contributing factor to the former. This is quite different from a number of other approaches,
where compositionality is instead treated as a factor of transparency (Kehayia et al. 1999,
Pollatsek and Hyönä 2005, Tabossi et al. 2008). Svensson succinctly defines compositionality as
follows : “If all the words contribute to the meaning of the expression, we will say that it is
compositional”16 (Svensson 2004: 73). One will no doubt notice that if one were to replace the
word “compositional” with “transparent,” the description would be nearly indistinguishable
from many of those proposed by other researchers for transparency. Nevertheless, in Svensson’s
mind, the two concepts are distinct.
This approach, however, requires that we adopt a much narrower view of compositionality, one
that focuses solely on the meaning of a construction’s parts. Briefly, a compositional
16
“Si tous les mots contribuent au sens de l’expression, nous dirons qu’elle est compositionnelle” (Svensson 2004: 73).
44
construction is one for which all constituents contribute semantically to the meaning of the
whole (Svensson 2004, Girju et al. 2005). In this regard, compositionality may then be partial if
only one of its constituents retains its meaning within the whole. Although this particular use of
the term may stray slightly from traditional usage, it acknowledges that a compound’s
constituents may contribute meaning without necessarily rendering it transparent. This is best
illustrated using the compounds hammerhead and arrowhead: in both cases, the meaning of
their constituents contribute to the meaning of the whole (“a shark with a head shaped like a
hammer” and “the head of an arrow” respectively), but in the case of the former, crucial
information is absent (i.e. that a hammerhead is a shark). Within a narrow view of
compositionality, key properties of certain exocentric compounds (e.g. hammerhead, birdbrain,
and redcoat) are captured despite their relative opacity. Compositionality can thus be said to
“feed” into transparency.
An interesting consequence that arises from the treatment of compositionality and transparency
as two distinct, yet related attributes is that the relationship between the two can only be
bidirectional under certain conditions. If we return to Gross’s (1996) approach to the concepts,
we notice that what he is effectively stating is that a non-compositional expression is an opaque
expression. Can we therefore say that an opaque construction is non-compositional? While this
statement seems plausible in his framework, it is not in fact tenable if the two concepts are
treated independently. Intuitively, when compositionality is viewed as a factor of transparency,
we generate a series of unequal relationships that show an interesting pattern. To illustrate this
point, let us look at the possible permutations of the dichotomies, as well as the assumptions one
can make regarding the relationship held between them:
(4) a. a compositional expression can be either transparent or opaque
b. a non-compositional expression can be opaque, but not transparent
c. a transparent expression can be compositional, but not non-compositional
d. an opaque expression can be either compositional or non-compositional
While the assertions in (4b) and (4c) are largely hypothetical in nature, they nevertheless remain
intuitively plausible. Moreover, similar points have been made elsewhere (cf. Svensson 2004).
To summarize, a non-compositional expression cannot be transparent, while a transparent one
cannot be non-compositional. This polarity reflects many approaches to compositionality and
45
transparency, whether they are treated as distinct concepts or not. What perhaps distinguishes
my position from that of others, however, are the types of implications that are possible given
the observations above. Most importantly, compositionality never strictly implies transparency,
nor does non-transparency strictly imply compositionality. Figure 2.4 on the following page
illustrates what entailments are in fact possible.
Figure 2.4. The relationship between compositionality and transparency.
As we can see, only two logical implications are present, which are indicated by solid lines (i.e.
a transparent construction is necessarily compositional and a non-compositional construction is
necessarily non-transparent). The dotted lines indicate possible relationships between concepts.
For instance, a compositional construction may be transparent, but it is not necessarily so; it
might instead be non-transparent17. One need only think of classic cases of exocentric
compounds to see how this can be true (e.g. fr. rouge-gorge; eng. redcoat). This relationship is
also reflected in ad-hoc expressions created spontaneously based on some contextually
dependent information (i.e. apple juice seat in Downing 1977). A consequence of the approach
described above is that the terms compositional and opaque (or non-transparent) are thus
ambiguous with respect to each other: stating that an expression is compositional is ambiguous
with regards to its transparency; the same is true of opaque expressions and their
compositionality.
17
It is important to note that the terms transparent and non-transparent are not used here in an absolute sense, as was initially discussed in Section 2.1.1. More precisely, one should understand them directionally, which is to say “tends toward transparent” and “tends toward non-transparent.”
Compositional Transparent
Non-transparent Non-compositional
46
For our purposes here, we may state that by adopting a narrower view of compositionality that is
clearly distinct from transparency, we are in a better position to develop a richer account of how
form and meaning relate to each other. If the goal is to re-examine how compound transparency
may be formalized, this distinction offers the advantage of recognizing that some compounds
have a direct connection to the meaning of their constituents without being fully transparent.
This relationship will be revisited in Chapter 4 when I explore some of the features I believe
should be incorporated into a typology of transparency.
2.4.2 Semantic Transparency Defined
Although the majority of definitions or descriptions proposed for semantic transparency have
relied heavily on the relationship between the meaning of the whole and that of its component
elements, I will, for the time being, avoid stipulating this factor as a condition of the concept and
will instead formulate it in more general terms. It should be noted that while the following
definition is proposed with compounds in mind, it may in fact be applicable to other
morphologically complex constructions.
(10) a. For a complex lexical unit C, semantic transparency refers to the degree of semantic
interpretability of C
b. Semantic transparency is a property of C that:
i. is scalar (i.e. is not simply a +/− feature)
ii. is multi-faceted (i.e. based on a number of factors)
The definition in (10a) is based on the widely accepted view that semantic transparency is
related to comprehension of a lexical unit and emphasizes that its interpretability is a matter of
degrees. The stipulations in (10b), which are consequences of (10a), are considered defining
features of transparency. First, I maintain that semantic transparency cannot be viewed as a
binary feature of complex constructions, as has been tacitly held by some researchers. In other
words, it cannot be said that a compound is either transparent or opaque, but rather that it
exhibits some degree of transparency. Of course, any typology of semantic transparency will
almost inevitably take the form of some sort of cline, but with enough parameters, such a
typology could be sufficiently granular for the purposes of compound classification. Second,
transparency is said to be dependent on a number of different factors, such as compositionality
47
and headedness. This allows for factors to be weighted differently or identically, whatever the
case may be. If constituent meaning proves to be a leading factor in semantic transparency, it
will be treated as such. We may find, however, that individual constituent meaning is trumped
by an infrequent semantic relation held between them (Allen 1978) or that in some cases
exocentric compounds show a high degree of transparency.
It must be noted that while we may formalize semantic transparency as a linguistic concept, its
application may not always reflect the speaker’s perception of it. In other words, the perception
of transparency for a particular construction will no doubt vary from speaker to speaker. A
model of transparency might label salad dressing as transparent, but if the speaker is unfamiliar
with either of these words, he or she will most likely not perceive it as such. As Svensson (2008)
emphasizes, however, this variation does not necessarily preclude the possibility of “label[ing]
certain expressions as opaque for the majority of language users who have not learned the
meaning yet” (88). The above definition is meant to be compatible with this view.
Moreover, as was discussed in Section 2.1.1, the term opaque will be reserved for only those
expressions that are deemed truly opaque, that is as having the same transparency status as
simplex words. In all other cases, compounds will be said to show a certain degree of
transparency, which may vary greatly from one construction to the next. The term transparent,
when used, will be understood to mean “high degree of transparency.” Figure 2.5 illustrates how
these terms relate to each other as points on a continuum. The extremities of this scale represent
absolutes to the extent that nothing greater may be located beyond them (i.e. there is no
compound more opaque than an opaque compound).
Figure 2.5. A continuum of semantic transparency.
Of the stipulations in (10), those in (b) are the most central to my approach. What I am
suggesting is that, on the one hand, transparency may vary from one construction to another and
Transparent X degree of transparent Opaque
48
is therefore best represented as the continuum above. On the other hand, this variation results
from a number of different factors and features. The latter position is already present to some
degree in Libben’s typology of transparency (Libben 1998) and is also hinted at in some of the
experimental work on the role of transparency in compound interpretation (Jarema et al. 1999
and Libben et al. 2003 for compounds, Feldman et al. 2003 for derived words). Some studies
suggest that a speaker’s ability to access compounds is affected by a number of compounded
factors, such as frequency, transparency, and productivity (Dohmes et al. 2004). My contention
here is that to effectively evaluate the semantic transparency of a complex lexical unit, these
factors must be taken into account. Based on past research on compounding, I will argue in
Chapters 4 and 5 for a typology of semantic transparency that incorporates the following
semantic features:
(11) a. The position and the nature of a compound’s head
b. The semantic contribution of a compound’s elements
c. The unexpressed semantic relation held between a compound’s constituents
d. The degree of semantic similarity between related compounds
It is my contention that the features in (11a-c) represent what a typology of compound
transparency should, at a minimum, take into account. The property listed in (11d) is meant to
augment the relational feature in (11c). By incorporating several factors into existing models of
semantic transparency, we may propose a more granular typology of the concept, one that
allows for a classification of compounds that better reflects the numerous ways in which their
meaning may be composed and established.
2.5 Summary
In this chapter, I looked at a number of issues related to the term “semantic transparency” (or
“semantic opacity”) as it has been used alongside morphologically complex words, including
compounds. I have argued that usage of the term generally lacks specificity and often differs
across works on the subject. This limited description of the concept has led to a wide range of
claims regarding not only the processing of compounds, but also their classification, as well as
their status as lexical items. Many of the studies that claim to look at semantic transparency
often do so without first explaining just how compound A is more or less transparent than
49
compound B. This is precisely the focus of this thesis. In this chapter, I argued that transparency
and compositionality are two distinct, yet related concepts and that a compound can be
compositional without being fully transparent. This distinction, I believe, affords a much wider
and richer view of transparency, one that includes elements that have seldom factored into
previous discussions on the concept. The working definition in Section 2.4.2 is meant to reflect
this broader approach to semantic transparency and will be revisited and improved upon as I
explore the factors mentioned at the close of this chapter.
50
Chapter 3
French Nominal Compounds and Data Collection
This chapter focuses on what is arguably the most fundamental component of the present
research, namely compounds. I begin by discussing compounding and how it is understood in
the context of this thesis. This discussion also touches on French compounds, highlighting how
other researchers have treated them in the past. I then narrow the scope of the study of semantic
transparency by setting limits on the types of constructions under investigation. Finally, I outline
the methods used to collect the core data that will serve as the basis for the analysis presented in
the remainder of this work.
3.1 Compounding
Despite its long and rich history, compounding remains a somewhat controversial object of
study. While most researchers agree that there is such a thing as a compound, not everyone
agrees on what exactly it is or what it should even look like. Yet, as with any other research
effort, a description of the object of study, however limited it may be, must be provided if
anything meaningful is to be said on the topic. To this end, the following sub-sections will
explore not only how compounding has been defined by other researchers in the past, but also
the criteria that have been used to distinguish between compounds and phrases. Moreover,
because this thesis focuses on semantic transparency from the perspective of French
compounds, several typologies and approaches to compounding in French are examined. It is
important to note that the purpose of this discussion is not to establish once and for all what
constitutes a compound, but rather to set clear boundaries with which to limit the scope of the
present study.
51
3.1.1 Defining the Compound
Compounding is largely understood as the process by which a compound is formed. A survey of
introductory texts in morphology reveals a rather homogeneous—though perhaps simplified—
approach to both the operation and its result. Let us consider the following definitions taken
from a number of recent works in general morphology:
(12) a. “words formed by combining roots” (Carstairs-McCarthy 2002: 59).
b. “What we mean by ‘compounding’ is the construction of a complex lexical unit from
at least two bound or free lexical morphemes18” (Apothéloz 2002:18, my translation).
c. “The formation of a new lexeme by adjoining two or more lexemes is called
compounding” (Bauer 2003: 40).
d. “A compound is a lexeme which contains two (or more) stems and which does not
have any derivational affix which applies to the combination of stems” (Bauer 2004: 32).
e. “A derived form resulting from the combination of two or more lexemes” (Aronoff
and Fudeman 2005).
f. “[compounding] consists of the combination of two words” (Booij 2007: 75).
g. “Compounds are words that are composed of two (or more) bases, roots, or stems”
(Lieber 2010: 43)
Two major criteria can be retained from the descriptions or definitions above. First, a compound
is typically viewed as a lexical item. A corollary of this principle is that we can expect any
multi-word construction labeled as a compound to behave similarly to traditional lexemes.
Second, a compound is the product of composition between two (or more) otherwise
18
“[O]n entend par ‘composition’ la construction d’unité lexicale complexe au moyen [...] d’au moins deux morphèmes lexicaux libres ou liés” (Apothéloz 2002:18).
52
independent lexemes19. This criterion allows us to distinguish rather easily between derived
words and compounds (e.g. abaissement ~ abaisse-langue). What remains to be established,
however, is which of a language’s multi-word constructions are in fact compounds. This is
especially important when discussing compounds in French, as they often bear a significant
likeness to syntactic phrases (i.e. la robe de mariée ~ la robe de la mariée). Despite these
similarities, however, compounds are mostly viewed as distinct entities given that, as Bauer
(2001) states, they “[show] some phonological and/or grammatical isolation from normal
syntactic usage” (695). The task is therefore to determine what such “isolation” entails.
More detailed and complete definitions of compounds reveal just how complex the issue
actually is. In Bauer’s (1978) seminal work on compounding in English, French, and Dutch, he
describes the phenomenon as follows:
“[I]t can be said that a compound is a morphologically complex unit, made up of two words (lexemes) acting as a single word (lexeme). The words or (in most cases) potentially free formatives may themselves be further subdivided. The compound, it is claimed, shows a degree of phonological, morphological and semantic isolation. However, these points are better considered as tendencies than as rules, since there appear to be very few ‘rules’ in compounding that admit of no exceptions.” (54)
While more elaborate definitions of compounds, such as the one above, share a number of
similarities with the more cursory descriptions mentioned in (12), they often highlight the fact
that the concept is nuanced and not necessarily subject to easy circumscription. Bauer therefore
speaks of tendencies, which may be prudent, but this approach also has the potential to weaken
the conclusions one might draw based on a study of the phenomenon.
It should therefore come as no surprise that many researchers do attempt to restrict what types of
constructions may be considered instances of compounds. This narrow approach to the topic is
usually done for a number of reasons. On the one hand, many authors wish to consign as much
as they can to syntax, thus accounting for the fact that some constructions behave internally as
syntactic units. On the other hand, researchers want their frameworks to be consistent and easily
19
Apothéloz’s (2002) definition mentions bound morphemes (“morphèmes lexicaux liés”) as acceptable compound constituents. This is done to treat words such as bibliophile as compounds, traditionally referred to as neo-classical compounds (Scalise and Bisetto 2009). As stated in Section 3.2.1, these constructions will be ignored in this work for technical reasons.
53
duplicated, a goal more easily achieved with a much narrower view of compounding. Let us
take, for instance, Ten Hacken’s (1999: 41) definition of a compound, stated as follows:
(13) A compound is a structure [X Y]Z or [Y X]Z, such that:
• The denotation of Z is a subset of the denotation of Y;
• If S is a possible way of specifying Y, the denotation of Z is determined by a range S’s that are compatible with the semantics of X;
• X does not have independent access to the discourse
Ten Hacken establishes not only what structure a compound has, but also its semantics. The first
clause stipulates that a compound must have a semantic head, thereby denying compoundhood
for English constructions such as pickpocket and redcoat, which have long been treated as
compounds elsewhere (Allen 1978, Selkirk 1982, Scalise 1984, Lieber 1992). The second clause
refers to the possible relations that may hold between a compound’s constituents, the range of
which is restricted based on their semantics. The third clause of Ten Hacken’s definition
accounts for the non-head’s inability to pick out a reference without contextual support20.
Semantic criteria, however, is not limited to headedness. According to Langacker (2009: 54) a
compound is by definition non-compositional because its meaning is indeterminate, but this
principle also challenges the widely held belief that a compound’s meaning can in fact be
computed from its components (see Chapter 2 for a discussion of compositionality in
compounding). Alternatively, some definitions restrict membership based on the types of lexical
units involved. For example, Arnaud (2004) states that “a compound noun is a nominal lexical
unit that results from the union of two (or more, recursively) open class lexical units21” (329,
my translation). According to Arnaud’s definition, nominal constructions such as arrière-plan
and sans-abri must be rejected as compounds because they contain elements from a closed
lexical class, which would also include any construction containing a preposition as a linking
20
It has been observed that the non-head element of a compound does not typically have a reference: house in housefly, for instance, does not refer to a particular house. The only exceptions are proper names that are sufficiently specific so as to possess a referent independent of discourse, which explains why Chopin fan is acceptable, but *Frédéric fan is not. 21
“Un nom composé est une unité lexicale nominale résultant de l’assemblage de deux (ou récursivement de plusieurs) unités lexicales de classe ouverte” (Arnaud 2004: 329).
54
unit. Yet many of these constructions show a great deal of syntactic autonomy and are widely
considered to be compounds elsewhere (see, for instance, Amiot 2005 for a discussion of Prep-
N compounds).
The fact is that compounds are not necessarily definable according to a fixed set of features, nor
are these features likely to coincide cross-linguistically. Although there is general agreement
that a compound consists of two otherwise independent lexemes, such a definition is most likely
too broad to prove very effective in identifying instances of compounds. In this regard,
establishing compoundhood is largely a matter of testing candidates according to various
behavioural and functional criteria. Even this approach, however, is bound to encounter
difficulties given that compounding, as a linguistic operation, may actually be connected to
several different domains. Ten Hacken (1994) succinctly states the difficulties, as well as the
resulting issues, at hand: “Compounding has borderlines with affixation, with syntax, and with
the lexicon. For each of these borderlines, there are cases where the classification is not
straightforward. A definition will have to result in a decision for these borderline cases” (23).
Thus, the next few sections will look at some of the criteria advanced to facilitate this task.
Before moving on, however, a brief word must first be said on terminology. There exist
numerous labels for the multi-word combinations that defy many of the syntactic and semantic
constraints of the language. The exploration of such constructions usually falls under the
purview of phraseology (see Gries 2008 for an overview of the field). A number of categories
have been proposed, but not everyone agrees on what these categories are or which items should
be included in them. Polguère (2003), for instance, labels fruit de mer and nid de poule as
nominal locutions, while Mathieu-Colas (1996) calls them compounds. Of course, the
terminology one adopts largely stems from one’s preferred theoretical framework (Polguère, for
example, works within the Meaning-Text Theory, Mel’cuk et al. 1995). Granger and Paquot
(2008) propose three major categories, one of which groups together “referential phrasemes”
that includes seven types: lexical collocations, idioms, irreversible bi- and trinomials, similes,
compounds, grammatical collocations, and phrasal verbs. Compounds are said to “resemble
single words in that they carry meaning as a whole and are characterized by high degree of
inflexibility, viz. set order and non-interruptibility of their parts” (43). These characteristics
presumably set compounds apart from lexical collocations, where one constituent is said to
depend on the other, and from idioms which, according to Granger and Paquot, “are
55
characterized by their semantic non-compositionality” (43). These criteria are only partial
indicators of membership, however, as each type is in fact defined according to several different
features.
It is also useful to note that a multi-word expression does not need to be listed to be considered a
compound. Di Sciullo and Williams’s (1987) “hierarchy of listedness” places compounds
somewhere between words and phrases, stating that “many of the compounds are listed” (14).
While few researchers have in fact claimed that compounds, because they are lexemes, must be
listed, it still bears mentioning that such a criterion should only be given limited weight.
Listedness cannot be a required property of compounds because compounding is widely
considered one of the most morphologically productive processes in word formation (Downing
1977, Roeper and Siegel 1978, Libben 1998, Bauer 2001b). Thus, to claim that the result of
such a process must be listed would be to argue that compounding is in fact unproductive.
Downing (1977) emphasizes the productive nature of compounds and discusses what she calls
deictic compounds, pragmatic constructions made up on the spot and used to refer to a
temporary situation or condition. These nonce forms—which includes her now famous example,
apple juice seat—will most likely never be listed anywhere, yet they are examples of
compounds in the wild, so to speak. Allen (1978) made similar remarks, distinguishing between
lexicalized and non-lexicalized compounds and suggested that semantic non-compositionality,
among others, is a contributing factor to the lexicalization of a compound, which further
increases its chances of being listed.
All things being equal, the use of the term compound in this thesis will reflect its usage across
works anchored in lexicalist morphology. Many of the definitions offered at the beginning of
this section are good examples of such works (e.g. Bauer 2003, Aronoff and Fudman 2005,
Lieber 2010) and while they are all cases of introductory texts on morphology, the definitions
they propose are nevertheless both sufficiently similar and explicit that they may support the
discussion to follow. In sum, compounds are understood as lexemes that are themselves
composed of two or more lexemes. I use the term lexeme so as emphasize three points: 1) that
compounds are typically formed using free morphemes; 2) that these constituents can be either
simplex (e.g. panier à salade) or derived forms (e.g. assurance-emploi); and 3) that such
combinations function as a single unit (cf. Bauer 1978).
56
As for a working definition, I think it prudent to offer something as concise as possible, while
relying on tests or criteria to determine with certainty whether a particular multi-word
constructions should be treated as a compound. These criteria will follow in the next section. To
conclude this section, I thus propose the following brief, albeit representative definition of
compounds:
(14) A compound is a morphologically complex unit composed of at least two lexemes that
may or may not be fused, but which functions as a single lexeme.
The above definition not only closely echoes Bauer’s (1978) own description mentioned earlier,
but also strays very little from most other definitions. What remains to be done is to establish
where one should draw the line between freely constructed syntactic units (e.g. brown table = ‘a
table that is brown’) and morphologically complex units (e.g. blueberry = ‘a berry that is blue’).
The following section discusses some of the key ways in which this distinction may be made.
3.1.1.1 Compounding Criteria
As was briefly touched upon earlier, the work presented in this thesis is largely based on a
lexicalist view of morphology, which is to say that there is such a thing as the lexicon and that
word formation falls within the domain of morphology. This approach, which originated in
Chomsky (1970), has given rise to the Lexicalist (or Lexical) Integrity Hypothesis (henceforth
LIH). Briefly, this hypothesis draws a hard line between what belongs to syntax and what
belongs to morphology and places constraints on how these modules may interact (Halle 1973,
Aronoff 1976, Allen 1978). Lapointe (1980) frames the matter succinctly: “syntactic rules are
not allowed to refer to, and hence cannot directly modify, the internal morphological structures
of words” (222). Consequently, such a framework also prohibits simultaneous operations from
taking place across modules: “No deletion or movement transformations may involve categories
of both W[ord]-structure and S[entence]-structure” (Selkirk 1982: 70). The original lexicalist
position was eventually weakened to allow for a certain degree of interaction between
morphology and syntax, mostly in order to account for inflectional morphology’s dependence
across multiple units (Anderson 1982, also see Baker 1985 for additional evidence supporting a
weakened lexicalist position). Although the LIH has undergone several refinements over the
57
years (see Ackerman and LeSourd 1997, Lieber and Scalise 2007), the fundamentals of the
approach have largely remained intact.
Given the premise outlined above, we may state that if compounds are in fact words, and words
are considered “syntactic atoms” (DiSciullo and Williams 1987), then establishing what
constitutes a compound involves examining how it responds to syntactic based tests. Many such
tests have been proposed, most of which are based on criteria that reflect the behaviour and
functionality of morphological and syntactic items.
Before discussing the syntactic tests mentioned in the literature, however, it is important to note
that the identification of compounds has in fact involved criteria from several domains.
Orthographic markers such as spaces and hyphens have received some attention, but as
Mathieu-Colas (1994) shows, they exhibit too much variation to be of much use in discerning
between phrases and compounds. Several phonological criteria have also been proposed over
the years, but many of them have proven either inconsistent or incorrect. For example, the oft-
cited stress criterion for English compounds, popularized by Marchand (1960) and formalized in
Chomsky and Halle (1968) as the Compound Stress Rule, states that stress is located on the left-
most constituent of a compound, but on the right most constituent of a regular noun phrase. This
rule, however, has been shown to vary greatly between otherwise identical constructions (for
instance, apple pie is left-stressed, while apple cake is right-stressed, but both are typically
considered compounds; see Bauer 1998). This particular criterion is also language dependent: it
is not relevant for languages that lack lexical stress, such as French. As for morphological
criteria, Bauer (1978) discusses plurality marking as a means of identifying compounds, stating
that inflection is usually present on the head. Rosenberg (2007), however, recently showed that
while all markings are possible (head-marking, external marking, and double marking) for
French compounds, double marking is by far the most frequent inflectional operation used for
indicating number (e.g. chèques-restaurants, secteurs-clés, lourds-légers, etc.). A prevalent
semantic criterion is related to a compound’s denotation and is included in many definitions of
compounding. Because compounds, like words, are naming units, they typically refer to a single
concept. Compounds are therefore said to possess both “a stable referent [and] a unitary
meaning” (Gaeta and Ricca 2009: 39). Thus, a zebrafish does not denote a zebra on the one
hand and a fish on the other, but rather a fish with properties like those of a zebra. This
particular criterion, however, is typically only useful for constructions containing more than one
58
noun. Nevertheless, it is widely accepted that compounds denote a single thing or concept, as
evidenced by its frequent inclusion in definitions of compounding.
The most prominent type of test, however, involves verifying whether a particular construction
is resistant to syntactic manipulation. Firmly rooted in the lexicalist tradition, these tests are
abundant in the literature. For examples on how these tests have been used to establish
compoundhood, the reader is invited to consult, among others, Allen (1978), DiSciullo and
Williams (1987), Ten Hacken (1994), Bresnan and Mchombo (1995), Bauer (1998), and Lieber
and Scalise (2007). For the application of these tests on French compounds, one should consult,
among others, Barbaud (1971), Gross (1988), Riegel (1988) and (1991), Anscombre (1999), and
Arnaud (2003). It should be noted that in many cases, these tests target the non-head constituent
as it is usually a compound’s most syntactically isolated unit (Bauer 1998). Of the numerous
tests proposed to identify compounds, three in particular stand out, all of which rely on the same
principle, namely the syntactic atomicity of words. These tests are summarized below as criteria
and are illustrated using examples from French.
The first test is most often used for instances of compounds that might otherwise be considered
nominal phrases with an adjectival modifier, but it may also be used with constructions
containing nouns. Because compounds function as single units, it is not typically possible to
modify the individual constituents—instead the modification must apply to the entire
compound.
(15) Criterion 1: A compound’s constituents may not undergo modification
a. sage-femme *[sage [jeune femme]] → [jeune [sage femme]]
*[[très sage] femme]
b. bureau de poste *[[bureau climatisé] de poste] → [[bureau de poste] climatisé]
The second test involves coordinating the elements of a compound with those of others. While
phrases typically allow for units to be coordinated, compounds do not allow for coordination to
occur, where such an operation produces either an ungrammatical construction (as in 16a) or a
strange or incorrect interpretation (as in 16b).
59
(16) Criterion 2: A compound’s constituents may not undergo coordination
a. *son beau-frère et père → son beau-frère et (son) beau-père
b. ?des bancs de sable et de neige → des bancs de sable et des bancs de neige
While the coordination (16a) is unacceptable, the one in (16b) gives rise to a strange,
contradictory reading, namely that the bank is made of both sand and snow. It should be noted
that the coordination test may also produce ungrammatical constructions for phrases if the
elements being coordinated are not semantically related (e.g. *an artificial heart and island).
The third test verifies whether a constituent is in fact independent by attempting to refer to it
using an anaphoric pronoun. Although most phrasal elements may be referenced in this manner,
a compound’s constituents may not function as antecedents.
(17) Criterion 3: A compound’s constituents may not serve as a reference for an anaphoric
pronoun
a. *C’était un délicieux café crèmei même s’il y eni avait pas assez.
b. *La base de donnéesj est maintenant en ligne; ellesj appuieront votre recherche.
Many other tests have also been proposed under a variety of names, most of which involve
similar syntactic based constraints to those just discussed. Their purpose, however, remains the
same: to distinguish between a freely formed phrase and a compound. A few questions
regarding the use of such tests do arise, however. First, how many of these criteria should apply
before a decision may be made regarding the status of a particular construction? In other words,
if we limit ourselves to the three syntactic tests above, must a combination conform to all of
them in order for it to be considered a compound, or is one perhaps sufficient? Second, are these
tests truly conclusive? In other words, if a particular combination fails to meet a series of
stipulated criteria, does this guarantee that it isn’t actually a compound? Bauer (1998),
examining a set of seven frequently discussed tests for identifying NN compounds in English,
including those discussed above, concludes that “none of the possible criteria gives a reliable
distinction between two types of construction” (78). His study shows that some compounds fare
better with certain tests than others, but that the overall degree of correspondence may not be
substantial enough to be used to categorically deny or affirm compoundhood.
60
Regardless of these issues, syntactic criteria has arguably served as the prevailing method for
distinguishing between compounds and phrases. Of course, one must first establish which types
of constructions warrant testing. In English, the focus has largely been on pairs of appositional
nouns or nouns with an adjectival modifier. When considering other languages, however, the
focus may be shifted to other types of multi-word constructions. As we will see in the next
section, this is certainly the case for French.
3.1.1.2 Compounding in French
Darmester’s (1874) treatise Traité de la formation des mots composés dans la langue française
is considered one of the earliest works to exhaustively look a French word-formation. Although
he uses the term mot composé, it should be noted that his usage covers a broad range of
operations, some of which would no longer be considered instances of compounding. The four
major classes of compounds identified by Darmester are as follows:
(18) a. compounds by juxtaposition coffre-fort, blanc-bec, toujours
b. compounds with a particle malheureux, biscuit, non-pareil
c. true compounds chou-fleur, arrière-cour, portefeuille
d. compounds from other languages jurisprudence, acrobate, auberge
The classification above may give the impression that Darmester’s study was perhaps too broad
and insufficiently granular, but this is not in fact the case. The categories above represent his
major types, each of which includes many subtypes based on a diverse set of criteria. His
treatise offers a rich description of word-formation in French, but, given its 19th century origins,
it relies heavily on etymological evidence for its classification, often retaining as compounds
items that few speakers would ever submit to decomposition (e.g. biscuit, from bis-cuit =
‘galette cuit deux fois’).
More recent research on French compounds has instead focused on grouping together
compounds based on the lexical categories of their elements. Gross (1988), considering only
nominal compounds consisting of no more than two major lexical categories, proposes 26 types.
His typology is reproduced in the following table:
61
Table 3.1. Gross’s (1988) typology of French nominal compounds.
Type Example(s) Type Example(s)
N de N une pomme de terre, un coup de force V Conj V un va-et-vient
NAdj un cordon-bleu, un cercle vicieux VAdv un frappe-devant AdjN un blanc-bec, un grand ensemble à N un à-coup, un à-côté NN un café-filtre, un cheval-vapeur contre N un contre-projet, une contre-allée Npartprés un poisson volant, un chat-huant sur N du sur-place, le sur-moi N par N la preuve par neuf sans N un sans-culottes, un sans-abri
N en N un arc-en-ciel, une entrée en fonction arrière N une arrière-saison, un arrière-train
N à N une pelle à gâteau, une roue à aubes avant N un avant-projet, une avant-scène
N Prép N de la sculpture sur bois Prép Pro un chez-soi VN un gratte-papier, un crève-cœur Prép Adv un en-avant V Prép Inf un pince-sans-rire Adv Partprés un bien-pensant V Prép N un tire-au-flanc Adj Prép N un haut-de-forme Vimpér Pro un rendez-vous Numér N un trois-pièces, un dix-tonnes
Using Gross’s work as the foundation for his own study of French compounds, Mathieu-Colas
(1996) proposes what amounts to the most exhaustive attempt at cataloguing nominal
compounds using lexical categories. His typology includes 17 major classes alongside 8
complementary classes (for compounds containing more than two lexemes), each of which
contains numerous sub-classes. The typology is said to consist of over 700 compound types.
What Mathieu-Colas’s classification thus reveals is that nearly all lexical categories may be
combined to produce a nominal compound. No doubt due to its unwieldy nature, Mathieu-
Colas’s typology has not been widely adopted, though it does remain useful when distinguishing
between certain subsets of compounds (e.g. Adjective + Nominalized Participle = un mauvais
perdant ~ Adjective + Participle = un nouveau né). A number of smaller, more restrained
typologies of French compounds have been put forward in recent years (e.g. Zwanenburg 1992,
Corbin 1997, Brousseau and Nikiema 2001, Fradin 2009), many of which vary greatly in the
types of constructions they include. Moreover, classifying compounds based on the lexical
categories of their constituents remains pertinent for a number of different languages:
MorboComp, a multilingual database of compounds, contains 110 possible combinations based
on an analysis of 23 different languages (Scalise and Vogel 2010). Given its prevalence in the
62
literature, an approach based on lexical categories will also serve as the basis for my
investigation of French compounds.
3.1.1.3 Which Compounds Should We Investigate?
As was discussed in Section 3.1.1.1, a discussion of compounding first requires that we
distinguish between instances of morphological units and instances of freely formed phrasal
units. This step presumes, however, that some combinations may be either. Looking at the major
combination types present in French, we find that some are more likely to be acknowledged as
instances of compounds than others.
Just as in English, NN constructions are the most widely accepted type of compound for French
and other languages (for French, see, among others, Bauer 1978, Riegel 1988, Gross 1996,
Lesselingue 2003, Arnaud 2003, Takada 2008, Fradin 2009). Much has been said regarding
these types and the tests discussed in Section 3.1.1.1 lend support to treating the vast majority of
these combinations as compounds (see previous references). A second type of combination also
widely viewed as an instance of compounding in French is the VN construction, which is
typically understood as French’s synthetic compound (see, among others, Roussarie and
Villoing 2003, Villoing 2002, 2003, 2009; Rosenberg 2008, 2011). According to Corbin (1997),
these two types of constructions are the only true instances of compounds in French, all other
constructions being lexicalized phrases. But even the most widely of accepted types have been
denied compoundhood status by some: Noailly (1990) and Fradin (2003), for instance, argue for
a very restricted class of NN compounds, and DiSciullo and Williams (1987) consider most
nominal combinations, including VN constructions, syntactic words, that is to say, phrases
inserted in N position. Again, the question driving the debate rests, in most cases, upon the
morphological and syntactic distinction discussed earlier: which combinations are governed by
morphological principles and which obey syntactic rules? Despite the range of positions held on
the matter, there is nevertheless a sufficient body of work to support treating NN and VN
constructions as morphological objects and thus compounds.
Another major class of construction that raises questions regarding their status as compounds
are those involving adjectives. In French, adjectives may be preposed or postposed to the noun
63
they modify, a fact that manifests itself quite clearly in constructions typically viewed as
compounds, as the following examples show:
(19) a. AN rouge-gorge, belle-mère, petit fils
b. NA cordon bleu, maison blanche, accent aigu
The challenges posed by these types of construction are related to the question of where one
should draw the line between compound and freely formed phrase. After all, even if we wish to
use lexicalization as a possible criterion, we must still admit that they are first introduced as
phrasal units. Corbin (1992), who rejects any phrase-like combination, argues against AN and
NA compounds (see also Fradin 2003, Rosenberg 2007, Gaeta and Rica 2009). The
modification test discussed in the previous section, however, works well for these examples and
is therefore the typical method employed to label them as compounds (e.g. *un [cordon [bleu
foncé]]). Alternatively, we might also wish to assess these cases according to a set of semantic
criteria, such as compositionality. This approach is attractive, given how easily AN and NA
compounds allow for both literal and non-literal readings. Presumably, freely constructed AN
and NA phrases would only permit a literal reading. If we revisit the examples in (19) above, the
differences are quite clear:
(20) a. belle(-)mère = ‘beautiful mother’ or ‘mother-in-law’
b. cordon bleu = ‘a blue tie/string’ or ‘an excellent cook’
This approach, however, does pose a problem for studies in semantic transparency. Retaining
only those AN and NA constructions that possess a non-literal meaning will result in a study of
transparency that relies heavily on the least transparent constructions. Consequently, if this
criterion is deemed sufficient, it must also be applied across all types, discarding NN
combinations such as pause-café and auteur-compositeur in the process. Such a result is highly
undesirable as these types are arguably those that will most benefit from a typology of semantic
transparency, not to mention their status as prototypical compounds in French. Moreover, NA
and AN compounds differ significantly from other compound types in that the relationship
between their elements is seldom ambiguous. In other words, combinations involving an
adjective are largely all attributive in nature. Even in the most figurative or lexicalized instances,
the attributive relation held between the modifier and the head is typically retained. In cordon
64
bleu, for instance, bleu still modifies cordon (i.e. cordon qui est bleu), despite the fact that the
compound refers to an individual22. This stability greatly contrasts with constructions involving
two nouns, where the relation may be realized as a predicate with several possible values (Allen
1978).
Given these observations, as well as the overall difficulties associated with establishing
compoundhood for NA and AN combinations (see Van Goethem 2009 for a recent cross-
linguistic look at the many issues involved), these types are not included in this study. While
some may object to this methodological choice on the grounds that I am ignoring a potentially
rich set of data, the scope of this project must nevertheless remain narrow if it is to successfully
investigate multiple factors in semantic transparency. Compounds involving adjectives,
although no doubt relevant to the discussion, would only add additional complexity to an
already encumbered investigation.
Another type of French construction, not found in English but present in most Romance
languages23, involves two nouns linked by a preposition, such as lune de miel, moulin à vent,
arc-en-ciel, etc. These types have been treated as compounds by several different researchers
over the years (Giurescu 1975, Gross 1988, Anscombre 1990 and 1999, Bosredon et Tamba
1991, Mathieu-Colas 1996), but they have also had their share of detractors (Corbin 1997,
Rosenberg 2007, Booij 2007, Fradin 2009). Those who reject these types as compounds do so
for many of the same reasons that lead to the dismissal of AN and NA combinations, namely
that they are instantiated in the syntax. As Booij (2007) says:
“The structures N à N and N de N are instantiations of the syntactic structure [N PP]NP, a noun phrase consisting of a head N followed by a PP complement, and have developed into constructional idioms. Such phrases are functionally equivalent to compounds in Germanic languages, and that is why the mistake is made to consider them compounds.” (Booij 2007: 83)
22
The etymology of the construction confirms the attributive relation in which bleu modifies cordon: “Se dit figurément et par plaisanterie d'une cuisinière très-habile (Ac. 1835-1932); plaisanterie qui porte sur l'éminence du grade de cordon bleu et sur l'ancien tablier bleu des servantes” (TLFi). 23
For Spanish, see Rainer and Varela (1992); for Italian, see Scalise (1992).
65
According to Booij, this erroneous account of N Prep N constructions as compounds is largely a
matter of a misguided cross-equivalency, which is to say that because toolbox is a compound in
English, its French equivalent boîte à outils is also a compound, regardless of its structure.
Furthermore, both English and French are head-initial languages, a fact that secures toolbox’s
status as a compound given its right-headedness, but which calls into question boîte à outils’s
own status considering its distinctly phrase-like structure. While Booij’s original argument is no
doubt well-founded, it would also mean the rejection of AN and NA constructions as they are
also noun phrases. Yet he considers some AN English constructions compounds and discusses
the modification test as proof of this classification (e.g. [dark [blackboard]] and not *[[dark
black] board]]). If the modification test does in fact indicate compoundhood, then many N de N
and N à N constructions should be treated as compounds: compare, for instance, *[[boîte vide]
de conserve] and [[boîte de conserve] vide]. Moreover, as was shown in Section 3.1.1.1, these
types of constructions also typically fail both the coordination and the anaphora tests.
It is also worth noting that the N Prep N constructions usually taken to be instances of
compounds lack determination in the PP’s NP complement, a fact that is not typical of French
NPs24. Thus, there is a clear difference between otherwise identical pairs of nouns connected by
a preposition. The following examples from Cadiot (1997: 104) illustrate this point:
(21) a. bac (à + ?au) sable fin / bac (*à + au) sable mouillé
b. sac à dos / sac au dos
In (21a), Cadiot argues that the first construction instantiates a type, while the second a token. In
(21b), he discusses referentiality: dos is only referential in the definite construction. The contrast
between these constructions shows that the absence of a determiner has a clear semantic effect
on the whole. Whether this is truly evidence of compoundhood is unclear, but it arguably lends
support to the notion that N Prep N constructions without a determiner possess behavioural
qualities that set them apart from regular N PP phrases. Moreover, as many authors have already
24
A highly frequent and productive instance of bare nouns with PPs in French involves the partitive construction (e.g. un morceau de gâteau, un litre de lait, une pointe de pizza, etc.). Although the tests discussed earlier might suggest that in some cases they are compounds (e.g. un ?[morceau de [bon gateau]] ~ un [bon [morceau de gâteau]]), they are not typically viewed as such.
66
shown, many of the atomicity criteria advanced to determine compoundhood apply to these
constructions, which only further confuses the matter (e.g. *un sac à dos et à main).
As was stated at the outset, the purpose here is not to establish once and for all what constitutes
a compound, but rather to shed some light on the object of study itself so as limit the scope of
the analysis that will follow. Unsurprisingly, exploring how semantic transparency relates to
compounds requires that we first define a compound. The discussion has shown, however, that
such a definition is not without its challenges. Numerous compound types have been proposed
for French, but not everyone agrees on which of these constructions should be included in this
class.
As a means to skirt controversy, this work will primarily focus on attested NN compounds.
These constructions are not only the least likely to be confused with syntactic phrases, but also
the most widely studied type of compound in several different languages, including French.
Moreover, as previous research has shown, NN compounds do not overtly communicate the
relational association held between their constituents, a fact that distinguishes them from other
types, including N Prep N constructions where the linking unit is said to provide some relational
information (Cadiot 1997).
That said, in order to provide additional insight into the arguments and hypotheses presented in
the following chapters, I will also look at instances of N à N constructions involving bare nouns.
These will be treated here as compounds, despite the fact that this is not a widely held position.
The choice of N à N compounds over other N Prep N constructions is mainly due to two facts.
First, they have received a great deal of attention over the years and their treatment as
compounds is perhaps less controversial than it is for other N Prep N constructions (see, among
others, Anscombre 1990, 1999, Bosredon and Tamba 1991, Borillo 1996). Second, the
preposition à is widely considered more semantically restrictive than de, yet less so than other
prepositions. The analysis of N à N constructions will thus allow for a broader investigation of
semantic transparency while still retaining a narrow focus, especially as it relates to factors such
as headedness and semantic relations. Whether they are considered compounds may not in fact
have much of an impact on the observations and conclusions I make, as semantic transparency is
understood to apply to both syntactic and morphological objects.
67
The main criterion used in the selection of the compounds at the heart of this study is therefore
structural: pairs of appositional nouns with or without the preposition à between them.
Additional restrictions are introduced in the following sections, but these are primarily used to
ensure that the items under investigation are in fact composed of nouns. The data I will use to
support the hypotheses and theories explored in the following chapters come from Wiktionary
(this will be discussed in greater detail in Section 3.2.1). By relying on a single lexicographic
source, along with a purely structural criterion, I hope to avoid introducing unwanted bias into
the data. In other words, no personal judgments are made with regards to the types of
constructions examined, the result of which should be a data set containing compounds that span
the entire spectrum of semantic transparency.
3.2 Data: French Nominal Compounds
Online dictionaries are plentiful, but not all are readily accessible for research purposes. The
English language famously has WordNet (Fellbaum 1998), a lexical database that many would
consider the gold standard for electronic lexicographic research. WordNet has served as the
basis for many projects, primarily because of its rich ontology organized around groupings
called synsets. It has been used in works on word sense disambiguation (Li et al. 1995, Banerjee
and Pedersen 2002, Canas et al. 2003), data mining and text extraction (Tan et al. 2000,
Andreevskaia and Bergler 2006), as well as in research more closely related to the present work,
such as in the semantic analysis and automatic processing of compounds (Kim and Baldwin
2005, Costello et al. 2006). One of the key factors in the WordNet project’s appeal in secondary
research is its Application Programming Interface (API), a software layer that allows for anyone
to retrieve information from WordNet’s database from one’s own system. This API has, over the
years, been ported into a variety of programming languages (e.g. Java, PHP, Perl, etc.) and it is
precisely this degree of openness that has played a pivotal role in the development of a number
of third party tools25, making WordNet the de facto resource for lexicographic work in English.
French lexicography, despite its rich history (see Bavoux 2008 for a broad overview), has not
produced as successful a tool as WordNet for the French language. While there are many French
25
See related projects at <http://wordnet.princeton.edu/wordnet/related-projects>.
68
language dictionaries with a strong online presence, namely le Trésor de la langue française
informatisé (TLFi) and Le Petit Robert (Rey-Debove and Rey 2010), few offer a degree of
access similar to that of WordNet. This trend could be changing now that many French language
dictionaries are being developed as Web-only resources, such as Usito, first developed at the
University of Sherbrooke under the name Franqus (Cajolet-Laganière et al. 2010) and the Dire
Autrement project at the University of Ottawa (Hamel 2010). What most, if not all of these
online reference tools still lack, however, is free and open access to their lexical databases. This
effectively makes projects that target a specific subset of words or expressions, such as
compounds, difficult. In some cases, specialised online dictionaries have been developed to
address this need, such as with Mathieu-Colas’s (1995) collection of over 12,000 French
compounds. Access to the database, however, is limited to queries via the dictionary’s Web
interface. Even more problematic, the pre-defined manner of consultation of the data is crippled:
one can only view a maximum of 10 records per search and only a small amount of information
is provided for each entry (i.e. mainly plural and gender for a given compound). More recently,
the MorboComp group at the University of Bologna is working on a more expansive repository
of compounds via their CompoNet project, a multi-lingual database of compounds from over 20
different languages (Guevera et al. 2006). This resource is said to contain a great deal of
information on a variety of compounds, including labels for headedness, classification type, and
lexical categories, but at the time of writing, the project remains closed and its database
inaccessible to the general public.
There exists, however, an alternative online resource appropriate for lexicographic based
research. Wiktionary is one of the WikiMedia foundation’s projects, a sister site to the well-
known Wikipedia project. It functions in the same way as Wikipedia: it is an information
storehouse managed by the online community. Anyone can add a word to Wiktionary’s
database, modify an entry, improve a definition, reclassify a word, or even provide an
etymology. Of course, just like Wikipedia, Wiktionary is subject to some degree of vandalism
and inaccurate information, but this hasn’t stopped investigators from using the dictionary in a
variety of research projects (Zesch et al. 2008, Müller and Gurevych 2009, Navarro et al. 2009).
Wiktionary’s appeal is largely due to the impressive number of languages represented: the site
69
contains entries for 158 different languages, 22 of which contain over 100,000 articles26.
Perhaps more importantly, however, Wiktionary’s text is available under a Creative Commons
Attribution/Share-Alike Licence, which, among other things, means that the information it
contains can be used freely as long as attribution is stated. This openness is compounded by the
fact that all Wikimedia projects offer free access to an API with which to connect to its sites, as
well as frequent dumps of its databases so as to conduct local research using copies of the
information available online.
3.2.1 The Wiktionary Database
I downloaded an XML dump of the entire French language version of Wiktionary [version
20110204, Feb. 2nd 2011], which included nearly 2 million lexicographic entries. This XML file
weighed in at approximately 1.5 GB of data and needed to be converted into a much more
manageable format in order to be parsed. The file was therefore converted into a MySQL
database using the MWDumper Java application provided for exactly these purposes by the
WikiMedia foundation. Wiktionary’s UTF8 encoding was preserved so as to ensure that all
French accents and special characters would not be lost during the conversion.
Unfortunately, compounds are not identified as such in Wiktionary. The closest category to
compounds is “Locutions nominales en français,” but this class of items contains a wide range
of constructions from acronyms such as ADN and MMORPG to very long fixed expressions
such as la goutte d’eau qui fait déborder le vase and temps que les moins de 20 ans ne peuvent
pas connaître. The only real criterion for inclusion of a multiword lexeme seems to be its status
as a noun or that it be headed by one. There is also some degree of inconsistency with the use of
this particular label such that clé à molette is listed as a “Locutions nominales en français” and
clé à chaîne as “Nom commun en français.” Nevertheless, this category proved to be a first step
in identifying constructions that might be included in a database of French compounds.
Additional information needed to be added to the data dump in order to extract the words
labeled as “Locutions nominales en français,” mainly an SQL dump of the Categorylinks table.
This table allowed me to cross-reference individual pages with the Locutions category, which
26
See <http://en.Wiktionary.org/wiki/Wiktionary:Statistics> for additional statistics.
70
then allowed for a cross-referencing of the extracted words with the page revisions and any text
(definition, examples, etymology, etc.) associated with a particular word (see Appendix C for a
truncated version of Wiktionary’s database schema).
All words labeled as “Locutions nominales en français” were thus extracted from the
Wiktionary database, which produced 10,269 entries. This first group, however, only accounts
for part of the data. As I mentioned earlier, the dictionary presents a number of inconsistencies
(i.e. clé à chaîne is listed as a common noun, the same status assigned to simplex lexemes). A
quick search through the extracted locutions revealed that many common compounds were
missing from the data (e.g. moulin à vent, grand-père). Is there, then, an accurate way to
identify compounds in the Wiktionary database? Unfortunately, the answer is no. There are
however a few workarounds that allowed for the extraction of additional compounds: knowing
that Wiktionary encodes spaces in words as underscores and that hyphens are retained, I was
able to cross-reference these entries with the “Noms communs” category. This netted me an
additional 1,120 compounds separated by at least one space and 6,836 compounds containing a
hyphen. Unfortunately, there is no effective way to identify fused compounds (e.g. monsieur,
malpropre). This isn’t necessarily a problem as many of these fused compounds are sufficiently
lexicalized so as to not be decomposed by speakers. Of course, this is not true of all such
compounds, but given that Wiktionary’s typological schema does not allow for fused
compounds to be automatically identified, the decision was made to forego incorporating this
type into the present study. Regardless, the initial dataset thus contained 18,224 potential
compounds, a number that needed to be greatly reduced if any headway was to be made with the
corpus.
It is important to note that, despite its advantages, the decision to use Wiktionary to assemble
the compounds for this work still presents several issues that will no doubt raise concerns
regarding the validity of the data examined. As was mentioned in the previous section,
Wiktionary is a lexicographic resource driven by the public, which means that the information
included in its database may not always be based on the same methodological rigour adopted by
traditional dictionaries. The result is not only noise in the source data (e.g. mislabeled items,
incorrect information, etc.), but also the possibility of questionable entries making their way into
the final dataset. Such entries may take the form of novel words that have yet to gain widespread
usage or constructions that belong to only a small subset of French language speakers. While on
71
its face, such criticism is entirely warranted, it must be stressed that the purpose of the data is to
lend support to a study of semantic transparency that focuses on the interpretation of complex
units. In this regard, the data extracted from Wiktionary must simply meet two particular
criteria: one, that they may be said to belong to the French lexicon (i.e. consist of French words)
and two, that they possess meaning. Even in instances where an entry is deemed marginal or too
recent to be included in traditional lexicographic works, its semantic transparency may
nevertheless be assessed if it is said to possess a sense, regardless of its status within a broader
linguistic context.
Moreover, the French data extracted from Wiktionary cannot reliably be sorted according to
language variety, register, or style. In other words, there is no way to distinguish, for instance,
between compounds used in varieties spoken in Europe and those in Canada. While this
distinction is not taken into account in this work, it is understood, however, that the processes
behind compounding may differ between varieties of French, where one particular group may
make use of simple appositional nouns (e.g. fr. hex. arrêt-maladie), while another may prefer
variants involving a preposition (e.g. fr. can. congé de maladie). Once again, the most important
factor in the retention of entries is that the compound’s constituents (and not necessarily the
compounds themselves) be members of a standard French lexicon (i.e. be included in a standard
French lexicographical work). Consequently, the work presented in this thesis sets aside the
effects, if any, that language varieties may have on semantic transparency. Again, given that the
focus here is on the synchronic interpretation of compounds from a speaker independent
perspective, these differences are unlikely to have a considerable effect on the findings
presented in this work.
3.3 Selecting Compounds and Cleaning up the Data
In order to reduce the scope of this project and to ensure some measure of feasibility, not all of
the 18224 entries extracted from Wiktionary could be retained. The next step therefore consisted
of removing any undesirable candidates from this first list. This was done using Google
72
Refine27, version 2.5, a spreadsheet-like application that allows for quick and easy data
manipulation.
3.3.1 Which Compounds to Include?
Some decisions had to be made regarding the types of compounds that were to be included in
the final database. This project was never intended to be an exhaustive repository of compounds,
but rather a consistent compilation of a subset of multi-word lexemes to be used for the purposes
of research on the semantics of compounds. The data to be retained are therefore selected
according to the following criteria.
First, because this thesis’s object of study is the nominal compound, the principal criterion for
inclusion is a compound’s status as a noun. Second, only those compounds that are in fact
binary constructions, that is to say, compounds constructed with no more than two semantically
full constituents, are to be included in the database. This particular criterion is meant to allow
for compounds with constituents joined by a preposition (e.g. moulin à vent, haut de forme).
There are, however, a number of constructions in the initial dataset that follow a more syntactic
structure, namely as N PPs where a determiner is present (e.g. tout à la rue, base de l’économie,
voix dans le champ, etc.). Although these types are not the focus of my study of semantic
transparency, I will presume nothing regarding their status alongside compounds containing
bare nouns, that is to say, they will be treated as compounds for the purpose of populating the
database. The primary focus of this work, however, will be on compounds where the determiner
is in fact absent.
Furthermore, I chose to set aside compounds containing onomatopoeias (e.g. pan-pan),
acronyms (e.g. langage XML) or loan words (e.g. curriculum vitae). The latter type was
discarded because its constituents are not, in most cases, lexemes in French and are therefore
seldom meaning units for French speakers. The former cases will be ignored because, on the one
hand, onomatopoeias do not bear any meaning and, on the other, acronyms are in fact multi-
word lexemes themselves. The data also contained compounds constructed on single letters such
27
See: <http://code.google.com/p/google-refine/>. The project has since been renamed to Open Refine (<http://openrefine.org/>).
73
as h muet, v lingual and hauteur d’x. These entries were also discarded as, once again, it is
unclear that these constituents carry meaning in the same way that simple lexemes do.
Finally, compounds containing proper nouns were also rejected. One reason for their exclusion
is that proper names, unlike common nouns, have no real meaning: they can only serve as a
reference to something in the real world and in many cases rely heavily on extralinguistic
information. Most compounds containing proper nouns refer to a specific place or person (e.g.
cheval de Troie, acide de Bronsted). Not only do these compounds require that the speaker have
specific knowledge of these individuals and locations, but many are also used as labels for
particular entities. In other words, a compound such as Océan Atlantique is used specifically to
refer to an entity and cannot therefore be said to have meaning. This is in stark contrast to
compounds that contain only common nouns (e.g. beau-frère, mot de passe), which are used as
generic labels that may have both a denotation and a reference.
In summary, the set of compounds retained for the database portion of this project corresponds
roughly to classes VII through XV28 of Mathieu-Colas’s typology (1996: 72), as shown in the
following table:
Table 3.2. Major classes from Mathieu-Colas’s typology retained for the present study.
Class Construction Examples
VII Composés sur VERBES tire-bouchon, couche-tard, porte-à-faux
VIII Composés sur ADJECTIFS clair-obscur, haut de forme, franc-parler
IX Composés ADJECTIF + NOM beau-frère, mauvais perdant, haut-parleur
X Composés NOM + ADJECTIF trou noir, pigeon voyageur, cerf-volant
XI Composés NOM + NOM appareil-photo, maître cuisinier, sourd-muet
XII Composés NOM + de + X prise de sang, rond de cuir, dessous-de-plat
XIII Composés NOM + à + X brosse à dent, chair à canon, pomp à eau
XIV Composés NOM + en + X retour en arrière, arc-en-ciel, mise en garde
XV Composés NOM + AUTRES PRÉP + X preuve par neuf, vol sans escale, hockey sur glace
28
This distribution accounts for the majority of preserved compounds. The database, however, does contain a number of compounds that fall within the scope of other classes, such as those with numbers (e.g. un deux-roues and les dix commandements).
74
3.3.2 Reducing the initial dataset
The first step in reducing the number of entries was to remove the most easily identified
undesirable candidates. Because some of these entries contain similar constituents, there is some
degree of overlap. In total, 4,146 entries were discarded according to the following criteria:
i. 199 entries were not in fact compounds, but instead single lexeme that had been
mislabeled (e.g. jalon, durabilité). Some of these entries were simply single string
acronyms (e.g. BD, ARN), while a few were fused variants of hyphenated compounds
labeled as “Locutions nominales en français” in Wiktionary (e.g. basselisse,
bonnevoglie).
ii. 352 entries contained at least one acronym (e.g. ADN chimère, système DORIS) or a
single letter abbreviation (e.g. bombe H, J-pop). These entries were identified using
regular expressions29: any sequence of more than two uppercase letters, any isolated
single uppercase letter, or any entry containing a period was flagged. Unfortunately, it is
difficult, if not impossible to identify any lowercase acronyms. These would be removed
manually later.
iii. 82 entries contained at least one numeric character (e.g. 100 mètres, Web 2.0).
iv. 32 entries contained non-Latin characters (i.e. Arabic loanwords, Greek letters, etc.)
v. 3,481 entries containing proper nouns were removed. Though it can be argued that some
of these entries should be retained because they contained capitalized words without
being actual proper names (e.g. Casque bleu, homme d’État), many of these entries were
duplicated elsewhere in lowercase and were thus retained in their common noun variants
(i.e. casque bleu, homme d’état).
Once the entries above were discarded, the next step consisted of removing any compound
comprised of more than two lexical words (following the criteria outlined in section 3.3.1). Any
29
Regular expressions are sequences of symbols that allow for the selection of character strings based on specific patterns.
75
entry containing 5 words or more was immediately discarded as these compounds necessarily
consist of more than two semantically full words (326 entries). I then removed any 4-word entry
that followed a similar pattern (e.g. chien-guide d’aveugle, laine lavée à chaud), but kept those
entries in which two words were linked by a preposition followed by a determiner (e.g. histoire
de l’entreprise, vente à l’événement), as already discussed in section 3.3.1 (513 entries).
All 3-word entries that did not satisfy the previously mentioned criteria were also discarded.
Thus, words such as huile à broche and agence d’architecture were retained, while entries like
intervalle éclair-son and parabole semi-cubique were discarded. Distinct prefixes, that is to say
those that have not been fused (e.g. parabole semi-cubique), were treated as lexemes at this
stage and were therefore discarded if they resulted in 3 word constructions (490 entries)30.
When there was any doubt as to whether a lexeme was in fact a prefix, the Petit Robert 2010
was consulted.
Following these exclusions, the candidate list was thus reduced to 12,914 entries. There
remained compounds, however, that needed to be discarded because they either consisted of
foreign words or contained prepositions in positions that cast doubt on their classification as
nominal compounds. There were also at least 194 duplicate entries because of variants in
spelling (e.g. belle-mère ~ belle mère), but these redundancies were merged into a single entry
according to the form that Wiktionary treated as the base (i.e. whichever entry the others pointed
to).
The simplest way to identify the remaining undesirable candidates was to label each of the
remaining compounds’ constituents with the appropriate lexical categories, a process that was
largely automated, but that nevertheless required a considerable amount of manual input.
3.4 Labeling the Entries
In order to further expand on the information associated with the entries, as well as to remove
any additional undesirable candidates, each individual lexeme needed to have its lexical
30
Many prefixes are in fact fused to other lexemes, making it impossible to identify them automatically. Most of these were discarded later on a case by case basis.
76
category identified. This time consuming task was facilitated with the help of a computer
assisted labeling system developed expressly for this project.
3.4.1 Automatically Assigning Lexical Categories
Labeling the constituents of the nearly 13,000 entries remaining in my data would prove
difficult and very time-consuming. Fortunately, Wiktionary’s API allows for the direct
extraction of information associated with each one of its dictionary entries. Unfortunately, this
API does not allow for the targeted extraction of lexical information, such as lexical category,
gender, definition, etc. I therefore wrote a series of APIs able to request and extract specific
information from within the text of a given dictionary entry. A simplified version of this parser
is available to the public on my personal website31. It supports both English and French
Wiktionary databases natively, can easily be adapted for other languages, and has been designed
to work with either the Web version of Wiktionary or a local copy of its database.
Wiktionary’s lexical categories for French lexemes are labeled within pairs of curly brackets. If
a word has multiple entries of the same lexical category, then “num” is used to distinguish
between acceptations of a word. When an entry is in fact an inflected form, “flex” is used within
the tag. Notice that the language is also identified:
(22) a. {{-nom-|fr}} = French noun
b. {{-verb-|fr}} = French verb
c. {{-adj-|fr}} = French adjective
d. {{-adv-|num=1|fr}} = First acceptation for a French adverb
e. {{-flex-adj-|fr}} = Inflected French adjective
Because Wiktionary has adopted a very “loose” standard for tagging words (presumably, to
facilitate contributions by laypersons), there is often a considerable amount of variation in the
format of these tags. This variation required that the parser be flexible in its identification of
category tags. For example, the following pairs of labels are functionally equivalent, despite
their superficial differences:
31
<http://www.igrec.ca/projects/wikparser>
77
(23) a. {{-nom-flex-|fr}} = {{-flex-nom-|fr}}
b. {{-adj-|fr|num=1}} = {{-adj-|num=1|fr}}
The first API function is a simple aggregation of lexical categories for a given lexeme. In other
words, a lexeme is fed to the API, which then scrapes the Wiktionary entry for that lexeme and
returns every possible lexical category associated with that particular word. Because I was not
interested in the number of acceptations for a given word, the parser fuses any redundant labels
(i.e. {{-nom-|num=1|fr}}{{-nom-|num=2|fr}} returns {{-nom-|fr}}). This function was used to
tag the first word of a compound. Because the first word was treated in a context independent
fashion, very little could be done to improve the accuracy of the automatic labeling without
incorporating a probabilistic model (i.e. the function returns multiple lexical categories for many
words). This first pass therefore required that the results be refined manually.
The second lexeme in a two-word compound, however, does benefit from some context:
because we are strictly interested in nominal compounds, only a subset of lexical categories are
likely as the second word (W2) for a given first word’s (W1) category. For instance, if W1 is an
adjective, W2 will most likely be a noun.32 Because Mathieu-Colas’s (1996) typology is far too
granular for the development of an automatic labeling script (nearly 700 distinct categories that
allow for practically every possible combination of lexical categories), I instead relied on the
much smaller compound typology proposed in Fradin (2009). This classification is reproduced
in Table 3.3 on the following page. According to Fradin’s nomenclature, PSTPT stands for past
participle, PRSTPT for present participle, and PTCP for simply participle. The bolded letters
indicate the category of the resulting compound; the grey boxes are used to highlight the
combinations that generate nominal compounds33.
32 W2 could also be either a verb (e.g. un bas-voler) or an adverb (e.g. un haute-contre), but these combinations are far less frequent than A+N nominal compounds. The function, as described, is not meant to be 100% accurate, but rather to reduce the amount of time required to manually label a consituent’s lexical category. 33
The examples in the table are those supplied by Fradin (2009). Despite the fact that his only examples of Adv N compounds are fused lexemes, these combinations remain valid for my data as I have a number of similar entries separated by a hyphen (ex. arrière-plan, haut-parleur).
78
Table 3.3. Fradin’s (2009: 420) categorial distribution for French compounds.
N A V ADV
N
N prêtre-ouvrier poisson-chat jupe-culotte
N coffre-fort guerre-froide ------------------- N < PSTPT chassé-croisé roulé-boulé
V maintenir saupoudrer
*
A
N basse-cour, beaux-arts ------------------- N < Vinf franc-parler faux marcher
A aigre-doux ------------------- A < PTCP nouveau-né faux-fuyant
* *
V N brise-glace tire-bouchon
N gagne-petit pète-sec
V saisir-arrêter
N couche-tard passe-partout
ADV
N malchance malheur
A malpropre bienheureux ------------------- A < PRSTPT moins disant malvoyant
V maltraiter bienvenir
_
Regarding compounds joined by a linking unit (i.e. boule de neige, moulin à vent), Fradin
(2009) does not include them in his typology because “they are instantiation [sic] of the
syntactic structure [N PP]NP, a noun phrase consisting of a head followed by a PP complement”
(419). This analysis is, of course, also true of other compounds listed in the table (i.e. AN, NA,
AA < PTCP, V ADV), but he admits that setting these aside in his typology would entail a
revision of the notion of compounding, an undertaking he does not wish to tackle in his article.
As I stated earlier in this chapter, all of these constructions (including N Prep N) are treated as
compounds in the present study, following the works of, among others, Bauer (1978), Gross
(1988) and Mathieu-Colas (1996).
Taking these characteristics into account, the API function written to identify the lexical
categories of W2 in a binary compound is based on the following rules, where the label on the
left indicates the grammatical category of W1 and the list on the right the possible categories for
W2:
79
(24) a. N => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-flex-verb-|fr}}
b. A => {{-(flex)-nom-|fr}}, {{-verb-|fr}}
c. V => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-adv-|fr}}
d. Adv => {{-(flex)-nom-|fr}}
In the data extracted from Wiktionary, there are a number of compounds where W1 is in fact a
preposition (e.g. après-midi, hors-sujet). The bulk of these compounds seem to be Prep-N
combinations, but because Mathieu-Colas (1996) lists other possible categories for W2, the
following rule was added to the lexical categories function:
(25) Prép => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-(flex)-verb-|fr}}
The API function simply compares a given word’s categories as listed in Wiktionary with the
possible set stated by the rule and returns the results of the intersection. For instance, the lexeme
CHIEN lists both noun and adjective as attested categories, but if W1’s category is A[djective],
the function returns only {{-nom-|fr}} as the category for CHIEN. A similar function with rules
specific to those compounds linked together by prepositions was also written and used to
automatically label compounds such as rouge à lèvres and table des matières. The process is
largely inspired by a simple probabilistic bi-gram model (Manning and Schütze 1999), although
no probabilities are in fact used. Despite the limited predictive functionality of these rules, this
approach was nevertheless able to improve considerably the accuracy of the automatic labeling,
which meant that cleaning up the results took far less time than it did for those of W1. Some
work still needed to be done to ensure that words hadn’t been mislabeled, however, as the
intersected values often produced two or more possibilities, but on average, this approach
resulted in a third of the lexical category groups for W2 (for NN) and W3 (for N à N) than for
W1.
3.4.2 Which Lexical Category?
One of the major methodological quandaries associated with the identification of the lexical
categories for a given compound’s constituents has to do with the independent status of some
lexemes. For instance, while it can be said that the French word haut is primarily an adjective, it
can also be used as a noun (e.g. “Perché sur le haut d’un arbre”). These two particular
acceptations can also be observed within compounds:
80
(26) a. haut de forme → A de N (= ‘un chapeau qui est haut de forme’)
b. haut de chausses → N de N (= ‘le haut de la chaussette’)
While the examples above may seem relatively uncontroversial to some, there are other cases
where the assignment of a lexical category is not nearly as simple. Mathieu-Colas (1996)
attempts to reduce the potential for confusion, and perhaps even disagreement, by labeling the
compound’s constituents according to the independent lexical categories of the individual
lexemes and not according to the roles they play within the compound, but states that the latter
will nevertheless be retained in some form or another (121). For instance, haut de chausses in
(26b) is labeled A=n de N according to his methodology: the category on the left of the equal
sign indicates the lexeme’s contextually independent category, while the label on the right
specifies the lexeme’s category within the compound. In the case of adjectives, only those that
cannot easily be nominalized are labeled as such without any secondary category (ex. clair-
obscur → AA; Mathieu-Colas 1996: 121); otherwise, they are labeled according to the process
described above. This approach, however, presumes that a lemma has a primary lexical category
and that the process of identifying this category is a straightforward one. It can be argued that
verifying the etymology of a compound’s constituents in order to confirm which lexical
category should be treated as primary is a difficult and lengthy task, one that can sometimes lead
to the assignment of labels according to the personal biases of the researcher.
Other cases, however, prove even more troublesome, such as in the case of inflected verbs that
behave as either nouns or adjectives within a given compound. In Table 3.3 presented earlier,
for instance, Fradin (2009) identifies such cases similarly to Mathieu-Colas (1996), that is to say
by labeling the lexeme with both its lexical category within the compound, as well as its
independent category:
(27) a. un nouveau né → Fradin (2009): A A < PTCP
→ Mathieu-Colas (1996): A (ou A=adv) / Pp
b. un faux-fuyant → Fradin (2009): A-A < PTCP
→ Mathieu-Colas (1996): A (ou A=adv) / Pprés
Setting aside Mathieu-Colas’s indecisiveness with regards to the lexical category of the first
constituents, the compounds in (27) clearly pose a certain number of methodological problems.
81
For one, the constituents’ grammatical status within each compound is not entirely clear. In
other words, are né in (27a) and fuyant in (27b) adjectives by way of participles (as they are for
Fradin), or are they simply participles (as they are for Mathieu-Colas)? If we turn to another
similar compound, premier venu, the situation becomes even thornier when we look up venu in
the digital version of Le Petit Robert 2010 (henceforth LPR2010). The dictionary assigns both
adjective and noun as possible lexical categories for venu, citing compounds such as nouveau
venu and premier venu as justification for these labels. Similarly, LPR2010 labels né as an
adjective, again citing compounds as evidence of this category. It is not at all clear, however, if
there is independent evidence for these categories outside of the compound (i.e. ?un venu, ?un
animal né est...). The fact that other dictionaries provide different lexical categories for the same
lexemes only exacerbates the situation (Antidote RX v.5, for instance, lists venu and né as past
participles only). In order to ensure that the identification of a lexeme’s lexical category is both
consistent and accurate, the methodological approach adopted must be based on a relatively
small set of rules that is easily replicated.
The method for labeling a compound’s constituents that has therefore been adopted for this
work is similar to some degree to that of both Mathieu-Colas and Fradin, but with a somewhat
more rigid set of principles. First, a compound’s individual constituents will be labeled
according to their status within the compound if such a lexical category is attested elsewhere for
that particular lexeme. By attested, I mean identified as a possible category in LPR2010. If a
particular lexical category is not possible for a lexeme, then it is labeled according to its
independent category. The compounds haut de forme and haut de chausse are therefore labeled
in my data as they are in (26).
There is, however an exception to this practice, related in some fashion to the examples given in
(27). These are mostly cases where participles could be labeled as either a noun or an adjective
(by definition, a participle is in fact an inflected verb that functions as either one of those
categories). The reason for this exception stems from the manner in which LPR2010 identifies
the lexical categories of lexemes. In short, past participle and present participle are not listed
categories in LPR2010 (but they are in the TLFi and Antidote RX, for instance). I will therefore
label them Pprés or Ppass according to the previous guidelines, that is to say unless their role
within the compound is clearly attested elsewhere. In the case of premier venu, for instance,
although LPR2010 lists N as a possible category, it does so by virtue of its presence in
82
compounds such as premier/dernier venu. For this reason, these compounds will be labeled as N
Ppass constructions because N is not truly an independent category for the lexeme venu. This
approach does not seem all that controversial given that compounds such as chasse-ennui and
garde-frontière are traditionally identified as V-N compounds, even though these particular
word forms are not listed as such in LPR2010. If we return to the examples in (27), reproduced
in (28) below, the compounds are then labeled as follows:
(28) a. un nouveau né → A Ppass
b. un faux-fuyant → A Pprés
This principle also allows for Pprés or Ppass to be used in cases where the first constituent is a
noun and the second is most likely a participial adjective, such as in the following constructions:
(29) a. menu déroulant, poisson volant
b. yack grognant, navire quittant
In the case of the compounds in (29a), the right-most constituents are listed in LPR2010 as
adjectives, which results in a label of N A for those compounds. This method is also supported
by the fact that there exist similar constructions in which the constituents agree in gender (e.g.
barre déroulante, soucoupe volante), which would indicate that they are simply adjectives and
should be labeled as such. In the case of the compounds in (29b), however, the second
constituents are not listed as lexemes in LPR2010. This would therefore allow for them to be
labeled as Pprés, but this may potentially result in some inconsistencies across the data. For
instance, while LPR2010 does not list grognant as a lexeme, it seems to be a perfectly
acceptable adjective because we can easily find cases where it agrees in gender within NPs (i.e.
a Google search for “la * grognante” generates over 16,000 results, such as in la chèvre
grognante, l’horloge grognante). In order to avoid generating irregularities in the data, these
lexemes will therefore be labeled as adjectives if they are attested elsewhere, such as on Google.
In the case of navire quittant, also in (29b), a Google search reveals that quittante is nearly non-
existent (i.e. the search string “la * quittante” returns 9 results). This compound will thus be
labeled N Pprés.
Although the above guidelines are a good starting point, a few more principles need to be added
to ensure that the labeling of lexical categories remains coherent across the data. There are a
83
number of noun initial two-word compounds where the second constituent could be labeled as
either a noun or an adjective (e.g. raton laveur, ver rongeur). Again, all decisions are based on
LPR2010. If a lexeme is identified as a noun in its heading, but not as an adjective, it is labeled
as a noun; if, however, LPR2010 lists both adjective and noun as possible lexical categories for
a lexeme, the adjective label is retained if it is plausible that its function within the compound is
adjectival in nature:
(30) a. raton laveur → LPR2010: laveur, euse � n. → N N
b. ver rongeur → LPR2010: rongeur, euse � adj. et n. → N A
c. centre automobile → LPR2010: automobile � adj. et n.f → N A
There are, however, a number of borderline cases, lexemes that are identified solely as nouns in
LPR2010, but that have adjective identified as a lexical category somewhere within the entry.
Many of these cases are said to be adjectives based solely on their usage within a compound:
(31) mouche piqueur LPR2010: piqueur, euse � n.
While such a case could be labeled according to its secondary lexical category, there are a few
reasons why I chose to label them according to my original principle, that is to say according to
the attested lexical categories identified via the entry’s heading (the compound in 31 being
treated as an instance of NN). First, as is the case in (31), there is no gender agreement between
the two lexemes (?mouche piqueuse), a morphosyntactic operation that would normally apply if
the second lexeme were in fact an adjective. This is not, however, an infallible indicator, as we
see in the case of baleine tueuse: although there is gender agreement, LPR2010 does not list
tueur as an adjective, neither in its heading, nor within its entry. In this case, tueuse is therefore
labeled as a noun, albeit as an inflected form; accordingly, the compound in (31) is also treated
as an instance of NN.
A second factor at play stems from the number of inconsistencies across entries in LPR2010,
incongruities that cast doubt on the status of the lexical categories only listed within the
dictionary article (and not in the heading). For one, there are a number of similar constructions
84
that LPR2010 labels simply as nouns in apposition and not fringe cases of NA compounds (as
opposed to the example in 31 for instance):
(32) a. éclair � n.m.
b. détaillant, ante � n
LPR2010 is also occasionally hesitant to assign a second lexical category within the entry when
there is room for debate, often opting for more than one possibility:
(33) a. pêcheur, euse � n.
b. surprise � n.f.
What the above examples show is that similar nouns are sometimes labeled differently within
LPR2010: in some constructions, lexemes are said to be adjectives even if they are not labeled
as such in the article heading (as in 31), while in others, they are said to be nouns apposed to one
another (32a-b). In other cases still, nouns are said to be either adjectives or apposed nouns
(33a-b). This disparity from one lexeme to another leads me to believe that it is justified to
identify a lexeme’s category based solely on those listed in the lexeme’s heading and not based
on those found within its entry.
85
To conclude this section, let us summarize the set of criteria used to label each compound’s
individual constituents:
(34) Criteria used to identify the lexical categories of a compound’s constituents
For a given compound AB:
i. A and B are assigned the lexical categories that best correspond to their roles within
the compound.
ii. Said lexical categories must be independently attested for both A and B outside of
the compound.
iii. If the lexical categories that best correspond to A and B’s role within the compound
is unattested outside of the compound, then they are labeled according to the most
prominent (i.e. the first) of their independently attested lexical categories as listed in
LPR2010.
iv. Attested lexical categories are only those listed in a lemma’s heading in LPR2010.
v. If A or B is a participle, it is labeled as such only under the following circumstances:
a. It is unclear whether the lexeme’s function within the compound is nominal
or adjectival or it is clearly neither; or
b. The lexical category listed in LPR2010 for the lexeme is motivated solely by
its presence within similar compounds; or
c. The lexeme is adjectival in nature and is neither listed as a lexeme in
LPR2010, nor attested elsewhere (i.e. cannot be found inflected on Google).
To further illustrate the method described above, the reader is asked to refer to the following
examples. Although this list is not meant to be exhaustive, it nevertheless gives a good idea of
the results obtained from the method described above.
(35) a. lave-vaisselle, garde robe, porte-clé → V-N
b. clair-obscur, douce-amère, grand largue → A-A
c. franc-parler, faux-ami, beaux-arts → A-N
d. menu déroulant, fait accompli, point cardinal → N A
e. raton laveur, avantage choc, carte mère → N N
h. bon d’achat, condamné à mort, barbe à papa → N Prép N
86
f. galant de nuit, fort en thème, haut en couleur → A Prép N
g. bon à tirer, prêt-à-porter, prêt-à-poster → A Prép V
h. navire quittant, chat huant, lanceur partant → N Pprés
i. laissé pour compte, achevé d’imprimer → Ppass Prép N
3.4.3 Cleaning Up the Remaining Data
Labeling each compound’s constituents for their lexical categories allowed for a second round
of candidate exclusion. According to the criteria outlined in Section 3.3.1, a number of
compounds were discarded for either being loanwords in their entirety (e.g. eng. fast food, viet.
banh canh, lat. ecce homo) or had at least one loanword as a constituent (e.g. mass-média,
a(c)qua-toffana). Compounds containing highly technical terms, and thus without any
corresponding entry in Wiktionary, were also removed (e.g. myxosomose des salmonides,
leucanie du roseau). Several compounds containing technical terms (e.g. acide cévadique)
remain in the data, however, because the constituents are listed items in Wiktionary and
therefore return a lexical category during the automatic labeling process. One such case are
compounds based on acide as the head noun (acide carboxylique, acide hypochlorique, acide
pneumique, etc.): of the 134 such compounds, 25 were retained. This selection was once again
done by consulting the corresponding entries in LPR2010. Any W2 adjective (e.g. carboxylique)
not listed in the dictionary was removed from the data. This was arguably an acceptable means
of reducing the number of technical compounds as it is likely that if such a lexeme is unlisted in
LPR2010, it is most likely not a widely used term and is therefore relatively unknown to all but
those who require the use of these highly technical terms. This was done only with the base
noun acide because it was such an egregious outlier in terms of patterning (the second highest
recurring noun in W1 position, coup, occurs 67 times or less than half of the occurrences
observed for acide). Other compounds still were discarded because they contained non-words,
that is to say words that can only be found within a given compound (e.g. stil de grain, tchic et
tchac, porte cochère). Finally, any remaining constructions containing single letters, in most
cases abbreviations, were also removed (e.g. p-acétylaminophénol, n-ième).
Furthermore, a number of entries listed as either nominal locutions or common nouns in
Wiktionary were not in fact nominal constructions at all. This mislabeling of expressions is
clearly evident from within Wiktionary entries themselves:
87
(36) a.
b.
It is obvious by looking at the example sentences provided by users for these particular entries
that the constructions are mislabeled and that the expressions in (36) are phrases rather than
nouns. These candidates and any similar entries were therefore also discarded.
There were also nearly 400 duplicate entries. The presence of these redundant compounds can
be traced back to variations in spelling, mostly because of a hyphenated form (e.g. écart-type ~
écart type, châtaigne de mer ~ châtaigne-de-mer, etc.). These entries were verified individually
against the data in Wiktionary. Any compound listed as a variant was discarded, but their
orthographic forms were retained as alternative spellings for the main entries. In some cases,
two variants were given two unrelated definitions in Wiktionary (e.g. bonne grâce ~ bonne-
grâce, boeuf carotte ~ boeuf-carotte, etc.). Both forms were retained as separate entries.
The data also contained a number of duplicate entries based on number, that is to say, many
entries were present in both their singular and plural forms (e.g. petite annonce ~ petites
annonces). A total of 115 inflected compounds were removed because an uninflected variant
was already present. This task was done using the clustering techniques available in Google
Refine, where similar strings of text were identified and grouped together. Each case was then
examined manually and any plural form considered redundant was removed from the data.
88
All in all, according to the above criteria, nearly 1,800 additional entries were set aside
following the labeling of lexical categories, which resulted in a total dataset of 10,410
compounds, all of which are stored in the database available online. There remained, however,
one final set of tags to be added to the data: gender and number. This was once again done using
software I wrote to extract the information from Wiktionary, but with an additional set of
functions capable of accessing Mathieu-Colas’s (1996) own database of compounds.
3.4.4 Gender and Number
In order to complete the basic set of information associated with each entry in the database (i.e.
all non-semantic related information), I needed to label each compound for both gender and
number. This information is included in Mathieu-Colas’s online database of compounds34.
Although the database of nominal compounds developed by Mathieu-Colas offers no way of
accessing its contents in bulk (only 10 entries at a time are available; see Figure 3.1 on the next
page for an example of the search output), the platform was built using a the very common
programming language PHP. This means that posting variables to its internal script is in fact
possible, which thus allows for users to submit values to its search engine outside of the
system’s Web enabled interface. I therefore wrote a small script that takes each entry in my
dataset and feeds it to the MC search engine, which returns an html page. The script then parses
the data and compares the information associated with the entry and determines whether the
given compound is masculine or feminine, as well as its number, which may have one of three
values: sing, pl, or invar.
34
<http://www-ldi.univ-paris13.fr/ODNC/moc.php>
89
Figure 3.1. The results screen for the compound café-filtre in Mathieu-Colas’s database.
The most interesting result to come out of this process, however, has to do with the very limited
overlap of entries between my dataset and that of Mathieu-Colas. Of the approximately 10,400
compounds remaining in my database, only 2,450 are also present in Mathieu-Colas’s data,
meaning the latter’s set could only be used to label a small fraction of the compounds contained
in my own (approximately 22%). This discrepancy is primarily due to the fact that Mathieu-
Colas’s database only contains hyphenated compounds. If we take this into account, the
resulting overlap is slightly improved: there are approximately 3,500 hyphenated compounds in
the data collected from Wiktionary, which results in a 70% correspondence between data sets.
Yet, the results also show that there are at least 9,000 compounds in Mathieu-Colas’s database
that are not present in my own. It is unfortunately difficult to determine what the cause of this
discrepancy is without gaining direct access to Mathieu-Cola’s original data.
Because only a mere 22% of my total compounds were also found in Mathieu-Colas’s database,
I had to modify my parser to extract the gender and number for each compound from
Wiktionary. Of the remaining 9,000 compounds, a little over 4,000 contained information on the
compound’s number in Wiktionary and approximately 7,000 contained information on its
gender. This meant that quite a few of the remaining entries needed to be labeled manually for
both number and gender. In the case of number, the remaining entries were labeled using pattern
matching for regularly inflected plural (i.e. -s and -x suffixes). As for the gender of the
90
unlabeled entries, knowledge of French compounds’ headedness helped tremendously. For
instance, N A and A N compounds can almost all be labeled for gender according to the nominal
constituent’s gender. As for N Prep N compounds, which accounted for more than a third of the
remaining entries, they are mostly left-headed (Bauer 1978), which meant that they could be
labeled automatically by extracting the gender of the first constituent from Wiktionary. The
remaining cases were all tagged manually.
3.5 Summary
After determining which types of compounds would best lend themselves to a study of semantic
transparency, I extracted more than 18,000 multi-word entries from Wiktionary’s database. The
number of entries was reduced according to several criteria. The result of this work is an online
database containing more than 10,000 nominal constructions composed of at least two lexemes
from several different lexical categories. Each entry contains the following information: lexical
category of its constituents, the gender and number of individual constituents (NN and N à N
compounds only), and the gender and number of the compound as a whole. The database is
searchable according to these features, but other parameters were added later so as to reflect the
research discussed in the following chapters. The reader is invited to visit www.polylexical.com
for a full listing of all compound types extracted. The entire dataset may also be downloaded as
a CSV file for personal use.
Of the thousands of constructions retained for the database, only a small subset are in fact
pertinent to the objectives of the present study. As stated at the close of Section 3.1.1.3, only NN
and N à N compounds are under investigation here. A query of the data retained from
Wiktionary reveals a total of 729 and 319 such types, respectively. These individual compounds
will therefore serve as the basis for the claims and hypotheses made in the remaining chapters of
this thesis, ultimately functioning as the foundation for the typology of semantic transparency
proposed at the close of this work.
91
Chapter 4
Compound Meaning: Features and Factors
Earlier in Chapter 2, I focused on how semantic transparency in compounding has traditionally
been understood and suggested that previous approaches could be improved upon. At its most
basic, semantic transparency is usually said to be a matter of compositionality: a compound is
semantically transparent if its meaning is the product of the meaning of its components. While
this position is not unsound, it fails to take into account a number of other characteristics that
have also been used elsewhere to formalize the semantics of compounding. Aware that other
factors are indeed at play, Libben (1998) expands upon traditional models by incorporating
headedness into the basic A+B = C view of compound semantics. My position on the matter is
that, although work like Libben’s represents a crucial and necessary step in the further
development of the concept, these approaches still fall short in their account of semantic
transparency. This chapter therefore seeks to complement previous models by introducing a
number of other factors that may prove useful in establishing the degree of semantic
transparency for a given compound (i.e. its degree of semantic interpretability).
This chapter is organised in the following manner. First, headedness (or centricity) is discussed
in Section 4.1 with an emphasis on how the head contributes meaning to the whole. Following
this is a brief discussion in Section 4.2 of compositionality and how the term is used in this
work. Finally, Section 4.3 explores semantic homogeneity, focusing on analogy and templates
as a possible means to further distinguish between individual compounds.
4.1 Centricity
Compounds, like other morphological and syntactic units, are typically headed constructions,
and identifying the head element is arguably the most crucial step in establishing meaning for a
given compound. According to Baroni et al.’s (2007) framework, head identification is in fact
the first step of this process:
92
“[I]n its simplest instantiation, interpreting a novel compound requires two interlocked processing steps:
i. identification of the head of the compound,
ii. interpretation of the contextually appropriate processing link between the head and its modifier, be it an argument relation, a property transfer or a conceptual hybridization.” (Baroni et al.: 265)
If this strategy is correct—and there is little reason to believe that it isn’t—then it suggests that
the head element is also an integral factor in a compound’s semantic transparency. This
approach to compound processing is largely backed up by a number of studies that show that
speakers are keenly aware of how a compound is organized internally and that the head element
is a dominant semantic marker for both novel and existing combinations. For instance, while
studies by Gleitman and Gleitman (1971), Ryder (1994), and Štekauer (2005) show that, on
occasion, a speaker will select the wrong head constituent when asked to produce paraphrases
for a novel compound (e.g. “clothes that are worn in the water” for clothes-water ; Ryder 1994:
189), the vast majority of participants correctly identify the language appropriate head element
when providing definitions. This fact is also supported by time-sensitive tasks: Libben et al.
(2003), who measured response times for lexical decision tasks involving compounds like
bedroom and cardshark, found that the latter (which they claim are opaque) produced longer
delays in response times than the former (which they claim are transparent). This difference may
also involve the headedness of the compound, as those producing longer response times had
heads that were unrelated to the meaning of the whole (i.e. a cardshark is not a literal shark).
Although headedness may be, in most cases, a relatively uncontroversial feature of compounds,
it is not without its quirks. The following sections will focus on how compounds may differ
from one another based on their centricity, which is to be understood as the property of a
compound to either possess a head or not.35 Following this discussion, the manner in which
centricity will factor into a typology of semantic transparency will then be made explicit.
35
As we will see in the following sections, this statement might lead one to believe that the issue is easily circumscribed. The notion of head, however, is not without its problems. This will be addressed throughout the remainder of Section 2.
93
4.1.1 Endocentric Compounds
The traditional view of compound centricity is that compounds either possess a head or they do
not. Following Bloomfield (1933), the former are typically called endocentric, while the latter
are called exocentric (or bahuvrihi in Sanskrit, Burrow 1955)36. I will use the term centricity to
refer to a compound’s status as either endocentric or exocentric.
In semantic terms, the head of a compound is the constituent that defines the conceptual class of
the whole compound. In other words, the head is the hypernym of the thing denoted by the
whole (Bauer 1978). Endocentric compounds have therefore been formalized by Allen (1978)
using an IS-A rule and may be tested in the following manner37:
(37) a. a table saw is a (*table / saw)
b. a credit card is a (*credit / card)
c. a truck driver is a (*truck / driver)
When applied to French compounds, this test reveals that, unlike English, the head in French
compounds is typically the left-most constituent (this will be discussed in greater detail in
Section 4.1.2):
(38) a. NN: un oiseau-mouche est (un oiseau / *une mouche)
b. N à N: un moulin à vent est (un moulin / *un vent)
c. N de N: un ver de terre est (un ver / *une terre)
In terms of formal features, the head element is also typically the constituent from which lexical
category is inherited. In the examples in (37) and (38) above, this is a trivial matter because both
constituents are nouns, but as the following compounds show, when the two constituents are of
different categories, the head does in fact determine the lexical category of the compound:
36
Although exocentricity is typically understood as “possessing a head external to the compound,” this position must be weakened slightly in order to account for cases involving sense extension. This will be explored in greater detail in Section 2.4. 37
In the literature, endocentric compounds are often described in far less rigid terms: AB is a kind of B. According to Arnaud (2008), this is a hyponymic test that sometimes produces less than desirable results (?a police car is a kind/type of car). That said, when the head prototypically denotes a highly general object, a hyponymic test can serve to better establish endocentricity (compare, for instance, lipstick is a stick ~ lipstick is a kind of stick).
94
(39) a. eng. [[black]A [board]N]N fr. [[coffre]N-[fort]A]N
b. eng. [[sea]N [green]A]A fr. [[vert]A [sapin]N]A
In French, as well as in many other romance languages, gender also typically percolates from
the head to the compound, but as we will see in Section 4.1.2, this behaviour may be altered
under certain conditions. For the moment, we may state that a compound is endocentric if one of
its constituents determines its conceptual class.
It is also worth noting that speakers may occasionally identify a constituent as the compound’s
head, thus interpreting it as endocentric, even though this may not in fact be the case. As Arnaud
(2008) points out, there may be disagreement between a speaker’s extra-linguistic knowledge
and scientific nomenclature. He uses watermelon as an example, explaining that it is not actually
a melon, but rather a melon-like fruit. This is similar to how speakers might interpret peanut,
which is not in fact a nut, but a legume. While Arnaud claims his informants hesitate when
asked if a watermelon is a melon or a melon-like fruit, suggesting that linguistic intuition may
influence head identification, it remains debatable whether this matters for most speakers
unfamiliar with the scientific taxonomy behind the labels. One can compare this with the lack of
consensus among laymen regarding the classification of tomatoes as either fruit or vegetables.
While analyzing the NN compounds in my data, I came across a number of cases, usually
involving plants, that show similar issues to those identified by Arnaud regarding centricity:
(40) houx-frelon, menthe-coq, laurier-tin
In all cases, the plant was given a name based on its appearance, but which was later revealed to
be of a different genus altogether. The following description from the fifth volume of Cours
complet d’agriculture shows how houx-frelon, for instance, received its erroneous label:
“Ce n’est point un houx ; la couleur et les épines dont les feuilles sont armées, lui ont mal à propos fait donner cette dénomination. Tournefort le place dans la seconde section de la première classe, qui comprend les herbes à fleur en grelot, dont le pistil devient un fruit mou ; et il l’appelle Ruscus, myrti-folius, aculeatus.” (Rozier et al. 1787, 531)
It seems reasonable to assume, however, that these types of compounds should be treated as
endocentric, as most native speakers unfamiliar with them would no doubt understand them as
such. In other words, to their knowledge, they would have interpreted the compound correctly,
95
thus understanding them as hyponyms of the constituent they assumed was the head (the
statement “a watermelon is a melon” would produce a “true” reading). In these particular
instances, it is far more advantageous to acknowledge that such distinctions are simply due to
differences in lexicon (i.e. technical vs. general) and that to the average speaker, compounds
such as peanut or watermelon are in fact endocentric and not exocentric.
4.1.2 Head position
In binary constructions such as those under investigation here, one element serves as the head,
while the other typically plays the role of either modifier or argument (Scalise and Bisetto
2009). Although Williams (1981) originally argued that the morphological head was the
rightmost element of a complex word (the so-called Right Hand Head Rule), it is now largely
accepted that the head is language dependent. The situation is no different for compounds.
Compounds in languages such as English, Dutch, and Chinese, for instance, are mostly right-
headed (Scalise and Guevera 2006, also see Lieber and Štekauer 2009 for a typology of
compounds in various languages), while those in most romance languages are left-headed
(Baroni et al. 2007 for Italian, Fradin 2009 for French, Rainer and Varela 1992 for Spanish)38.
In my data, French compounds clearly pattern with other Romance languages as 493 of the 564
(~ 87%) NN compounds with a clearly defined head39 are left-headed.
Although French compounds are typically left-headed, right-headed compounds are in fact
possible (see previous references for similar observations for other Romance languages). These
types are, however, exceptional and seem to be entirely consigned to the class of NN
constructions40:
38
According to a survey conducted by Bauer (2001a) for 36 different languages, there seems to be a slight overall preference for right-headed compounds. Likewise, Scalise and Fábregas’s (2010) investigation of compounds from 22 different languages shows a strong preference for right-headedness. 39
These numbers do not include compounds that incorporate a coordination of their elements. See Section 4.1.3 for a discussion of these particular types. 40
In the French compound data collected from Wiktionary, there are no cases of N à N right-headed compounds. This fact is discussed further in Chapter 6.
96
(41) a. une auto-école est (*une auto / une école)
b. un taupe-grillon est (*une taupe / un grillon)
c. un radio-taxi est (*une radio / un taxi)
Although it is not always clear why some compounds in French are right-headed, we may state
that in most cases, atypical centricity can be traced to one of the following sources:
(42) a. Calque from English quartier-maître (eng. quartermaster)
b. N1 with affixal functionality ciné-parc
c. N1 with adjectival functionality chef-lieu
The fact that French allows for a compound’s head to be either the left or right-most constituent
may produce atypical behaviour, mostly in how gender is assigned. Typically, an endocentric
compound acquires gender from its head constituent. Although most right-headed compounds
behave as they should, which is to say that gender is determined by the right-most constituent,
some show a difference in feature percolation that suggests that gender may, in some cases,
originate from the non-head constituent41. Compare, for instance, the right-headed compounds
in (43a-b) with those in (43c-d):
(43) a. [[taupe]F-[grillon]M]M lit. ‘mole-cricket’
b. [[vélo]M-[école]F]F lit. ‘bicycle-school’
c. [[bracelet]M-[montre]F]M lit. ‘strap-watch’ (wristwatch)
d. [[bateau]M-[école]F]M lit. ‘boat-school’
The compounds in (43a-b) behave as expected: gender percolates from the head, despite its
atypical position. This is in fact how most right-headed compounds in the data behave. In (43c-
d), however, gender appears to percolate from the left-most constituent, even though the head is
on the right. For (43c), it is entirely possible that this compound is in fact left-headed, which
corresponds to Arnaud’s (2003) treatment of it, but the centricity test described earlier makes
41
See Lieber (1980) for a formal description of feature percolation.
97
this interpretation untenable (i.e. *un bracelet montre est un bracelet)42. This fact is also
supported by the definitions found in most lexicographic works43. It is also worth noting that
plural is marked on both constituents (i.e. bracelets-montres), suggesting that this compound
might in fact be a case of coordination, but this analysis also seems incorrect given the failed IS-
A test described above. It therefore appears to be a case where some features percolate from an
element other than the head. The same may be said of bateau-école in (43d), although this
compound stands in stark contrast with the one in (43b), which, for all intents and purposes, it
should pattern with, yet doesn’t. The disparity between the two analogous compounds might be
explained in a number of ways. First, masculine is often considered the unmarked gender, as
evidenced by exocentric compounds that otherwise contain feminine nouns (e.g. rouge-gorge,
en-tête). Second, it is entirely possible that bateau-école (‘boat school’) is subject to influence
from the homonymous left-headed bateau-école (‘school boat’) because of its non-standard
head position. This seems all the more plausible given that there are a number of results on
Google for the string “une bateau-école,” which suggests that some speakers, aware that the
head is the right-most constituent, assign gender accordingly. That said, the difference in
occurrences is significant enough that it seems unlikely that gender should be said in this case to
percolate from the head constituent44.
Given that French favours left-headed compounds while also allowing rightward heads, but that
right-headed compounds occasionally behave in non-intuitive ways, it seems reasonable to hold
that head position is a factor in compound transparency. Following Scalise and Fábregas (2010),
who argue that head position should be understood as a tendency, and not an absolute, I propose
that endocentric compounds be further characterized according to a language’s canonical and
non-canonical head position45. It is important to add that this parameter is not necessarily fixed
42
It is worth noting that bracelet in this compound is in fact redundant as a watch is usually understood to possess a strap (compare with pocket watch). 43
“Montre montée sur un bracelet de cuir, de métal ou de matière plastique” (LPR2010). 44
Google search for “un bateau-école” = 423,000 results; Google search for “une bateau-école” = 977 results. 45
This statement does not preclude the fact that a language may favour neither position, but to my knowledge, no one has reported on such a fact. If such a language did exist, then head-position might be considered a neutral factor in semantic transparency.
98
for a given language, as specific compound types may favour a particular head position over
another. While left is the canonical head position for NN compounds in French, it is right for A-
N nominal compounds (e.g. beau-père, chauve-souris, longue-vue, etc.)46. We must therefore
also evaluate canonical head based on the dominant position for a given compound type.
The question, of course, is whether speakers are actually aware of this distinction and whether
transparency is at all affected by it? Though not very numerous, there are some studies that lend
support to head position as a factor in compound processing. Most notably, Jarema et al. (1999)
tested priming effects for a variety of compounds and found that reaction times for right-headed
French compounds differed from left-headed ones based on which constituent was primed.
Crucially, the priming effect observed for all compound types was greatest for the initial
constituent, regardless of its transparency, except for right-headed compounds, where priming
the final constituent resulted in a greater priming effect (see Section 2.2.1 for more details on
Jarema et al.’s experiment). Their results suggest that speakers are in fact sensitive to head
position during lexical decision tasks. These results, however, are mitigated by the fact that
Jarema et al.’s stimuli were composed of N-A and A-N compounds, which differ in terms of
canonical head position. A study in which the stimuli consist of constructions that typically
favour the same position for the head would perhaps produce different results, shedding
additional light on how speakers process compounds that follow atypical patterns.
Intuitively, it may be argued that compounds with non-canonical heads are nevertheless
unexpected, which may affect how speakers interpret them. In fact, a novel non-canonically
headed compound will most likely first be understood based on the element in canonical head
position and may only be reanalysed if the first interpretation is deemed impossible or unlikely.
I therefore posit that compounds with canonical heads are more transparent than those with non-
canonical heads, with the understanding that this stipulation remains hypothetical until tested.
Nevertheless, the data suggest that such a distinction should be considered as a possible factor in
a compound’s degree of transparency given that heads are subject to relatively strict positional
restrictions.
46
There are also left-headed A-N compounds, mostly involving colours (e.g. vert sapin, rouge sang). It is unclear if an exception regarding canonical head position should be made for these cases. This is certainly something worth investigating further.
99
4.1.3 Coordinated Compounds
Further complicating matters regarding centricity are compounds involving the coordination of
their elements. Traditionally called dvandva, these types of compounds are composed of two
lexemes that denote, to varying degrees, two things, aspects, or features of equal status, which,
under certain circumstances, might allow for both constituents to function as heads. They are
typically understood as a conjunction of elements (i.e. and), but may also involve a disjunction
(i.e. or). These particular types of compounds have received a number of different labels over
the years, most of which overlap in non-trivial ways: appositional (Jespersen 1956, Bauer
1978), coordinate or coordinative (Bisetto and Scalise 2005, Arcodia et al. 2010), co-
compounds (Wälchli 2005), copulative (Olsen 2001), additive (Marchand 1969). I will opt for
the term coordinated compound as it is the most neutral and is easily opposed to compounds that
involve an asymmetrical relationship (cf. Bisetto and Scalise 2005).
Of interest here is the fact that, although many compounds involving the coordination of their
elements are exocentric and thus possess no head (Bauer 2008a), a few might be said to possess
two heads. Take for instance, the following examples from French:
(44) auteur-compositeur, café-bar, chargeuse-pelleteuse
The compounds in (44) are all no doubt endocentric, but the anomaly here is that both
constituents seem to produce acceptable results for the IS-A test (as in 45a), which is typically
also true of their English analogues (45b):
(45) a. un auteur-compositeur est un (auteur/compositeur)
b. a singer-songwriter is a (singer/songwriter)
Arnaud (2008) calls these cases bi-centric, an apt label that reflects the opinion of some that
these types do in fact possess two heads (Bisetto and Scalise 2005, Scalise and Guevera 2006).
Bauer (2008a), in his typology of dvandva compounds, argues that these are in fact appositional
compounds (and not true dvandvas) by the very fact that they are headed. They are thus opposed
to other cases where a coordination of some sort is involved, but which are exocentric. This
distinction, however, is not always made, the result of which is a class of compounds
characterized by a great deal of variation (Olsen 2001, Bisetto and Scalise 2005).
100
Other than the results of the centricity test in (45) above, there is another indication that lends
further support to a bi-centric approach to coordinated compounds, namely that, in Romance
languages, both constituents are typically pluralized (46a-c). This evidence is somewhat
weakened, however, given the fact that their Germanic counterparts, among others, are typically
only inflected on the right-most constituent (46d-f):
(46) a. fr. auteur-compositeur, auteurs-compositeurs Zwanenburg (1992)
b. sp. poeta-pintor, poetas-pintores Rainer and Varela (1992)
c. it. actor-encenador, actores-encenadores Scalise (1992)
d. en. writer-director, writer-directors Olsen (2001)
e. nl. leerling-verpleegster, leerling-verpleegsters Booij (1992)
f. de. Linguiste-Psychologe, Linguist-Psychologen Olsen (2001)
All of the compounds in (46) are similar semantically, yet inflectional marking differs between
groups. This is in part due to the fact that Germanic languages don’t typically allow for
inflection within compounds, even when the non-head is plural in isolation (i.e. scissors,
scissor-sharpener). It would seem strange, then, to claim that one set is bi-centric, while the
other is not based solely on this particular difference. Perhaps an even stronger arguement
against inflection as a property of bi-centricity is that many French NN compounds are dually
marked for plural despite the lack of coordination between their elements (i.e. chou-fleur,
choux-fleurs).
There are indications, however, that while these types of compounds are no doubt semantically
headed, they may not in fact possess two heads. Most notably, in languages with nominal
gender, coordinated compounds nearly always inherit the gender of the constituent in canonical
head position in cases where gender differs between constituents. This behaviour, illustrated in
(47) below, suggests that they might actually be regular endocentric constructions:
(47) a. [[bain]M-[douche]F]M
b. [[grave]F-[ciment]M]F
Few coordinated compounds are actually mismatched for gender: 78 of the 105 endocentric NN
compounds in my data that involve some form of coordination contain elements of the same
gender. Most of the remaining compounds behave according to typical percolation conventions.
101
The few instances where coordinated compounds show a disparity in how gender percolates can
all be explained via other means. Consider the following examples:
(48) a. [[huppe]F-[col]M]M ‘un oiseau avec une huppe et un col’
b. [[radio]F-[réveil/gramophone/phonographe]M]M
The compound in (48a) is in fact exocentric and is therefore most likely acquiring gender from
the external head (i.e. oiseau, cf. rouge-gorge). In (48b), radio, because of its widespread usage
as a prefix, may not be available for feature percolation, although this is purely speculative.
Regardless of these few anomalies, gender percolation, as I showed in Section 4.1.2, is not
always an accurate indicator of head as some compounds seem to inherit the gender of their
non-head constituent when the head is in a non-canonical position. One might argue, however,
that some of these compounds are perhaps not truly coordinated and that the left-most
constituent is of far greater morphological—and perhaps semantic—importance than the other,
thus rendering them single-headed. This is precisely what Scalise and Fábregas (2010: 121)
argue for in the case of it. prete-operaio ‘priest-worker’, which, despite the apparent
coordination between its elements, is left-headed because prete has more “semantic” weight
than operaio. We may make similar claims regarding the compounds in (49) below, which bear
some resemblance to those in (47), but which show a greater degree of semantic asymmetry47:
(49) a. [[cotton]M-[poudre]F]M ‘cotton-powder’ = ‘cotton that serves as gunpowder’
b. [[mémoire]F-[tampon]M]F ‘memory-buffer’ = ‘memory that serves as a buffer’
One distinguishing characteristic between these sets of compounds, according to Arnaud (2008),
is that true bi-centric compounds typically involve co-hyponyms, which is why those in (49) are
likely to possess a single head (cf. child-soldier, girlfriend, etc.). Yet it is entirely possible for a
coordinated compound to involve co-hyponyms of different genders (such as those in 47 above,
as well as baladeur-radio, bistro-brasserie, location-financement) and which would therefore
have an effect on how the feature percolates. We may assume, however, that this is the result of
a very basic constraint imposed by a language such as French: a noun may only have one gender
47
These types of compounds will be taken up in Chapter 5 during the discussion of compound relations.
102
and it is most natural for a compound to inherit the gender of whichever constituent is in the
canonical head position whenever this feature differs between elements. Feature percolation is
therefore blocked as per Lieber’s (1980) Feature Percolation Conventions because nodes
acquire their features from the head stem in sister configurations.
Despite some of these inconsistencies, either constituent of endocentric coordinated compounds
may meet the criteria of semantic head, which, a priori, sets them apart from other compound
types (cf. Bisetto and Scalise’s 2005 typology of compounds). If identifying the head element is
a crucial step in compound interpretation, what can we say about compounds that might
technically have two heads? Unfortunately, to my knowledge, there are no studies on how
speakers interpret coordinated compounds48. It is undeniable that compounds like those in (46)
above are endocentric in nature, but it is unclear if the coordination of their elements and the
fact that their constituents are of equal status has an effect on transparency. As we will see in
Chapter 5, however, these types of compounds are in fact numerous and account for a non-
negligible part of French NN compounds (cf. Arnaud 2003).
I will posit that such compounds are largely no different than single-headed compounds in terms
of semantic transparency and that any additional processing they require involves relational
information, which is to say one based on coordination. This position does not, however,
prevent additional characteristics such as gender percolation from being incorporated into the
definition of headedness, thereby allowing for the model to account for instances where features
do not behave as expected. Moreover, coordinated compounds may also set themselves apart
according to the fact that both constituents contribute to the meaning of the whole—it is
impossible for a coordinated compound to contain a semantically unrelated element49. This
aspect is pertinent to the discussion of compositionality in Section 4.2. As for the particular
relational properties exhibited by these compounds (i.e. coordination), this will be a key
48
See Wisniewski (1996) for studies where participants made use of hybridization (i.e. the amalgamation of the two constituents) to interpret compounds with conceptually similar constituents. One of the two constituents, however, was typically identified as the head. 49
In Wälchli’s (2005) semantic classification of co-compound, the Ornamental class is said to involve compounds in which a constituent offers no semantic contribution to the whole (e.g. in Erza Mordvin, ve’e-sado = ‘village-hundred’ = ‘village’). I agree with Bauer (2008a), however, that such compounds are most likely best treated as non-coordinated.
103
component of next chapter’s discussion of the semantic tissue that ties together the elements of
compounds. In terms of headedness, such compounds will simply be treated as endocentric and
may therefore be opposed to similar cases that are clearly exocentric (cf. Bisetto and Scalise
2005, Scalise and Bisetto 2009).
4.1.4 Exocentric Compounds
As stated at the outset, not all compounds are endocentric. In the traditional view, a compound
is exocentric when it fails the IS-A test for either constituent, as in the examples below:
(50) a. eng. a redcoat is a (*red / *coat) fr. un rouge-gorge est un(e) (*rouge / *gorge)
b. eng. a birdbrain is a (*bird / *brain) fr. un cheval-vapeur est un(e) (*cheval / *vapeur)
The test reveals, quite convincingly, that not all compounds possess a head in the semantic
sense. Not only do these particular compounds fail the endocentricity test described in Section
4.1.1, they also fail the less stringent hyponymic test mentioned earlier. The following French
NN and N à N compounds show that even a weakened centricity test cannot save them from
their exocentric status:
(51) a. le ballon-panier est [une sorte de] (*ballon / *panier) = sport
b. un jambon-beurre est [une sorte de] (*jambon / *beurre) = sandwich
c. une barbe à papa est [une sorte de] (*barbe / * papa) = candy
d. un moulin à parole est [une sorte de] (*moulin / * parole) = person
One will notice, however, that these so-called headless compounds still seem to rely more
heavily on one constituent over the other. This is especially evident for N à N compounds,
where the leftmost element, despite its limited semantic contribution, seems to nevertheless
govern the whole construction. For one, there is the matter of lexical category, which, for some
exocentrics, may be determined by the lexeme in canonical head position, regardless of its
semantic contribution. For instance, a redcoat functions as a noun and not as an adjective,
despite it not being a type of coat. In the case of true bahavruhi compounds, this fact might be
explained using the unexpressed external head as the source of lexical category (i.e. a redcoat
refers to a person, which is why it is a noun). The problem, of course, is that it is difficult to
formalize feature percolation if said features come from a constituent that isn’t part of the word-
104
form. Furthermore, the constituent that would otherwise be in head position, is usually the locus
of morphosyntactic marking, such as inflection50 (e.g. airheads and not *airshead). Similarly, a
compound’s gender may be determined by the element in head position regardless of that of its
semantic class (e.g. [[barbe]F à papa]M]F est une [confiserie]M). Based largely on these
observations, Scalise and Guevara (2006) argue that compounds have, on the one hand, a
semantic head, which determines a compound’s semantic features (e.g. [± animate]) and on the
other, a formal head, which determines features such as lexical category. Accordingly, they
assess centricity along the following parameters:
“An endocentric compound has at least one formal head and at least one semantic head. If a compound has only one formal head and only one semantic head, then the two must coincide. If a compound realises any of the remaining possibilities, it will be considered to be exocentric.” (Scalise and Guevara 2006: 192)
Although distinguishing between a semantic and a formal head may be justified given that many
exocentric compounds retain the formal features of one of its constituents, it’s not entirely clear
if the line that Scalise and Guevara draw between endocentric and exocentric compounds is in
fact correct. The main problem is that according to their approach, an exocentric compound may
have a semantic head as long as it differs from its formal one, but this seems extremely unlikely
as it would render the test for a semantic head (i.e. IS-A) impossible. In other words, testing for
a semantic head requires that the hypernym be of the same lexical category as the compound,
otherwise the test produces infelicitous results (i.e. ?a [XAYA]N is a YA). Yet, Scalise and
Guevara’s stipulation produces sixteen51 possible configurations, four of which are exocentric
with a semantic head. While they admit that further research is required to determine which of
these permutations are possible, they do not include information regarding formal and semantic
heads in their list of exocentrics from Dutch, Chinese, and Italian, making it impossible to
ascertain if any of their data support some of the more unconventional configurations generated
by their proposal. I suspect that such cases do not in fact exist, but this remains, for the moment,
50
It should be noted that inflection is far more variable for French compounds (as well as other romance languages) than it is for English, but the fact remains that it is typically the constituent in head position that is marked for number and gender. 51
The high number of possible combinations is due to the fact that they allow for a compound to have more than one formal or semantic head.
105
speculation. Complicating matters further, Scalise and Fábregas (2010) add to Scalise and
Guevara’s original proposal by arguing in favour of a third type of head, which they call
morphological and which is responsible for the percolation of features such as gender. Although
they do not say so, the decision to include a third type of head seems related to Scalise and
Guevara’s (2006) definition of endocentric and exocentric compounds and is most likely
influenced by the fact that some compounds appear to inherit gender from the non-head
constituent. This was discussed in Section 4.1.2; the examples are repeated here for the
convenience of the reader:
(52) a. fr. [[bateau]M-[école]F]M
b. fr. [[bracelet]M-[montre]F]M
If gender were treated as a feature of the formal head52, then the compounds in (52) would
possess distinct formal and semantic heads, which would mean that they are exocentric in
nature, but this position is difficult to maintain. By arguing that a feature such as gender
percolates from a different type of head, Scalise and Fábregas are able to preserve the centricity
position held in Scalise and Guevara (2006). While I agree with most of their arguments
regarding formal and semantic features in compounds, I would argue that their overall
stipulation is much too strong, the result of which is the postulation of a number of
fundamentally incompatible configurations.
Because the aim here is to establish parameters with which to determine how easily a compound
will be understood, I am choosing to distinguish between endocentric and exocentric
compounds based solely on the presence or absence of a semantic head. Formal features, such as
lexical category (as well as gender), while no doubt pertinent, will not factor into a compound’s
centricity53, which I will formally define as follows:
52
The literature is in fact divided on whether gender percolates in the same fashion as lexical category (see Lieber 1989). 53
Nothing precludes future integration of additional features, such as formal head and morphological head, into a typology of semantic transparency. For instance, one could imagine that a compound whose formal, semantic, and morphological features coincide would be easier to understand than one whose features are distributed among its constituents. At present, I have chosen to set aside such factors in order to concentrate on a select number of features, which I hold to be of greater significance.
106
(53) A compound is endocentric if it possesses a semantic head, which is to be understood as
the constituent that determines the conceptual class of the compound. All other
compounds are exocentric.
The stipulation in (53) is not without its problems, however. Chief among them is that the
identification of the semantic head is not always an unexceptional process as factors such as
semantic drift or tropes may obscure meaning, which may in turn render centricity tests
inconclusive. Despite this drawback, however, my position regarding exocentricity remains in
line with most approaches and reflects the typology proposed by Bauer (2010)54, as shown in
the table below (English examples are from Bauer, while their French analogues are from my
data):
Table 4.1. Types of exocentric compounds in Bauer (2010)
Type of Exocentric English French
Bahuvrihi red-eye rouge-gorge
Synthetic pickpocket ---
Transpositional --- clair-obscur
Exocentric co-compounds blue-green bleu-vert
Metaphorical dust-bowl radio trottoir
The first type, for which Bauer retains the sanskrit term, are also known as possessive
compounds (bahuvrihi meaning ‘having much rice’, Burrow 1955) as they typically involve a
property possessed by the designatum. While Bauer’s example (i.e. red-eye) may not be the
most prototypical example of the possessive exocentric compound, it nevertheless emphasizes
that this particular type denotes a feature of some external, unexpressed head (i.e. a flight that
causes red eyes). The French example, rouge-gorge, is perhaps a better example as it refers to a
bird with a red throat (cf. redcoat, greybeard, etc.). This class also includes NN compounds in
which the attributive property of the association may involve other information, but which still
exemplifies an external head in relation to the compound (e.g. a hammerhead is a shark with a
54
Marchand (1969) offered a similar typology of exocentrics that also includes five types, none of which possess semantic heads.
107
head like a hammer). It is worth noting that these compounds are often viewed as instances
involving metonymy as they can evoke a part-whole type relationship (e.g. an airhead, where
head is taken to mean the person, cf. Bauer 2008b).
The second type is a highly frequent and widely studied compound with numerous endocentric
analogues (Roeper and Siegel 1978, Lieber 1980, Botha 1984) and involves constituents in a
head-argument relationship. These typically manifest themselves as VN compounds in French55.
As for Transpositional exocentrics, these are compounds whose lexical category differs from
those of its constituents. Bauer offers an example from Khmer, khɔh trǝw, which combines two
adjectives (‘wrong’ and ‘right’ respectively) to form a noun (‘morality’). French has a few such
cases, usually also involving conversions from adjectives into nouns (cf. clair-obscur, cinq à
sept, douce-amère). Exocentric co-compounds are coordinated compounds (Bauer 2008) for
which the designatum is understood as a combination of its constituents. Thus, blue-green is
neither blue, nor green, but is instead a colour with the properties of both. Bauer’s final
exocentric class involves compounds whose constituents, while not strictly compositional, are
nevertheless motivated on metaphorical grounds. The French compound radio-trottoir, a term
used according to Martin and Copeland (2003) in French speaking Africa to mean word-of-
mouth communication networks, is exocentric, yet the head (radio) can be understood
metaphorically. This last type, as well as traditional bahuvrihi compounds, are worth discussing
further, as they have, in recent years received considerable attention (Goossens 1995, Geeraerts
2002, Benczes 2005, 2006, Arnaud 2008).
4.1.4.1 Exocentric by Trope
As I briefly mentioned above, traditional bahuvrihi compounds are often called possessives
because they denote a characteristic possessed by the unexpressed head of the compound (Bauer
(1978). These are usually instances involving adjectives (as in 54a-b), but may also be
appositional nouns (as in 54c):
55
It is widely accepted that French synthetic compounds consist almost entirely of VN constructions (e.g. ouvre-bouteille, lave-vaisselle, essuie-glace, etc.) and are headed by a zero affix (Lieber 1992). In this regard, they are not actually exocentric compounds.
108
(54) a. greenshank = ‘a bird with a green shank’
b. rouge-gorge = ‘un oiseau avec la gorge rouge’
c. hammerhead = ‘a shark with a head like a hammer’
Some authors, like Štekauer (1998), argue that these are simply cases of ellipsis in which the
actual head element has been omitted. Others, like Bauer (2008b), believe that we may simply
be dealing with cases of metonymy, where the constituent in head position is understood as a
stand-in for the actual head56. Benczes (2005, 2006), in her extensive work on compounds
involving tropes, argues that, from a cognitive linguistics approach, these headless compounds
can be explained using conceptual metaphor and metonymy, which she says renders them far
less opaque than traditionally understood (cf. Warren 1978). This assertion is in line with Lakoff
and Johnson’s (1980) seminal work on metaphor, in which they go to great lengths to show that
tropes are a core component of language use and not simply figures of speech. Benczes
therefore proposes a typology of compounds based on the many ways that metonymy and
metaphor may interact within and without a compound. Examples of exocentric compounds due
to either metaphor57 (55a-b) or metonymy (55c-d) on the head constituent are as follows (from
Benczes 2006):
(55) a. a jailbird is a *bird → A PRISONER IS A CAGED BIRD
b. a bellybutton is a *button → THE UPPER BODY IS AN UPPER GARMENT
c. a loudmouth is a *mouth → PART FOR WHOLE
d. a gaslight is a light → PRODUCT FOR PRODUCER
The first three compounds in (55) fail the IS-A test and are therefore exocentric according to
most approaches (cf. Bauer 2010). Benczes (2006) argues, however, that in all instances, the
56
Metonymy is understood here, in cognitive linguistics terms, as CONCEPTUAL DOMAIN A FOR CONCEPTUAL DOMAIN B (Kövecses 2002). The transfer is usually based on some element of congruity between domains and often involves meronymic relations, such as WHOLE THING FOR A PART OF THE THING and PART OF A THING FOR THE WHOLE THING (see Radden and Kövecses 1999 for a detailed list of common metonymical relationships). 57
Following Lakoff and Johnson (1980), metaphor is understood as CONCEPTUAL DOMAIN A IS CONCEPTUAL DOMAIN B. Thus, the conceptual metaphor ARGUMENT IS WAR allows for the use of expressions such as “to attack someone’s point” or “to demolish his or her argument” because words that apply to one concept are transposed onto another.
109
head58 can be understood if the metaphor or metonymy is sufficiently established or widespread.
Speakers familiar with the metonymy MOUTH FOR PERSON may therefore have little difficulty in
establishing meaning for a compound such as loudmouth. Moreover, if a particular metonymical
reading becomes sufficiently widespread, the centricity test will no longer produce negative
results: gaslight in (55d), for instance, which Benczes includes in her study59, involves the
PRODUCT FOR PRODUCER metonymy (i.e. LIGHT FOR LAMP), yet does not fail the IS-A test (i.e. a
gaslight is a light). In this case, the trope is in fact sufficiently conventionalized that it has
become an acceptation of that particular lemma60. Similarly, Arnaud (2008) argues that in bird
sanctuary, the head is in fact metaphorical, but that it does not prevent the compound from
being interpreted as endocentric (i.e. a bird sanctuary is a sanctuary). This, of course, isn’t
always the case, as bird in jailbird is not typically used to refer to a person, at least not in a
manner that might render its presence in the compound transparent. Thus, the head of a
compound may be susceptible to varying degrees of tropic extension, which may or may not
blur the line between instances of endocentricity and exocentricity. The distinction between the
two poles may simply be a question of metaphoric or metonymic entrenchment.
Unfortunately, evaluating degrees of entrenchment is no simple task. As Arnaud (2008)
suggests, one might choose to rely on lexicographic sources to see if a particular trope is viewed
as an acceptation of a particular lemma, but this may not always yield convincing results. In
fact, one may be forced to treat as endocentric compounds which would otherwise be treated as
exocentric. Loudmouth, for instance, where mouth is understood as denoting a person, involves
a sense extension listed in the OED, although not without an indication that its usage involves
some shift in meaning: “4. In extended use: a person who speaks.” While it would seem that the
trope is well-known, it nevertheless seems strange to label loudmouth as endocentric, at least not
without some additional designation.
58
Benczes’s work is conducted within the cognitive framework championed by Langacker (1987)—she therefore uses the term “profile determinant” for what is typically called the head in most morphological frameworks. 59
Although Benczes (2006) doesn’t explicitly state that gaslight is exocentric, the focus of her work is on compounds traditionally understood as such. According to my definition of endocentricity, gaslight is endocentric. 60
“5. A body which emits illuminating rays” (OED). This usage has also been conventionalized in French: “I.A.3 Lumière: source de lumière” (LPR2010).
110
An examination of centricity across the compounds in my data reveals a number of cases where
the head involves either a metaphor (56a-b) or a metonymy (56c-d) on the head element, which
may or may not have an effect on how the compound is interpreted:
(56) a. pomme cajou lit. ‘apple cashew’ = ‘fruit of the cashew tree’
b. chou-palmiste lit. ‘cabbage-palm tree’ = ‘edible part (core) of palm tree’
c. blanc-seing lit. ‘blank-signature’ = ‘blank sheet of paper with a signature’
d. bec-figue lit. ‘beak-fig’ = ‘bird that eats figs’
Similar to the English compounds involving tropes mentioned earlier, the French compounds in
(56) all arguably fail the centricity test and are thus typically considered exocentric. In some
instances, however, the metaphor or metonymy may be sufficiently established so as to make
such judgements difficult: in (56a), for example, the metaphor expressed in pomme (‘apple’) is
well established and sufficiently prevalent to produce an endocentric reading (cf. pomme de pin
or fruit de mer). In contrast, it is unlikely that this is the case in (56b). Similarly, the metonymy
in (56c) is most likely not very familiar to most speakers, while the one in (56d) is perhaps more
so.
Although one might simply choose to maintain a hard line between endocentricity and
exocentricity using the IS-A test, it seems entirely justified to look to distinguish between
compounds in which the head retains no meaning at all (e.g. doughnut) and those that can be
motivated using operations such as metaphor and metonymy (e.g. loudmouth). I will argue,
however, that unless these tropic senses are not only widespread, but also narrow in scope, such
compounds cannot legitimately be said to be endocentric. For instance, while it may certainly be
the case that BODY PART FOR BODY is a common and widespread case of metonymy, it may not
be sufficiently narrow to allow for the accurate interpretation of the compound. Thus,
compounds like razorback or yellowtail, or greenshank, which all involve this particular
metonymic relationship, will remain, to some degree, opaque to the speaker as he or she has no
way of knowing what type of entity it denotes (a wild pig, a fish, and a bird respectively). In
cases like airhead or redhead, the trope might indeed be more circumscribed (i.e. HEAD FOR
PERSON), thus rendering them potential candidates for endocentricity via metonymy, but it is
difficult to argue that these compounds are endocentric in the same way as, say arrowhead or
shower head are.
111
I therefore propose that a compound may be either strongly or weakly endocentric, as well as
exocentric, depending on the head’s requirements in terms of sense extension. Although far
from perfect, the method used to determine the centricity is as follows:
(57) A compound C is
i. strongly endocentric if it passes the IS-A test
ii. weakly endocentric if it does not pass the IS-A test, but the head constituent involves
an established sense extension (i.e. listed in a lexicographic work)
iii. exocentric in all other cases
The compound pomme cajou, listed in (56) above, is therefore weakly endocentric. It is also
important to note that sense extensions that seemingly arise naturally or logically (i.e. that can
be motivated), but which require that meaning be established using information far beyond a
lexeme’s context-free usage will retain the traditional label of exocentric. This approach
arguably reflects the fact that such compounds do not provide crucial information or a means of
establishing the exact nature of the designatum.
The choice to simplify and reduce the role of tropes in my approach is partly due to the fact that
it’s not entirely clear just how significantly they contribute to compound interpretation, and thus
compound transparency. While it is clear that both metaphor and metonymy are multi-faceted
concepts in compound meaning (cf. Benczes 2006), attempting to include the numerous degrees
of possible interactions would quickly prove unwieldy and may not ultimately provide facets of
significant relevance to the matter of semantic transparency. Nothing, however, prevents others
from expanding on this approach. One might choose, for instance, to take into account the
hierarchy of metonymic vehicle proposed by Radden and Kövecses’s (1999), or Benczes’s
(2006) extensive typology of metaphor and metonymy when determining just how weak or
strong a given compound’s head actually is. Moreover, it may be the case that metaphor poses a
greater challenge than metonymy, or vice-versa. For the present moment, however, I believe
that distinguishing between weakly and strongly endocentric compounds is a first, yet crucial
step, in establishing what factors are involved in the semantic transparency of these types of
constructions.
112
4.1.5 Summary
Based on the properties of French compounds (as well as reported facts for other languages), we
may summarize issues related to headedness according to the hierarchy below. Canonical and
non-canonical labels refer to the predominant position of the head for a given compound type
(i.e. NN), while strong and weak indicate whether a compound involves some degree of sense
extension.
Figure 4.1. Distribution of compounds according to features related to the head.
Not included in the diagram are bi-centric compounds such as lecteur-graveur and artiste-
interprète. These types are necessarily canonical (as either constituent may be interpreted as the
head) and, while nothing prohibits them from being weakly endocentric, none was actually
found in the data. The fact that they are coordinated will instead be reflected using the relational
typology developed in the next chapter.
One question that the configuration above raises is whether head position or centricity strength
is the most influential aspect of compound transparency. In other words, is a weakly endocentric
canonical compound (e.g. chou-palmiste) more or less transparent than a strongly endocentric
non-canonical compound (e.g. auto-école)? This is not an easy question to answer. While
Jarema et al. (1999) showed that French speakers processed right-headed compounds differently
than they did left-headed ones, it is unclear what sort of effect the presence of tropes has on the
113
interpretation process, their research having been conducted using only the parameters
Transparent/Opaque. Adding to the challenge is that all tropes may not involve the same degree
of sense extension. My stance here is that head position is in fact a highly dominant feature of
compound transparency, more so than centricity strength.
The primary reason for this particular stance is that canonical head position is a necessary factor
in both compound formation and interpretation. Because compounds are inherently ambiguous
(i.e. they lack the necessary information for complete and explicit meaning construal), speakers
must rely on fundamental—and preferably immutable—factors in order to establish meaning for
a given combination. Headedness is one such factor. It is in fact unlikely that, upon
encountering a completely novel compound, that a French speaker, for instance, will first
attempt to interpret it as a right-headed construction without some indication that it should be
understood as such (e.g. presence of a neoclassical stem, cf. Dal and Amiot 2008). Semantic
transparency, being intimately tied to this process, will arguably favour systematicity above all
else. In instances where a compound type may involve either position (i.e. A-N or N-A),
establishing head position may be impossible without additional context (e.g. vert sapin as a
noun or as an adjective), thus reducing overall semantic transparency. Canonically headed
compounds that involve established tropes, on the other hand, merely require that the speaker
understand whether a metaphoric or metonymic reading is necessary for meaning composition.
It should also be noted that, despite their terminal nature on the chart above, exocentric
compounds are not all equal in terms of transparency. The focus of the next section will be on
constituent contribution, which will allow for exocentrics to be further contrasted within their
own class.
4.2 Semantic Compositionality
In the previous section, I explored the notion of headedness as it pertains to compounds and
argued that the presence or absence of a head, as well as the degree of its semantic contribution,
are crucial factors in a compound’s semantic transparency. The head constituent, however, is
only one of two components at play in a compound’s meaning: the non-head constituent must
also be incorporated into a discussion of semantic transparency. To this end, the following
sections explore compositionality as a factor of transparency.
114
4.2.1 Definition and Approach
Chapter 2 focused on previous approaches to semantic transparency and it was shown that the
concept was often—but not always—conflated with semantic compositionality. More
specifically, while some researchers view the two concepts as distinct, most believe that they are
simply two different labels for the same concept. In Section 2.4.1 of that chapter, I laid out my
arguments in favour of the former approach, which is to say that compositionality is distinct
from, yet related to transparency. Crucially, I argued that semantic compositionality should be
understood narrowly as referring to the meaning of a compound’s parts and their relationship to
the meaning of the whole (cf. Svensson 2004, Girju et al. 2005). In this regard, compositionality
is understood as a property of compounds that “feeds” into their semantic transparency. A
compositional compound is therefore a compound whose constituents contribute meaning to that
of the whole, regardless of its perceived transparency. The chief argument in favour of
distinguishing between the two concepts is that a compound may incorporate the meaning of its
constituents without truly being transparent. Conversely, it is unlikely that a semantically
opaque compound is compositional. This unidirectional implication was represented in a
diagram in Chapter 2 and is repeated below for the convenience of the reader. The dotted lines
indicate that the relationship between the two concepts is not one of entailment (i.e. a
compositional compound may be non-transparent, but it is not necessarily so).
Figure 4.2. The relationship between compositionality and transparency.
By distinguishing between compositionality and transparency, we are in fact able to
discriminate between a number of exocentric compounds, which would otherwise be grouped
Compositional Transparent
Non-transparent Non-compositional
115
together as one particular subset of opaque compounds. Thus, the compounds in (58a-b) below
may be differentiated from those in (58c-d) by virtue of their different degrees of
compositionality:
(58) a. année-lumière lit. ‘year-light’ ‘distance traveled by light in a year’
b. jambon-beurre lit. ‘ham-butter’ ‘sandwich containing (only) ham and butter’
c. bourg-épine lit. ‘village-thorn’ ‘shrubby plant with thorns’
d. chat-chateau61 lit. ‘cat-castle’ ‘instrument used to penetrate castle defences’
Although the stipulation above would allow for an exocentric compound to possess a constituent
in head position that contributes non-head meaning alongside a modifier that contributes no
meaning to the whole, no such case was found in the data. Such a case, if it did exist, would
remain partially compositional, just as in true exocentrics in which the non-head contributes
meaning, but not the constituent in head position (e.g. chat-château).
If we return to the initial description of semantic compositionality, we will notice a few
problems. First, in a binary construction, semantic compositionality is a four-way configuration,
but if neither constituent is accorded more weight than the other, this output is reduced to only
three distinct levels. Compositionality is therefore a cline that may be ordered from most
compositional to least compositional, as shown in (59) below:
(59) The meaning of a compound XY may include the meaning of:
a. both X and Y
b. (X, but not Y) or (Y, but not X)
c. neither X nor Y
This is in fact Cruse’s (1986) approach, although he does allow for partial semantic contribution
thanks to his use of the term “semantic indicator” (see Chapter 2, Section 2.3.2 for an overview
of his approach). For the configurations above, I will use the terms compositional (59a),
partially compositional (59b), and non-compositional (59c) to refer to each of these possible
61
“[M]achine au moyen de laquelle des ouvriers à couvert, ébranlaient les murailles et jetaient des ponts sur les fossés ou les remparts” (De Roujoux 1839: 48).
116
outputs. One problem with this somewhat simplistic approach, however, is that partial
compositionality refers to one of two possible configurations. This ambiguity may be eliminated
by factoring in centricity, as Libben (1998) does in his typology of semantic transparency.
Where Libben’s approach results in a seven-way configuration, I will argue in favour of a
reduced distribution that still allows for a clear hierarchy of compositionality based on
centricity. In the diagram below, possible configurations go from most (left) to least
compositional (right):
Figure 4.3. Possible configurations for semantic compositionality.
Non-compositional endocentric compounds are presumably impossible as their centricity
necessarily means that one constituent is a hyponym of the whole. This restriction does not
apply, however, to exocentric compounds as they may be composed of lexemes that contribute
no meaning to that of the whole (e.g. eng. rugrat; fr. compère-loriot). The following examples are
of endocentric compounds that are either compositional (60a) or partially compositional (60b) :
(60) a. stylo-bille lit. ‘pen-ball’ ‘ballpoint pen’
b. bateau-mouche lit. ‘boat-fly’ ‘boat for tourists in Paris’
Compound
Endocentric
Compositional Partially Compositional
Non-‐compositional
Exocentric
Compositional Partially compositional
Non-‐compositional
117
As it stands, compositionality and centricity may combine to produce a five-way configuration,
but once other factors are taken into account, such as tropes and sense extension, the number of
possible combinations increases greatly.
4.2.2 Metaphor and Metonymy
As was discussed in Section 2, metaphor and metonymy may sometimes have an effect on a
compound’s meaning by casting doubt on its centricity status when these tropes target the head
constituent. I argued that one way of addressing the problem was to rely on lexicographical
work to establish whether a particular lexeme’s usage was in fact conventional or if the trope
was perhaps not quite sufficiently established to produce absolute endocentrics. This led me to
argue for a weakly endocentric label for those compounds that might otherwise fail the
centricity test. Unsurprisingly, non-head constituents are also prone to sense extension within
compounds (Benczes 2005, 2006, Arnaud 2008), which may result in varying degrees of
compositionality and hence transparency. In the following examples, the compound’s modifier
in (61a) contributes meaning via metonymy, while in (61b) the same operation takes place via
metaphor:
(61) a. carte soleil lit. ‘card sun’ ‘(health) card with a picture of the sun on it’
b. voiture balai lit. ‘car broom’ ‘vehicle that “sweeps” up last place runners’
The examples above are both endocentric, yet, the presence of tropes alongside these
hypernymic heads may potentially reduce their interpretability. As Štekauer (2005) argues,
coining a complex word “using a non-established shifted (metaphorical) meaning [. . .] reduces
the meaning-predictability of a naming unit” (Štekauer 2005: xix). On the other hand, most of
these types of compounds are not opaque either, nor are they on the same level as compounds in
which the non-head contributes no meaning at all to that of the whole (e.g. bateau-mouche,
chou-croûte62, laurier-tin). The question, of course, is whether they are fully or partially
compositional. I will return to this question in a moment.
62
Diachronically, chou-croûte is not a compound, but synchronically, it has all the features of one: “Étym. Allem. Sauerkraut, de sauer, aigre, sur (voy. sur, adj.), et, Kraut, herbe, l'assimilation avec chou ayant altéré sauer” (Littré 1873, Vol. 1).
118
Also a factor in a compound’s compositionality is just how metaphor and metonymy may
interact with each other to produce meaning. Goossens (1995) calls “metaphtonymy” the
interplay between metaphor and metonymy, and argues that—following a study of various
British expressions—it is either integrated, which is to say that metonymy and metaphor are
combined, or cumulative, where one trope is derived from the other. According to Geeraerts
(2002), metonymy and metaphor may occur in multiword expressions in one of three ways: i)
consecutively, ii) in parallel, and iii) interchangeably. Likewise, Benczes’s (2006) framework
allows for metaphor and metonymy to emerge in a number of different configurations, which
results in a rather detailed and complex typology. Thus, according to Benczes, accounting for a
compound such as macarena page (“a webpage capitalising on a current fad, they are usually
full of fluff and have a short life expectancy”, 2006: 167) requires that a metaphorical
relationship between constituents first be established (i.e. a page that is like the macarena), and
which is then further expanded upon using metonymy on N1 (i.e. macarena for fad). An analysis
of the French compounds collected for my work reveals a number of similar examples, as
follows:
(62) a. singe-lion lit. ‘monkey-lion’ ‘lion tamarin’
b. oiseau-lyre lit. ‘bird-lyre’ ‘lyrebird’
b. effet papillon lit. ‘effect butterfly’ ‘butterfly effect’
In (62a), the relationship between the compound’s constituents involves a sub-part of each of
the element’s designatum: a lion tamarin is a tamarin whose mane resembles the mane of a lion.
The presence of metonymy in the head differs from those discussed in Section 4.1.4.1 as it only
arises within the context of the compound (i.e. when establishing the link between constituents),
which explains why the centricity of the whole unit is not affected (i.e. un singe-lion est un
singe). Similarly, the compound in (62b) also incorporates a trope that arises only when
establishing meaning, but in this case it involves a metaphor on the non-head: the lyrebird is a
bird whose tail resembles a lyre. Thus, metonymy is invoked via the whole-part trope for the
head, which is in turn connected to the non-head via physical resemblance. Although speakers
no doubt reduce the level of complexity for compounds involving parallel tropes (i.e. a tamarin
that looks like a lion), mixed tropes are not so easily parsed (i.e. *a bird that looks like a lyre).
Both types, however, require that the speaker establish just what part of the designatum is
involved in the trope.
119
Furthermore, because a compound ultimately functions as a single lexical unit, its meaning may
involve a trope at a global level. In (62c), for instance, we have a metaphor that applies to the
whole compound: a butterfly effect is an effect like the effect caused by a butterfly63. This type
of comprehensive metaphor is in fact much more difficult to assess in terms of compositionality
than a localized metaphor and may be closer to an idiom given that its meaning can only be
understood in the non-literal sense (Gibbs et al. 1989).
The number of potential combinations of tropes in a given compound makes it extremely
difficult not only to offer an exhaustive set of features that might affect semantic transparency,
but also to determine which of these combinations has the greatest impact. In its simplest
manifestation, we have nine possible configurations given the features metaphor, metonymy, or
literal64. That said, some of these complexities have already been taken care of with the
strong/weak endocentric distinction suggested earlier, but as I have shown, tropes may arise
internally without directly impacting the status of the head constituent. Moreover, as Benczes
(2006) has shown, both metaphor and metonymy may also relate to each other in additional
ways, which may then be applied to the whole compound and not just its parts. Similar to my
approach to centricity, where I distinguished between strong and weak endocentric compounds
based on a head constituent involving tropes, I will also distinguish between strong and weak
compositionality in a similar manner. This decision is unfortunately only a partial solution to the
challenges discussed above and may prove insufficient were these compounds to be evaluated
by speakers. The data, however, seems to suggest that once we’ve made a tropic distinction at
the level of centricity, the meaning of the modifier is far more “forgiving” at the level of
interpretation. In other words, where metonymy and metaphor made it difficult to test
headedness using the IS-A paraphrase, tropes on modifiers do not typically block a simple
predicative paraphrase, as the following examples show:
63
Compare with ‘an effect caused by something like a butterfly,’ which would be a metaphor solely on the non-head constituent. (“[T]he phenomenon whereby a very insignificant change in a complex system can significantly alter an anticipated course of events” OED.) 64
As follows: literal-literal, literal-metaphor, metaphor-metaphor, metaphor-literal, literal-metonymy, metonymy-metonymy, metonymy-literal, metonymy-metaphor, metaphor-metonymy. Configurational complexity would also increase if we were to account for tropes applied to the whole.
120
(63) a. Metonymy on Modifier: bleu horizon/ciel, jaune paille, pierre miel, pince crocodile,
porte papillon, rouge sang
H THAT RESEMBLES M ex. bleu qui ressemble à l’horizon
b. Metaphor on Modifier: client-cible, date butoir, poursuite-bâillon, site miroir, taux
plafond
H THAT SERVES AS M ex. client qui sert de cible
c. Metonymy on Head/Metaphor on Modifier: oiseau-lyre, poisson-sabre/épée, noctuelle
gamma, couleuvre à collier, serpent à lunette
H THAT HAS M AS A PART ex. oiseau qui a une lyre en tant que partie
Such paraphrases, while not entirely accurate, are largely sufficient to understand the meaning
of the compound. This should come as no surprise, given how fundamental metaphor and
metonymy are to language and how frequently they are used—this is in fact the premise behind
Lakoff and Johnson’s (1980) work. Moreover, as Benczes (2006) argues, if metaphor and
metonymy were truly significant hurdles to compound interpretation, one would have to ask
why they are so prevalent and why speakers continue to coin new ones that often involve major
sense extensions. In all cases, then, these compounds can be viewed as compositional, albeit to a
lesser degree than more literal instances. Thus the compounds in (63) above, which I call
weakly compositional, may be opposed to the following cases, where the modifier is understood
literally65:
(64) a. H THAT RESEMBLES M ex. hyppocampe-feuille, chou-fleur
b. H THAT SERVES AS M ex. avion-cargo, mémoire tampon
c. H THAT HAS M AS A PART ex. stylo-bille, montre-bracelet
My argument here is that compounds involving tropes are in fact compositional. It may be that
further distinctions are required to develop an even finer grained typology (i.e. which tropes are
present and whether one generates the other), but I will set aside these details so that my
65
Although it was argued that the compounds in (63) permit paraphrases using basic predication, they nevertheless involve some form of sense extension (e.g. metonymy: rouge qui ressemble à la couleur du sang). They may therefore be contrasted with the compounds in (64), which do not require such shifts (e.g. chou qui ressemble à une fleur).
121
typology might be as tractable as possible. The remaining step is to combine compositionality
and centricity so as to order them in terms of their effects on semantic transparency.
4.2.2.1 Combining Compositionality and Centricity
The main issue with an approach based on different features or parameters is that it is seldom
clear how each of these properties should be weighted. At the top level, it is largely
uncontroversial to distinguish between endocentric and exocentric compounds, but once we
begin to examine how the levels within endocentrics should be organised, the picture is far less
clear. In other words, does compositionality have a greater or lesser effect on transparency than
the position and the strength of the head? Unfortunately, research in this area is too limited to
offer any clear solutions. The most obvious way to address this problem, which is to compare
compounds for each of the possible configurations and to determine which ones are more
transparent than others, is unfortunately circular in its reasoning: compound A is more
transparent than compound B, which means feature A is more transparent than feature B, which
in turn shows that compound A is more transparent that compound B. Ideally, compounds
involving various combinations of features would undergo testing with native speakers, which
would then allow us to determine which features seem to have the greatest effect on
interpretation. That said, because this work is exploratory in nature—which is to say that its goal
is to propose a typology of features that might be used for future research—comparing
compounds that fit the possible feature configurations suggested should, to some extent, still
allow for these features to be weighted. To this end, Table 4.2 on the following page contains
compounds for each of the possible feature sets66.
66
A checkmark indicates the strong or positive parameter for a given feature. In the case of centricity, the distinction is between strongly and weakly endocentric compounds; for compositionality, fully compositional may be opposed to partially/weakly compositional.
122
Table 4.2. Possible combinations of compound features.
Centricity Canonical Compositional Compound Meaning
✓ ✓ ✓ stylo-bille ‘stylo avec une bille’
✓ ✕ ✓ radio-taxi ‘taxi qui utilise une radio’
✓ ✓ ✕ bateau mouche ‘bateau pour touriste’
✓ ✕ ✕ aube-vigne ‘vigne’
✕ ✓ ✓ pomme-cajou ‘fruit du cajou’
✕ ✕ ✓ vidéo-lynchage ‘enregistrement d’un acte répréhensible avec l’intention de le diffuser’
✕ ✓ ✕ --- ---
✕ ✕ ✕ --- ---
One thing to notice is that if a compound is weakly endocentric (i.e. the head involves a
metaphor or metonymy), it’s non-head element must contribute literally to the whole, as
evidenced by the absence of non-compositional combinations in the data. This may not hold for
a larger dataset, but for the moment, we may postulate that compositionality is highly dependent
on centricity.
Taking the compounds in the previous table and comparing them according to
opposing/alternate features results in pairs better suited to the evaluation of said features:
Contrasting Features
pomme-cajou radio-taxi Weak Centricity ~ Non Canonical Head Position
pomme-cajou bateau mouche Weak Centricity ~ Partially Compositional
radio-taxi bateau mouche Non Canonical Head Position ~ Partially Compositional
In the table above, the listed compounds may each be contrasted according to one negative or
weak feature. For instance, pomme-cajou is weakly endocentric, but both canonically headed
and compositional; radio-taxi, on the other hand, is non-canonically headed, but both strongly
endocentric and compositional. These two compounds, when opposed, allow us to compare the
semantic strength of the head relative to its position. If we judge radio-taxi to be less transparent
than pomme-cajou, then head position is most likely a stronger indicator of transparency than
the degree of endocentricity (in terms of the weak ~ strong distinction). The bolded compounds
in the table above are those I believe are most transparent within pairs, which suggests that the
123
degree of a compound’s transparency may be assessed according to the following hierarchy:
Head Position > Centricity Strength > Compositionality. This hierarchy also reflects the fact that
partial or weak compositionality cannot occur if the head is not understood literally. If we
compare compounds possessing a single positive or strong feature, two particular details arise,
as illustrated by the following table.
Compound Canonical Strong Endocentric Compositional
--- + − −
aube-vigne − + −
vidéo-lynchage − − +
First, the proposed hierarchy grants a left-headed, weakly endocentric and weakly or partially
compositional compounds greater transparency than it does other combinations, despite the fact
that no such compound is present in the data. This isn’t entirely problematic, given that if such a
compound existed, it would still rank lower than other canonically headed compounds.
Moreover, such a compound remains endocentric, albeit weakly, which explains how it might
be, all things being equal, more transparent than compounds like aube-vigne or vidéo-lynchage.
Second, the comparison also shows how tenuous the order is for centricity and compositionality,
as one could just as easily argue that non-compositionality has a greater effect on reducing
transparency than a figurative head constituent. As was mentioned earlier, the decision made
here is meant to illustrate how such an approach might work; further research might certainly
allow for properties to be weighted differently.
Also related to this last point is how counter-intuitive it may seem to argue that compositionality
is less significant in terms of transparency effects than head position. After all, traditional
approaches to semantic transparency have often emphasized the semantic contribution of a
compound’s elements over other indicators. Yet, if we compare radio-taxi (right-headed and
compositional) and bateau mouche (left-headed and partially compositional), it is possible to
argue that canonical headedness is a stronger factor in meaning construal than compositionality
alone. This is perhaps not so surprising after all, if we understand compounds as inherently
ambiguous lexical items: most provide just enough information to understand what they might
mean, but do not convey crucial information necessary to fully understand them. A compound
with its head in canonical position allows speakers to establish just what it is the item is
referring to. Conversely, a non-canonically headed compound, when encountered for the first
124
time, may lead to the incorrect identification of its hypernym (e.g. radio in the case of radio-
taxi) and thus to an erroneous interpretation. This was in fact the argument offered in Section
4.1.5, where it was said that head position was a greater factor in transparency than the
distinction between strong and weak centricity.
Given these observations, we might suggest the following (partial) hierarchy of features, in
which the corresponding compounds, from left to right, go from most transparent to least
transparent.
Figure 4.4. Distribution of features for endocentric compounds.
Given that no convincing examples were found for weakly endocentric compounds in which the
non-head element either involved a sense extension or contributed no meaning at all, we might
postulate that compositionality is constrained by the nature of the head itself. In other words, if
the head already involves a metaphor or metonymy, thus weakening its semantic transparency,
then the non-head constituent is likely to be understood literally. Interestingly, this constraint
may not be as strong for exocentric compounds as a few such cases were found in the data. The
compound trou-madame, for instance, which is a game in which players attempt to push small
balls into holes, is partially compositional, but only on the left-most constituent. It must be
noted, however, that such examples are not numerous, which suggests that compositionality
Endocentric
Canonical
Strongly Endocentric
Fully Compositional
stylo-bille
Weakly Compositional
carte soleil
Partially Compositional bateau mouche
Weakly Endocentric
Fully Compositional pomme-cajou
Weakly Compositional
???
Partially Compositional
???
125
stemming from the non-head may depend on the semantic nature of the head. The issue is
fundamentally empirical in nature, which may be better addressed following the examination of
a much larger dataset.
4.2.3 Summary
A compound is said to be fully compositional if all constituents retain (and therefore contribute)
their individual meanings to the meaning of the whole; a compound is partially compositional if
the non-head does not contribute meaning to the whole. In cases where an established trope
targets the non-head element, then we may say that the compound is weakly compositional.
Only when neither constituent retains meaning is a compound considered non-compositional.
Such a case is only possible for exocentric compounds. Although the relative importance of
each feature heretofore discussed is difficult to assess, it has been proposed that they may be
ordered in the following manner: Head Position > Centricity Strength > Compositionality. The
next section will explore how compounds with the same feature sets might be further contrasted
using similar lexicalized compounds as a point of comparison.
4.3 Semantic Homogeneity
Unlike other word formation processes, there are few restrictions on just what types of items
may be joined together to form a compound. Theories of word formation often invoke rules to
explain how new units are formed (cf. Aronoff 1976), most of which are highly productive
mechanisms that, based on some input, produce an output that is appropriate given the
parameters of the specified language. In other words, rules account for a language’s potential
words. While compounds are not free to involve just any item from the lexicon, the selectional
criteria that govern compounds are significantly less restrictive than they are for derivation.
At its simplest, compounding may be said to hinge on basic rewrite rules according to what
combinations of lexical categories are possible for a given language. This was Selkirk’s (1982)
approach, stating that, for instance, an adjectival compound in English consists of either a noun,
an adjective, or a preposition, followed by an adjective (i.e. A → {N, A, P} A) (16). As was
discussed in the previous chapter, similar rules may be stipulated for French (Zwanenburg 1992,
Fradin 2009). It is this particular fact that has allowed researchers to focus their efforts on
126
compounds involving only certain lexical categories (e.g. Noun-Noun, Verb-Noun, etc.). Once
we step away from the constraints imposed by lexical categories, however, there is little
preventing two words from being combined to form a compound. Of course, some words may
seem incompatible from a semantic perspective, but this sort of criterion often fails to truly
predict what might constitute a potential or impossible compound, especially those that are
coined spontaneously or that are context dependent (e.g. pumpkin bus, Downing 1977). One
might say that there are in fact few impossible compounds, which makes this particular type of
word-formation highly productive67. This is evident not only in the frequency of novel
compounds, but also in the variety of coined compounds used to name new products or
companies (Facebook, YouTube, SoundCloud, etc.).
Instead of talking about sub-categorization and base selection, many researchers instead
emphasize compounding as a process based on schemata. Such schemata are meant to give
compounds a fundamental frame from which one may analyze existing forms, as well as create
new ones. Examples are given below, from the least to the most specified:
(65) a. Ten Hacken (1999) [X Y]Z or [Y X]Z
b. Jackendoff (2010) [F (. . . , X1, . . . Y2, . . .)]
c. Booij (2010) [Xi Yj]Yk ↔ [SEMj with some relation R to SEMi]k
d. Bell and Schäfer (2013) λ B λ A λ y λ x [A(x) & R(x,y) & B(y)]
In all instances, a compound is said to consist of two unspecified words68 and a function (as in
65b) or a relation (as in 65c-d) that links the two units together. Just what these relations or
functions may be is a discussion that will take place in the next chapter. For the moment, what
we may observe given these schemata is that compounds are complex units for which the
constituting elements are largely underspecified. For instance, while a word such as widity is
impossible because the affix “-ityN” selects a [+latinate] adjective as its base (Scalise and
67
Morphological productivity is a vigorously debated concept that I have chosen not to address here, but it seems justified to say that compounding is in fact a productive means of word-formation, given how many new words that enter the lexicon are compounds. Tulloch (1991), for instance, lists 1,950 new English words, 621 of which are compounds, by far the largest category of new entries (cited in Bauer 2001b). 68
It is largely understood that compounding is a recursive process that may thus consist of an unlimited number of words (Selkirk 1982, Lieber 1992). Most schemata may easily be expanded to account for this.
127
Guevara 2005), no such constraint may be stipulated for compounds. If a language allows for
NN constructions, then presumably all nouns are available to the process of compound
formation.
If compounds are unrestricted, then, what rules might govern their creation? After all, as
Štekauer (2005) puts it, “[n]ew naming units do not come into existence in a vacuum or
accidentally” (43). While speakers no doubt coin compounds by selecting lexical items relevant
(i.e. semantically related) to the things they aim to denote, is the process influenced by other
factors? In Štekauer’s (2005) onomasiological approach to word-formation, both lexical creation
and interpretation are governed by a number of different factors, including extra-linguistic
reality and speech community. We may also add the lexicon itself to these factors: speakers may
call upon their knowledge of other, similar compounds in order to produce new items in a
process called analogical word-formation (Bauer 1983). I would like to emphasize that if
analogy is at play in the creation of new forms, it is also likely involved in the interpretation of
novel combinations.
Although not strictly limited to compounding, analogical word-formation involves the creation
of new words patterned on existing words in the lexicon. According to Booij (2010), at its
extreme point, a word formed via analogy is an opaque construction: “For these [analogical]
words we can indeed point to one particular compound as the model word for the formation of
the new compound, and the meaning of this new compound is not retrievable without knowing
the (idiomatic) meaning of the model compound” (94). Such extreme cases may be based on
either already opaque compounds or ones in which one or more constituents have undergone a
significant shift in meaning post-formation. The following examples from Ryder (1994) show
different examples of compounds formed via analogy, some of which might be viewed as more
reliant on the source construction than others.
(66) a. whitemail based on blackmail
b. ice legs based on sea legs
c. Iran-gate based on Watergate
In both (66a) and (66c), the source compounds are most likely synchronically opaque—it is
difficult to imagine that the hearer would understand, for instance, the meaning of whitemail if
he or she were not already familiar with blackmail. The pair of compounds in (66b) are perhaps
128
slightly less challenging, but again, it is entirely possible that understanding ice legs requires
that one know the meaning of sea legs.
The French data taken from Wiktionary contains a series of compounds that might also be said
to have been constructed on the basis of analogy69:
(67) a. seconde/minute/heure/jour/semaine/mois-lumière based on année-lumière
b. moto/vélo/bateau-école based on auto-école
Becker (1993), arguing in favour of analogical word-formation, states that “[a] compound like
firewoman ‘female fire fighter’ or frogwoman ‘female skin diver’ is not formed on the basis of
fire or frog and woman but on the basis of fireman and frogman. The constituents of firewoman
do not motivate the compound” (13). Becker refers to these particular cases as “replacive
compounds” and sets them apart from ruled-based “derivational compounds.” The key
distinction is that replacive compounds typically involve items that share a paradigmatic
relationship. For instance, airman might be the replacive result of seaman as the non-head
constituents are both items in a paradigm (i.e. land, air, sea). By this measure, the French
compounds in (67) above are all replacive in nature and thus analogical. It is not clear, however,
if this paradigmatic criterion is in fact a steadfast requirement of analogical compounds, as the
lexemes involved in many N-gate compounds do not necessarily participate in any sort of
paradigm.
Perhaps crucially, and as Bauer (1983) underlines, word-formation based on analogy is not,
under most circumstances, as productive as rule-based formation, though this does not mean that
analogical frames are incapable of generating a large number of new forms:
“That is, following Thompson (1975: 347), a distinction is drawn between productivity and analogy. This does not preclude the possibility, of course, that an analogical formation will provide the impetus for a series of formations: this is presumably what happened in the case of formations in –scape, based on landscape, then an analogical formation seascape giving eventually a productive series including [. . .] cloudscape, skyscape and waterscape.” (Bauer 1983: 96)
69
This assertion is based on the fact that only année-lumière is listed in dictionaries like LPR2010 and TLFi, indicating that it is most likely the basis for the others mentionned. Likewise, LPR2010 only lists auto-école.
129
Ryder’s Iran-gate example in (66) above offers another case of analogical compound formation
that has become highly productive: N-gate is now the standard frame with which to introduce
new words denoting a scandal70. This view is somewhat tempered, however, if one understands
that this productivity is most likely the result of the lexeme gate having acquired the meaning of
‘political scandal’, thus rendering these cases as simple compounds coined according to regular
compounding rules. Despite the existence of productive templates, Bauer (1983), still hesitates
to grant analogical word-formation a very prominent place on his cline of morphological
productivity, stating that “[t]he limiting case of productivity at the lower end is presented by
analogy, where only one new form may exist” (100). This is in fact quite true. While affixes
seldom produce only one possible entry in the lexicon, an existing word might only produce one
new form via analogical means. Of the examples offered earlier, sea legs is unlikely to generate
a large number of related compounds.
Just where exactly analogy might fit into a theory of word-formation is certainly an on-going
matter of debate. Derwing and Skousen (1989), who argue in favour of analogical word-
formation from a cognitive perspective, offer ten main points upon which analogical and rule-
based theories may be opposed. They argue that while a rule-based model may allow a speaker
to store fewer lexical items, an analogy-based model results in far lower computational loads.
Bauer (2001b) also offers a good overview of the advantages and disadvantages of a theory of
word formation based on analogy. Some of the objections are that analogy fails to predict
potential words and that it is not sufficiently restrictive. While there is no doubt that these
objections are tenable in the case of word-formation involving bases and affixes, it is unclear if
they remain as strong for compounding. Bauer (2001b) also lays out a number of arguments in
support of analogy, notably how it may account for irregularities in word-formation (i.e.
multiple, different nominalisations for the same verbal base). In the case of compounding,
analogy therefore explains compounds that are otherwise entirely opaque (e.g. whitemail and
greymail). Bauer concludes that the two mechanisms no doubt co-exist, each supplementing the
other. This stance is similar to the one held by Booij (2010), for whom word formation is both a
matter of analogy and schema.
70
Wikipedia, at the time of writing, lists more than one hundred such compounds. While some of these cases may not be widely used, their number no doubt lends support to the argument that analogy may prove to be productive.
130
Whether the process of analogical word-formation is more effective or better captures the
mechanisms in lexical creation than rules goes beyond the scope of this work. No matter what
the theoretical implications are, however, it is undeniable that some lexemes are created based
on existing forms in the language and that this process also applies to compounds. The question
is therefore whether analogy can be executed in reverse? In other words, if a compound can be
created analogically, might they also be interpreted this way, which is to say, might a hearer rely
on his or her knowledge and understanding of similar forms to interpret new ones? Van
Jaarsveld et al. (1994) describe this hypothetical process as follows:
“To determine whether some lexicalized compound is a suitable model, its semantic representation will have to be retrieved. The relation specified between the nouns in the lexicalized compound will be applied to the novel compound (that is the core of the analogous interpretation) and the result of this process will be evaluated with respect to meaningfulness. When the outcome of this evaluation process is unsatisfactory (according to some criterion), the process will be repeated for some other lexicalized compound. When the outcome is, however, satisfactory, the nouns of the novel compound will be related in the same way as the nouns for the lexicalized compound and interpretative processing of the novel compound will stop.” (116-117)
Similarly, Derwing and Skousen (1989), discussing Skousen’s (1989) parallel work, use the
terms supracontextual homogeneity and random selection in the context of interpretation via
analogical means, stating that “if the given context does not lead to a single, definitive solution
in the lexicon, a range of surrounding supracontexts is explored until a point of supracontextual
heterogeneity, explicitly defined, is reached; a random choice is then made from among the set
of possible analogical examples made available by the search” (64). These mechanisms describe
meaning resolution via an evaluation of similar forms present in the speaker’s lexicon.
Presumably, it is up to speakers to determine if a particular meaning is valid using other
(extralinguistic) means.
Unfortunately, experiments meant to verify to what extent analogy influences a speaker’s
processing of compounds have produced mixed results. Van Jaarsveld and Rattink (1988), for
instance, tested the effects of lexical frequency on compound processing by Dutch speakers and
concluded that the existence (and availability) of a lexicalized form could serve as the basis of
interpretation for a novel compound. Clark and Berman (1987), however, in their work on
children’s understanding and production of novel compounds in Hebrew found no such
influence in paraphrasing tasks: they obtained similar results for tests using compounds based
131
on low frequency heads (i.e. heads that did not appear frequently in lexicalized compounds) and
those based on high frequency heads, concluding “that knowledge of the pertinent lexical items,
and not the constructions they appear in, is more important for compounding.” (Clark and
Berman 1987: 560). Later experiments by Van Jaarsveld et al. (1994) lends additional support
for these findings, though not without some measure of nuance. In their first experiment
involving a lexical decision task, participants responded faster overall to novel compounds
containing nouns also present in a large number of lexicalized compounds. They did not,
however, find that reaction times were affected by a novel compound’s degree of
interpretability, which was determined beforehand by other participants who rated each novel
compound on a 7-point scale, ranging from very difficult to interpret to very easy to interpret. It
is unclear whether this method of determining interpretability might have had some effect on the
absence of interaction between set-size and interpretability, but it does suggest that analogy may
have a limited effect on how a speaker judges a compound’s interpretability. In their second
experiment, Van Jaarsveld et al. looked to compare compounds in which the shared noun was
based on their semantic relatedness to lexicalized compounds. For instance, the first word of the
novel compound coughing pause is semantically related to the first word of the lexicalized
compound breathing pause, whereas no such relation exists between the latter and the novel
compound bay pause. Such pairs were constructed for both high and low productive sets of
lexicalized compounds. They found that, in a prime-target lexical decision task, participants’
reaction times were lower for semantically related pairs, but did not differ significantly between
low and high frequency lexicalized compounds. Based on the results for both experiments, Van
Jaarsveld et al. (1994) conclude that it is unlikely that individuals make use of existing
compounds when interpreting novel ones. Despite this conclusion, they concede that their
results nevertheless suggest that lexicalized compounds are being activated at some level.
In Van Jaarsveld et al.’s (1994) investigation of analogical effects on compound processing, it
was understood that sets were constructed based on a given noun’s overall frequency of
appearance in compounds. Their second experiment, which might be viewed as most relevant to
the question of analogical compound interpretation, involved single existing compounds as
targets, selected based on whether the head was of a high frequency or low frequency in their
database. What this does not address, however, is whether these sets of existing compounds
were semantically homogeneous. In other words, while speaker reaction times may not have
132
been influenced by high or low frequency heads, it is entirely possible that the high frequency
heads belonged to a set of compounds that differed greatly in meaning. Ryder’s (1994) work on
how speakers interpret novel compounds addresses this particular point. She argues that certain
words not only participate in a large number of existing compounds, but the degree to which
these sets are semantically homogenous will influence how a speaker will interpret a novel form
involving those same nouns. This differs from Van Jaarsveld et al.’s (1994) assumption that
“initial activation of lexicalized compounds will be independent of characteristics of the whole
set” (116), which led them not to take into account the semantic homogeneity of a particular
group of lexicalized compounds.
The premise behind Ryder’s (1994) work is that speakers do in fact use their knowledge of
existing compounds to determine the meaning of novel ones, based on what she calls linguistic
templates. Ryder refers to a particular template as an analogy base. According to Ryder,
analogy bases may consist of templates based on groups of compounds that share a common
element. Thus, sea-N (e.g. sea lion, seaman, sea cow, seaweed, etc.) and N-house (e.g.
boathouse, warehouse, treehouse, firehouse, etc.) are both potential linguistic templates. The
words at the heart of such templates (i.e. sea and house respectively) are called core words and
are the key to how analogy is applied (see also Becker 1993 for a similar approach).
Ryder uses what Bates and MacWhinney (1987) call cue reliability to assess just how
semantically influential a template can be. Bates and MacWhinney define the notion as “a ratio
of cases in which a cue leads to the correct conclusion, over the number of cases in which it is
available” (164). Using this approach as her starting point, Ryder distinguishes between two
types of cue reliability (1994: 81-82):
(68) i. absolute cue reliability: “core words contribute the same meaning regardless of what
they are paired with”
ii. relative cue reliability: “while one cannot predict the meaning of the compound just
from the presence of the core word, the conjunction of the word with a certain semantic
class of other words produces highly reliable results”
One can imagine that cue reliability is more likely to be relative than absolute as few core words
(or templates) would participate in a set of compounds that all share a common semantic thread.
133
Ryder offers box as an example of a core word with a high cue reliability as most compounds
seem to involve a container-contained relationship, but it is not difficult to think of a number of
compounds that do not (e.g. boom box, music box, signal box, etc.). Moreover, a high cue
reliability is no doubt intimately linked to a core word’s semantic representation. As Ryder
(1994) says of her example, “Box has a high cue reliability because it has a highly central
schema with a very salient slot for the item to be put in the box, and naturally this is the schema
that has been used in almost all X + box compounds.” (145).
Ryder’s approach is echoed in Štekauer’s (2005) work on meaning predictability, in which
analogy is one of twelve factors that, together, influence the meaning predictability of a novel,
context-free unit. Štekauer’s experiments revealed that participants’ interpretations of novel
compounds were occasionally influenced by the existence of similar compounds, which led
them to offer meanings based on analogical templates. Štekauer also found, however, that, while
these “templates [were] insufficient to recognise the subtle shades of individual readings” (258),
existing forms could have either a “boosting” effect on meaning predictability or, when there
was no possibility to interpret the word using an existing template, a reduction of that unit’s
meaning predictability rate. In either case, the existence of similar lexical items is said to have
an influence on the interpretation of a novel one.
Similarly, Baroni et al. (2007), also looking at the potential effects of lexicalized compounds on
the processing of new ones, propose what they call Lexicalized Interpretation Schema (LIS),
which is “an abstract constructional pattern [. . .] shared by all members of the same compound
family” (273). Their approach is largely based on results from Baroni et al. (2006) in which they
“found a strong tendency for the same heads and modifiers to be repeatedly used within the
sample of compounds from all frequency ranges they analyzed” (reported in Baroni et al. 2007:
279). Where Ryder spoke of core words, Baroni et al. talk about pivots, which may either be the
head or the modifier for a given compound. What distinguishes their model from Ryder’s,
however, is that they claim that a compound’s pivot is governed by the type of compound in
which it is found. Thus, according to Baroni et al., relational compounds (e.g. sugar box) will
involve an LIS based on the head as its pivot (i.e. X box schema), whereas attributive
compounds (e.g. feather luggage) will have an LIS with the modifier as its pivot (i.e. feather X).
What their model predicts is that novel compounds for which the pivot is retained will be easier
to understand than those in which it is changed. For instance, for an attributive compound such
134
as feather luggage, we might generate either wing luggage (pivot substituted) or feather trolley
(pivot retained), the result of which is that the latter will be easier to understand than the former.
Moreover, they predicted that under certain circumstances, a semantically related lexeme might
be substituted for the pivot without negatively impacting the resulting compound’s ease of
interpretability. They tested speaker acceptability judgements for 380 analogically constructed
novel compounds and found that their predictions were largely borne out. The results are
summarized in the following table:
Relational
(e.g. city centre) Attributive
(e.g. zebra pot)
Substitution of head Only a semantically related word Any word
Substitution of modifier Only a semantically related word No substitution permitted
Broadly speaking, substituting either component of a relational compound with anything but a
semantically similar noun will result in an unacceptable combination. For attributive
compounds, however, only the head may be substituted, but the new noun need not be
semantically related to the original.
Most of the work on the role of analogy on compound interpretation has focused on entirely new
and novel constructions. The question here is whether the concept can also be applied to existing
compounds? I believe that it can. The premise that might allow for analogy to be factored into
evaluating the transparency of existing compounds is in fact quite simple. Semantic
transparency was defined in Chapter 2 as a compound’s degree of interpretability in the absence
of prior knowledge of that particular combination. In other words, semantic transparency applies
to existing compounds as if they were novel, which is to say that the speaker, being familiar
with only its constituents, must attempt to establish meaning using the same methods he or she
would use if the compound had just been coined. Knowledge of existing compounds may factor
into this process regardless of a compound’s status as either novel or established. Thus, if a
particular compound happens to share its template with a number of other compounds, the
interpretation of said compound may be influenced by how semantically uniform that template
is. Moreover, if a given template is highly homogeneous, compounds based on that template, but
that do not involve the same meaning may be harder to understand that those that do. In some
135
ways, this approach is similar to theories involving family size, which is said to influence
processing of morphologically complex words (Schreuder and Baayen 1997, Bertram et al.
2000, De Jong et al. 2002).
4.3.1 Semantic Reliability Index
In order to offer a point of comparison between similar compounds, I will call the semantic
reliability index (henceforth SRI) the measure of a template’s semantic homogeneity in relation
to a particular compound. A template is to be understood as a specific lexeme alongside a
lexical category (e.g. sea N, cf. Ryder 1994). Thus, a compound’s SRI is calculated by dividing
the number of semantically similar compounds sharing the same template by the total number of
compounds for that template:
(69) # of semantically similar compounds based on template T = Compound
# of total compounds based on template T SRI
The concept of SRI, while largely based on Ryder’s work, also bears some resemblance to
Baroni et al’s (2007) LIS71. Moreover, the calculation proposed above is also similar to
Štekauer’s (2005) calculation of a unit’s predictability rate, which is calculated by dividing the
number of participants who judged a particular meaning acceptable by the total number of
participants; this number is then multiplied by the quotient obtained from dividing the sum of all
rating points assigned by participants by the total number of points possible72. Thus, when
Štekauer tested the compound baby book, he found that 38 of his 40 participants found ‘a book
for babies’ acceptable with a points tally of 306 out of 400, resulting in a predictability rate of
0.727. Similarly, the SRI calculation proposed here produces results anywhere between 0.001
and 1.000.
71
The SRI is also similar to Gagné and Shoben’s (1997) strength ratio within their CARIN theory, as well as to the relational measures proposed in Pham and Baayen (2013). 72
These points refer to a scale used by participants to evaluate a given meaning’s acceptability (i.e. 1 is least acceptable, while 10 is most acceptable). Presumably, if no such scale were used (in a yes or no solicitation task, cf. Baroni et al. 2007), Štekauer’s predictability rate would simply involve the number of acceptable responses divided by the total number of responses.
136
But just what does “semantically similar” in the equation above mean? Given that compounds
are often characterized as a pair of nouns related to each other in some way, we may want to
judge similarity based on the relation present for a given compound. This approach is backed up
by experimental work by Gagné and Shoben (1997), as well as Gagné (2001), who found that
participants’ reaction times in lexical decision tasks were affected by the relative frequency of a
modifier’s relational history. In other words, participants found it easier to judge novel noun
pairs when the most likely relation was also the modifier’s most frequent relation for existing
compounds. For instance, compounds containing mountain as a modifier were easier to interpret
if they involved a locative relation (e.g. mountain cloud) than a less frequent association for that
particular lexeme (e.g. ABOUT: mountain magazine). Thus, the SRI proposed here is a simple
means to assign a number to a compound based on how likely its meaning is given other similar
forms in the lexicon. It must be stressed, however, that taken on its own, the SRI is most likely
not a very strong indicator of semantic transparency as it is highly dependent on the number of
compounds available for a particular template, which will no doubt vary greatly from one
construction to another. Furthermore, individual speakers will not all draw from the same set of
compounds for a given template, a fact that supports a wholly “speaker-dependent” SRI.
In order to further illustrate how the SRI might be applied to a set of compounds, all NN and N
à N combinations in my data were examined and grouped together according to how often either
of their constituent lexemes occurred. Unfortunately, the number of high frequency compound
patterns identified in the data collected from Wiktionary is quite low. This is no doubt due to the
small sample size of the items retained (729 NN and 319 N à N compounds). The following
table contains the number of patterns and tokens in which one constituent recurs at least 4 times:
Table 4.3. Number of templates and tokens found in the data.
Compound Type N1 ≥ 4 occurrences N2 ≥ 4 occurrences
N1(-)N2 32 patterns 15 patterns
729 compounds 165 items 83 items
N1 à N2 15 patterns 4 patterns
319 compounds 80 items 23 items
Despite the small number of templates identified, a few facts do surface regarding French
compounds. First, the number of templates based on N1 is at least twice that of those based on
137
N2. We may argue that this is due to French’s preference for left-headed compounds, which
shows that templates typically favour the head constituent. Second, in terms of proportion, the
difference in numbers between the two types of compounds is not actually that great: 165 items
for N1 X templates accounts for approximately 22% of NN compounds, while the 80 items
identified for N1 à X templates accounts for approximately 25% of N à N compounds. It would
seem then, that the distribution of compounds within recurring templates is similar for both
types.
Because the actual number of tokens for a given template is in fact quite low (i.e. the highest
number of compounds for either NN or N à N templates is 11), an analysis using a larger set
was conducted using traditional dictionary entries, namely LPR 2010. Two different templates
were used.
The first example template groups together a set of NN compounds formed around the noun
papier in head position (i.e. papier-N). This also happens to be the most frequent pattern in my
own data with a total of 11 compounds. Arnaud (2003), on the other hand, lists 25 such
compounds. Looking through the entry for papier in LPR2010, we find a total of 24 NN
compounds involving this particular lexical unit (though they don’t all coincide with Arnaud’s
list). These compounds are listed in Table 4.4 on the next page and are grouped together
according to their shared meaning73.
Using these compounds, the average SRI74 of the template papier-N is 0.083, which suggests
that it possesses very low semantic uniformity. The number in fact reflects the wide range of
possible meanings a papier-N compound might have. On the one hand, this number may be
used as a comparative measurement of semantic homogeneity between different templates, and
on the other, it may also serve as an anchor point for defining semantic homogeneity within the
template itself. In other words, a compound that has an above average SRI no doubt represents
73
The paraphrases retained here are based on the definitions provided by LPR2010 and are necessarily specific (e.g. essuyer, emballer, etc.) so as to attribute greater importance to the semantic homogeneity of the retained templates. Compounds are considered semantically similar if they allow for the same basic paraphrase to be used. In the next chapter, many of these compounds are subsumed under identical relations, which would then (slightly) increase the calculated SRI. 74
The average SRI of a template is the sum of each type’s SRI, divided by the number of types for that template.
138
the most likely meaning if interpretation does in fact activate other lexicalized forms. In the case
of papier-N, four meanings in particular stand out (given at the top of Table 4.4)
Table 4.4. List of papier-N compounds taken from LPR2010 under the entry for papier.
papier-N Approximate Meaning Compound’s SRI crépon, cristal, vélin, pelure Papier qui rappelle N 0.167 chine, hollande, japon Papier fabriqué à/en N (de style N) 0.125 carbone, émeri, toile Papier ayant N en tant que partie 0.125 bristol, kraft, buvard Papier de type N 0.125 main, cul Papier pour essuyer N 0.083 bible, journal Papier qui fait partie de N 0.083 filtre, monnaie Papier servant de N 0.083 toilette Papier utilisé de façon quelconque pour N 0.042 aluminium Papier composé de N 0.042 calque Papier pour produire N 0.042 cadeau Papier pour emballer N 0.042 ministre ? 0.042
Average SRI of Template 0.083
Comparing the above template to pompe à N, however, shows what a relatively homogeneous
set of compounds might look like. The LPR2010 lists 17 such constructions under the lemma
pompe, all of which are listed in the following table, and again grouped together based on their
shorthand periphrasis.
Table 4.5. List of pompe à N compounds taken from LPR2010 under the entry for pompe.
pompe à N Approximate Meaning Compound’s SRI
eau, huile, gazole, insuline, morphine, essence pompe destinée à déplacer (pomper) N 0.353
piston, bras, levier, roue, moteur pompe ayant N comme élément constitutif 0.294
injection, chaleur pompe fonctionnant à l’aide de N 0.118
vide, fric pompe utilisé pour produire N 0.118
vélo/incendie pompe utilisé de façon quelconque pour N 0.059
Average SRI of Template 0.246
139
Taken as is, compounds constructed on the pattern pompe à N may possess one of five possible
meanings, with an average SRI of 0.246, much higher than that of papier-N. In this regard, we
may state that pompe à N is more semantically homogeneous than papier-N. Moreover, both
‘pump that pumps N’ and ‘pump with N as a part’ have a higher than average SRI, suggesting
that these meanings are dominant for this particular template. Activation of lexicalized
compounds during processing might favour these particular meanings75. In the case of the most
frequent meaning, I would suggest that this is due to the lexeme pompe’s inherently salient
function, which is a result of its artefactual nature. This will be discussed in greater detail in
chapters 5 and 7.
If we return to the data collected from Wiktionary, results for this particular analysis show a
great deal of variance. Table 4.6 on the following page contains the 13 patterns in which the
left-most constituent occurs at least five times76, ordered according to their average SRI (the
bolded row indicates a right-headed template).
At first glance, NN compounds containing the same leftmost component (i.e. the semantic head)
seem to consist of a number of highly homogeneous templates. While this may be an indication
of a great deal of homogeneity across NN compounds, it is more likely a consequence of the low
number of tokens present in the data, which seems all the more plausible given the low SRIs
calculated above for papier-N and pompe à N, each of which contained a much greater number
of tokens. It is also worth noting that one of the templates in Table 4.6 is in fact right-headed:
video-N is based on the modifier as the core word, with four out of five compounds meaning ‘N
that uses video’ (e.g. vidéo-protection).
75
This statement must, however, be hedged, as other factors are no doubt involved. As discussed earlier, meaning selection for a novel compound based on analogy would still be required to pass a felicity test (i.e. it is unlikely, for instance, that a speaker unfamiliar with pompe à incendie (‘fire pump’) would interpret it as ‘pump to pump fires’ given the semantic incompatibility such an interpretation would produce). 76
Earlier, in Table 4.3, all patterns in which either constituent occurred at least 4 times were retained. In the interest of space, only those compounds in which the core word appears at least 5 times are presented above.
140
Table 4.6. NN compounds with average template SRI based on the left-most constituent.
Template X-N
# of types Average SRI
Most Frequent Meaning Example
poids 5 0.680 poids de N poids coq vidéo* 5 0.680 N qui utilise vidéo vidéo-protection singe 5 0.520 singe qui ressemble à un N singe-chouette wagon 7 0.510 wagon qui a un N en tant que partie wagon-citerne mot 5 0.440 mot qui fonctionne comme un N mot-outil carte 8 0.375 carte qui sert de N carte-cadeau voiture 7 0.347 voiture dans lequel il y a un N voiture-bar radio 7 0.306 radio qui est aussi un N radio-gramophone café 7 0.266 café qui est aussi un N café-bar
poisson 7 0.225 poisson qui ressemble à un N poisson qui a un N en tant que partie
poisson-chat poisson-épée
chou 6 0.222 chou qui resemble à un N chou-fleur bateau 8 0.219 bateau qui sert de N bateau-bus chien 5 0.200 --- --- papier 11 0.124 papier destiné à N papier toilette
As for templates constructed using the right-most constituent (i.e. the modifier), the average SRI
ranges from very high to absolute. Again, the following table contains all templates for which
the second constituent occurs at least five times (the bolded row indicates that the template is
right-headed):
Table 4.7. NN compounds with average template SRI based on the right-most constituent
Template N-X
# of types Average SRI
Most Frequent Meaning Example
alpha 6 1.000 N de type/catégorie alpha particule alpha
lumière 6 1.000 distance parcouru par la lumière en un N
année-lumière
garou 11 0.835 N qui est un garou loup-garou
mère 12 0.680 N qui sert de mère bateau-mère
gamma 5 0.680 N de type/catégorie gamma particule gamma
tampon 5 0.680 N qui sert de tampon mémoire tampon
école* 6 0.611 école où on apprend N auto-école
141
There seem to be two reasons for the high semantic homogeneity observed for modifier based
templates. First, as Baroni et al. (2007) suggest, templates may be influenced by the type of
compound favoured by certain lexemes, which is to say that attributive compounds will most
likely favour the modifier as its pivot. Most of the recurring modifiers in the table above involve
compounds in an attributive (or attributive-like) association (e.g. N-mère, N-alpha, N-tampon,
etc.). If the modifier is meant to ascribe a particular characteristic to the head noun and this
characteristic is in fact a stable component of the modifier, then compounds involving this
modifier are likely to possess the same attributive meaning (e.g. N-mère = ‘N that is like a
mother’). This differs from relational compounds in that the relation that arises may not be a
fixed feature of either constituent (see, for example, the compounds based on papier-N).
Second, many of the compounds occurring within the templates above are in fact related to each
other via analogy, suggesting that analogical word-formation need not favour the head.
Compounds such as N-lumière (discussed earlier) and N-garou are both templates for which a
source compound can be identified (année-lumière and loup-garou respectively). These
instances call back to Becker’s (1993) comments regarding replacive (i.e analogical)
compounds, in which he argues that the elements susceptible to substitution form a paradigm
(e.g. N-lumière, where N is a standard measurement of time). N-école, also mentioned earlier as
a case of analogically formed compound, differs in that it is mostly right-headed (4 of 6 tokens),
which means that the core word is in fact the head constituent. Meaning across these items of
the template is consistent: école où on apprend à faire N (‘school where one learns to do N’).
Although this might suggest that this pattern has an absolute SRI, the truth is that there is
interference from left-headed compounds that fit the same pattern and which obviously do not
share the same meaning:
(70) a. bateau-école, auto-école, moto-école, vélo-école (‘école ou on apprend à faire N1’)
b. navire-école, croiseur-école (‘N1 qui est une école’)
It is my contention that SRIs are to be calculated using all compounds that fit a given template,
regardless of headedness, so as to account for any interference that might occur at the level of
interpretation. If the speaker, when presented with a new compound, attempts to interpret it
using existing compounds, he or she must do so by evaluating all items for a given pattern. The
142
process is most likely executed using the dominant head position first, but may then require that
the other head be considered if either position is possible for a particular template (i.e. N-école).
If we consider N à N compounds, we find only seven templates based on a core word appearing
at least five times. The following table contains all such instances:
Table 4.8. N à N compounds with average template SRI based on the left-most constituent
Template X à N
# of types Average SRI
Most Frequent Meaning Example
boîte 7 0.755 boîte pour contenir N boîte à outils clé 11 0.454 clé qui a N en tant que partie clé à chaîne pierre 6 0.389 pierre destinée à N pierre à briquet pompe 6 0.389 pompe qui pompe N pompe à essence moulin 9 0.308 moulin pour moudre N moulin à poivre
pâte 6 0.278 pâte utilisée pour faire N pâte qui produit N
pâte à papier pâte à pet
tête 5 0.233 tête qui provoque N tête à claques
Although the number of recurring N à N templates is far lower than for those examined earlier,
the few that are present show that this particular type forms, overall, a more semantically
homogeneous group than their NN counterparts. This is not all that surprising given that the
presence of the preposition significantly restricts what relation might link together its nominal
constituents, thus increasing semantic homogeneity within templates (see Chapter 6, Section
6.2.2 for a discussion of N à N compounds following an analysis of their relational semantics).
Examining N à N templates based on the right-most constituent reveals only four cases where
the core word occurs at least four times in the data. These are all listed in the following table:
Table 4.9. N à N compounds with average template SRI based on the right-most constituent.
Template N à X
# of types Average SRI
Most Frequent Meaning Example
vapeur 4 1.000 N qui fonctionne à vapeur bateau à vapeur
vide 6 0.722 N qui fonctionne au vide tube à vide
main 5 0.440 N employé par la main frein à main
feu 8 0.156 N qui utilise le feu N dans lequel il y a du feu
arme à feu chambre à feu
143
Again, templates constructed using the modifier show a high degree of semantic uniformity.
Although the SRI numbers above are no doubt influenced by the small number of types
examined, some of the patterns observed are revealing. Compounds involving N à vide, for
instance, are analogical in nature, 5 of the 6 occurrences being based on tube à vide (→ diode à
vide, triode à vide, etc.) and are therefore all semantically related. In the case of compounds
constructed on the N à vapeur template, all instances denote machines powered by steam, which
is a highly productive relation for N à N compounds (see Chapter 6 for more on this). These
observations, although weakened by the limited number of templates in the data, are significant
in that they correspond to those made earlier for NN compounds sharing the same modifier
based templates.
Overall, many compounds containing the same lexeme do show some degree of semantic
homogeneity, suggesting that compound meaning is in fact constrained based on what
constituents are involved. These results, however, are of limited appeal given how few patterns
are in the data. That said, my initial examination of papier-N and pompe à N using a larger set
of compounds shows that the number of compounds for a given template is most likely much
greater than my own data suggests. Ideally, a template’s SRI would be calculated using as many
existing compounds as possible, which would produce a far more reliable set of indices.
4.3.1.1 How Does the SRI Fit in?
The purpose of calculating the SRI is in fact twofold. First, it provides an additional means with
which to compare and evaluate otherwise identical compounds. Most compounds analysed in
the data retained from Wiktionary are compositional compounds with canonically strong
semantic heads, which would, according to the features outlined in the first two sections of this
chapter, render them equally transparent. While this might in fact prove both accurate and
sufficient, determining the semantic uniformity of a set of compounds allows us to further rank
them within their feature sets. Verifying the viability of this proposal, however, requires that
compounds with differing SRIs be tested with speakers. For the time being, we may use SRI as
a scale for each terminal point on the hierarchy developed in the first two sections of this
chapter, as illustrated in the following partial figure:
144
Figure 4.5. Relationship between compound features and the semantic reliability index.
Once we’ve established how a compound is classified using head and compositionality features,
we may then verify how its meaning relates to those based on the same template. A compound
that shares its meaning with a large number of similar compounds may have a greater chance of
being interpreted correctly than one involving marginal or idiosyncratic relational information.
This may not be all that surprising or counterintuitive if we imagine that a highly recurring
meaning for a given template is most likely motivated by the presence of highly salient features
or properties of the core constituent’s semantic representation (e.g. N-box where box is meant to
contain things). This use of the SRI will be explored in greater detail in Chapter 7 (Section
7.2.2.5).
A second use for the SRI is that it also allows for the evaluation of specific relational
information for a given compound. In the examples in Table 4.5, pompe à N seems to favour
either a purposive (in this case, for pumping N) or a part-whole relation. Conversely, this
particular template does not seem to include, among others, a locative sense, which would mean
that a compound based on this template involving location might have a negative impact on its
semantic transparency. In the next chapter, I will propose a set of basic relations that may be
used for this very purpose.
Strongly Endocentric
Compositional Weakly Compositional
Partially Compositional
1.000
SRI
0.001
1.000
SRI
0.001
1.000
SRI
0.001
145
There are, however, a number of limitations to this approach. First, it is not entirely clear just
how great a role lexicalized compounds play in the interpretation of new compounds involving
the same constituents. While the research seems to suggest that that these stored compounds are
being activated (Van Jaarsveld et al. 1994), it isn’t clear that the semantic homogeneity of these
shared sets will have a positive or negative effect on a given compound’s semantic transparency.
Furthermore, these sets may be far larger than suggested here and may in fact involve related
words, such as synonyms, hypernyms and hyponyms. In other words, a compound with the
lexeme boat as its head may activate compounds involving other related words (ship, cruiser,
vessel, craft, etc.). The template used to calculate a compound’s SRI would then be something
along the lines of WATER BASED VEHICLE + N. Ryder’s (1994) research seems to suggest that
speakers do in fact generalize these patterns (e.g. ANIMAL + LOCATION). If this approach is
correct, however, it may be that greater opacity will occur for compounds whose meaning strays
from a template’s established meaning, as well as for those that otherwise seem to share the
same template, but that differ structurally, and therefore semantically (e.g. bateau-école ~
navire-école, where head position varies within the N-école template; see Table 4.7).
I would reiterate, however, that the semantic reliability index is meant to add just one more
indicator to the typology of semantic transparency, one that might add greater granularity to the
classification. Additional research might in fact show that compounds that differ only by their
SRI ratings are viewed by most speakers as equally transparent or opaque. This is a topic worthy
of future exploration.
4.4 Summary
In this chapter, I proposed three major features with which to evaluate a compound’s degree of
semantic transparency: centricity, compositionality, and semantic homogeneity. For each of
these factors, I argued that a number of key characteristics, such as head position and the
presence of tropes, should be taken into account when comparing compounds to one another.
The resulting hierarchy, while theoretical in nature, reflects a number of facts observed in the
data collected. One factor, however, that has not been discussed here and that might also be
integrated into this framework is frequency, whether it be the lexical frequency of the compound
itself or the frequency of its constituents. One interesting aspect that would merit further
exploration (and formalization) is the effect of relative frequency between the whole and its
146
parts (cf. Hay 2003 for complex words). It is possible that a compound whose lexical frequency
is greater than its constituting elements (i.e. lexicalization) might pose greater interpretational
challenges than one whose elements are more frequent. I have chosen to set aside this particular
factor in order to concentrate on purely semantic features.
At the beginning of this chapter, I cited Baroni et al.’s (2007) hypothesis regarding compound
interpretation, which states that the process involves two steps. The first step requires that the
speaker identify the head constituent. This operation was largely the focus of this chapter. The
second step requires that the speaker establish the nature of the relation that binds a compound’s
elements together. This particular characteristic of compounds has received a great deal of
attention over the last 40 years. In the following chapter, I will attempt to synthesize the
research done in this particular area and propose a set of relations that might be used to further
develop a theory of semantic transparency.
147
Chapter 5
Compound Relations
If the earlier discussion on compounding has shown anything, it’s that many, if not most
primary compounds are semantically abstruse. As I discussed in Chapter 2, compounds are often
said to defy semantic compositionality because they lack any clear indication of just how their
constituents are meant to relate to each other. Much of this ambiguity of meaning arises from an
absence of predication between a compound’s elements, a semantic gap that may be bridged by
a number of different senses, even across similar combinations. For instance, sun and burn in
sunburn are linked by an unexpressed causal relation (i.e. ‘burn caused by the sun’), while in
heartburn, the relation might instead be locative or argumental in nature (i.e. ‘burn located in
the heart’; ‘burn of the heart’). Unfortunately, this disparity of meaning is not a simple quirk of
a select few isolated constructions: Lees (1968) lists eleven compounds with dog as its head
element and every single one differs in meaning (e.g. puppy dog, watch dog, police dog, etc.);
Jackendoff (2010) illustrates the same problem using seven different compounds with cake and
nine with car. This wide range of possible meanings is due to some unexpressed association
between otherwise independent nominal constituents, a relationship that Allen (1978) calls the
Variable R. The value of this variable is governed by what she refers to as the Variable R
Condition, which constrains the range of possible values, while simultaneously blocking
unlikely or even impossible values by establishing compatibility between the semantic features
of the constituents. The fundamentals of this approach have been widely supported elsewhere in
some form or another (Cohen and Murphy 1984, Murphy 1988, Lieber 2004, Benczes 2006,
Baroni et al. 2007).
While such a proposal seems entirely reasonable, it is not, however, without its problems. Most
vexing is the fact that a number of compounds do not easily allow for the implicit meaning to
surface without some degree of manipulation or coercion. In a compound such as sunglasses, for
instance, what is it exactly about sun or glasses that allows for the Variable R Condition to
produce the correct relation (i.e. ‘glasses that protect against the sun’)? Allen admits that these
148
cases abound and that they usually involve some degree of lexicalization, but this doesn’t
explain how the relation might come to be. Moreover, what exactly prevents equally plausible
associations from being generated for a given pair of words? As Allen puts it, a watermill could
just as well be a mill where people drink water and not one that uses it to generate power (cf.
Anscombre 1990 for the French moulin à vent). Her solution involves organising a lexeme’s
semantic features into a hierarchy, which allows dominant properties to determine meaning
when combined with other lexical units. Thus, according to Allen, mill has two dominant
features: “powered by” and “produces or makes something,” which accounts for a number of
compounds headed by that lexeme (94-95):
(71) a. water/wind/hand/steam-mill “powered by” interpretation
b. steel/paper/flour/cotton-mill “production” interpretation
This approach is not only reasonable, but it is also most likely correct given some of the results
from experimental research on compound interpretation (Ryder 1994). It nevertheless gives rise
to two questions. First, if feature dominance is sufficient to prevent incorrect meaning
generation, how does it allow one to distinguish between equally valid interpretations? In other
words, given that at least two possible relations exist for compounds involving mill, why is the
“production” reading not available for windmill? As mentioned earlier, Allen and others are
aware that features must be compatible in order for a particular interpretation to be deemed
felicitous, but it isn’t always clear where such lines must be drawn (such as for windmill). A
second question, and one which will be the focus of this chapter, involves the very nature of
these “dominant features.” More precisely, are they solely internal properties of the lexemes in
question or might there be a number of recurring relations able to account for various
combinations? Consider once again the compounds in (71b) above (i.e. steelmill, paper mill,
etc.). Is “production” truly a dominant feature of either steel or mill? Perhaps, given that a mill is
typically understood as a place where things are manufactured, but what of bee in honey bee, or
house in lighthouse, both of which also involve similar implicit predicates? Could “production”
instead be some sort of recurrent relation for compounds, one of many possible fundamental
values for the so-called Variable R? This is in fact what a number of researchers believe and
have attempted to formalize using a set of basic or fundamental relational concepts that could
account for most (if not all) compounds. It should come as no surprise, however, that such an
approach is not without some controversy.
149
In its simplest form, a theory of compound relations (as they will henceforth be called) involves
two fundamental positions: either the number of relations that may emerge between a
compound’s elements is limited to just a few or these relations are, to some extent, unlimited.
Although Downing (1977) proposes her own set of relationships, she argues that no definitive
list can be compiled given that contextual circumstances will allow for a nearly infinite number
of possible interpretations, thus rendering compounding a matter of pragmatics. In her now
classic example, an apple juice seat would be difficult to interpret out of context, but if we were
to imagine a table with place settings, one of which has a glass of apple juice, it’s meaning
suddenly becomes rather trivial77. Others have also argued that no basic list could possibly
account for all compounds (Selkirk 1982, Lieber 1992, Wisniewski 1997) and those who do
promote such an approach often admit that their list of relations is not meant to be exhaustive,
but instead representative of most compounds under study (Jespersen 1956, Adams 1973,
Jackendoff 2010). Some have even gone so far as to propose a set of relations that account for
all possible nominal combinations, usually ignoring those compounds that defy compositionality
(Hatcher 1960, Warren 1978, Arnaud 2003). As we will see, while many researchers seem to be
at odds with one another when it comes to the analysis of compounds, often criticizing each
other’s works at length, there is nevertheless substantial overlap between their various
frameworks.
The focus of this chapter will therefore be on compound relations as a limited set of basic
predicates. The central motivation for this approach is that it allows for an account of
compounds like windmill and heartburn whose meanings are otherwise difficult to ascertain
using only the intrinsic semantic properties of their constituents. Although semantic properties
no doubt remain an important factor in the process of sense disambiguation, the fact that a
number of relatively basic relational concepts recur with some degree of frequency suggests that
speakers are more than likely using this information when interpreting compounds. In fact, there
is ample evidence to support that speakers make use of relational information during compound
processing (among others, Wisniewski and Love 1998, Gagné 2002, Estes and Jones 2006,
77
This observation, however, is also valid for nearly all aspects of language, as research in semantics has shown that context may allow for even simplex forms to acquire new and novel meanings through a variety of means (Nunberg 1979, 1995, Copestake and Briscoe 1995).
150
Gagné and Spalding 2009). It therefore seems justified to integrate this component into a theory
of semantic transparency: when confronted with unfamiliar compounds, the speaker may
attempt to establish its meaning by “testing” known basic relations (cf. Ryder 1994). Like most
researchers, however, I do not believe that all compounds can be accounted for using only a
closed set of relational associations. Rather, some compounds may involve highly specific and
idiosyncratic relations, while a significant number of them make use of only a few recurring
associations. This position, if true, is of consequence for the current work because it further
strengthens my earlier position regarding “irregular” compounds (see Chapter 4, Section 4.3)
and the additional costs they might incur during interpretation. Furthermore, results from
experiments conducted by Gagné and Shoben (1997) suggest that the frequency of a relation for
a particular constituent (in their case, the modifier) has some effect on the ease with which a
speaker will be able to interpret a given compound. Factoring in relational frequency data is
therefore crucial, given that not all relations are equally pertinent for compounds. For instance,
in Girju et al. (2005), Part-Whole accounts for nearly 17% of their 4,500 English compounds,
while Location only accounts for half that number. Furthermore, research on novel compound
interpretation has shown that, plausibility constraints notwithstanding, speakers tend to favour
similar and recurrent relational paraphrases in meaning composition (Downing 1977, Ryder
1994, Wisniewski 1996). Such data for existing French compounds would not only be useful for
testing frequency effects of compound interpretation, but also offer an additional metric by
which to measure semantic transparency.
This chapter consists of two major sections. First, I will discuss some of the work previously
conducted on compounds, focusing on the approaches and relations others have adopted to
provide a semantic account of NN compounds. Based on this research, I have retained the most
salient and frequently mentioned compound relations in the literature for my own research on
semantic transparency. These relations will be the focal point of the second half of this chapter.
The results of their application for both French NN and N à N compounds will be discussed in
greater detail in Chapter 6.
5.1 Studies on the Semantics of Compounds
Although there has certainly been no shortage of research done on the relational properties of
compounds, only a few authors go into great detail on just what their relations are meant to
151
represent or how they apply to compounds. In fact, it is largely the earlier work that seems most
concerned with a reliance on data to support the research. More recent studies have been far
more applied in nature and have concentrated on cognitive and computational approaches. I will
first look at some of the earlier research, before moving on to more recent proposals.
It is worth noting that the objective of the following sections is to show how the topic of
compound relations has been broached and what formalisms have emerged from this work. It is
not my goal to openly critique the approaches detailed, but to compare them so as to establish a
set of relations with which to explore the French data on hand. That said, some degree of
criticism is unavoidable as it remains important to identify the strengths and weaknesses of the
various approaches.
One further note: as most of the work done on this subject has been on English, the examination
that follows will rely heavily on examples from that language. The French data compiled from
Wiktionary will be used during the presentation of the relations retained in Section 5.2.2.
5.1.1 Early studies
Although he may not have been the first to do so, Jespersen’s (1942) early and rather cursory
work on compounds seldom goes unmentioned in the work that has followed. He identified six
types of substantive compounds, the first of which he called final determinative (i.e. right-
headed endocentric compounds). Although he states that “the number of possible logical
relations between the two elements is endless” (143), he nevertheless describes a number of
relational classes involving concepts such as Time (nightmare), Location (headache), Means
(handwriting), and Purpose (beehive). Hatcher (1960) believed Jespersen’s work to be flawed
and laden with innumerable inconsistencies and subsequently offered a harsh and biting critique
of his formalism. Where Jespersen saw idiosyncrasies and caprices of language, Hatcher saw an
opportunity for a higher level of abstraction. She reworked his classification and reduced it to
just four basic relations:
(72) a. Ⓐ = A is contained in B (e.g. seed orange) b. Ⓑ = B is contained in A (e.g. orange seed)
c. A → B = A is the source of B (e.g. cane sugar) d. A ← B = B is the source of A (e.g. sugar cane)
152
Hatcher maintains that her classification system is able to account for even the most
uncooperative of compounds, including those Jespersen claimed defied classification. She does,
however, admit that this is largely due to the very loose nature of her basic relations and that
further subdivision might prove useful. In fact, Hatcher, realizing that her four main classes
failed to capture a number of relevant distinctions between compounds, supplemented her
categories with seven semantic classes (e.g. A is an animal and B is a person: cowboy), which
resulted in a system with 49 possible combinations. Given that this additional layer of
abstraction is not relational in nature (i.e. a constituent remains an animal, no matter what B is),
it does little to distinguish between very different compounds that otherwise share the same
space in her system: for example, both wheelchair and toolbox are object-object compounds of
class Ⓐ (as in 72a), yet are sufficiently dissimilar so as to merit different treatments. Despite the
appeal of such a succinct approach, four relations seems too small a number to be truly useful in
teasing out the many nuances that compounds exhibit.
That very year, Marchand (1960)78 offered his own take on compounds in his seminal work on
English word-formation. His approach, however, was more descriptive than it was explanatory.
According to Marchand, predication for non-verbal nexus compounds (i.e. root compounds) is
necessarily restricted. He distinguishes between copula compounds, where BE is the underlying
verbal unit (e.g. girlfriend), and rectional compounds, which require a full verb for expansion
(e.g. steamboat → ‘boat that uses steam’). For copula compounds, he identified four types:
subsumptive (oak tree), attributive (girlfriend), dvandva (fighter-bomber), and adjectival
(blackbird). Of the so-called rectional compounds, Marchand distinguishes between two types:
the type steamboat and the type policeman. For each of these types, he offers a number of
possible paraphrases (e.g. ‘B consisting, made up of A’), but these are given on a case by case
basis and play little role in the classification of these compounds. What is more important, in
Marchand’s opinion, is just how the components relate to each other syntactically. He therefore
discusses subject types on the one hand (i.e. the head is in subject position; silk worm = ‘worm
produces silk’), and object types on the other (i.e. the head is in object position; steamboat =
78
This discussion is based on the second edition of Marchand’s work published in 1969.
153
‘steam powers the boat’). Although relational concepts are present in his work, they are not the
primary focus of his study.
Published the same year as Marchand’s opus on English word formation, Lees (1960) offered
his own analysis of compounding, one that was steeped in the transformational grammars of the
time. His work shares a number of features with Marchand’s in that Lees also draws parallels
between compounds and sentences. He therefore sought to show that compounds were in fact a
surface form of a “kernel” sentence whose predicate was deleted at some stage of the derivation.
The example for oil well in (73), taken from the fifth edition of Lees (1968), illustrates the
process (144):
(73) i. The well yields oil. à (GT19: relative clause)
ii. ... well which yields oil... à (T57: nominal modifier)
iii. ... well yielding oil... à (preposed modifier)
iv. ... oil-yielding well... à (ellipsis)
v. ... oil well...
The important point to retain is that compounds are purported to begin their lives as complete
sentences (i.e. The well yields oil.), and, with the help of a number of transformational rules,
lose most of their phrase structure along the way. Lees’s transformations are based on the
underlying syntactic relations held between the compound’s elements, as well as a number of
types and subtypes. Lees identified eight major classes of nominal compounds, each of which
contains multiple subclasses:
Table 5.1. Lees’s (1960) grammatical relations of nominal compounds.
Grammatical Relations Examples Subject-Predicate girlfriend, fighter plane, madman, redskin Subject-Middle Object doctor’s office, arrowhead, rattlesnake Subject-Verb talking machine, payload, population growth, etc. Subject-Object steamboat, car thief, water spot, etc. Verb-Object setscrew, pickpocket, eating apple, etc. Subject-Prepositional Object gunpowder, garden party, eggplant, etc. Verb-Prepositional Object grindstone, washing machine, boiling point, etc. Object-Prepositional Object bull ring, station wagon, wood alcohol, etc.
154
While these classes do not seem to single out any of the semantic characteristics of compounds,
Lees does offer a limited set of possible paraphrases for many of his types. For instance, the
Subject-Object class, in which the head is the subject of some unexpressed verb and the
modifier its object (e.g. car thief = a thief steals cars), includes compounds that may be
paraphrased using one of two prepositions:
(74) a. from battle fatigue, beet sugar, fingerprint
b. for candy factory, grocery store, textile mill
These prepositions, while perhaps broad in meaning, offer some glimmer of a semantic
approach to compounds. The from preposition might be said to mean Source or Cause, while for
is most likely purposive in nature. Other prepositions used by Lees are of, by, with, and like.
Also included in some of his subtypes are basic verbs such as have and be.
As exhaustive and meticulous as Lees was in his work, his approach was subject to a great deal
of criticism. The main point of contention stemmed from the ad-hoc nature of his kernel
sentences. As Bauer (1978) points out, “a compound appears to be a surface neutralization of a
number of different logical/semantic/underlying representations” (81). Compounds are thus
inherently ambiguous, which means there cannot be just one kernel sentence for a given
compound. Lees was of course aware of this, stating that “most compounds can be derived, each
one, in a number of different ways, and thus each may have many different ways of being
understood” (1968: 122). This is in fact a widely held opinion regarding compounds, but it does
little to resolve the difficulties inherent to his transformational approach, which may be summed
up with the following question: if the predicate of a compound is deleted during a
transformation, where does this deleted information come from in the first place? If the speaker
only has access to the surface form, yet must derive it from some underlying sentence, how does
he or she choose “powered by” for windmill, but “produces” for paper mill? Similar criticism
was raised elsewhere (among others, Marchand 1965; Scalise 1984). Most of the work on
compounds that followed distanced itself from a purely syntactic account of compounding and
instead sought to introduce a more robust semantic component into their approaches.
Although Adams (1973) retains many of the syntactic relations found in Lees (1960), so as to
account for verbal nexus compounds, she classifies root compounds using a small number of
highly recurrent semantic associations. Adams (1973) introduced 11 such groups, each with its
155
own set of sub-types. Her classification system ultimately contains over 70 distinct compound
types. The following table retains only those types that are paraphrased using unexpressed
content and thus omits five classes (Subject-Verb, Verb-Object, Adjective-Noun, Names, and
Other). Nor do they exhaust all of the possibilities set forth by Adams:
Table 5.2. Adams’s (1973) compound classes.
Major Classes Number of Sub-Classes
Possible Paraphrases
Appositional 4 functions as, is an instance of, is more specific than
Associative 7 is part of, belongs to, is produced from
Instrumental 13 prevents, preserves, causes, obtained through
Locative 8 place where, place to or from, time when
Resemblance 8 is in the form of, has features of, reminds one of
Composition/Form/ Contents 6 consists of, made from, in the form of, contains
Even with this reduced set of retained paraphrases, one will notice that meaning can differ
significantly within a given class. Appositional, for instance, includes compounds like fuel oil
(i.e. oil that serves as fuel), as well as compounds like codfish (i.e. fish of which cod is a
particular instance). Because the Appositional class is meant to capture copulative compounds,
the inclusion of forms such as fuel oil is not unsound (i.e. fuel oil is oil that is fuel), but one
might argue that this particular grouping is perhaps too broad. After all, fuel oil is not fuel like a
codfish is a cod. In other cases, however, the decision to include wildly different relations seems
entirely justified. For instance, compounds that make use of an instrumental relation could in
fact refer to a number of different activities: we use things to build, remove, clean, fasten, etc.
By the same token, however, it seems strange, to then distinguish between the contents and the
locative classes, given that when one says that “X contains Y,” one is actually saying “X is
located in Y.” Adams does in fact provide an explanation for this particular distinction, stating
that the Composition/Form/Contents class is meant to represent “compounds in which one
element specifies the other in terms of some concrete feature” (81), while the locative class
deals strictly with nouns denoting a time or place. But other unanswered questions do arise from
her classification system. For example, is grouping together Composition, Form, and Contents
indeed warranted? What distinguishes a bell jar (RESEMBLANCE) from a bow tie
156
(COMPOSITION/FORM/CONTENTS), for instance? More importantly, why is “B in the form of A”
included in two different classes? Unfortunately, Adams does not provide an explanation for
these conflicting analyses. Nevertheless, her research remains among some of the most
comprehensive work done on compounds and provides a number of possible relational concepts
that might prove fundamental.
Like Jespersen (1942), Downing (1977) believes that the number of relations is not finite: “The
existence of numerous novel compounds [. . .] guarantees the futility of any attempt to
enumerate an absolute and finite class of compounding relationships” (828). She nevertheless
concedes that the majority of the novel compounds included in her study seem to involve a
limited set of basic semantic categories, which, combined with the fact that most previous
studies tended to invoke similar relations, prompted her to offer what she says should be the
minimum number of relationships needed to account for most compounds (828):
Table 5.3. Downing’s (1977) minimal compound relationships.
Relationship Example
Whole-Part duck foot
Half-Half giraffe-cow
Part-Whole pendulum clock
Composition stone furniture
Comparison pumpkin bus
Time summer dust
Place Eastern Oregon meal
Source vulture shit
Product honey glands
User flea wheelbarrow
Purpose hedge hatchet
Occupation coffee man
Downing’s list offers a good look at what the reduction of nominal compounds to a set of
primary relations would look like. One will no doubt notice some similarities with Adams’s
(1973) own system, though not quite with the same level of granularity. Downing does not,
157
however, elaborate on just how these relations might apply beyond her own work on novel
compounds.
Warren’s (1978) classification of English compounds bears a striking resemblance to Adams’s
work, but with a far more elaborate structure. Although her approach consists of only five major
classes (i.e. Constitute and Resemblance, Belonging to, Location, Purpose and Activity-Actor,
and Proper-Name Combinations), each of these classes contains a number of sub-classes, which
are in turn composed of their own set of sub-classes. To further complicate matters, some of
these sub-classes are subject to an additional level of sub-division, which results in a final count
of 60 different compound types. Needless to say, her system is deeply hierarchic and proves to
be just as, if not more fine-grained than Adams’s. That said, Warren does offer a summary of
her relational classes79, which closely resembles the one proposed by Downing (1977):
Table 5.4. Warren’s (1978) semantic classes.
Semantic Classes Example
Source-Result student group
Copula girl friend
Resemblance clubfoot
Whole-Part spoon handle
Part-Whole armchair
Size-Whole 3-day affair
Goal-OBJ moon rocket
Place-OBJ sea port
Time-OBJ Sunday paper
Origin-OBJ hay fever
Purpose ball bat
Activity-Actor cowboy
Warren’s work is based on a corpus of just over 4,500 root Noun-Noun compounds (no verbal
nexus constructions). According to her analysis, nearly 4,000 compounds are accounted for
79
While Warren’s relations are primarily semantic in nature, some of them nevertheless possess syntactic features (i.e. N2’s status as object).
158
using the classes in Table 5.4, with a further 519 falling under the proper names category. All in
all, Warren claims that only 33 compounds do not fit under her classification scheme, either
because they possess idiosyncratic meanings (e.g. stage coach = ‘coach that goes in stages’) or
because they defy analysis (e.g. bobby pin). At first glance, that such a low number of “misfits”
(as Warren puts it) should appear in her data seems odd, considering how many such
compounds have been said to exist (Jespersen 1942, Marchand 1960). It may very well be that
Warren was simply very liberal in her interpretation of certain compounds. If we look at those
listed under Adams’s (1973) OTHER class, we discover that Warren was indeed able to classify
some of them: cradle song, for instance, can be found under Place-OBJ. In fact, a closer look at
her data reveals that Warren classified a number of compounds usually treated as unanalyzable
elsewhere, such as honeymoon (OBJ-Time) and butterfly (Activity-Actor). She does, however,
usually explain why certain obscure or non-compositional compounds are included under a
particular class, which may or may not satisfy the reader. If Warren’s classification scheme is
indeed representative of most of the compounds included in her data, then what we have is
quantitative confirmation of what Downing (1977) had suggested were essential relations at play
in compounding. One may wonder, however, whether Warren’s abridged set of classes does in
fact suffice or if the numerous layers of sub-types are key to her—or any, for that matter—
treatment of compounds.
Levi (1974) had already sought to reduce the complexities proffered by some of these
approaches and argued that much of the granularity observed in previous works was in fact
unnecessary. In her (1978) work on complex nominals, she sought to reintroduce the
transformational approach advocated for by Lees (1960) and hoped to address some of the
criticism that had been leveled at his work, namely that there was little way to know what
predicates had been omitted during the derivation. Her solution was to introduce what she called
Recoverably Deletable Predicates (RDP), a small set of basic relations that, because there are so
few of them, are recoverable by the speaker at the surface level. She proposes 9 such RDPs
(three of which are reversible, allowing for a total of 12 types) and argues that they are largely
159
sufficient to account for the majority of what she calls non-predicating compounds80. Her RDPs
consist of both basic verbs and prepositions:
Table 5.5. Levi’s (1978) Recoverably Deletable Predicates.
RDP Examples
Cause1 tear gas, disease germ, mortal blow
Cause2 drug deaths, birth pains, viral infection
Have1 picture book, apple cake, gunboat
Have2 government land, lemon peel, student power
Make1 honeybee, silkworm, musical clock
Make2 daisy chains, snowball, consonantal patterns
Use voice vote, steam iron, manual labor
Be soldier ant, target structure, professorial friends
In field mouse, morning prayers, marine life
For horse doctor, arms budget, avian sanctuary
From olive oil, test-tube baby, apple seed
About tax law, price war, abortion vote
Levi’s relations are as close to primitives as such an approach may allow. Her highly reductive
approach has the advantage of being sufficiently underspecified so as to capture a great many
types of compounds. Thus, Have may be used for constructions involving either possessive or
partitive relations (similar to Warren 1978). Of course, this also means that some compounds
that differ greatly in meaning are grouped together under the same predicate81. For instance,
Make2, under certain circumstances, conflates production (e.g. beeswax = made by) and
composition (e.g. snowball = made of). Perhaps more problematic is just how general her Be
80
Levi (1978) distinguishes between compounds such as atom bomb and atomic bomb via copular periphrasis (bomb that is atomic ~ *bomb that is atom). 81
Levi is of course aware of this fact and therefore discusses at length the matter of overlapping RDPs. The RDPs Cause and Make are especially difficult to differentiate. The reader is encouraged to consult Chapter 4 of Levi (1978) for additional insight into her classification.
160
RDP is: not only does it group together compounds based on genus-species, coordination, and
resemblance, it also overlaps with her Make2 RDP (i.e snowball = ‘a ball that is snow’ or ‘a ball
made of snow’). These may or may not be real problems, depending on one’s opinion on the
matter, but they are nevertheless easily addressed with the addition of perhaps a few more
dividing lines. It should come as no surprise, then, that Levi’s work has been at the heart of a
great deal of research on the semantics of compounding and has since served as the basis for a
number of similar formalisms.
5.1.2 Recent Developments in Compound Relations
In recent years, largely due to renewed interest in compounds in both natural language
processing (NLP) and psycholinguistics, the notion of basic relations has resurfaced and has
been the focus of a great deal of research. Because recent work on the topic is predominantly
based on earlier research on compound relations, many authors simply list the relational
concepts without going into much detail regarding their characteristics or the data they are
meant to represent. Readers are instead invited to consult previous formalisms. Thus, while this
section offers an overview of these more recent approaches, it occasionally does so with only a
minimum of clarification as it is not always possible to elaborate on some of the finer points of
their models.
Leonard’s (1984) early work on automatic compound interpretation includes eight major types
of compounds, many of which are further subdivided according to the more restricted meaning
of certain combinations. Although not directly related to Warren’s (1978) classification system,
Leonard’s typology is in fact quite similar to it: her classes include, among others, Locative,
Annex, Equative, and Material. According to Leonard, her software, with the help of a robust
dictionary application, is able to generate the correct interpretation for her data (consisting of
roughly 2,000 compounds from works of fiction between 1719 and 1968) roughly 76% of time.
What is perhaps most interesting about Leonard’s work is that her approach allowed for her
relational associations to be expanded upon. In other words, once a compound’s semantic type
has been identified (i.e. sponge-bag: Locative), the program is able to transform it into a natural
sentence (i.e. “A bag for or containing a sponge or sponges”). This is, in effect, the opposite of
what Lees (1960) was arguing for.
161
In contrast, Lauer’s (1995) own work on compound interpretation was directly related to
Warren’s (1978) classification system, but he relied on an understated aspect of her work.
Warren had in fact offered a number of prepositions as possible labels for her classes, which
Lauer used as the basis for his approach:
Table 5.6. Lauer’s (1995) preposition based treatment of compounds.
Preposition Example
Of state laws means laws of the state
For a baby chair means a chair for babies
In morning prayers means prayers in the morning
At airport food means food at the airport
On Sunday television means television on Sunday
From reactor waste means waste from a reactor
With gun men means men with guns
About war story means story about war
These connectors are supplemented with a few additional relations such as BE-copula, as well
as various labels for verbal-nexus compounds. Like Leonard, the purpose of Lauer’s work is to
provide a system for the automatic interpretation of compounds. His system automatically
determines meaning based on a probabilistic model that takes into account the affinity of
conceptual groupings (i.e. how likely two words are to be paired together), which is highly
facilitated by the limited number of possible outcomes in his system (i.e. eight basic
prepositions).
One will have no doubt considered that prepositions may not be the best candidates given their
highly polysemous (or ambiguous) nature. This is evidenced by Lauer’s use of the preposition of
with compounds that differ significantly from one another:
(75) a. jute products = ‘products made of jute’
b. health problems = ‘problems related to one’s health’
c. family business = ‘business run/owned by a family’
d. cupboard doors = ‘door that is part of a cupboard’
162
Using of to paraphrase such compounds (e.g. products of jute) is of course semantically
acceptable. Just like many PP adjuncts involving prepositions, however, this approach hardly
offers the most accurate means of distinguishing between groups of nominal pairs—after all,
jute is not to product as health is to problem. Nor does this method truly eliminate compound
ambiguity as the preposition is unable to prevent other, equally plausible meanings from
emerging at the surface level (cf. Levi 1978).
Vanderwende (1994), also working in NLP, approached the problem from a different
perspective and reformulated the more conventional relations used elsewhere as wh-questions,
which she argues allows for greater ease in judging noun sequence classification. Her thirteen
classes are reproduced in the following table:
Table 5.7. Vanderwende’s (1994) classification schema of noun sequences.
Relation Conventional name Example
Who/what? Subject press report
Whom/what? Object accident report
Where? Locative field mouse
When? Time night attack
Whose? Possessive family estate
What is it a part of? Whole-Part duck foot
What are its parts? Part-Whole daisy chain
What kind of? Equative flounder fish
How? Instrument paraffin cooker
What for? Purpose bird sanctuary
Made of what? Material alligator shoe
What does it cause? Causes disease germ
What causes it? Caused-by drug death
While the results of her tests are promising (her algorithm has an accuracy rate of roughly 78%),
she admits that other categories, such as topic (i.e. What about?), might be necessary given
some of the more general interpretations she encountered (e.g. history conference =
Whom/what?). Nothing prohibits expansion of this particular approach either: one can rather
easily add to the basic list of wh-questions (e.g. Who/what uses it? Made from what? etc.). The
163
fact that each wh-question can be associated to a conventional label, however, shows that the
fundamentals of her approach are very similar to those of previous models. The interrogatives
she proposes are in a fact a method for determining whether compounds are classified
correctly—they may therefore also prove useful when assigning relations to existing
compounds.
More recent work on compounds in NLP share much in common with this earlier work, though
the number of relations does differ greatly. Rosario and Hearst (2001) propose a total of 38
relations, 18 of which are considered dominant because of their frequency in the compounds
contained in their corpus. While the relations they use are loosely based on Warren’s (1978)
work, they are applied to compounds found in the highly specialized language of medical texts,
which they claim requires an expanded number of labels. Some examples of Rosario and
Hearst’s relations are Activity/Physical process (bile delivery, virus reproduction), Cause (1-2)
(AIDS death, automobile accident), Measure of (relief rate, asthma mortality), and Purpose
(headache drugs, HIV medication). Similarly, Girju et al. 2005, following their earlier work in
NLP involving compounds (Girju et al. 2003, Moldovan et al. 2004) propose 35 relations, a
number they argue is both necessary to account for most combinations and sufficiently limited
so as remain manageable. Most of these classes are also found in Adams (1973) and Warren
(1978). Girju et al. (2007) later reduced this number of relations to just seven general
associations. Even more recently, Séaghdha (2008) returned to the pared down approach
adopted by Lauer (1995), but instead based his compound relations on those proposed by Levi
(1978). Five of these relations were taken as is or amalgamated into new, more general labels:
Be, Have, In, Actor, Instrument, and About. Séaghdha’s algorithm, through a variety of means,
was able to correctly assign meaning for his dataset of 1,400 compounds approximately 70% of
time.
From a more cognitive perspective, Shoben’s (1991) discussion of conceptual combinations has
influenced a great deal of subsequent research on compound interpretation (Gagné and Shoben
1997, Gagné 2001, Spalding and Gagné 2007). Shoben’s 14 relations are, once again, very
similar to those used elsewhere: they include such primitive relations as Cause, Has, Make,
Uses, Located, etc. Although Shoben uses these relations for combinations that involve what he
calls non-predicating adjectives, it is clear from the examples he provides that they are in fact
what have been traditionally treated as NN compounds (e.g. tax law, oil money, finger toy, etc.).
164
The research that stemmed from Shoben’s discussion of relational concepts has shown that
speakers are in fact keenly aware of the types of associations at play in compounding. For
instance, Gagné and Shoben (2002) found that speakers understood compounds more easily
when they were preceded by pairs that involved similar relations. Such research is arguably
reliant on the understanding that compound’s do in fact involve particular relational concepts
and that they can be reduced to some fundamental set of recurring associations.
Jackendoff’s (2009) work on compounds, expanded and further refined in Jackendoff (2010),
stands as the most recent attempt at reducing compound semantics to a set of fundamental
relations. His approach involves 14 basic functions, most of which can be found elsewhere in
the literature:
Table 5.8. Jackendoff’s (2010) 14 Basic Functions.
Basic Function N2 = Subject N1 = Subject
Classify beta cell ---
Argument wardrobe color chewing gum
Be boy king ---
Same/Similar piggy bank ---
Kind bear cub puppy dog
Be-Loc sunspot water bed
Comp(osition) rubber band sheet metal
Made apple juice sugar beet
Part backbone wheelchair
Cause sunburn ---
Make moonbeam honeybee
Be-Function handlebar ---
Have career girl gangster money
Protect lifeboat mothball
Jackendoff does introduce a few inovations, however. First, he formally acknowledges what
others have usually only implied regarding the possible bidirectionality of compound relations.
In other words, instead of specifying both a Part-Whole and a Whole-Part relation in order to
account for compounds in which “N1 is part of N2” and “N2 is a part of N1” respectively (cf.
165
Downing 1977 and Warren 1978), Jackendoff instead introduces what he calls Reversibility.
Thus, as shown in Table 5.8, a number of his functions may be inverted, allowing for a more
complete account of the data without resorting to an increased number of relations. Second,
Jackendoff formalizes the application of his functions using his Lexical Conceptual Structures
(LCS: Jackendoff 1992), which, on the one hand, further reduces ambiguity by explicitely
stating how argument slots are filled and, on the other, allows him to combine relations in cases
where a compound might require a more complex representation. Finally, Jackendoff also
includes a component he calls proper function, which he uses to account for those compounds
whose meaning would otherwise fall outside his set of relations. Proper Function, as it is
presented, is conceptually similar to Pustejovsky’s (1995) telic quale and is loosely related to
what other authors have often called Purpose. Both Reversitiblity and Proper Function will be
discussed in greater detail in Sections 5.2 and 5.2.2.14 respectively.
Evidently, the major focus on compound relations has largely been on English. Arnaud (2003)
is, to my knowledge, one of the only authors to have applied a similar approach to French
compounds82. His work on French NN compounds led him to identify 54 “low level” relations
based on an inventory of 810 compounds, only 96 of which are also included in my data83.
These relations are said to exhibit a low level of abstraction and are thus meant to be as granular
as possible, similarly to Adams (1973) or Warren (1978). In fact, 19 of his relations have fewer
than 5 compounds each (two relations only have 1 compound each). Of course, this is not to say
that these relations are without merit, but they unfortunately offer little in the way of
generalisations regarding the semantics of French compounds. Recognizing this, Arnaud
grouped together his low-level relations into eight higher order ones that offered a much greater
degree of abstraction, which he dubbed “high-level relations:”
82
Barbaud (1971), working on French compounds, offers 4 “fundamental” relations (i.e. Attribute, Metaphorical, Complementarity, Coordination), but his work remains largely syntactic in nature. 83
Arnaud’s data was constituted using a variety of sources, including dictionaries such as Le Petit Robert and Le Larousse, as well as compounds encountered haphazardly or through Google searches (see Arnaud 2003: 95). He was therefore far more selective in his manner of gathering data than I was—for instance, he completely excludes coordinated (or dvandva) compounds from his corpus.
166
Table 5.9. Arnaud’s (2003) high-level relations for French NN compounds.
Relation Gloss ((N1) N2) N1 is included in N2 (N1 (N2)) N2 is included in N1 N1 → N2 N1 toward N2 N1 ← N2 N2 toward N1 ÊTRE Predication of a quality (metaphorical) ANALOG Resemblance N1 SYMB N2 N1 symbolizes N2 N2 SYMB N1 N2 symbolizes N1
Arnaud’s approach is partly based on Hatcher’s (1960) highly abstract treatment of compounds.
To her four basic relations, Arnaud adds four more meant to account for compounds that exhibit
what he considers to be less literal associations (e.g. franc-or, kirch fantaisie, style nouille), but
which only make up a little more than 10% of his data. The majority of his compounds are in
fact covered by the four primary relations and encompass most of the basic ones mentioned
elsewhere. For instance, the inclusion relations (i.e. ((N1) N2) and (N1 (N2))) consist of
compounds based on Part-Whole, Location, and Composition associations, while the directional
relations (i.e. N1 → N2 and N1 ← N2) involve those based on Destination, Source, Purpose,
and Production.
While Arnaud’s work remains in-line with previous work on compounds, his set of relations are
arguably too extreme in their approaches. On the one hand, his low-level relations are far too
granular to be useful at a global or universal level. On the other hand, his high-level relations,
although they offer interesting generalizations regarding the interaction between the elements of
compounds, remain insufficiently explicit to provide any sort of method for disambiguation. In
other words, many compounds with different underlying relationships are grouped together in
less than meaningful ways; thus, both reliure cuir and poisson-scie are compounds for which
“N2 is included in N1,” but this says little about what this inclusion entails. Nevertheless,
Arnaud’s analysis reveals many useful facts about French NN compounds, one of which
pertains to the prominence of certain relations based on his data. Although the number of low-
level relations he uses is quite large, thus making it difficult to determine to what degree some
of them overlap, a few associations stand out: Location, Part-Whole, and Purpose are by and
large the most prominent relations he identified, together accounting for at least 35% of his
167
data84. It is clear, then, that a large number of compounds involve a small number of associative
concepts.
5.1.3 Summary
In the previous sections, I have sought to show two things. First, that the work on compound
relations is well-established and offers a rich and diverse breadth of research with which to
work, and second, that while there is much disagreement between works, there is also
considerable agreement. In fact, from a purely qualitative perspective, there is perhaps more
agreement that there is disagreement. The following table summarizes the number of relations
found in each of the works cited above:
Table 5.10. Summary of the number of semantic relations present in the literature.
Author(s) Number of Major Semantic Relations85
Number of All Relations Including Sub-Types
Jespersen (1942) 11 - Hatcher (1960) 4 4986 Adams (1973) 11 70 Levi (1978) 9 22 Downing (1977) 12 - Warren (1978) 14 59 Shoben (1991) 14 - Vanderwende (1994) 13 - Lauer (1995) 11 - Rosario et Hearst (2001) 38 - Arnaud (2003) 8 54 Moldovan et al. (2004) 35 - Girju et al. (2005) 23 - Girju et al. (2007) 7 - Séaghdha (2008) 6 - Jackendoff (2010) 14 22
As Table 5.10 shows, the number of basic relations ranges anywhere from 4 to 38, with an
average of 14. When all sub-types are included, the number of distinct compound types balloons
84
This number is highly conservative, however, as Arnaud often labels compounds using multiple relations, arguing that they are in fact multi-faceted. He identifies these compounds as “complex” and says that they owe much of their complexity to the fact that some compounds can be interpreted in a number of different, yet equally valid ways. I will explore this concept in greater detail in Section 5.2.1. 85
Because Lees’s (1960) analysis of compounds is primarily syntactic, his work has been omitted from this table. 86
See Section 5.1 for a summary of Hatcher’s expanded classification system.
168
to as many as 70. Despite the considerable range of relations proposed, there is nevertheless
significant overlap between them. In order to determine which relations would be retained for
the present work, every set of relations was compared against each other and parallels were
drawn wherever possible. For instance, whereas Downing (1977) would use Product for
compounds such as honey bee, Levi (1978) would use Make: I therefore consider these two
labels as both referring to the same relational concept (i.e. production). Appendix B contains a
table showing just what this comparison looks like. In the following section, I will discuss the
results of this research.
5.2 Retained Semantic Relations
Given the emphasis on the semantics of compounds, the relational concepts retained are largely
characterized by their semantic content and not their syntactic function. No matter how much
care or effort is put into establishing a set of basic relations, it is unlikely that it will suit
everyone’s needs or that it will apply to every type of compound examined. Considering the
sheer number of different approaches adopted over the years, Bauer’s (1983) words regarding
compound classification remain true today: “[a]ny method of subclassification is bound to be
controversial, and none can hope to win unqualified support” (202). One must therefore focus
on remaining as coherent as possible with not only one’s choice of relations, but also their
application to the data.
One of the reasons systems such as those proposed by Adams (1973), Warren (1978), or Arnaud
(2003) could proliferate to as many as 70 distinct types is that they were meant to account for
every compound included in their respective corpora. Of those researchers who offered far
fewer relations, one of two methodologies was adopted: either they applied their relations in the
most general way possible or they allowed for some compounds to resist their analysis. It is this
latter approach that will be adopted here. To put it plainly, certain compounds simply cannot be
reduced to a basic relation, while still remaining faithful to their meaning. I will call these cases
idiosyncratic compounds (e.g. année-lumière; cf. Warren 1978). These compounds differ from
those that might, for historical reasons, possess no discernible relation (e.g. compère-loriot) and
which may instead be called lexicalized compounds. This method is, I believe, very much in line
with the principal goal of this work, which aims to determine to what degree a given compound
can be considered transparent, given both its meaning and its form. The semantic relations
169
retained here are thus not meant to categorize all compounds in my data. In fact, considering
that my data were constituted using entries in Wiktionary without prejudice, so to speak, it is
inevitable that a number of the compounds retained would defy classification. This approach
thus tacitly acknowledges that, from a semantic perspective, some compounds involve more
complex relational concepts than others.
This decision speaks to the necessity of establishing—and adhering to—a set of principles that
might guide the selection of the fundamental logico-semantic relations at play within
compounds. I therefore offer the following principles with regards to said relations87:
1) They should be meaningful. Relations should be sufficiently meaningful so as to
disambiguate the compounds to which they are applied.
2) They should be limited. Relations should not be based on ad-hoc associations or
proliferate in a manner that might render them redundant.
3) They should be representative. The relations retained should account for as many
compounds as possible without violating (2).
4) They should be distinct. The differences between relations should be sufficiently clear so
as to allow for their coherent application across the data.
Principle (1) rules out the use of prepositions as labels for relations as they are not able to
sufficiently disambiguate compounds (i.e. at might represent both time and location). It also
means that whatever labels are used, they should easily allow for expansion via some sort of
paraphrase. Principles (2) and (3), taken together, urges us to work toward a number of relations
that is both sufficient and necessary. Principle (4) emphasizes the need for a coherent and
cohesive formalism. As we will see, however, the fourth principle is also the most difficult to
respect. Two relations will often overlap in ways that make their application difficult, thus
requiring that tests or clear descriptions be offered in order to further distinguish between them.
87
These principles are loosely based on Séaghdha’s (2008) set of 5 criteria for establishing a classification scheme, which are Coverage, Coherence, Generalisation, Annotation Guidelines, and Utility.
170
Given these basic principles, along with the careful comparison of the works discussed in the
previous sections, 15 relations were retained based on both their recurrence and salience in the
literature. By salience, I mean that although a given relation may not be listed across most
works, it may offer a sufficiently significant distinction so as to warrant inclusion. For instance,
“function as” is only truly discussed in Adams (1973) and Jackendoff (2010), but it offers a
useful layer of granularity to the list. This particular relation is often grouped with copulative
compounds (i.e. A is a B), but since hypernymic and coordinating relations often receive,
despite their own copulative status, distinct treatments, it seems reasonable to view function as
its own category. This allows us to distinguish, for example, between compounds such as buffer
state (i.e. ‘state that functions as a buffer’) and girlfriend (i.e. ‘friend that is a girl’).
The relations retained for this project are listed in Table 5.11 and will be explained in greater
detail in the following sections. The labels reflect many of those found in the literature, but the
use of substantive forms is not rooted in any particular approach; nothing hinges on this
particular choice of nomenclature.
Table 5.11. Logico-semantic relations retained in this work.
COORDINATION COMPOSITION TIME
HYPERNYMY SOURCE TOPIC
SIMILARITY PART FUNCTION
PRODUCTION LOCATION PURPOSE
CAUSE POSSESSION USE
Many of the relations listed above are in fact reversible, a term borrowed from Jackendoff
(2010), but also present in Warren (1978) and referred to as “direction” in Séaghdha (2008). In
essence, a relation is reversible if either the head (as in 76a) or non-head element (as in 76b)
may function as its subject:
(76) a. tear gas ‘gas that causes tears’ gas CAUSE tears
b. motion sickness ‘sickness that motion causes’ motion CAUSE sickness
In Levi (1978), only the Make and Cause RDPs are reversible, but Warren (1978) shows that all
of her classes, save Purpose and Copula, are reversible. Likewise, of the fourteen basic functions
in Jackendoff (2010), nine are said to be reversible. Interestingly, Jackendoff does not outright
171
state that Cause is reversible, although this seems likely given Levi’s (1978) analysis of this
RDP (cf. the compounds in 76). Reversibility is in fact implied in a number of other works,
usually by stipulating the same relation twice. Downing (1977), for instance, differentiates
between Whole-Part and Part-Whole, a distinction that clearly relates to the order of the
constituents. Similar treatments can also be found elsewhere (Adams 1973, Shoben 1991,
Vanderwende 1994, Arnaud 2003, etc.).
Reversibility is relevant for French as well (cf. Arnaud 2003), as evidenced by the following
compounds:
(77) a. piétin-échaudage = ‘piétin qui cause l’échaudage’ [N1 causes N2]
b. arrêt-maladie = ‘arrêt causé par une maladie’ [N2 causes N1]
c. marche-palier = ‘marche qui fait partie d’un palier’ [N1 is part of N2]
d. stylo-bille = ‘stylo dont une bille fait partie’ [N2 is part of N1]
When applicable, reversibility, along with the appropriate paraphrases, will be included in the
presentation of the relations.
How might the relations in Table 5.11 above apply to French compounds? To answer this
question, they were used to label 729 NN, as well as a smaller set of 319 N à N compounds, all
taken from Wiktionary (see Chapter 3 for details on the data). Before going into greater detail
regarding the meaning of each of the above relations, however, a few words must be said
regarding some of the challenges related to the selection of a particular relation for a given
compound.
5.2.1 Interpreting Compounds
In an attempt to make labeling compounds as straightforward as possible, explanations are
provided for many of the choices made. Unsurprisingly, however, the task of selecting a
particular relation over another for a given compound poses a number of challenges. Chief
among them is the unavoidable semantic overlap present for many of the retained relations.
Many authors have discussed this problem at length, but seldom can a definitive solution be
offered. For instance, Jackendoff (2010) says the following of two of his basic functions: “It is
sometimes hard to distinguish MAKE from CAUSE. Perhaps MAKE (X,Y) decomposes as CAUSE
172
(X, (COME INTO EXISTENCE (Y)).” The blurred line between these compounds perhaps explains
why he lists knife wound under both of these functions, but the matter is not addressed any
further. As for MADE FROM and COMPOSITION, Jackendoff insists that they differ in that for
composition “the object or substance is no longer in evidence,” but that “[t]he distinction is
however slippery” (440). These comments echo those of many of the authors discussed in the
previous sections.
Another, perhaps more vexing issue, however, stems from the fact that many compounds allow
for a number of different interpretations. Take, for instance the following set of English
compounds:
(78) a. dog house ‘house in which there is a dog’ (LOCATION)
‘house for a dog’ (PURPOSE)
b. peanut butter ‘butter made from peanuts’ (SOURCE)
‘butter which consists of peanuts’ (COMPOSITION)
c. bear country ‘country that has bears’ (POSSESSION)
‘country in which bears are located’ (LOCATION)
The multiplicity of meaning illustrated in these examples is distinct from the one discussed at
the beginning of this chapter, where a compound was said to be ambiguous because it could
potentially have different meanings, but that only one was actually acceptable or attested. In the
examples above, however, either interpretation is in fact correct. If one feels that one paraphrase
is clearly better than the other, it should be noted that all of the interpretations in (78) can be
found in the literature on compounds.
This issue is most likely a cross-linguistic one. A quick look at a few French compounds reveals
that they too are susceptible to multiple analyses, a fact that Arnaud (2003) had already
acknowledged and for whom the solution was to simply assign multiple relations to a single
compound:
(79) a. carte-index ‘carte servant d’index’ (FUNCTION)
‘carte sur lequel il y a un index’ (LOCATION)
173
b. passage-piétons ‘passage destiné aux piétons’ (PURPOSE)
‘passage utilisé par les piétons’ (USE)
c. lit-cage ‘lit qui ressemble à une cage’ (SIMILARITY)
‘lit ayant une cage en tant que partie’ (PART)
Again, arguments exist in support of any of the above interpretations. Levi (1978) refers to this
particular issue as an “indeterminacy of analysis,” stating that “there seems to be no principled
reason to prefer one [RDP] over the other” (263). She ultimately dismisses the problem, arguing
that the existence of “double analyses” is simply a consequence of a language system that shows
a great deal of inter-dependency. To say, for instance, that a man has a beard entails that there is
a beard located on his face or that a beard is part of him. Jackendoff (2010) largely agrees with
Levi and argues that this multiplicity of meanings should not, under normal circumstances, be
viewed as a case of ambiguity, but that it should instead be understood as characteristic of
certain compounds, which he labels as promiscuous. According to Jackendoff, any one of the
possible interpretations is equally valid. Marchand (1960) was of the same opinion, claiming
that
“[w]hether a night shirt is a ‘shirt for the night’ or a ‘shirt worn at night’ is quite unimportant. In forming [compounds] we are not guided by logic but by associations. We see or want to establish a connection between two ideas, choosing the shortest possible way. What the relation exactly is, very often appears from the context only.” (Marchand 1960: 22)
Without context, then, any analysis may be correct so long as it remains faithful to what we
know about the meaning of a particular compound.
This “indeterminacy of analysis” is in fact so pervasive that it is not uncommon for compounds
to be treated differently from one work to another. So as to quantify just how disparate some
analyses can be, I gathered as many of the example compounds as I could from five different
works on relations: 648 from Adams (1973), 485 from Warren 1978, 387 from Levi (1978), 378
from Lauer (1995), and 389 from Jackendoff (2010). In total, nearly 2,300 English compounds
were collected. Of these compounds, only 114 were present in more than one work. It is
surprising to see so little overlap in the compounds examined (given that the object of study in
174
all works was NN compounds), but it’s unfortunately unclear why this is the case88. What is
clear, however, is just how different the treatment of identical compounds is from one work to
another: of the 115 duplicate entries, just under half (56) were treated similarly by two or more
authors. The following table shows the 59 compounds that were interpreted differently in each
work89:
Table 5.12. A comparison of compound interpretations across five different works.
Compound Adams 1973
Levi 1978
Warren 1978
Jackendoff 2010
Lauer 1995
anthill Part-whole Make1 apple core Have2 Part1 bear country Locative Have1 Loc2 bird sanctuary For Protect2 bloodstain Instrumental Comp1 bookmark Locative Argument booster shot Appositional (is a) Serve bull ring Locative For career girl Part-whole Have1 crystal structure Whole-part Argument doghouse For Loc2 (PF) eyeball Associative (part) Source-Result farm boy From Loc1(Char) fighter plane Appositional (is a) Serve fingerprint Instrumental Make1 fireplace Composition/Form/
Contents Purpose fisherman Copula Be food surplus Source-Result Argument football Instrumental Source-Result/Purpose frogman Resemblance Be gangster money Origin-object Have2 garlic bread Part-whole Loc2 garter snake Resemblance (char) Be Similar gas mask Instrumental Protect2 grain alcohol Instrumental From Made1 guidebook Instrumental Serve handlebar Appositional
(function) Copula Serve
88
A number of explanations are in fact available. One possibility is that the source data are intimately related to the period during which they were collected: nearly forty years separate Adams’s work from Jackendoff’s, for example. Another possibility, however, is that the methods used in the selection of compounds in each study (i.e. source, criteria, classification, etc.) were sufficiently dissimilar to result in vastly different data sets. 89
Some of the relations have been truncated to their most recognizable designations. When a relation includes a number (e.g. Make1, Have2), it indicates whether the head is the object or subject of the relation.
175
Compound Adams 1973
Levi 1978
Warren 1978
Jackendoff 2010
Lauer 1995
handlebar moustache Resemblance Be
headache Subject-Verb Loc1 headache pill For Protect hearing aid Instrumental Argument hermit crab Resemblance Be honeybee Subject-Verb Make1 Make2 houseboat Appositional
(function) Copula immigrant minority Make2 Source-Result lifeboat Instrumental Protect1 lightning rod For Protect2
loaf sugar Composition/Form/Contents Comp2
mothball Instrumental For Protect2 mouse trap Instrumental Argument musk deer Make2 With particle shape Whole-part Argument peanut butter Composition/Form/
Contents From
picture book Composition/Form/Contents Have1
prose poem Source-Result Be puppet government Resemblance Be (Copula)
sandpaper Composition/Form/Contents Loc2
sandstone Part-whole Similar silkworm Subject-Verb Make1 Make2 sunburn Subject-Verb Cause tablecloth Purpose Loc1(Pf) tea room Locative For tear gas Instrumental Cause1 textbook Be Part-Whole toothache Subject-Verb Loc1 union member Whole-part Argument wall board Purpose Comp2 wardrobe color Whole-part Argument windshield Instrumental Protect2
In some cases, the differences in treatment are simply due to gaps (or to highly generalized
relations) in a particular formalism. For instance, frogman is based on the resemblance relation
under Adams’s analysis, while it is grouped under Be in Levi’s work, the reason being that Levi
has no resemblance predicate—all such compounds are treated as cases of metaphorical
copulatives. In many cases, however, the differences are due purely to the author’s personal
interpretation: bear country is locative for both Adams and Jackendoff, but purposive for Levi;
176
anthill is based on a part-whole relationship for Warren, but on a production one for Jackendoff.
Nor is it unheard of for an author to include the same compound under two different relations
within their own work. Jackendoff (2010), for instance, lists ferry boat under both the Kind and
Serve functions (‘a boat of kind ferry’ ~ ‘a boat that serves as a ferry’); stew beef is first said to
be based on the Made function (‘beef from which stew is made’), but is later listed under Part
(‘beef that is part of a stew’)90. Similarly, Warren (1978) is unsure whether a spacecraft
expresses a locative or purposive relation and so lists it under both classes. These dual category
compounds show just how difficult it can be to determine what basic relation is at play in certain
constructions. In some instances, nearly identical compounds are given different treatments. It is
unclear, for example, why Jackendoff views sunburn as a case of Cause (‘burn caused by the
sun’), but suntan as an occurrence of Make (‘tan made by the sun’). These types of analyses,
while perhaps infrequent, do occur.
It is clear then, that the only solution is to strive for consistency above all else. Unfortunately,
this proves surprisingly difficult. For most of the works cited above, many of the compounds
under investigation are not in fact institutionalized, which means that resolving any issues
related to interpretation is not simply a matter of consulting reference works. Given that this
project relies on entries from Wiktionary, one should, in theory, be able to base all
interpretations on the definitions it provides, but this method doesn’t always provide clear and
consistent results:
(80) a. boîte à outils “Coffret où ranger les outils [...]”
b. boîte à camembert “Boîte en bois léger ou en carton pour le camembert.”
Should boîte à outils be treated as locative and boîte à camembert as purposive or should they
be treated identically? This issue is only further exacerbated by the numerous condensed or
truncated definitions that fail to provide sufficient information to determine how a compound’s
elements might be related, often requiring that additional research be conducted. This is a
90
Such heterogeneous treatments for a single compound may be due to a Type vs. Token reading on the author’s part. Therefore, treating stew beef as an instance of the Made relation might be due to a type (or intensional) interpretation, while Part would be its token (or extensional) interpretation. It must be noted, however, that all authors seem to be interested solely in compound types and not tokens, which is also the perspective adopted here.
177
shortcoming not only of Wiktionary, but also of more established dictionaries. Take, for
instance, the following two definitions for chêne kermès:
(81) a. Wiktionary: “Arbuste méditerranéen piquant de la famille des Fagacées.”
b. LPR2010: “Espèces méditerranéennes à feuilles persistantes.”
Neither of the definitions in (81) mentions kermès, let alone how it relates to chêne. The only
solution is to adopt a methodology that might allow one to minimize, though probably not
completely eliminate some of the problems posed by these so-called promiscuous compounds. It
is therefore with this issue in mind that I attempt to offer as many distinctive features as possible
for the relations used in this work. This is an important component to the analysis considering
that some relations might be said to share a number of characteristics (Location and Part; Cause
and Production; etc.). Whenever possible, tests and explanations are provided that might help to
further ensure that the relations are applied consistently, or at the very least, to provide
justification for some of the decisions I have made.
5.2.2 Presentation Format
Traditionally, the elements of a compound are referred to as N1, N2, N3...Ni (or X, Y, W, Z), but
this doesn’t allow for much flexibility when stipulating how a relation applies to a compound. In
other words, by stating that a relation such as causation is applied as “N2 causes N1” (as, for
instance, Jackendoff 2010 does), one fails to account for headedness, or rather, one assumes that
the head is always the rightmost constituent. This is relatively trivial for English compounds as
they are nearly always right-headed, but not all languages are so rigid when it comes to the
position of the head. Although French is mostly left-headed (requiring that the above causal
relation be instead stated as “N1 causes N2”), right-headed compounds in French are not
unheard of. For this reason, I will use the labels H(ead) and M(odifier) when stating relational
associations and their application91. Of course, this means that position must then be stated
independently (i.e. H=N1, M=N2), but this should arguably have little effect on the presentation
91
Similar to Pham and Baayen (2013), the abbreviations used are as follows: H = Head constituent (fr. T = Tête); M = Modifier constituent; C = Compound. Although the term modifier is usually reserved for subordinative compounds (Scalise and Bisetto 2009), it will also be used for the non-head element of compounds that would otherwise fall under the coordinate type (e.g. boy-king).
178
of the relations. This nomenclature also has the added advantage of being language independent,
provided that a head element can be identified.
It must also be noted that many of the authors whose work has inspired the following relations
focused on compounds denoting concrete objects and not abstract concepts. The relations, as
they are put to use here, have a relatively broad application, that is to say that a relation such as
PART does not necessarily have to involve physical objects. This particular facet of my approach
will be further elaborated upon as each relation is discussed in greater detail.
The retained relations are presented using the following table:
RELATION
Relation Type Structure Template Examples Linking Material
Basic H REL M H REL M NN
verbs, prepositions, etc. Reversed
H that M REL H que M REL
NN
For each relation, a basic structure template (or paraphrase) is provided for both English and
French compounds. Examples are also given for both languages, along with a list of possible
linking material that can be used to paraphrase a given compound syntactically (e.g. toolbox =
box for tools). This linking material is meant to draw parallels between the retained relation and
those proposed elsewhere in the literature and may include such items as verbs (e.g. have, cause,
make, etc.), prepositions (e.g. for, from, of, etc.), and even nouns (e.g. kind, type). Whenever
appropriate, the reversed form of the relation is also included.
The following sections will largely focus on NN compounds as they offer few, if any, methods
of disambiguation and are thus most likely to make use of all the retained relational concepts.
Furthermore, the research upon which the relations are based has focused almost exclusively on
NN compounds. A discussion of N à N compounds, however, will be provided whenever it is
deemed pertinent (i.e. when a particular relation is only marginally applicable for NN
compounds) and will instead be a major component of the following chapter, which will seek to
determine to what degree these relations coincide with what has already been said regarding the
preposition à.
179
5.2.2.1 Hypernymy
HYPERNYMY
Relation Type Structure Template Examples Linking Material
Basic an H of kind M un T de sorte M
oak tree banane plantain kind of, type of
sorte de, type de Reversed an H that M is a kind of ?un T dont M est une sorte
bear cub ?argent métal
Some compounds consist of two semantically related words, one of which is in fact a more
specific term for the other. A number of authors have identified these types of compounds and
have classified them in a variety of ways. Adams (1973) lists compounds such as football game,
repair job, and teaching profession under her appositional class and paraphrases them as “B of
which A is a particular instance” (69). In Warren’s framework, these types fall under her
Copula: Subsumptive class (cf. Marchand 1960). Vanderwende, whose approach is based on
interrogative structures, categorizes hypernymic compounds using “What kind of?” Others,
however, simply group them together with other copula (i.e. A is a B) compounds (Levi 1978,
Séaghdha 2008). A test for hypernymy, taken from Marchand (1960), is based on bi-
directionality: for oak tree, for example, both an oak is a tree and the tree is an oak are true
(42). In French, most cases seem to involve either animals (82a), plants (82b), or substances and
minerals (82c):
(82) a. chouette-effraie, larve échinocoque, chat serval
b. banane plantain, houx frelon, menthe pouliot
c. quartz morion, salicaire pourpier, zéolithe cyanite
Based on the data examined, the HYPERNYMY relation does not seem to be reversible in French,
although, according to Jackendoff (2010), it is in English (e.g. puppy dog ~ bear cub).
Unfortunately, he only provides two such examples (seal pup and bear cub), both of which are
animal-young combinations, stating that they may also be open to other analyses (438). Warren
(1978), on the other hand, states that the species-genus (i.e. subsumptive) relation is not
reversible (105). This assertion is seemingly supported by the French data as all such
compounds have the head element as hypernym and the modifier as hyponym. There is,
however, at least one case that could be treated as a reversed instance of hypernymy, that is to
180
say one in which the head constituent plays the part of hyponym: argent métal is defined as
‘argent qui est un métal’ so as to distinguish argent meaning ‘metal’ from its homonym
meaning ‘money’. If this is in fact a case of reversed hypernymic compound (and not simple
coordination), then there may be other cases involving homonymic or highly polysemous
constituents. I will, however, leave open the question of reversibility for this particular
compound type, which seems prudent given the divided consensus on its applicability.
5.2.2.2 Coordination
COORDINATION
Relation Type Structure Template Examples Linking Material
Basic a C is an H and an M un C est un T et un M
boy king auteur-compositeur is also, is both / and
est aussi, est à la fois / et Reversed --- ---
Coordination is typically used for compounds whose elements may be coordinated using and.
More specifically, coordinating compounds refer to combinations that, from a semantic
perspective, seem to involve both elements equally. Jespersen (1956) identified two such types.
The first, which he called copulative, is defined as “AB means A plus B” (e.g. Alsace-Lorraine).
This type, according to Bauer (1978), is the traditionally accepted meaning of dvandva
compounds. In recent years, however, this Sanskrit term has had a far less restrictive usage (see
Scalise and Bisetto 2009) and often includes the second type identified by Jespersen, which he
called appositional and which refers to pairings that mean “at the same time A and B, the two
combined in one individual (e.g. maid-servant)” (144). The fact is that the terminology
employed for many of these types of compounds is far from conventionalized92. Coordination,
as it is used here, groups together a number of different, yet related types, all of which may
traditionally fall under one of three frequently used labels: copulative, appositional, and
coordinate.
92
For instance, Marchand (1960) used the term copulative for compounds such as fighter-bomber, while Jespersen (1956) used the term appositional; for Scalise and Bisetto (2009), however, these types are coordinate compounds.
181
These types of compounds were examined in Chapter 4 while discussing headedness, where it
was said that they may, under certain conditions, be considered bi-centric as both constituents
can function as hypernyms of the compound. From a purely semantic perspective, however,
coordinated compounds seem to involve several types of coordination. According to Wälchli
(2005), co-compounds may be grouped together based on the type of “natural coordination”
involved, which he describes as a “coordination of items which are expected to co-occur, which
are closely related in meaning, and which form conceptual units” (5). Many languages, for
instance, possess compounds consisting of the words for mother and father and which mean
‘parents’. Family ties may thus be said to involve natural coordination. Wälchli’s corpus
primarily consists of Eurasian languages, but he also makes extensive use of East and South
East Asian languages, which are known to involve a great deal of coordination in compounding
(Arcodia et al. 2010). Based on his research, Wälchli identifies ten semantic classes of co-
compounds. Unfortunately, few of these classes are applicable to the compounds typically
labeled as coordinated for germanic and romance languages. This is not necessarily surprising as
few of Walchli’s co-compounds denote just one entity, which is in fact related to the idea of
natural coordination. Thus, both Mordvin t’et’a.t-ava.t ‘father.pl-mother.pl’ = ‘parents’ and
Georgian mšvild-isari ‘bow-arrow’ = ‘bow and arrows’ are said to be additive co-compounds
because they “denote pairs, each consisting of the parts A and B” (2005: 137-138).
Bauer’s (2008) classification of coordinated compounds, on the other hand, contains five major
types, one of which is the classic dvandva type, which he further subdivides into five sub-types.
Bauer reserves the dvandva label for those compounds that adhere to the Sanskrit description
(cf. Burrow 1955) and thus reclassifies a number of compounds that had, over the years, been
labeled as dvandvas by virtue of the coordination of their elements, but which Bauer argues is
only partly related to what the Sanskrit grammarians understood to be dvandvas. True dvandva
compounds, according to Bauer, require that both constituents be of equal status. His five major
types of coordinated compounds are summarized in the following table:
182
Table 5.13. Bauer’s five main types of coordinated compounds.
Type of coordination English Example French Example
Translative London-Edingburgh (Express) (vol) Paris-Londres
Co-Participant Mother-Child (Relationship) (relation) médecin-patient
Dvandva Austro-Hungary (among others) Alsace-Lorraine
Appositional singer-songwriter auteur-compositeur
Hyponym-Superordinate oak-tree banane-plantain
For the purposes of this work, the coordination relation is used to refer to any compound in
which its constituents may be coordinated using, at a minimum, the conjunction and. For a more
exhaustive look at how this relation may be instantiated, including within exocentrics, the reader
is encouraged to consult Wälchli’s (2005) inventory of ten semantic classes of co-compounds or
Bauer’s own extended typology (2008).
Based on the examination of the French data, the first type of coordination involves compounds
whose elements denote two aspects or features of equal status. They usually denote a person
(83a), an artefact (83b), and occasionally an establishment (83c):
(83) a. analyste-programmeur, auteur-compositeur, cardinal-diacre
b. canapé-lit, chargeuse-pelleteuse, moissonneuse-batteuse
c. bistro-brasserie, café-bar, restaurant-bistro
These particular compounds are most plainly paraphrased as “an H that is also an M” (e.g. un
analyste qui est aussi un programmeur), or by the more equative paraphrase: “a C that is both an
H and a M” (e.g. un analyste programmeur est à la fois un analyste et un programmeur). In this
regard, coordinating compounds are in fact particular types of copula constructions as they
allow for an IS A based paraphrase, but remain distinct from those listed under HYPERNYMY
because neither element is subsumed under the other (i.e. *un programmeur est un analyste). It
should also be noted that the use of a copulative verb for these compounds also distinguishes
them from those categorized under the SIMILARITY and FUNCTION relations. A number of other
compounds involving other noun types exist and are relatively frequent (e.g. hotellerie-
restauration, aller-retour, roulage-décollage, quinte-flush). Because these types of compounds
arguably possess two heads, reversibility is largely a matter of interpretation (see Chapter 4,
Section 4.1 for a discussion of centricity). Although reversibility is technically possible, it is of
183
little consequence: to say that a singer-songwriter is a singer who is also a songwriter, or vice-
versa, does not give rise to a significant shift in meaning93.
In contrast with the constructions described above, a number of coordinating compounds instead
denote hybrid entities. They do not typically permit the first type of paraphrase mentioned
earlier (“an H that is also an M”):
(84) point-virgule, âne-zèbre, femme-renarde, jupe-culotte, punk-rock, roman-feuilleton
One reason for this limited periphrasis is that the designatum of each of the compounds in (84)
is in fact neither entity, but instead a mixture of them. In other words, a point-virgule is neither a
period nor a comma, but is instead a form of punctuation with the features of both. If one tackles
this from the perspective of function, the distinction is much clearer: a point-virgule does not
accomplish or serve the same purpose as either a period or a comma does. Compare this with
those compounds in (83) above, which do in fact function as either/or (i.e. an analyste-
programmeur does what an analyst does, as well as what a programmer does; a canapé-lit is
used as either a sofa or a bed; etc.).
An interesting set also treated as cases of coordination and related to the hybrid compounds in
(84) are based on an N-garou pattern:
(85) animal-garou, loup-garou, ours-garou, chien-garou, etc.
Very little can be said about these cases as they are all patterned on loup-garou, garou alone
having originally meant part-man, part-wolf, but which, according to the TLFi, later underwent
expansion to explicitly include the word for wolf. They are included under coordination based
on the assumption that garou now simply means ‘human-like monster’, which is then
coordinated with the lexeme denoting an animal.
93
One might argue that reversing the order of such compounds would result in an adjustment of prominence, that is to say a singer-songwriter is a singer first and songwriter second, but this is a different issue related to the actual order of the elements and not the interpretative one (i.e. a songwriter that is a singer ~ a singer that is a songwriter).
184
The copula paraphrases mentioned earlier are not definitive criteria for coordinating compounds,
however, as some compounds seem to focus on some middle space between the designatum of
its elements. These are related, to some degree, to the hybrid compounds in (84) above:
(86) nord-ouest, baryton-basse
These particular types are not in fact very frequent and mostly consist of cardinal points.
Although north-west might be said to indicate a point that is both north and west, it is in fact
more accurate to say that it is in fact referring to a point somewhere in between the two. The
same can be said for baryton-basse. They nevertheless involve coordination (i.e. north and west)
and so are treated as such. An additional set of compounds that might also be said to indicate
some sense of “in between” is related to designations of rank or titles:
(87) lieutenant-colonel, lieutenant général, sergent major
Again, one might argue that these are cases of “both A and B,” but the truth is that they are
neither and that the whole refers to someone situated between the two ranks denoted by the
elements.
It is also worth noting that exocentric compounds may also rely on the COORDINATION relation,
as shown in the following examples:
(88) a. jambon-beurre ‘sandwich composé de jambon et de beurre’
b. huppe-col ‘oiseau ayant à la fois une huppe et un col’
c. épinard-fraise ‘plante ayant des feuilles ressemblant aux épinards et des baies
ressemblant aux fraises’
The compound in (88c) differs from the others in that the coordination of the elements relies on
an additional relational factor, that is to say one involving resemblance. Another similar instance
of this type might be fibre-cellule, which according to a number of sources is neither a fibre, nor
a cell94, although there may be reason to treat it as “fibre that is similar to a cell.”
94
“L’usage a fait adopter ce substantif, introduit par les anatomistes allemands, malgré l’opposition qui existe entre la valeur des mots fibre et cellule ; mais les éléments anatomiques qu’il sert à désigner ont à la fois la forme généralement étroite, allongée, aplatie, de beaucoup de fibres, et quelque chose de la structure des cellules, en ce
185
Some might argue that the compounds discussed above would benefit from being treated as sub-
types of coordination (cf. Adams 1973). Although such an approach does have its advantages
(mainly that it further disambiguates between these particular types), it also needlessly
introduces relations to account for compounds that only differ slightly in meaning. They are all,
fundamentally, cases of coordination, which is evidenced by the possibility of paraphrasing
them in more or less the same manner. In fact, the differences of meaning observed (i.e. hybrid
~ intersection) often stem from extralinguistic constraints related to how two elements may be
coordinated. For instance, while nord-ouest may very well be paraphrased as ‘nord et ouest,’ it
is impossible for something to occupy these two points at the same time. In this case,
COORDINATION will thus favour an intersective reading.
5.2.2.3 Similarity
SIMILARITY
Relation Type Structure Template Examples Linking Material
Basic an H that is similar to M un T qui est semblable à M
ant lion fourmi-lion similar to, like
semblable à, comme Reversed --- ---
The SIMILARITY relation is based on a general degree or aspect of “likeness,” although just what
exactly this “likeness” might involve is not always entirely clear. Such a criterion is, of course,
highly indeterminate and thus allows for a wide range of possible interpretations. This is why
I’m calling SIMILARITY the loosest possible compound relation. As Warren (1978) says, two
objects might be similar in any number of ways. In fact, based on her data, a more granular
approach to similarity would need to account for at least 18 different types. The French data,
although not quite as multi-faceted as Warren’s, seem to support this finding as well:
qu’elles renferment un noyau central ou quelquefois deux, avec ou sans granulations moléculaires autour de lui” (Nysten et al. 1858. Dictionnaire de médecine, de chirurgie, de pharmacie, des sciencies accessoires et de l'art vétérinaire).
186
(89) a. oiseau-mouche H looks like M
b. pomme cannelle H smells like M
c. oiseau-cloche H sounds like M
d. fermeture éclair H is fast like M
e. taupe grillon H behaves like M
f. mot obus H functions like M
g. mot-valise H is formed like M
h. roman fleuve H flows like M
The different kinds of similarities are numerous enough—and possibly idiosyncratic enough—to
consider grouping them together under a single relation. Of course, such a relationship requires
that the speaker attempt to determine just what the similarity is based on, which is perhaps
further evidence that a compound’s meaning is established by evaluating the compatibility
between the items’ features using some sort of schema or slot-filling mechanism (cf.
Wisniewski 1998, Baroni et al. 2007, Lieber 2009). Although the adoption of SIMILARITY may
seem to violate the principle requiring that all relations be meaningful, the alternative—which is
to say, further distinguishing between various types of similarities—would violate both the
principles of limitedness and representativeness.
Note that the SIMILARITY relation is not reversible, as doing so produces an implausible
equivalence:
(90) catfish ‘a fish that looks like a cat’ ‘*a fish that a cat looks like’
It is also worth noting that there have been efforts to distinguish between a broad similarity
relation and one involving shared physical attributes. Under more granular approaches, physical
similarity corresponds to “B which is in the form of, has the physical features of, A” (VIIA1,
Adams 1973) or “N2 designates via analogy a perceivable characteristic of N195” (Arnaud
2003) and might be represented by a number of different expressions such as “B is
like/resembles A” (Lees 1968, Warren 1978) or even “N2 is similar to N1” where the similarity
is understood to mean physical resemblance (Séaghdha 2008, Jackendoff 2010).
95
“N2 désigne par analogie une caractéristique perceptive de N1” (Arnaud 2003: 75).
187
Despite the fact that some authors have chosen to treat physical resemblance as distinct from
other types of similarities, many have in fact grouped together these particular relational
concepts under a single heading. For Warren (1978), resemblance entails that “B is similar to A
in some respect or respects” (108) (see also, Downing 1977, Levi 1978, Séaghdha 2008). This
broader interpretation of likeness is entirely justified given that physical resemblance is simply a
narrower instance of similarity: if something looks like something else, than it is also similar to
that thing. SIMILARITY, as it is used here, therefore covers all manner of shared features,
regardless of their nature.
The SIMILARITY relation, however, does possess a number of application issues. First, when
referring to physical similarities, it occasionally involves a meronymic association where it is in
fact the individual parts and not the whole that is targeted by the relation:
(91) a. clé crocodile, raie léopard, poisson-chat
b. requin marteau, plante crayon, tortue-boîte
In (91a), parts of both elements are involved in the similarity (i.e. clé crocodile = the teeth of the
wrench look like the teeth of a crocodile), while in (91b), only part of the head element is
targeted (i.e. requin marteau = the head of the shark looks like a hammer). Because the
relationship between the elements in the compounds in (91a) cannot be said to be based on a
part-whole relation (e.g. *clé qui fait partie d’un crocodile / *clé dont un crocodile fait partie),
they are treated as cases of SIMILARITY; conversely, because the elements in (91b) involve a
part-whole relation (e.g. requin qui a un marteau en tant que partie), albeit one that relies on a
metaphor, they are included under the PART relation (see Section 5.2.2.6). This distinction is
similar to the one made by Arnaud (2003), though his corpus actually contains few examples of
compounds like those in (91).
Another related type of SIMILARITY involves colour and is present in two different, yet related
kinds of compounds:
(92) a. bleu ciel, jaune paille, rouge sang
b. beurre noisette, liane-corail, pierre miel
188
The compounds in (92a) can be paraphrased as “H that is the colour of M,” while those in (92b)
are best paraphrased as “the colour of H is the colour of M.” Both types are treated as cases of
SIMILARITY as they may both be paraphrased using this relation (e.g. bleu semblable au ciel;
beurre semblable aux noisettes).
5.2.2.4 Function
FUNCTION
Relation Type Structure Template Examples Linking Material
Basic an H that serves as M un T qui sert de M
buffer state papier filtre functions/serves as
sert de / fonctionne en tant que Reversed --- ---
This particular relation is not in fact frequent in the literature: it is most prominent in Adams
(1973) and more recently in Jackendoff (2010). It is retained here because it highlights key
differences between compounds that might otherwise be treated as either cases of
COORDINATION or SIMILARITY. This relation groups together compounds in which one element is
fulfilling the function of the other. For instance, although many have treated compounds such as
houseboat as a simple copula construction (i.e. a boat that is a house, cf. Warren 1978), this
approach in fact glosses over a critical aspect of this compound, namely that it is not, strictly
speaking, a house, but rather a boat used as one. A few examples from French are as follows:
(93) cellule assistante96, circuit tampon, gaz propulseur, logiciel antivirus, papier filtre
Distinguishing between coordinating compounds and those involving functionality is not always
simple. The chief distinction between them is that the former will allow either of its elements to
function as the head (as in 94a), while the latter typically produces marginally acceptable
paraphrases (as in 94b):
96
This construction may also be treated as an NA compound, where assistante is understood as an adjective. It is treated here as a noun because LPR2010 contains no adjectival entry for this word (see Chapter 3, Section 3.4 for information on the methodological choices made when identifying parts of speech of compounds).
189
(94) a. un auteur-compositeur est un auteur / un auteur-compositeur est un compositeur
b. une cellule assistante est une cellule / ?une cellule assistante est une assistante
Borrowing from Jackendoff (2010), we can further identify FUNCTION based compounds by
using the paraphrases “the function of H is as an M” or “the function of H is to do what M does”
(442). Paraphrasing coordinate compounds in this manner produces odd results, while doing the
same with those in (93) typically yields acceptable sentences:
(95) a. ?la fonction/le rôle de cet auteur est de servir de compositeur
b. la fonction/le rôle de cette cellule est de servir d’assistante
Neither test provides absolute evidence for classification one way or the other, but they do show
that these particular cases are not exactly alike. To treat them in the same manner fails to
emphasize that a number of compounds are coined by identifying the function of a particular
entity and applying it to another that wouldn’t typically fill this role. Moreover, another key
distinction is that the elements of a coordinated compound are typically of the same conceptual
class (i.e. person-person, artefact-artefact, place-place), while compounds involving function
have no such requirement. The examples given in (93) show just how different the elements can
be when one functions as the other.
It should be noted that this relation is not reversible, most likely because doing so would
introduce an illogical relationship between the elements that would make it difficult to identify
the head:
(96) a. houseboat ‘a boat that functions as a house’ = it’s a boat
b. houseboat ‘a boat that a house functions as’ = is it a house or a boat?
We may also want to differentiate FUNCTION from SIMILARITY, given that the latter also seems to
include compounds that involve functionality to some degree (e.g. mot-obus or ville-dortoir).
The distinction here lies with how they are paraphrased:
(97) a. mot-obus ‘mot qui fonctionne comme obus’
b. cellule assistante ‘cellule qui fonctionne en tant qu’assistante’
190
While it is true that the compound in (97a) involves the modifier’s function, it does so in a
metaphorical way. In short, FUNCTION is for compounds in which one element functions AS the
other, while similarity may include those compounds in which one element functions LIKE the
other.
5.2.2.5 Possession
POSSESSION
Relation Type Structure Template Examples Linking Material
Basic an H that possesses M un T qui possède M
career girl punk à chien possess (have / of)
possède (a / de) Reversed an H that M possesses un T que M possède
family estate droit d’auteur
The POSSESSION relation, often paraphrased using the verb have, is related to a number of other
associations. For instance, it is subsumed under the Part-Whole class in Warren’s (1978)
framework and under OF in Lauer (1995). Similarly, Levi includes both partitive and possessive
meanings in her Have predicate. This approach is largely based on the fact that Have is a highly
polysemous verb and that possession involves a number of different relations: Baron and
Herslund (2001), for instance, argue that Have expresses what is fundamentally a locative
relation that may also include possessive and partitive associations. Fonagy (1975) identified ten
types of possessive relations, many of which are further subdivided into more specific types,
which include, among others, ownership, kinship, part-whole, and group membership. It is
therefore not surprising that by relying heavily on Have as a basic relation, one inevitably
groups together vastly different compounds:
(98) a. doorknob, fingertip, shoelace
b. student power, family car, gangster money
While both sets of compounds can indeed be paraphrased using the predicate Have, only the
compounds in (98a) allow for the explicit use of PART in its paraphrase (e.g. a knob that is part
of a door ~ *power that is part of a student). Distinguishing between these types of compounds
seems therefore warranted and many have done just that (Adams 1973, Vanderwende 1994,
Moldovan et al. 2004, Jackendoff 2010), usually defining possession as a case of ownership,
regardless of tangibility (cf. the examples in (98b above). For French, Arnaud (2003) seems to
191
also include a possessive relation in his inventory of NN compounds, which he paraphrases
simply as “N2 has N1.” He records 15 such cases (with an additional 9 in complex formations),
but consulting his list raises a number of questions.
Table 5.14. Compounds listed as dd or “N2 has N1” in Arnaud (2003).
année lumière incident calénaire situation météo
bateau pirate PC course taille mémoire
cas régime radio pirate temps machine
cas sujet régime moteur vaisseau pirate
émetteur pirate relais traction vérité terrain
Arnaud’s description of the “N2 has N1” class doesn’t make the matter any clearer. His
argument in favour of using this particular class for the compounds cas sujet and cas régime, for
instance, is largely unconvincing (my translation):
“The subject case and the objective case are not cases ‘intended’ for the subject or the objects, nor do they ‘contain’ these functions. Can we say then that the grammatical function is at the source of the case? This is a bit more acceptable, but it nevertheless seems more accurate to say that the case ‘corresponds’ to, ‘is that of’ the function. In more abstract terms, the function ‘has’ its case.” (71)97
It’s not clear just how Arnaud arrives at “the function ‘has’ its case” from his observation that
“the case ‘corresponds’ to [. . .] the function.” In fact, his statement regarding correspondence
might be better served by the use of other relations, such as his “N2 is what N1 is about” class
(e.g bilan matières, plan produit, réflexe achat).
As for the other compounds in Table 5.14, some might be more sensibly represented by
instrumental or “use” type relations (e.g. bateau pirate), while others seem to support a
production interpretation (e.g. régime moteur). Others still defy any straightforward analysis.
For instance, the exocentric compound année lumière (light year) refers to the distance traveled
97
“Le cas sujet et le cas régime ne sont pas des cas ‘destiné’ au sujet et aux régimes, ils ne ‘contiennent’ pas non plus ces fonctions. Peut-on dire pour autant que la fonction grammaticale est à la source du cas ? C’est un peu plus acceptable, mais il semble quand même plus juste de dire que le cas ‘correspond’ à, ‘est celui de’ la fonction. de façon plus abstraite, la fonction ‘a’ son cas” (Arnaud 2003: 71).
192
by light in a year, a meaning that is not easily paraphrased using possession, even in its most
abstract form. It is therefore surprising to see it included in Arnaud’s dd class.
Although the “possession as ownership” relation seems of limited use for NN French
compounds, other types do clearly rely on this particular association, namely N de N
constructions in which the preposition assigns the genitive case (cf. Bartning 2001, Knittel
2009):
(99) a. bien de famille, droit d’auteur, mémoire d’éléphant
The examples above might be sufficient evidence for the retention of POSSESSION as a
compound relation, but these types are not under investigation here. Unfortunately, my data
contain few, if any cases of NN and N à N compounds that would unequivocally involve
“ownership.” The only possible candidates are as follows, with those in (100a) being the least
likely:
(100) a. poids coq/mouche/paille/plume
b. bourse-à-berger/pasteur
c. fils à papa, punk à chien
Unfortunately, it’s not immediately clear that any of the compounds in (100) are in fact truly
cases of a possessive relation. According to Riegel (2001), only “ownership/belonging”
possessive constructions allow for a paraphrase using the verb posséder. Compare the following
sentences from Riegel (2001:189):
(101) a. Jean possède trois voitures.
b. *Jean possède deux frères.
c. *Une voiture, ça possède/a quatre roues.
d. *Jean possède un nez bulbeux.
e. *Une équipe de football possède onze joueurs.
This test seems to confirm that the N de N compounds listed in (99) are indeed cases of
ownership or belonging (e.g. il possède des biens/des droits/une mémoire), but shows that the
set of NN compounds in (100a) are most likely not (e.g. *ce coq possède un poids). We might
argue, then, that they are in fact cases of heads profiling for their internal argument (i.e. the
193
weight of X). As for the compounds bourse-à-berger and bourse-à-pasteur in (100b), they are
exocentric, but nevertheless seem to involve possession: they refer to plants whose flowers look
like the purse of a monk or a shepherd. They can thus be paraphrased using the verb posséder
(i.e. bourse que possède un berger). The compound fils à papa in (100c), because it involves
kinship, does not typically allow for such a paraphrase. Should it be treated as a case of
POSSESSION, then? If we choose to include kinship under possession (of which it is arguably a
type), then the answer is obviously yes. But how should it be paraphrased? Is it ‘fils qui a un
papa’ or ‘fils qu’un papa a’? If we base our interpretation strictly on the meaning of the
compound, then the first paraphrase seems most appropriate (it is, after all, a boy who has an
influential father); yet, the preposition à, when used possessively, functions like de, in that it is
the head that is possessed and the complement that is the possessor (cf. bourse-à-
berger/pasteur). When we compare these compounds to punk à chien, however, the matter is
perhaps further clarified as the head cannot be understood as the possessed in this case (*punk
qu’un chien possède). I will therefore state that fils à papa (provided that kinship be accepted as
possessive in nature) and punk à chien are endocentric POSSESSION compounds (H that has M),
while bourse-à-berger and bourse-à-pasteur are exocentric REVERSED POSSESSION compounds
(H that M has). These are, it would seem, the only such cases present in the collected data.
Given that the possessive relation seems to be of limited relevance for NN French compounds,
one might wonder just how prevalent it is elsewhere, despite its frequent inclusion in the
literature. Adams (1973), for instance, only lists four examples and Jackendoff (2010) only lists
six. In Warren (1978), although possession accounts for a non-negligible 15% of her BELONGING
TO class, it only accounts for approximately 4% of all her compounds. Warren defines three
types of Possessor-Belonging relations (102a-c) under her Whole-Part semantic class and one
under Part-Whole (102d):
(102) a. Possessor-Legal Belonging: family estate, agency car, hospital bus, clubhouse
b. Possessor-Habitat: police station, foxhole, courthouse
c. Authority-Subordinate Entity: county school, state hospital, police laboratory
d. Belonging-Possessor: gunman, boatmen, horsemen
In most cases, alternative analyses are available, a fact that is further underlined by their
treatment elsewhere. The compounds in (102b), for instance, are routinely analyzed as locative
194
in nature (cf. Adams 1973), while many of those in (102d) might instead be treated as
instrumental. The nature of the compounds in (102c) is harder to establish, which may or may
not support a possessive analysis, but here too, there is the potential for alternative analyses (e.g.
a police laboratory is a laboratory used by the police); the compounds in (102a) can be said to
represent the more “ownership” type combinations and are in line with such a treatment
elsewhere (cf. Vanderwende 1994, Moldovan et al. 2004, Jackendoff 2010). When it is
understood strictly under the angle of ownership or belonging—as is the case in (102a and
102d)—possessions remains marginal and accounts for a mere 2% of the compounds examined
by Warren.
Despite its limited scope in French for both NN and N à N compounds, the POSSESSION relation
has been retained for this work, mainly because, on the one hand, it is a component of nearly
every other formalism I have examined, and on the other, it will no doubt be necessary for any
work that might look at N de N compounds in the future.
5.2.2.6 Part
PART
Relation Type Structure Template Examples Linking Material
Basic an H that is part of M un T qui fait partie de M
table leg tiroir-caisse part of (have / of)
faire partie de (a / de) Reversed an H that M is part of un T dont M fait partie
wheelchair stylo-bille
The PART relation is one of the most commonly identified compound relations: of the sixteen
works listed earlier in Table 5.10, twelve include some manner of partitive association between
elements. Despite the similarities shared by PART and POSSESSION, namely in the use of either
HAVE or OF as a paraphrastic predicate, the former distinguishes itself from the latter in that it is
included in the whole. PART, as it is used here, is meant to identify those compounds for which
one of its constituents denotes a constitutive element of the whole object or concept denoted by
the other constituent. It is best paraphrased as ‘H that is a part of M’ and reflects what many
have labeled as a Whole-Part relation (Downing 1977, Warren 1978, Levi 1978, Shoben 1991,
Vanderwende 1994, etc.):
195
(103) a. tiroir-caisse ‘tiroir qui fait partie d’une caisse’
b. grille écran ‘grille qui fait partie d’un écran’
c. moteur fusée ‘moteur qui fait partie d’une fusée’
This relation is reversible, which results in the non-head as the component element; it is often
labeled as a Part-Whole relation elsewhere (id.):
(104) a. roman-photo ‘roman dont des photos font partie’
b. stylo-bille ‘stylo dont une bille fait partie’
c. montre-bracelet98 ‘montre dont un bracelet fait partie’
With regards to NN French compounds, the PART relation is far more frequent in my data in its
reversed form (38 compounds) than in its basic form (7 compounds). While these findings run
counter to those in Warren (1978) for English, they seem to reflect those found by Arnaud
(2003). Although it is difficult to say exactly how many such compounds he identifies given the
number of low-level classes used to encode his data, one class in particular does stand out as an
exact match to the one described above (af: “N2, concret-discret, est une des parties de N1”).
This category of compounds includes 47 simple cases, including all of those in (104) above.
Arnaud does not, however, seem to include the alternate formulation in which N1 is part of N2.
Of the 7 compounds identified in my data as “H is a part of M”, only one is also included in
Arnaud’s corpus, namely balai brosse, which he treats alongside those in (104) and thus
interprets it as ‘balai dont fait partie une brosse’. He no doubt views this particular compound
as left-headed, which, intuitively seems correct given that this is French’s preferred position for
the head. I, however, chose to base my treatment of this compound on the definitions provided
not only by Wiktionary99, but also LPR2010100, which both seem to treat this compound as
right-headed, the result of which is an interpretation along the lines of ‘brosse qui fait partie
98
The compound montre-bracelet might also be treated as a case of COORDINATION, an interpretation supported by the existence of the inverted synonym bracelet-montre. COORDINATION, however, typically allows for either component to fill the role of head, a fact that doesn’t apply to either one of these compounds: *une montre-bracelet / *un bracelet-montre est un bracelet. 99
Balai-brosse: Brosse très dure fixée sur un manche à balai (fr.wiktionary.org/wiki/balai-brosse). 100
Balai-brosse: Brosse de chiendent montée sur un manche à balai, pour frotter le sol (LPR 2010).
196
d’un balai’ (cf. bracelet-montre, which is also right-headed)101. In this matter, then, our
contrasting analyses are the result of a different identification of the head constituent.
Fundamentally, however, this particular compound involves the PART relation either way.
It should also be noted that PART, as it is used here, also covers those compounds that fall into
the “group membership” category, such as orchestre musette or émission-débat, though these
types are not frequent in my data.
Another issue to consider is that some compounds might be analysed as either PART or
LOCATION. This dual analysis is related to the fact that LOCATION may subsume PART: if
something is a part of something else, then it is located at/on/in that thing (cf. Baron and
Herslund 2001). One possible solution is to reserve location for only those compounds that
actually involve a locative noun, as does Adams (1973). The problem, of course, is that one
must treat combinations such as toolbox or treehouse using some other relation, as they do not,
in the strictest sense, involve places. The key distinction that will be used here is one that views
the PART relation as a reference to an integral component of the whole, without which it would
either be incomplete, defective, or non-functional. Thus, a negation test may be used to
determine whether the modifier denotes an essential part of the compound. The formulation in
(105) below shows how such a test might apply to compounds in which the head denotes the
whole (cf. 104 above):
(105) a. a C without an M is still a C
b. un C sans M est toujours un C
A positive response to the above sentence would indicate that the modifying noun is not an
essential component of the object denoted by the compound, but instead a distinguishing
feature. Thus, a toolbox without tools is still a toolbox, which indicates that tools is connected to
box via some other relationship (i.e. container-contained). This result is the same for the French
boîte à outils (i.e. une boîte à outils sans outils est toujours une boîte à outils). When applied to
compounds that denote a part-whole association, the test produces defective or incomplete
101
See Section 4.1.2 for a discussion of the issues related to head position in French.
197
readings. Given the following compounds, the test highlights the differences between the
compounds in (106a-b) and those in (106c-d):
(106) a. ?un stylo-bille sans bille est toujours un stylo-bille
b. ?une auto-mitrailleuse sans mitrailleuse est toujours une auto-mitrailleuse
c. une poche-revolver sans revolver est toujours une poche revolver
d. une info-bulle sans information est toujours une info-bulle
The test thus provides a method of treating compounds that might otherwise prove difficult to
categorize, such as those in (107) below:
(107) a. café-crème ?un café-crème sans crème est toujours un café-crème
b. bloc-cylindres ?un bloc-cylindres sans cylindres est toujours un bloc-cylindres
Because a café-crème can be paraphrased as “un café dans lequel il y a de la crème”, it might
allow for a locative treatment (cf. Arnaud 2003), but given the oddness of the test sentence in
(107), I prefer to treat it as an instance of the PART relation. Some compounds that seem to
involve a partitive relationship, however, still fail the test because the component is not in fact
integral or can be typically removed, such as those in (108):
(108) a. laurier-cerise un laurier-cerise sans cerise est toujours un laurier-cerise
b. chêne-gomme ?un chêne-gomme sans gomme est toujours un chêne-gomme
Compounds similar to the one in (108a) are thus understood as instances of PRODUCTION as the
non-head is in fact something the whole produces and is not necessarily always present (i.e. not
in bloom). The compound in (108b) is slightly different, as evidenced by the inconclusive result
from the test. A chêne-gomme (gum oak) is a type of oak tree which contains and exudes large
quantities of sap. Although production might also be appropriate for this compound, the fact that
it would be difficult, if not impossible to remove the sap entirely suggests that it should be
treated as partitive: chêne-gomme is therefore labeled as PART REVERSED.
Unfortunately, the test proposed in (105) is incompatible with compounds in which the non-
head element denotes the whole (cf. 109). In these cases, simply removing the part element from
the head is sufficient to produce a defective reading:
198
(110) a. ?cet écran sans grille fonctionne parfaitement
b. ?cette caisse sans tiroir fonctionne parfaitement
c. ?cette fusée sans moteur fonctionne parfaitement
There is also a small set of compounds included under PART—and which were briefly discussed
in the SIMILARITY section—that permit an alternative analysis. They mostly involve animals:
(111) a. écrevisse signal, noctuelle gamma, oiseau-lyre, poisson-épée
These compounds are all based on some physical property that might be said to look like some
other object. Thus, a poisson-épée is a fish with a nose that looks like a sword, an oiseau-lyre is
a bird with a tail that looks like a lyre, etc. Including them under the PART relation necessarily
involves invoking a metaphor alongside it (i.e. an H that has a part that looks like M). Including
these particular compounds under SIMILARITY does not provide a simpler solution as PART
would still need to be invoked (i.e. an oiseau-lyre is not a bird that looks like a lyre). The use of
complex relations is in fact a component of Jackendoff’s (2010) formalism, but it remains
largely absent from most other works. I have chosen to use simple relations and rely on the
weakly compositional label discussed in Chapter 4 to take into account the presence of the
metaphor. This is in fact similar to the approach Arnaud (2003) adopts for these particular cases:
he includes in his list of low-level relations one that is paraphrased as “N2, concrete-discrete, is
one of the parts of N1 (meronymic-analogical relation)102.”
5.2.2.7 Location
LOCATION
Relation Type Structure Template Examples Linking Material
Basic an H located at/near/in M un T situé à/près de/dans M
windows seat centre-ville at, near, in, etc.
à, près de, dans, etc. Reversed H that M is located at/near/in un T auquel M est situé
bedroom café concert
102
“N2, concret-discret, est une des parties de N1 (relation méronymique-analogique) – poisson-scie” (Arnaud 2003: 73)
199
The locative relation is another frequently cited relation for compounds: it can be found in
nearly all of the works mentioned in Section 5.1. Many of the compounds involving this relation
contain a constituent denoting a place or container (bedroom, sandbag, groundwater), but this
need not be the case (earwax, sunspot, leg cramp). The suggested paraphrases for these types of
compounds are numerous, but are all typically locative in nature and usually make use of one of
several prepositions: at, in, on, under, above, near, etc. Examples of basic French NN
compounds that seem to involve a locative function are in (a) and reversed in (b):
(112) a. bout dehors, colis-route, côté jardin, page web, station-aval, village-rue
b. bloc-eau, café concert, chêne kermès, point presse, prés-bois
A number of compounds involve a “container-contained” relationship, which consequently
supports a locative treatment. These types do, however, raise a number of questions regarding
how they should be categorized. It can be argued that because a container is meant to contain
something, compounds involving this relationship are fundamentally purposive in nature. A
number of NN compounds display this double reading:
(113) bloc-note, info-bulle, livret-police, malle-poste, poche-revolver
In addition to a locative paraphrase, any of the compounds in (113) can also be paraphrased as
“an H for M” (i.e. un bloc pour notes). The problem isn’t immediately apparent if we only
consider NN compounds; rather, it becomes much more obvious when we compare them to
analogous sets of N à N and N de N compounds. As Bassac and Bouillon (2013) point out, there
are clear differences between the compounds in (114a) and the constructions in (114b):
(114) a. boîte à outils, verre à vin, chambre à air
b. boîte d’outils, verre de vin, chambre d’air
If we treat both sets of constructions above as locative in nature, we ignore the fact that those in
(114a) express both purpose and location (i.e. N pour N), while those in (114b) only convey
200
location103. How should we treat the compounds in (113) and (114a) then? Any solution to this
problem should be able to distinguish between these types and the ones in (114b), while also
retaining what sets them apart from other purposive compounds that don’t involve location.
These particular cases will be explored in greater detail in Section 5.2.2.14, but I will state here
that whenever purpose is involved in the compound’s meaning, the item will be labeled as such,
along with whatever additional information it may express using what Jackendoff (2010) calls
proper function.
5.2.2.8 Composition
LOCATION
Relation Type Structure Template Examples Linking Material
Basic an H made of M un T composé de M
sugar cube disque vinyle composed/made of
composé/fait de Reversed an H that M is made of ?un T dont M est composé
sheet metal ?
The COMPOSITION relation is used for compounds in which one constituent is the material or
substance that composes the other constituent. COMPOSITION thus differs from part in that it
entails, on the one hand, irretrievability, which is to say that the composing substance cannot
(simply) be removed from the whole, and on the other, that the substance be its sole component.
It is a frequently cited relation in the literature and is usually stated as “composition” (Adams
1973, Downing 1977, Jackendoff 2010) or “made of” (Levi 1978, Shoben 1991, Vanderwende
1994). There are only a few such cases in my data; they are supplemented below with examples
from Arnaud (2003):
(115) a. disque vinyle, gaz hydrogène, pal-fer, terre diatomée
b. bac acier, bas nylon, papier aluminium (Arnaud 2003)
Although COMPOSITION primarily refers to the sole substance that makes up the item, this is in
fact only partially true. It may be more accurate to state that this substance is the
103
The difference no doubt stems from the prepositions involved. According to Cadiot (1997), these analogous constructions lend support to his argument that à is intensional in nature, while de is extensional. We may therefore draw parallels between this distinction and the Type ~ Token one mentioned in footnote 90.
201
overwhelmingly predominant material. Parallels may be drawn with freely constructed nominal
phrases containing a pre-adjectival modifier such as wooden table: although it most likely also
contains nails, screws, and brackets made of other materials, it is largely understood as ‘a table
made of wood’. The line may not always be easily drawn between part and substance, but the
“H composed/made of M” paraphrase is, in most cases, sufficient to disambiguate borderline
cases. For instance, sabre laser (light saber) is listed under PART as it cannot be paraphrased as
‘?sabre composé d’un laser,’ which confirms that the non-head is only a major defining
characteristic of the object in question, but that it nevertheless contains parts that also play a
crucial role in its composition (i.e. a handle with internal components, buttons, etc.). Moreover,
for substance or material composition, French will typically allow compounds to be
reformulated using the preposition en (e.g. disque en vinyle), which is analogous to the English
NPs mentioned above (i.e. wooden table).
Composition is also used in a more abstract manner if, again, one of the constituents refers to
the sole (or predominant) element of the whole. Although these compounds can be paraphrased
using “composé de,” they do not typically permit the use of the preposition en or the verbal
predicate fait de:
(116) a. code-barres ‘code composé de barres/*en barres/*fait de barres’
b. plan séquence ‘plan composé de séquence/?en séquence/*fait de séquence’
c. fan-club ‘club composé de fans de qqch/*en fans/*fait de fans’
Some might prefer to treat fan-club as a part related compound, but it is in fact a club consisting
entirely of fans (it is the sole constituting element of the whole; cf. spectacle solo also listed
under composition).
Based on the compounds in my data, the composition relation may not be reversible in French,
which is consistent with Arnaud’s (2003) findings. It is not clear, however, why French might
not permit “H that M is made of” compounds; in contrast, reversibility of the composition
relation seems relatively productive in English (see Jackendoff 2010).
202
5.2.2.9 Source
SOURCE
Relation Type Structure Template Examples Linking Material
Basic an H (made) from M un T (fait) à partir de M
cane sugar sauce soja (made) from
(fait) à partir de Reversed an H that M is (made) from un T (à partir duquel) M est fait
sugar cane chêne-liège
The SOURCE relation is related, in some ways, to both the COMPOSITION and PRODUCTION
relations. In fact, identifying compounds that make use of SOURCE poses some challenges. The
principle use of this relation is with compounds in which one element is the object or substance
from which the other is derived. It differs from COMPOSITION in that the source object is no
longer an identifiable component of the whole. Parallels may be drawn between the NN
compounds in (117a) and the N de N ones in (117b):
(117) a. baume copalme, carton pâte, papier maïs, sauce soja, sauce tomate
b. jus d’orange, sirop d’érable, huile d’olive
The most basic paraphrase for this particular type of compound is “H from M,” (cf. Levi 1978,
Lauer 1995, Moldovan et al. 2004), but it may also be paraphrased using “derived from” (cf.
Adams 1973, Shoben 1991) or “made from” (cf. Jackendoff 2010).
This relation is reversible, although there are few such NN cases in my data, chêne-liège being
the clearest instance of a reversed source (i.e. chêne à partir duquel le liège est obtenu). Arnaud
lists a few other such compounds such as laurier-cerise and pin pignon. These examples in fact
show how PRODUCTION, CAUSE, and SOURCE might overlap if the latter is understood in a much
broader sense:
203
Table 5.15. The SOURCE, PRODUCTION and CAUSE relations compared.
Relation Basic Reversed
laurier-cerise
SOURCE *laurier provenant des cerises laurier d’où proviennent les cerises
PRODUCTION laurier qui produit des cerises *laurier que les cerises produisent
arrêt-maladie
SOURCE arrêt provenant d’une maladie *arrêt d’où provient une maladie
CAUSE *arrêt qui cause une maladie arrêt causé par une maladie
The above table only represents a sampling of the potential overlap, as these relations may
conflict in a number of different ways. Put simply, the basic form of SOURCE (H from Y) may
produce a paraphrase that is functionally equivalent to the reversed form of either PRODUCTION
or CAUSE, while its reversed form (H that M is from) may conflict with the basic forms of these
two relations. This is, unfortunately, largely unavoidable as something that produces or causes
something else will inevitably serve as the source of the resulting item (i.e. le moulin produit du
papier; le papier provient du moulin). I would argue, however, that not all compounds are
susceptible to this particular parallelism. For example, chêne-liège mentioned earlier is not an
oak that produces cork, but is in fact an oak whose bark is used in the production of cork (cf.
those in 117). This not only explains why this compound is viewed as an instance of SOURCE,
but also why I treat laurier-cerise as a case of PRODUCTION as the head of this compound does in
fact produce the object denoted by the non-head element.
For these reasons, the SOURCE relation is used solely for compounds that involve the “material
or substance origin” of an object, thus allowing both PRODUCTION and CAUSE to fill any other
type of “origin” gap. Although this distinction may not always provide the most intuitive
analysis, it seems to largely suffice as few compounds that do not meet the above criterion can
be paraphrased using these alternative relations. As we will see later, other pairs of relations also
engender similar problems, namely USE and PURPOSE, but the solution they require is not so
sharply delimited. Whenever possible, relations are applied in a manner that is both consistent
and coherent, so as to minimize analyses that might otherwise introduce contradictory or
paradoxical annotations. I believe that the criterion provided above for SOURCE achieves this
goal.
204
5.2.2.10 Cause and Production
CAUSE
Relation Type Structure Template Examples Linking Material
Basic an H that causes M un T qui cause M
sunburn piétin échaudage causes
cause Reversed an H that M causes un T que M cause
motion sickness arrêt-maladie
PRODUCTION
Relation Type Structure Template Examples Linking Material
Basic an H that makes M un T qui fait M
honey bee appareil photo makes, produces
fait, produit Reversed an H that M makes un T que M fait
beeswax jazz manouche
The CAUSE and PRODUCTION relations share a number of similarities and have in fact been
reduced to a more general primitive in a number of works (among others, Downing 1977,
Warren 1978, Vanderwende 1994). This reduction usually results in compounds such as honey
bee and tear gas being grouped together (see, for instance, Warren 1978: 188-189). Most
authors, however, have argued in favour of distinguishing between these two compounds based
on precisely these types. For Levi (1978), MAKE (her corresponding production RDP) refers to
associations based on “physically producing, causing to come into existence,” (90), whereas
cause has no such physical requirement. Honey bee is therefore a bee that makes honey, while
tear gas is gas that causes tears. Furthermore, Levi suggests that CAUSE involves both direct and
indirect causation. While she does not stipulate this criterion for MAKE, it seems plausible that
production also assumes either type of involvement, as evidenced by her inclusion of sap tree
under this RDP.
This approach to production thus differs slightly from the one argued for by Moldovan et al.
(2004). They define their CAUSE and MAKE relations as follows (62):
(118) a. CAUSE: “an event/state makes another event/state to [sic] take place”
b. MAKE/PRODUCE: “an animated entity creates or manufactures another entity”
205
According to these definitions, the difference between these relations is twofold. First,
production requires that the agent be animated and second, that the result be something other
than an event or state. The first criterion would therefore exclude sap tree from the
MAKE/PRODUCE class; the second would do the same for music box. Yet, neither of these
compounds seem causal in nature (i.e. *tree that causes sap; * box that causes music).
The fact is that it can be difficult to set apart CAUSE from PRODUCTION. As I mentioned earlier,
Jackendoff (2010) says that they are “closely related function[s]” and that “[i]t is sometimes
hard to distinguish make from cause” (441). This may explain why he includes sunburn under
CAUSE, but suntan under MAKE, even though there is little reason to treat them differently. Also
worth noting is that he, perhaps mistakenly, lists knife wound under both functions, which shows
just how tenuous the distinction between them is.
Perhaps the easiest way to differentiate between these to relations is simply to use paraphrases
that include the verbs cause or make. In fact, in most cases, these paraphrases are an effective
way of ruling out one or the other relation (cf. sap tree and music box mentioned above). Using
this test as the basis for identifying these particular relations, it would seem that French has only
a few NN compounds that rely on either relation. Examples of CAUSE are given in (119a), while
examples of PRODUCTION in (119b):
(119) a. piétin-échaudage, piétin-verse
b. appareil photo, bombe aérosol
Both relations are, however, reversible (CAUSE in 120a; PRODUCTION in 120b), although again,
their numbers are limited:
(120) a. arrêt maladie, effet papillon, erreur système, photolyse éclair
b. café-filtre, drainage taupe, image-gradient, jazz manouche, portrait-robot
As discussed in Section 5.2.2.9, PRODUCTION and CAUSE may show some degree of overlap with
SOURCE, an unfortunately unavoidable consequence of their shared semantic space. Measures to
avoid this particular conflict were introduced in that section.
206
5.2.2.11 Topic
TOPIC
Relation Type Structure Template Examples Linking Material
Basic an H about M un T à propos de M
history conference réunion bilan about
à propos de Reversed --- ---
TOPIC is present in a number of works, but seldom accounts for a significant number of
compounds and usually involves very specific types of nouns. For instance, of the 32
compounds that Levi (1978) lists under her ABOUT RDP, most involve “a rather restricted set of
either abstract nouns or activity nominalizations” in head position (103). These nouns are
usually “about” something (e.g. vote, policy, law, conference, novel, speech, etc.). Similarly,
Lauer only lists 22 such compounds in his appendix and they too have head nouns like those
mentioned by Levi.
Put simply, TOPIC is understood as “H is the subject matter of M.” This relation is not well
represented in my data. Only four compounds seem to unequivocally involve this particular
relation:
(121) ciné-club, réunion bilan, science fiction, secret défense
The limited nature of TOPIC for my data contrasts somewhat with its use in Arnaud (2003),
where he identifies 10 simple cases of “N2 is what N1 is about104.” Some examples of these
compounds are bilan matières, plan calcul and réflexe achat. When listed alongside other
relations, however, the number of instances involving topic increases to 46 and includes many
compounds that I would have included under this category had they been present in my data
(e.g. bulletin météo, catalogue auteurs, film catastrophe, etc.). These numbers lend further
support to TOPIC as a fundamental relation.
104
This low-level relation is labeled as bx and is defined as “N2 est ce à propos de quoi est N1” (Arnaud 2003:74).
207
Some compounds have been included under this relation that might also call for alternative
treatments:
(122) cas régime/sujet, participe passé/présent
The first pair in (122) has already been discussed in Section 5.2.2.5 as they are treated by
Arnaud as “N2 has N1.” The fact is that both sets of compounds in (122) seem to involve
correspondence or representation (i.e. cas qui représente/correspond au régime). They may also
be paraphrased, however, using “à propos de,” which is why they have been identified as
instances of TOPIC.
The TOPIC relation does not seem to be reversible. It is difficult to imagine a context where a
compound might be coined where the head is the thing that the modifier is about (i.e. an H that
M is about), as the result would be somewhat circular:
(123) a. history book ‘a book about history’
a. history book ‘a book that history is about’ (= a book about itself)
5.2.2.12 Time
TIME
Relation Type Structure Template Examples Linking Material
Basic an H that occurs at/during M un T qui a lieu pendant M
summer job pause-carrière
during, at, in, before, etc.
pendant, à, en, avant, etc. Reversed an H at/during which M occurs
?un T pendant lequel M a lieu golf season
(journée-débat)
The temporal relation TIME is not very frequent in my data. In fact, only three compounds
clearly rely on some temporal property, none of which are reversed, and they can all be traced
back to the meaning of one of their constituents:
(124) a. épreuve-minute ‘épreuve produit très rapidement / en une minute’
b. pause-carrière ‘pause prise pendant la carrière’
c. réveil-matin ‘réveil pendant le matin’
208
Arnaud only lists one explicitly temporal compound (match retard) and includes pause-carrière
in his locative class (i.e. ‘pause dans une carrière’), which is not only a plausible analysis, but
one that seems widely held: a number of researchers treat locative and temporal relations as
highly related functions and group them together, usually with TIME subsumed under LOCATION
(among others, Adams 1973, Levi 1978, Jackendoff 2010). Arnaud’s analysis of épreuve-minute
involves duration, which remains a temporally related property and which also includes
compounds such as fermeture éclair and pulsar milliseconde. Given the limited number of
compounds involving TIME as a relational component, one might wonder whether it merits its
own category. In fact, considering that the temporal information otherwise required to
disambiguate most, if not all of these compounds is usually available in one of its elements (i.e.
minute, pause, matin), it might seem unnecessary to stipulate this particular relation for
compounds. The decision to retain this relation, however, is based primarily on its frequency in
the literature: no fewer than ten authors have included it in some form or other in their
formalisms. Although it remains of limited use for both French NN and N à N compounds (two
occurrences for the latter), it may very well prove necessary for other compound types.
It should also be noted that the data does not contain any cases of NN compounds involving a
reversed application of TIME, but this is no doubt simply due to the limited size of the dataset, as
such compounds do seem to exist (e.g. journée-débat = journée pendant laquelle a lieu des
débats). There are also a number of similar N de N cases (e.g. jour de paye, heure de pointe,
saison de drainage, etc.), which lend further support to a reversible treatment of the TIME
relation.
5.2.2.13 Use
USE
Relation Type Structure Template Examples Linking Material
Basic an H that uses M un T qui emploie M
steamboat bouton pression use / with, by
emploie / avec, par Reversed an H that M uses un T que M emploie
hand brake langage machine
The USE (or instrumental) relation is meant to group together compounds in which one element
is used by the other. Consequently, this relation represents those compounds in which one
209
constituent is said to be “powered by” the other (e.g. windmill, steamboat, gas stove, etc.). It is
found in the literature as either instrument/al (Adams 1973, Warren 1978, Moldovan et al. 2004,
Séaghdha 2008) or use/r (Downing 1977, Levi 1978, Shoben 1991). Vanderwende (1994)
frames this particular relation using How?, while Lauer (1995) makes use of the traditional
instrumental preposition with, although other prepositions may also be used (e.g. by). French has
a number of compounds that involve USE, as in (125a), some of which are reversed, as in (125b):
(125) a. bouton pression, café-comptoir, danse-poteau, pneu contact, vidéo-surveillance
b. alphabet hindi, attaché-case, croiseur-école, code machine, culotte garçonne
The examples in (125) show that the thing being used need not be physical either. This is
particularly important for N à N compounds, where the modifier is in fact a process, state, or
event that is at the core of the compound’s functionality:
(126) amortisseur à fluide, arme à implosion, bombe à fission, instrument à vent, etc.
One of the challenges posed by USE is that, like PURPOSE covered in the next section, it is in fact
an indeterminate relation, as it does not specify how a particular thing is used. The only evident
aspect of this particular relation is that it refers to a necessary feature of the compound’s
designatum, without which it will cease to function or even be. In this manner, USE also shares
some similarities with PART, so much so that in some instances, PART could very well be a
plausible analysis for some of the compounds listed above (e.g. amortisseur à fluide). The
distinction, of course, lies with USE’s narrower scope: whereas amortisseur à fluide may easily
be paraphrased as ‘amortisseur qui emploie un/du fluide,’ a compound such as navire-citerne—
treated as an instance of PART here—produces a less than ideal paraphrase: ‘?navire qui emploie
une citerne.’
Although some compounds show support for the reversibility of USE, its implementation gives
rise to a number of problems, namely that it may, under certain circumstances, overlap with
PURPOSE. In other words, some compounds permit two related, yet distinct paraphrases: “an H
for M” or “an H that M uses.” This issue will be addressed in the following section.
210
5.2.2.14 Purpose and Proper Function
PURPOSE
Relation Type Structure Template Examples Linking Material
Basic an H intended for M un T destiné à M
animal doctor passage piétons for
pour Reversed --- ---
A number of authors have suggested that one of the relations required to describe a subset of
compounds is purposive in nature, that is to say, one for which an element is intended or
designed to fulfill some purpose related to the other. The French compounds that best illustrate
this relationship are N à V, in which V is infinitival and represents the act that N is intended to
perform. Although these types are not under investigation here, they are nevertheless presented
below as evidence that the PURPOSE relation is in fact necessary:
(127) a. fer à repasser, pâte à modeler, machine à écrire, poêle à frire
The periphrasis for these particular cases is “an H whose purpose is to M” (i.e. un fer à repasser
est un fer qui a pour fonction de repasser). The PURPOSE relation is listed in a number of
different inventories (among others, Warren 1978, Levi 1978, Vanderwende 1994, Rosario and
Hearst 2001) and is often associated with the preposition for. This relation is missing from a
number of works, however, most notably from Adams (1973) and Jackendoff (2010). Its
occasional absence likely stems from its wide scope of application, as PURPOSE can be said to
apply without specifying its exact nature (similar to USE above). As Levi (1978) points out, both
headache pill and fertility pill are purposive combinations (i.e. both allow for expansion via for),
but the nature of this purpose differs greatly between them (suppress and enhance respectively).
She argues that this is partly due to the fact that pill itself serves an unspecified purpose, which
only surfaces in context. Similarly, related compounds involving purpose may show different
degrees of explicitness. For instance, pompe à eau and pompe à vélo both seem to rely on a
purposive relation (i.e. they can both be paraphrased using the preposition pour), but the
function of pump in pompe à vélo is far more removed from its modifier than it is in pompe à
eau. I will return to both of these examples in a moment.
211
Based on the initial description provided above, a number of NN compounds can be said to rely
on the PURPOSE relation:
(128) abri-vent, appui-tête, arrêt-buffet, chèque-repas, chou-vache, clé lavabo, passage piétons
Arnaud (2003) lists 160 such compounds and an additional 87 in complex configurations, which
lends further support to the retention of this particular relation for NN compounds.
PURPOSE does not seem to be reversible, however. In other words, there are no compounds that
are best paraphrased as “an H that an M is for.” PURPOSE seems to always originate from the
head. The only way for purpose to emerge from the modifier is if it explicitly communicates its
function in connection with the head (as in 129c), which would not only be redundant, but also
better paraphrased using the USE relation (like the more general form in 129b):
(129) a. lamp oil oil for a lamp
b. oil lamp ?lamp that oil is for → lamp that uses oil
c. [[lamp oil] lamp] lamp that lamp oil is for → lamp that uses lamp oil
These particular configurations also bring us to discuss another issue, mainly how PURPOSE and
USE may, under certain conditions, overlap. The following table illustrates the issue.
Table 5.16. A comparison of PURPOSE and USE.
Relation lamp oil oil lamp PURPOSE oil for a lamp *lamp for oil PURPOSE Reversed *oil that a lamp is for ?lamp that oil is for USE *oil that uses a lamp lamp that uses oil USE Reversed oil that a lamp uses *lamp that oil uses
For a compound such as lamp oil, PURPOSE and reversed USE both produce acceptable
paraphrases with no clear indication which of the two is preferable. There are two reasons for
this overlap. First, USE (along with PURPOSE) is an underspecified predicate. Its usage is virtually
unrestricted. One can use a bicycle, a mug, a table, yet none in the same manner. In fact, the
object of the verb need not even be an artefact (i.e. John used the rock to drive in the pegs.). The
result is that when one says that “X is for Y,” it may very well be that what one is really saying
is that “X is meant to be used by Y.” Second, because PURPOSE allows for the modifier to be
212
either the subject or object of the unspecified predicate, it will necessarily run parallel to the
reversed form of USE when it involves the modifier as its subject:
(130) a. lamp oil “the lamp uses oil” = oil for a lamp / oil that a lamp uses
b. bread knife “X uses the knife (to cut) bread” = knife for bread / *knife that bread uses
Although USE only applies to the compound in (130a), PURPOSE is well-founded in both cases.
How do we then address the overlap between PURPOSE and USE? There are a number of options
available. First, we might simply treat one of the relations as redundant and exclude it from the
analysis, thereby allowing the remaining relation to replace the other via its reversed form.
Because PURPOSE can only account for one particular set of compounds, while USE can account
for two, it is clear that the most likely candidate for exclusion is PURPOSE. The problem with this
approach, however, is that a number of compounds can only be accounted for using PURPOSE
(e.g. pause-café cannot be paraphrased as either “pause qui emploie un café” or “pause qu’un
café emploie”). By the same token, the data also lend support for the retention of USE as a
compound relation (see examples in Section 5.2.2.13). A second option is to artificially reject
the offending configuration—in this case the reversed form of USE—but this too is unlikely to
work. One need only look at some of the examples listed earlier in the section on USE to see that
PURPOSE cannot always fill the gap: for instance, croiseur-école is not in fact a cruiser whose
purpose is to be used by a school, but simply a cruiser used by a school for training purposes.
One might of course argue that such a distinction is irrelevant, but the fact remains that these
treatments do produce different interpretations. A third option might be to always default to one
particular relation when both are applicable. The question, then, is which analysis should take
precedence over the other? We might choose PURPOSE as it doesn’t involve a reversed
construction, but this will likely result in a number of borderline cases being treated under the
purposive heading (cf. croiseur-école). The fact is that there is no simple and foolproof method
with which to select PURPOSE over USE or vice-versa in cases where both seem to apply. The
only solution is to choose the relation that is most representative of its meaning. While this
approach may produce different results for different people, it is nevertheless the only one that
does not rely on artificial means to assign these particular relations.
213
Returning to the examples pompe à vélo and pompe à eau mentioned at the beginning of this
section, it was said that both compounds seem to involve the PURPOSE relation, but that the
former is far less clear about how the elements relate to each other:
(131) a. pompe à eau pompe pour eau ~ ‘pompe qui pompe de l’eau’
b. pompe à vélo pompe pour vélo ~ ‘*pompe qui pompe un vélo’
The reason these cases are relevant is that for all other relations (save USE), there is no such
ambiguity. For instance, PRODUCTION is paraphrased as “H makes/produces M” or vice-versa; if
the paraphrase is infelicitous, then the relation cannot apply. PURPOSE, however, has no such
restriction. Because no verbal predication is stipulated, the fullest paraphrase possible is “H
whose purpose is to X to M,” where X can be any number of verbs (cf. pump for the examples
in 131). Thus, although the compounds in (131) look alike and both involve PURPOSE, they
differ in a significant way. How do we account for this difference then?
One way is to allow for predication to emerge through one of the compound’s elements. In this
regard, I will use what Jackendoff (2010) calls Proper Function (henceforth PF). This function is
very similar to Pustejovsky’s (1995) Telic Role, which has since been used successfully to
explain how certain compounds may be interpretable out of context (Johnston and Busa 1996,
Bassac 2006, Bassac and Bouillon 2013). Put simply, some lexemes denote things that are
designed to perform a particular function. For instance, a knife’s purpose is to cut something,
clothing is to be worn, food is to be eaten, etc. When combined with another lexeme in a
compound, the proper function may be profiled, yielding additional material with which to
connect its elements. The following are a few examples of such compounds:
(132) a. auto-école = ‘école où l’on enseigne/apprend à conduire une auto’
b. appui-tête = ‘appui qui supporte la tête’
c. chou-vache = ‘chou que les vaches consomment’
In the first case, one will notice that two PFs are listed (cf. store = buy/sell). Jackendoff (2010)
dismisses this as a real problem, arguing that in either case, the same event is at play. I agree:
one’s interpretation will skew according to whichever perspective one chooses to adopt, which
will have little effect on whether the compound is paraphrased correctly.
214
Technically, introducing a PF mechanism only partially solves the problem I identified earlier
for the two pompe à N compounds. Although we can now explain how their constituents are
connected (via pompe’s PF), we must now state that in the case of pompe à eau, the modifier
fills the internal argument of the PF (i.e. X pumps Y), while for pompe à vélo, the modifier does
not. The result is that both compounds are identified as instances of PURPOSE, but only pompe à
eau meets all of the requirements of its PF. A corollary to this observation is that pompe à eau
may in fact be more transparent than pompe à vélo. This will be touched on again in Chapter 7
when I discuss ordering relations according to transparency effects.
Unfortunately, there is often some degree of overlap between a lexeme’s PF and a few of the
basic relations listed previously, namely LOCATION and PRODUCTION. For instance, a noun
whose designatum is a container of some sort will often target this function within the
compound (e.g. boîte à N). Other such examples are as follows:
(133) a. poche-revolver PF of poche = contain (LOCATION)
b. pâte à papier PF of pâte = used to make (PRODUCTION)
The question is whether we should then select the basic relation or the PF for these particular
compounds? The solution I have chosen to adopt is to identify purposive compounds as such,
while also providing information regarding a PF that may overlap with other basic relations. The
reasoning behind this decision is based on the research mentioned at the beginning of this
chapter, which supports an approach to compound interpretation that involves establishing
compatible features or properties between a compound’s elements. It can be argued that proper
function is a core feature of certain lexemes, especially those denoting artefacts, and that
speakers are no doubt aware of and able to utilize this information when establishing meaning
for a given combination. Thus, when the meaning of a compound is based on a relational
concept such as PRODUCTION, those that involve a lexeme with this relation as its PF should be
far easier to interpret (and thus more transparent) than those that do not. Although this may
seem to challenge the premise that the retained relations are fundamental associations for
compounding, I believe that this is, at best, a minor conflict, as nothing fundamentally prohibits
someone from treating such compounds as they would other cases involving basic relations. The
advantage, however, of including toolbox under PURPOSE with its PF as “contain” (as opposed to
215
just under LOCATION) is that it recognizes that it arguably has more in common with steak knife
(another purposive compound), than it does tree house (a locative compound).
5.3 Summary
Based on a detailed survey of the literature on compound relations, I have retained 15 basic
relations and have discussed their use with some of the NN French compounds found in my
data. The following table summarizes the results of this research:
Table 5.17. Summary of the relations retained in the present work.
Relation Basic (H REL M) Reversed (H that M REL)
HYPERNYMY chouette-effraie ?argent métal
COORDINATION auteur-compositeur ---
SIMILARITY fourmi-lion ---
FUNCTION papier filtre ---
POSSESSION* punk à chien droit d’auteur
PART tiroir-caisse stylo-bille
LOCATION centre-ville café concert
COMPOSITION disque-vinyle ?
SOURCE sauce soja chêne-liège
CAUSE piétin-échaudage arrêt-maladie
PRODUCTION appareil-photo jazz manouche
TOPIC réunion bilan ---
TIME afternoon (journée-débat)
USE bouton pression langage machine
PURPOSE passage piétons ---
Of the relations retained, eight are clearly reversible in French. HYPERNYMY may be reversible,
but the sole example present in the data does not offer incontrovertible evidence that this
particular relation may be reversed. Moreover, the French compounds examined do not seem to
support a reversed application of COMPOSITION, although there is no clear indication why this
should be the case (this relation seems to be reversible in English, for instance; cf. Jackendoff
2010). Others, such as SIMILARITY, TOPIC, and PURPOSE, defy reversibility as the result of such a
transformation would be strange or illogical.
216
Furthermore, some of these relations have shown only limited use for the compounds examined,
namely POSSESSION, TOPIC, and TIME. In fact, no NN cases of POSSESSION were found in the
data, although there are a few examples of N à N and N de N that involve this particular
relation.
Perhaps unsurprisingly, a number of compounds eluded the approach described above. In the
next chapter, I will not only examine the distribution of these relations in greater detail, but also
look at some of the more problematic compounds in my data. In addition, I will explore how
these relations apply to binary constructions that involve more lexical material, which is to say
N à N compounds.
217
Chapter 6
Compound Relations: Application Results
In the previous chapter, I conducted a detailed examination of several works on compounding
that had integrated, to varying degrees, a semantic component into their frameworks. All of
these works proposed various sets of basic relations meant to account for the unexpressed
relational associations that link together the elements of compounds. Based on this research, I
identified 15 highly recurring relations in the literature, which I then used to analyze more than
700 NN and 300 N à N French compounds, all of which were collected online from Wiktionary.
The goal of this chapter is to offer a closer look at the results of this analysis. The following
sections will focus on how the relations are distributed across the data, as well as on some of the
compounds that could not be accounted for using these relations. The first section will address
NN compounds, while the subsequent sections will concentrate on N à N compounds, a
category of French compounds that seems to involve far fewer relations, which may be
understood as a consequence of the additional material (i.e. à) present within the compound’s
structure.
6.1 NN Compounds
Given the absence of predication between elements, NN combinations may represent the most
semantically underspecified type of compound. The relations presented in the previous chapter
should be regarded as the formalization of this missing relational information. This section will
focus on their application to French NN compounds.
6.1.1 Relations
A total of 729 French NN compounds were retained from Wiktionary and each one of these was
evaluated according to the 15 relations discussed in the previous chapter. The following table
summarizes the results of this work.
218
Table 6.1. Results of relations analysis for NN compounds.
Relation Basic Reversed Total
COORDINATION 113 --- 113
SIMILARITY 112 --- 112
FUNCTION 80 --- 80
LOCATION 32 24 56
PART 8 39 47
PURPOSE 44 --- 44
HYPERNYMY 40 1 41
USE 22 10 32
COMPOSITION 20 0 20
SOURCE 7 5 12
PRODUCTION 5 8 13
TOPIC 8 --- 8
CAUSE 2 5 7
TIME 3 0 3
POSSESSION 0 0 0
Other --- --- 141
Total 496 92 729
The table reveals a number of facts about French NN compounds. First, these particular
combinations are largely dominated by relations that express an equative relationship, which is
to say COORDINATION, FUNCTION, SIMILARITY and, to a lesser extent, HYPERNYMY. These four
relations alone account for nearly 50% of all the NN compounds in my data. It therefore seems
justified to state that, in the absence of additional elements (i.e. prepositions), appositional
nouns tend to favour some sort of coordinative or copulative association. Although these types
of compounds were not under investigation in Arnaud (2003), their prevalence in French was in
fact noted. Arnaud distinguished between these types of compounds and a category he called
composé timbre poste (CTP). The distinction was based on the results of a series of judgement
tests in which Arnaud asked ten participants to indicate whether a particular sentence—
constructed using material from various compounds—was acceptable (+), unacceptable (-), or
marginal (?). The results for three compounds, all of which are included in my data and
correspond to one of three “equative” relations above, are given in the following table:
219
Table 6.2. Results of judgement tests in Arnaud (2003).
Test auteur compositeur
(COORDINATION) scie égoïne
(HYPERNYMY) oiseau mouche (SIMILARITY)
Un C est un N1 ? + +
Un C est un N2 + ? −
C’est quoi comme N1 ? ? + +
C’est quoi comme N2 ? ? − −
Arnaud’s CTP category of compounds are largely identified by those combinations for which,
under normal circumstances, most speakers would produce a “ + − + −” result pattern for his
four tests. In other words, he is strictly interested in clear cases of endocentric left-headed
compounds. According to Arnaud, this criterion automatically excludes both hypernymic and
coordinative compounds for a number of reasons. On the one hand, although hypernymic
compounds, which he calls composés génériques-spécifiques, seem to favour a left-headed
reading for most speakers, they do not typically disallow a right-headed interpretation. Thus,
both scie égoïne and chêne rouvre produce marginally acceptable results for Arnaud’s second
test (e.g. ?une scie égoïne est une égoïne). As for coordinated compounds, such as bain-douche,
café-restaurant, chasseur bombardier, and point virgule, results varied not only across speakers,
but also across items. Arnaud labels these particular types as “composés combinatoires” and
argues that they are either “theoretically” exocentric (e.g. auteur-compositeur) or weakly left-
headed (e.g. bateau lavoir) (Arnaud 2003: 10).
According to the results of Arnaud’s tests, SIMILARITY based compounds should be treated as
CTPs as they usually produce a “ + − + −” pattern for most speakers (see oiseau-mouche in
Table 6.2), but Arnaud dismisses them on the basis that such compounds rely on an analogical
or equative relation that is too variable to analyze accurately. I agree with this assessment: as I
showed in Chapter 5, SIMILARITY is the loosest of the retained relations as it may involve any
number of different features in order to establish meaning (e.g. looks like, smells like, tastes
like, etc.). I would argue, however, that this particular type of association is sufficiently
important (and frequent) to warrant formalization, even if this formalization is only partial. It is
entirely possible that speakers, aware that SIMILARITY is a core relation linking elements to one
another, are able to establish the exact nature of the compound by using other means (i.e.
extralinguistic knowledge). Fundamentally, SIMILARITY may in fact be a strong indicator of
220
compound meaning despite its variability, which is partially supported by its frequency in the
data (i.e. approximately 15% of the NN compounds examined rely on some sort of analogical
association). Moreover, it is a highly frequent relation in the literature, which suggests that most
researchers had to take into account its prevalence in their own data. The same might be said of
FUNCTION (e.g. circuit tampon), although here the analogy is much clearer: one thing does what
the other does—it serves the same purpose. While it’s true that Arnaud doesn’t specifically
discuss these types, there is little reason to believe that he would view them outside his
composés équatifs/analogiques (which include those involving similarity), which explains why
we find no such relation in his list of classes105.
A second point to notice regarding the results in Table 6.1 above is that reversibility is in fact a
limited operation. Not only are equative relations impossible to reverse, some reversed relations
are not present in the data despite the fact that there is little a priori reason for there not to be
(i.e. COMPOSITION and TIME). Furthermore, only PART shows a strong preference for its reversed
configuration (i.e. H that M is part of). French compounds involving a partitive106 relation
therefore follow the whole-part template, which corresponds to Warren’s (1978) findings for
English (see Table 5.4). This fact seems counter-intuitive, however, given that these two
languages are known to involve different head positions. Compare the following sets of
compounds:
(134) a. Part-Whole: eng. wheelchair fr. moteur-fusée
b. Whole-Part: eng. spoon handle fr. stylo-bille
This unexpected correspondence may be due to the different sizes of data examined (Warren’s
data contains 4,500 compounds, whereas mine contains just over 700), but it certainly raises a
number of questions regarding the head’s role in partitive compounds. It seems that for such
compounds in English, the head is more likely to denote the part component, while in French
compounds, the head will denote the whole component. One possible explanation is that French
105
Arnaud does include an être class in his typology, but it always involves some additional information, such as style of, status of, state of, etc. and never similarity or function. 106
It should be noted here that the term partitive is used to refer generally to part-whole associations and not to the traditional French grammar usage involving the preposition de.
221
may rely on other constructions to express part-whole entities, namely N de N sequences (e.g.
bras de levier, tête d’épingle, corne de chevreuil, etc.). If these particular combinations do in
fact outnumber part-whole NN compounds in French, then we might begin to explain the
observations mentioned above. It certainly remains something worth further exploration.
Nevertheless, these results for French NN compounds reflect those from Arnaud (2003), whose
data contained only whole-part combinations. Most of the part-whole compounds identified in
my data allow for other analyses (e.g. moteur-fusée = PURPOSE: moteur pour une fusée), which
would, again, support an analysis of partitive French compounds in which whole-part is
expressed via NN constructions, while part-whole is expressed via N de N constructions.
The other two relations that might also favour a reversed template in French, although perhaps
only slightly, are PRODUCTION and CAUSE. These two relations seem to lean toward a
construction that puts the result in head position, thus producing the patterns result-causer and
product-producer. Interestingly, Arnaud’s (2003) analysis of compounds involving production
seems to indicate the opposite pattern:
(135) a. N1 that produces N2107 = 22 tokens
b. N1 that N2 produces108 = 13 tokens
An examination of his data, however, reveals that his interpretation of production is far broader
than mine, which might explain why our results don’t seem to coincide. For instance, Arnaud
treats compounds such as route collision and marteau reflex as “N2 produced by N1,” whereas
they would have been treated as LOCATION and CAUSE respectively in my framework. Thus, the
number of compounds in (135a) above is much higher in Arnaud’s data than it would have been
in mine. Short of analyzing all of Arnaud’s data according to my approach, there is no way to
know whether the patterns I identified for both PRODUCTION and CAUSE match his. Regardless,
there are so few compounds involving these relations in my data (13 and 7 respectively) that
stating unequivocally that a certain directional pattern is favoured over the other would be ill-
considered.
107
Included here are compounds paraphrased as “N2 is produced by N1 (such production is not its function)” and “N2 is produced by N1 (such production is its function).” 108
Included here are compounds paraphrased as “N2 is an artefact used to produce N1.”
222
Looking at the other relations present in the data, we find that the highly related LOCATION and
PART are quite frequent. As was discussed in the previous chapter, it can sometimes be difficult
to distinguish between these two relations given that when something is a part of something
else, it is in fact located in it (e.g. bloc-cylindres). Nevertheless, these two relations are well
represented in the data, which largely coincides with other works. Not only are these relations
frequently discussed, but they are also quantitatively important (i.e. Warren 1978). It should be
noted that LOCATION shows only a slight preference for the basic template (i.e. H is located
on/at/in M) over its reversed form (i.e. H on/at/in which M is located). We might therefore say
that the LOCATION relation is fully reversible.
Another set of related relations that appear frequently in the data are PURPOSE and USE.
Although some purposive compounds such as pause-café and clé lavabo are relatively
uncontroversial, others might not be treated in the same manner due to the availability of a
reversed USE. For example, the compounds passage piétons and timbre-poste, analyzed as cases
of PURPOSE here, also allow for an alternative treatment (i.g. passage que les piétons utilisent).
The same can be said for some of those included under USE and which could be treated
differently (e.g. langage machine). The result is that the numbers given for PURPOSE and USE
might be different under a modified analysis. That said, there is little doubt that these relations
are present in the data in some form or another and that they should be factored into a semantic
theory of compounding.
The remaining relations are of less significance. Despite their low numbers, relations such as
COMPOSITION, PRODUCTION and CAUSE are unlikely to be discarded given their near universal
status in the literature on compounding. POSSESSION, absent for NN compounds, was retained
solely on its prevalence in the literature and its potential use for other constructions (i.e. N de
N). Relations such as TOPIC and TIME, however, might not fare so well. These are in fact so
marginal that one might wonder whether they are indeed worthy of retention. My earlier
discussion of these two relations mentioned that other researchers observed constraints on their
application, which is to say that they typically only appear in compounds where one element
shares the same semantic space as the relation. Therefore, compounds involving TOPIC tend to
contain nouns such as book or meeting, while temporal compounds contain nouns such as day or
season. The few compounds identified here for these particular relations seem to support this
assertion. Given the limited size of my dataset, however, it seems wise to refrain from making
223
sweeping claims regarding these relations based on numbers only. While it is true that one
should not expect an even distribution of relations across all compounds, it may still seem
surprising to see some of these basic relations come up so seldom in the data. One possible
reason for the disparity is that the dataset itself is either too small or biased to be representative
of French NN compounds as a whole. It is in fact difficult to say whether the French Wiktionary
is skewed in favour of certain combination types, which would undoubtedly influence what
types of relations are present. Compiling and analyzing similar compound data from other
sources might provide greater insight into the distribution of the retained relations. That said,
Arnaud’s 810 compounds were taken from a number of different sources using no particular
collection methodology and his distribution of relations is just as asymmetrical as mine. The
most frequent high-level relation in his data, N1 → N2, is used with 269 compounds, while N2
SYMB N1 only with 3. Simply put, some relations are used more frequently than others within
compounds.
A second explanation for some of the more uncommon relations has to do with overlap. As I
discussed in the previous chapter, a few compounds share the same semantic space and could,
under certain circumstances, be reducible to a single relation. For instance, some have viewed
PRODUCTION and CAUSE as sufficiently related to combine them into a single relation (Downing
1977). Doing so would produce a relational class containing 18 distinct compounds, a perhaps
more robust number. Moreover, the approach adopted in this work, which is to say the
integration of reversibility into the formalism, also leads to the dilution of some relations. As
was discussed previously, a few of the retained relations overlap in their reversed forms. Thus
SOURCE (X made from Y) overlaps with the reversed forms of both PRODUCTION (X that Y
produces) and CAUSE (X that Y causes). Similarly, the line between PURPOSE and USE is often
blurred because of the reversibility of the latter (X is for Y ~ X that Y uses). Eliminating
reversibility and combining some relations would help to reduce the number of low frequency
relations, at the cost of losing some information. We may find, however, that with a
significantly larger corpus, some of the more marginal relations might have a greater overall
presence. This may also be true if we were to extend the analysis to other structures.
It is also worth noting that of the 729 NN compounds examined here, 63 were identified as
exocentric. Many, but not all of these compounds are lexicalized and therefore do not involve
224
any discernible relation. There are, however, a few exocentric compounds that make use of
some of the relations retained, as in the following examples:
(136) a. ballon-panier LOCATION ‘sport’
b. jambon-beurre COORDINATION ‘sandwich’
c. chèvre-pied PART ‘mythical creature’
These instances of exocentric compounds involving a particular relation are not frequent (18
items) and mostly make use of coordinative and locative associations.
All in all, approximately 78% of the 729 NN compounds retained from Wiktionary could be
accounted for using the 15 relations discussed in the previous sections. If we ignore all
lexicalized compounds, which is to say those for which there exists no discernible relation
between their constituents, then that number rises to 92%. It would seem that the relations
retained are largely able to represent the majority of semantically motivated NN compounds in
French.
Of the compounds that could not be accounted for using the relations detailed in Chapter 5,
many did pattern together according to a variety of features or factors. Although some of these
compounds are so lexicalized that their meanings are not related in any way to those of their
constituents, others rely on other, less general relations to join their elements. The following
section will discuss in greater detail some of the compounds that eluded my analysis.
6.1.2 Residual NN Compounds
A total of 141 NN compounds do not seem to involve any of the 15 retained relations, 26 of
which are exocentric containing semantically unrelated elements and thus impossible to treat
synchronically (e.g. cap-mouton, chef-mets, coq-souris, etc.). These are disregarded for the
moment, but it may already be said that these compounds are most likely opaque to most
speakers, placing them alongside simplex lexemes. The remaining unanalyzable compounds fall
under a number of different categories, some of which are related to the methodological
decisions outlined in Chapter 3. Most unanalyzed compounds, however, are simply too
idiosyncratic to be said to involve general or basic relations.
225
6.1.2.1 Idiosyncratic and Partially Unrelated Compounds
Of the unanalyzed endocentric compounds, 24 contain a modifier that has no bearing on the
meaning of the whole:
(137) a. aube-vigne from the latin albus, meaning white109
b. chou-croûte from the german kraut, meaning herb110
c. laurier-tin deformation of thymum or tinus111
These compounds are similar to exocentric compounds whose modifier may retain its meaning
(e.g. chat-château), but they differ in that it is still possible to determine what the compound
denotes (i.e. un laurier-tin est un laurier ~ *un chat-chateau est un chat/est un château).
In some cases, a compound will preserve the meaning of its constituents, but the association
between them remains sufficiently idiosyncratic that any attempt to label them using the
retained relations would require that they be manipulated significantly. Forty-two such
compounds were identified in the data:
(138) a. dent oeillère “Chacune des canines de la mâchoire supérieure ainsi appelées
en raison des douleurs vers l'oeil qu'elles peuvent provoquer112.”
b. laine renaissance “Laine qui provient des déchets résultant du détramage des vieux
draps et des chiffons113.”
109
“Étym. Aube, de albus, blanc, et vigne” (Littré 1873, Vol. 1). 110
“Étym. Allem. Sauerkraut, de sauer, aigre, sur (voy. sur, adj.), et, Kraut, herbe, l'assimilation avec chou ayant altéré sauer” (Littré 1873, Vol. 1). 111
“Cf. laure. Ac. 1694 et 1718 : -thin; 1740-1798 : -thym [. . .] Var. qui s'expliquent par confusion entre thym < thymum et tin < tinus (?)” (laurier-tin, TLFi). 112
TLFi. 113
Lacroix, Eugène. 1884. Arts et métiers des manufactures, des mines, de l’agriculture, etc. Tome III. Librairie scientifique, industrielle, et agricole: Paris. 24.
226
c. retraite-chapeau “Retraite complémentaire d’un montant élevé allouée à certains
dirigeants d’entreprise ou à leurs salariés en récompense de leurs
services éminents114.”
A few of these cases might allow for the application of a particular relation, but such an analysis
is often strained at best:
(139) a. cocotte minute ‘cocotte qui fait cuire très rapidement’ (TIME)
b. eaux vannes ‘eaux souillées’ (SOURCE/LOCATION)
c. médicament conseil ‘médicament acheté sur le conseil (SOURCE)
du pharmacien115’
Of course, nothing truly prohibits one from choosing a particular relation for the compounds in
(139), but the approach adopted here makes treating them in this manner problematic. One will
recall that the relations retained are all meant to apply using a simple paraphrase, such as X
causes Y, X is similar to Y, etc. while also being sufficiently informative to disambiguate the
compound. Although the relations may not capture every subtlety or nuance of a particular
compound, they should communicate most of the semantic material held between its
constituents. For this reason, the compounds in (139) above were all treated as idiosyncratic. For
instance, although the compound cocotte minute involves a temporal sense (albeit
metaphorically), the TIME relation alone fails to account for the ‘cooking’ function derived from
the head constituent via its proper function (i.e. ‘pot whose proper function is to cook quickly’).
Nor does SOURCE fully account for the predication between the elements in either eaux vannes
or médicament conseil. The point here is that not all compounds that preserve the meaning of
their constituents can easily be assigned a relation. While opinions on a given relation’s
appropriateness may vary from person to person, the compounds in (139) do not make use of the
retained relations according to the parameters established in Chapter 5.
114
Wiktionary: <http://fr.wiktionary.org/wiki/retraite-chapeau>. This compound is a relatively recent coinage: a Google NGram query suggests that it was introduced in the early 1980s. This contrasts with a compound such as laine renaissance, which, according to the TLFi, is attested in the 19th century. It would seem, then, that idiosyncratic relations are not necessarily the result of semantic drift. 115
Wiktionary: <http://fr.wiktionary.org/wiki/médicament_conseil>
227
Finally, the data examined also contain three compounds involving reduplication. They are
listed in (140) below.
(140) ami-ami, noeud-noeud, train-train
While these particular constructions could have been discarded prior to the analysis, they were
nevertheless retained because they met the criteria outlined in Chapter 3, namely that they are
pairs of appositional French nouns. Unfortunately, very little can be said about these particular
cases, as they typically involve magnification or intensification and not a relational association.
6.1.2.2 Nouns and Adjectives
This brings us to discuss other cases that met the criteria for inclusion in the dataset, but which
nevertheless introduced an additional layer of relational functions beyond the scope of the
retained relations and which therefore fall within the 141 cases deemed unanalyzable.
Not only are many lexemes polysemous, but they may also belong to different lexical
categories. The French lexicon is no exception—it is filled with words that straddle the
boundaries between nouns and adjectives and verbs. In some instances, these multi-faceted
lexemes escape the analysis put forth in Chapter 5. A total of 23 NN compounds examined
involve nominal modifiers that function more like adjectives than they do nouns, instances that
Noailly (1990) refers to as “attributive nouns” (substantif épithète). Some examples are as
follows, where the head is underlined:
(141) a. boeuf mode, dose limite, écart type, grandeur nature
b. maître cylindre, chef lieu
The inclusion of these types in my dataset is largely a consequence of the method I used to
identify the lexical categories of each of the individual lexemes collected. Because I relied
solely on the headings used in LPR 2010, I was forced to label the above modifiers as nouns and
not as adjectives (I will refer the reader back to Chapter 3, Section 3.4.2 for a detailed
explanation of the methodology adopted). The compounds in (141) are in fact probably best
228
viewed as cases of N-A and A-N compounds. Mathieu-Colas (1996) labels these types of
combinations as N / A=n and A=n / N respectively116.
The relation that might be said to hold between the constituents of the compounds above is
attributive in nature: the non-head qualifies the head in the same manner that an adjective might.
In fact, if we consult lexicographic works, some of these nouns are often described in a sub-
entry as having an adjectival function. For instance, LPR 2010 provides an entry for nature as
“adjectif invariable” even though its lemma is labeled as a noun. Right-headed compounds such
as those in (141b) further highlight the possibility for preposed nouns to shift to a more
adjectival function as this is French’s preferred position for adjectives. Boeuf mode, while
similar to the other compounds in (141a), is somewhat unique as it is most likely a reduction of
the expression boeuf à la mode.
It certainly would have been possible to introduce a relation such as QUALIFY or ATTRIBUTE to
account for these types of compounds, but this approach would have only weakened my
analysis: all of the relations introduced earlier make use of basic predicates to link a
compound’s constituents, but no such approach is possible with adjective-like nouns. We may
instead consider them cases of shifted A-N or N-A compounds where the modifier noun is
involved in a copula-attribute relationship (i.e. dose qui est limite, écart qui est type, grandeur
qui est nature, etc.). In instances where such an intepretation is either difficult or impossible
(e.g. boeuf mode), the speaker may be required to establish meaning via other means (i.e.
analogy: boeuf mode → boeuf préparé à la mode). The difference here may very well depend on
whether a particular noun is known to function as an adjective.
It is unfortunately unclear how these types are to be viewed in terms of semantic transparency.
They are typically compositional in the same way as other relational compounds are, but
headedness may never pose a problem to the speaker given that French adjectives are either
116
Although this is partly conjecture, as most of the compounds in (141) are not found in his article detailing his nomenclature, the labels reflect the output of his methodological approach. Of the compounds formed on maître, he says: “Les composés en maître(-)N prêtent à ambiguïté pour le premier élément ; nous proposons de les regrouper dans une sous-classe des NN (maître cuisinier, maître imprimeur, maître-assistant, maître chanteur, etc.), sauf pour les non-humains, rattachés aux AN (maître-autel, maître-cylindre, maître couple, etc.)” (Mathieu-Colas 1996: 43). I have treated all such cases as attributive nouns (cf. Noailly 1990).
229
preposed or postposed. This issue will be revisited in the next chapter when the relations are
incorporated into the feature set proposed in Chapter 4.
6.1.2.3 Classificatory Relation
Closely related to the adjectival association discussed above is a relation that serves what
Jackendoff (2010) calls a “classificatory” function, which is to say that the modifier is meant to
set the head apart without actually providing additional meaning. His examples are beta cell, X-
ray, Leyden Jar, etc. These particular cases differ from those involving nouns functioning as
adjectives because the former contain lexemes that do not actually possess semantic content.
Similar compounds are found in my data:
(142) facteur sigma, mâle alpha, particule bêta, rayon gamma
These types of combinations could also be said to involve attributive nouns, but unlike the
examples in Section 6.1.2.2, the modifiers in (142) above contribute little to no meaning to the
compound. In other words, beta in beta cell is strictly used to distinguish it from other types of
cells and is not otherwise meaningful. An appropriate paraphrase might be “H of type M” (e.g.
cell of type beta). These particular instances of compounds are not likely to be easily understood
by the layperson and no doubt require specific knowledge of the fields in which they are
typically employed. There are 14 such cases in my dataset, all of which involve a Greek letter
used in a highly technical context.
6.1.2.4 NN Compounds Involving Nominalizations
As has already been discussed, a number of works on compounds have relied on syntactic
relations to account for combinations that involve nominalizations, such as chewing gum or
dishwasher. Because Lee’s (1960) work was transformational in nature, these verbal-nexus (or
synthetic) compounds were easily accounted for using a syntactic approach (i.e. someone chews
gum → chewing gum). Similarly, Adams (1973) includes two major syntactic classes of her
own: Subject-Verb (e.g. filing clerk → the clerk files X) and Verb-Object (e.g. drinking water
→ X drinks water). Even in less syntactic oriented frameworks, there is usually some way to
account for compounds in which the elements are in a head-complement relation (e.g. V-Obj in
Lauer 1995; ARGUMENT in Jackendoff 2010). These relations are used not only to represent
230
synthetic compounds of the type N V-er (e.g. truck driver, dishwasher, windbreaker, etc.
Roeper and Siegel 1978, Lieber 1980, 1992, Selkirk 1982, Botha 1984), but also to account for a
number of other constructions containing deverbal heads, such as habit-forming, hair-raising,
and time-consuming (Adams 1973). Synthetic compounds in French are typically of the V-N
type (e.g. ouvre-bouteille, lave-vaisselle, essuie-glace) and are thus equally open to more
syntactically motivated relations, although these constructions pose their own set of challenges
given the inflected nature of the head (Rosenberg 2008, Villoing 2009). Considering that
nominalization is a productive morphological mechanism (Chomsky 1970), it is therefore likely
that French NN compounds containing a nominalization could involve the base’s internal
arguments. A few such cases are in fact present in my data, all of which were identified using an
ARGUMENT label:
(143) a. administrateur réseau, groupement phosphate, photo-interprétation
When the non-head element is the direct argument of the head, the compound may be
paraphrased by of (fr. de): administrateur de réseau, groupement de phosphate, interprétation
de photo. This particular fact is also paralleled by a number of established N de N constructions:
augmentation de salaire, gestion de risques, modulation de fréquence, etc. As for those cases
where the modifier is in fact the external argument, they can usually be accounted for using an
appropriate reversed relation (e.g. H that M causes/makes/uses/etc.).
A number of authors have, however, expanded the scope of their argument relations to include
other types of compounds, such as those containing zero-affix derivations (e.g. V-N compounds
in French), but this approach is hindered by a number of issues, none more troublesome than the
possibility of incongruous application: should hair brush, for instance, be interpreted as ‘X
brushes hair’ (cf. Adams 1973, Jackendoff 2010) or should it simply be treated as other non-
predicating compounds (i.e. “a brush for hair”, cf. most authors)? Adams does in fact allow for
this duality of classification, but more often than not, opts for the syntactic treatment, an
approach that sometimes leads to questionable choices: semantically, is there really a difference
between cold cure, sorted under her Verb-Object class (i.e. ‘cures cold’: 67), and cough mixture,
which she includes under the instrumental class (i.e. ‘mixture which cures a cough’ 73)? In
essence, it seems strange to treat compounds such as hair brush and steak knife differently (as
Jackendoff does, for instance) simply because the latter happens to involve a morphologically
231
unrelated predicate (i.e. cut). By the same token, should pompe à eau be treated as a verbal
nexus compound because the head happens to look like its corresponding verb?
Jackendoff (2010) in fact grants his ARGUMENT function a great deal of power. This particular
function not only groups together the aforementioned deverbal headed constructions, but also a
number of compounds containing simplex nouns he argues possess semantic arguments. These
compounds permit an “of” (144a) or “by” (144b) periphrasis (N2 of/by N1):
(144) a. wardrobe color, food surplus, sea level, speed limit, union member
b. helicopter attack
According to Jackendoff, we are dealing with a non-head as argument if it saturates the
corresponding slot within the head’s argument structure (as in 144a) : *wardrobe color of her
clothes, *food surplus of potatoes (436)117. Although this test is sound (it clearly works for true
synthetic compounds), it can produce odd results for compounds that may also be said to
possess an internal argument, but which Jackendoff lists elsewhere under other functions:
(145) a. PART fingertip *fingertip of the thumb
c. LOC sandbox *sandbox of stones
One might also ask why Jackendoff only lists one compound with an internal argument (i.e
helicopter attack) and why it is not simply subsumed under his CAUSE or MAKE functions. In
fact, many of the compounds listed in those groups are nearly indistinguishable from helicopter
attack, such as sunburn under CAUSE or knife wound under MAKE. Of course, there may be a
genuine necessity to include syntactic based functions to account for NN compounds, as
evidenced by the examples mentioned earlier in (143): not only are these compounds difficult to
analyze using only basic functions, but doing so would arguably mask how their elements are
truly related. Unsurprisingly, Arnaud (2003) also includes the following three relation
actancielle low-level relations in his classification, which seem to target exactly these types of
compounds in French:
117
In cases where the non-head noun itself can take an argument, Jackendoff argues that expansion remains possible through argument inheritance (e.g. the wavelength of the light = the length of [a wave of the light], 436-437).
232
(146) a. N2 est le patient du procès représenté par N1 (saisie contrefaçon)
b. N2 est l’objet de l’activité de N1 (ingénieur système)
c. N2 est l’agent/source du procès représenté par N1 (opération commando)
Of these three types, (146a-b) are compounds with modifiers that are internal arguments of their
respective heads (the example in 146a being a much clearer instance of a nominalization than
the one in 146b). The relations used in this work cannot account for these particular compounds.
Opération commando in (146c), however, may be accounted for using a basic relation (e.g.
PRODUCTION) as the modifier is instead an external argument (cf. helicopter attack). Based on
the difficulties discussed above—mainly related to zero-affix nominalizations in head
position—I have chosen to label as ARGUMENT only those compounds that involve heads
possessing an argument slot directly filled by the modifier. Compounds such as pompe à eau do
not meet this requirement because the modifier is actually filling an argument slot of an
unspecified predicate (i.e. pomper). Compare this to compounds such as mise à niveau or
passage à vide and this distinction becomes clearer. The French compounds mentioned in
(143a) and repeated below in (147a) are therefore grouped under this non-conceptual relation
(ARGUMENT)118. Although most instances seem to involve an overt affix, this is not always the
case, as shown by the compounds in (147b):
(147) a. administrateur réseau, groupement phosphate, photo-interprétation
b. retour chariot, mort-chien, auto-stop
Although not under investigation here, V-N compounds would also be labeled using the
ARGUMENT relation as they too involve a non-head constituent that directly fills one of the
head’s internal argument slots.
6.1.3 NN Compounds: Conclusion
Of the 15 relations retained in the previous chapter, 14 were present in the dataset examined.
Only POSSESSION did not seem to apply to NN compounds in French. One possible explanation
118
All compounds that could not be analyzed using one of the 15 retained relations are included under the category “Other” in Table 6.1, which includes compounds labeled as ARGUMENT as this is not a denotational relation.
233
for this particular relation’s absence is that French genitive is marked nearly exclusively by the
preposition DE (and, to a lesser extent, à; this will be discussed in the following section).
Although English also uses a preposition to indicate possession (i.e. OF), it also makes use of
the ’s morpheme, which may be dropped at some stage of the compounding operation (e.g.
butcher’s knife → butcher knife).
The evaluation of the relations finds that French NN compounds favour equative relations, such
as COORDINATION, FUNCTION, SIMILARITY and HYPERNYMY. Other highly frequent relations (i.e.
with more than 40 items in the data) are PURPOSE, PART and LOCATION. Although the other
relations discussed earlier were also present in the data, they were not nearly as well-
represented. In some cases, there were so few instances of a given relation that other analyses
might instead be preferred (e.g. TIME and TOPIC). Overall, however, the set of relations discussed
in Chapter 5 remained highly relevant for NN compounds, which lends support to the notion
that compounding relies on basic, recurrent relations to bind constituents. It remains to be shown
whether the same can be said for other types of constructions, such as N à N. These particular
combinations will be the focus of the next section.
6.2 N à N Compounds
As was mentioned in Chapter 3, a small number of N à N compounds were collected from
Wiktionary with the intent of expanding upon the discussion of semantic transparency as it
might apply to more semantically restrictive constructions. The relations retained in the previous
chapter were therefore applied to 319 N à N compounds. Before exploring in further detail the
results of this analysis, I will first briefly examine some of the previous work on the preposition
à in an attempt to determine how its semantic role has traditionally been viewed. This will be
followed by a look at the N à N compounds retained here from the perspective of the relations
used in the present work.
6.2.1 The Preposition À
French compounding may involve a number of different prepositions as evidenced by Mathieu-
Colas’s (1996) inventory of French nominal compounds, which includes not only the major
linking units à, de, and en, but also other, less frequent prepositions as well, such as avec, entre,
234
par, pour, sans119, etc. It is de, however, that boasts the most prominent place in his inventory,
which not only coincides with my data from Wiktionary (2234 occurrences of N de N,
compared to 319 occurrences of N à N), but also with textual frequency numbers cited
elsewhere (see Saint-Dizier 2006).
Despite these quantitative differences, à and de (and to a lesser degree, en) have both long been
viewed as semantically empty or underspecified prepositions (cf. prépositions incolores, Spang-
Hanssen 1963). Their usage can be highly variable and their meanings are often so numerous
that they very nearly cease to be meaningful. That said, most researchers agree that de is far
broader in meaning (and thus emptier) than à (Melis 2003). Much of the research on
prepositions has in fact focused on distinguishing between these two particular compounds as
they are sometimes in complementary distribution. According to Cadiot (1997), one major
difference between them is largely notional in nature: the use of à is intentional, while de is
extensional120. Thus, the distinction speakers make between verre à vin and verre de vin is
based on sense and denotation (or type and token). Verre à vin is used to refer to an entire class
of objects, while verre de vin can only be understood as denoting an instance of that particular
class. It has also been argued that à and de, like a number of other prepositions (sur/sous,
dans/hors, etc.), oppose each other in a locative sense (Cervoni 1991). This certainly seems true
of à and de from a directional perspective: à usually stands in for destination (i.e. je vais à X)
and de for origin (i.e. je viens de X).
Semantically, it may be difficult to determine with exact certainty just what meanings are
expressed by à. Dictionaries offer a number of different headings, some of which are quite clear,
while others are less so. Le Petit Robert (2010), for instance, lists the following four major
senses (for comparison, the TLFi lists seven):
119
The status of constructions involving these prepositions as compounds is in fact debated (Fradin 2009), but this is of little importance to the present work. 120
Cadiot defines the constrasting pair intentional/extensional using several criteria, but generally, intention is understood as the content of a word or expression, whereas extension designates the class of objects that the word or expression may potentially refer to (Cadiot 1997: 51-52).
235
(148) I. Introduisant un objet direct (1)
II. Marquant des rapports de direction (6)
III. Marquant des rapport de position (4)
IV. Marquant la manière d’être ou d’agir (5)
These particular senses do not offer much in the way of specifics and so the editors have
attempted to further nuance the various meanings this preposition might possess by further
subdividing them into more precise usages. The numbers in parentheses indicate how many
entries each of these senses is said to have, which means that, according to LPR2010, à
possesses at a minimum 16 different meanings, many of which involve verbal, adjectival and
nominal complements (see Spang-Hanssen 1963 for an overview of what these usages entail).
Similarly, according to the 15th edition of Le bon usage (2011), à, when introducing a nominal
complement, is mostly used to express one of three broad relations:
(149) a. Possession/Belonging (Sections 1048 and 352)
b. Distribution (Section 1048) e.g. kilomètre à l’heure
c. Location (Section 1049)
These relations are perhaps all that remain of à’s once rich and diverse usage. According to Le
bon usage: “The scope of the preposition à was once far more extensive than it is today. It could
be used in a number of sentences where we now use avec, dans, de, en par, pour, selon, sur,
etc121” (2011: 1396). The multiple ways in which à is used are on full display in the numerous
comparisons this resource offers in its section on this particular preposition.
More descriptive than it is explanatory, Le bon usage does occasionally offer some insight into
the sort of restrictions that govern the usage of this preposition. Regarding the distinction
between the use of à and de in pre-nominal position, they observe that “[n]ominal complements
121
“Le domaine de la préposition à était autrefois beaucoup plus étendu qu’il n’est aujourd’hui. Elle pouvait s’employer dans bien des phrases où nous mettons avec, dans, de en, par, pour, selon, sur, etc.” (Le bon usage, 15th Edition, 2011: 1396).
236
denoting containers are headed by à if it refers to a destination [pot à eau] and de when it refers
to its contents [pot d’eau]122” (2011: 463).
At its core, research on the preposition à is relatively convergent. Most works have looked to
isolate and describe the various usages of this preposition—and therefore its meaning—
according to the contexts in which it can appear. Oftentimes, these studies seek to reduce the
preposition’s usage to broad and highly general classes. Spang-Hanssen (1963), for instance,
splits à’s pre-nominal role into two groups. The first is unrestricted and may involve one of
three functions or relations: to indicate belonging (e.g. le costume à mon ami), to introduce an
abstract noun following il y a (e.g. il y a une raison à ce crime), or to mark, in a broad sense,
direction and destination (e.g. conduire à l’hôtel). Spang-Hanssen also discusses what he calls
conditioned usages of à, which are used to mark either a characteristic (e.g. armoire à glace) or
a purpose (e.g. canne à pêche)123. Bosredon and Tamba (1991), on the other hand, argue for a
far narrower approach to à and state that its usage falls within two general semantic paradigms.
The first involves combinations in which the second element is “a part, a property, a distinctive
definitional feature of the referent124” (51). They label this class using the preposition avec as it
can be substituted for à in most cases. The second paradigm, for which they mention the
paraphrases “that serves to, that is destined for, that functions using,” (51) is represented using
the preposition pour, again because it may be substituted for à. Examples for each class are as
follows:
(150) a. avec: casquette à carreaux, sac à rabat, chaussures à talons, papier à fleurs
b. pour: ver à soie, moule à gaufres, sac à dos, brosse à dents, homme à femmes
Although this two-way distinction serves Bosredon and Tamba’s purposes, one will no doubt
notice just how different some of the compounds within the same class are. This is no doubt due
122
“Les compléments des noms désignant des récipients sont introduits par à s’il s’agit de la destination et par de quand on envisage le contenu” (Le bon usage, 15th Edition, 2011: 463). 123
Spang-Hanssen also discusses a third conditioned usage of à, which involves predicative expressions such as c’est folie à vous de le croire and c’est aimable à vous d’être venu. (1963: 125). 124
“Le premier [. . .] concerne des [noms composés motivés] [. . .] dont le F2 est présenté comme une partie, une propriété, un trait définitoire distinctif du référent” (Bosredon et Tamba 1991: 50-51).
237
to their reliance on other multipurpose prepositions as classifiers. While it is arguably true that
both pour and avec are far more meaningful than à, they unfortunately remain highly
polysemous and are not always sufficiently meaningful to further disambiguate a particular
combination (see discussion in Section 5.2 for an argument against the use of prepositions to
classify compounds). This is certainly true for many of the compounds in the POUR class,
which includes such widely different compounds as ver à soie and sac à dos. Despite this
shortcoming, these prepositions come up often as possible alternations that serve to distinguish
between N à N constructions.
Anscombre (1999), following his earlier work on compounds (1990), offers three types of
predicative relations for N à N compounds. The first, which he refers to as locative, is
paraphrased as “in some N1, there is an N2” (62). These compounds typically involve a
container/contained relationship (cf. comments in Le bon usage) and usually alternate with N de
N constructions:
(151) a. pot à fleurs dans le pot, il y a des fleurs (pot de fleurs)
b. réservoir à essence dans le réservoir, il y a de l’essence (réservoir d’essence)
c. jouet à piles dans le jouet, il y a des piles (*jouet de piles)
Although all three compounds can be paraphrased using a locative sentence, only those in
(151a-b) allow for conversion to an N de N construction. According to Anscombre (1992: 162),
jouet à piles in (151c) instead belongs to his second type of N à N compound, which he labels as
a statif (or suite non actancielle). This relation is characterized by a number of different stative
verbs, such as avoir, être avec, posséder, etc. and is paraphrased as “Some N1 V N2.” Other
examples are stylo à bille, meuble à tiroir, and verre à pied. Given the description Anscombre
provides for this particular class, one could argue that it attributes to à either a possessive or
partitive relational sense.
The third type of predicative meaning identified by Anscombre, and which he calls processif (or
suite actancielle), is also paraphrased as “Some N1 V N2,” but here the verb may denote any
number of dynamic actions. Compounds that represent this type are type à histoires, moulin à
café, and homme à femmes. In essence, this type corresponds to Allen’s (1978) articulation of
the Variable R, which is to say that à is a placeholder of sorts for an undefined relational value.
238
Anscombre offers three basic tests to determine which type a given N à N construction might
belong to, but in some cases these tests are either inconclusive or rely on additional
argumentation in order to explain counterintuitive results. For instance, according to his tests,
homme à femmes is locative in nature, but he admits that this doesn’t make any sense and thus
argues that it is in fact an active sequence using other lines of reasoning (1999: 65). Of course, it
isn’t necessarily the effectiveness of his tests that is most important here, but rather his attempt
at circumscribing the possible meanings of à in an N à N sequence. Although his first two types
might be interpreted as locative and possessive/partitive respectively, his third type is far more
ambiguous and may involve a number of different meanings, which could potentially be filled
by some of the relations I discussed in Chapter 5.
Cadiot (1997), in his comprehensive work on French prepositions, identifies four types of
relations expressed by à in N à (DET) N constructions, each of which has a number of sub-
types. His four main types, which are summarized below, include the two types identified by
Bosredon and Tamba (1991), but are supplemented with two additional classes.
(152) i. à/pour bac à sable, cuiller à café, chair à saucisse, fer à cheval,
brosse à dents, canon à neige, aide à la traduction
ii. à/avec casquette à carreaux, chaussure à talon, steak au poivre,
armoire à glace, brioche aux noix, char à bancs, canot à moteur,
auteur à succès
iii. à/CIRCONS blessure au bras, réponse à chaud, lutte à mort
iv. à/META mort aux rats, chair à canon, pot aux roses
The first type (pour) includes several related and unrelated sub-types. Perhaps most noteworthy
is that, although this class is meant to group together N à N compounds that may allow for the
substitution of pour for à, Cadiot is only slightly concerned with the expression of purpose.
While he does refer to this relation as destinative, his analysis is far more locative in nature.
Thus, he not only talks about container/content combinations (bac à sable), but also about
bearer/borne (fer à cheval), degrees or types of physical contact (brosse à dent ~ colle à bois),
etc. This class also includes compounds paraphrased as “N1 produces N2” (canon à neige).
239
As for the second type, it mostly consists of part/whole constructions, which Cadiot further
distinguishes based on just how integral a part it is (casquette à carreaux ~ char à bancs). This
class also includes combinations that express an instrumental relationship between elements
(e.g. canot à moteur), as well as a harder to assess “attributive” association (e.g. auteur à
succès).
Cadiot’s CIRCONS consists of two major sub-types consisting mainly of N à DET N
constructions. The first is again locative, but unlike the pour class, the locative constructions
allow for the prepending of “il y a” where N2 is the location of N1 (i.e. blessure au bras = il y a
une blessure au bras ~ bac à sable ≠ *il y a un bac à sable). The second sub-type describes the
manner (e.g. réponse à chaud) or result (e.g. lutte à mort) of N1 and, according to Cadiot, are
similar to constructions containing deverbal heads (e.g. mise à niveau).
META, the final type in Cadiot’s classification of the preposition à includes three subtypes, all
of which he deems semantically non-compositional. The first, which he calls délocutif125,
includes combinations such as mort aux rats and pied à terre. The second involves a metonymic
shift, such as in chair à canon and tête à claque. Finally, the third type of N à (DET) N
construction relies on a metaphoric sense, as in pot aux roses and sac à vin. The majority of
these constructions, if understood as compounds, are exocentric in nature.
Using Cadiot’s (1997) work as the basis for her own examination of à, Knittel (2010) also
argues that this preposition, when joining two nouns, will result in a construction belonging to
one of four classes, but her typology differs slightly from Cadiot’s. She summarizes her
observations in a table (partially reproduced on the following page), which shows how the
preposition’s meanings are, to some extent, a function of the conceptual classes of the
compound’s constituents (19):
125
Unfortunately, Cadiot is not clear regarding the use of this term for these types of compounds, but his glosses seem to indicate that they involve instances with an external subject (i.e. exocentric : “x donne la mort aux rats”, “x prend pied à terre”, etc., Cadiot 1997: 136).
240
Table 6.3. Summary of Knittel (2010) relations for the preposition à.
Class Lexico-pragmatic Relation
N1 – N2 Examples N1 N2
1 -instrument destination sac à dos, verre à vin
2 whole part stylo à bille, bateau à voile
3 +instrument product/energy machine à pain, fer à vapeur
4 preparation ingredient tarte aux pommes, salade au thon
Although Knittel states that Classes 1, 3, and 4 are all, to some degree, locative in nature, only
class 4 is said to be entirely locative. She bases this distinction on the fact that compounds under
classes 1 and 3 involve potential locations (i.e. a wineglass without wine is still a wineglass,
whereas an apple pie without apples is no longer an apple pie). Otherwise, her four classes are
all distinct. Class 1 privileges combinations that are purposive in nature (although, like Cadiot,
Knittel emphasizes location more so than purpose), while class 3 involves compounds where N2
is either the product of N1 or the energy that allows N1 to function. Class 2 compounds,
although described as possessive, represents combinations in which N2 is part of N1. Only Class
4 sets itself apart from previous research as it strictly involves “dish-ingredient” combinations
and typically includes a determiner. Nevertheless, her first three types align closely with much
of what has already been said regarding the role of à.
In summary, research shows that although the preposition à, when introducing a nominal
complement, may express a number of different relations, its function generally falls into a
somewhat narrow band of possible roles, namely to express location and destination, part/whole
relationships, possession, purpose, and use (e.g. energy). Because there is such significant
overlap between previous research on à and the relations retained here, it is likely that the N à N
compounds collected will make extensive use of these associations. As we will see in the next
section, a small subset of the retained relations are indeed sufficient to account for most of the
compounds under investigation. A question one might ask is whether any additional relations
surface from a more in-depth analysis of the data. This will be the focus of the remainder of this
chapter.
241
6.2.2 Results for N à N Compounds
Given that the preposition à is not in fact devoid of semantic content, it should come as no
surprise that compounds containing this preposition show a far more restricted use of the
relations discussed in the previous chapter. In fact, two particular facts arise when we examine
these types of constructions. First, N à N compounds involve fewer relations than NN
compounds and second, the preposition’s inherent directionality restricts the reversibility of
these relations significantly. As we will see, these facts are largely related to one another.
The following table contains the raw data resulting from the analysis of 319 N à N compounds
extracted from Wiktionary:
Table 6.4. Results of Compound Relations for N à N compounds.
Relation Basic Reversed Total PURPOSE 90 --- 90 PART 0 69 69 USE 39 6 45 LOCATION 13 14 27 PRODUCTION 18 0 18 SIMILARITY 8 --- 8 SOURCE 0 6 6 POSSESSION 5 1 6 CAUSE 5 0 5 COMPOSITION 5 0 5 TIME 2 0 2 COORDINATION 0 0 0 FUNCTION 0 0 0 HYPERNYMY 0 0 0 TOPIC 0 0 0 Other/Unanalyzable --- --- 38 Total 185 96 319
The first thing that we notice is that relations establishing some sort of parallel or equative
function between a compound’s elements are virtually non-existent for N à N constructions.
242
Thus, COORDINATION, HYPERNYMY, and FUNCTION are not present in the data. In other words, à
makes it impossible for the following meanings to occur:
(153) a. H à M, where H is also an M / where C is both an H and an M
b. H à M, where M is a kind of H
c. H à M, where H functions as an M
Moreover, for similar reasons, the SIMILARITY relation is also significantly underrepresented:
(154) a. clé à béquille, clé à pipe, escalier à vis, ouvrages à cornes
b. châssis/fenêtre à tabatière, fenêtre à guillotine
The examples in (154a) involve physical similarities (i.e. H looks like M) and could in fact be
treated as particular cases of PART, though this analysis would require the use of analogy, as
they cannot be interpreted literally. The same can be said for the compounds in (154b), which
instead rely on a different type of similarity (i.e. H functions like M). What is important to note
about the above observations is that whereas NN compounds seem to rely heavily on equative
relations, N à N compounds do not. The preposition seems to block such relations—we might
also expect similar results with other constructions involving prepositions (e.g. N de N).
The strong directionality of the preposition also affects the centricity of N à N compounds.
Unlike NN compounds, which may be right-headed, the N à N constructions present in the data
are all left-headed.
N à N compounds seem to favour the PURPOSE, PART, USE, and LOCATION associations. These
four relations largely correspond to the major paradigms identified for à in the research
discussed in the previous section (Cadiot 1997, Anscombre 1999, Knittel 2010). Examples of
these compounds are as follows:
(155) a. PURPOSE bac à sable, boîte à lettres, brosse à dents, clé à bougies
b. PART armoire à glace, baignoire à porte, clé à molette
c. USE arme à feu, bombe à hydrogène, machine à sous
d. LOCATION banque à domicile, passage à niveau, sac à dos
243
These four relations alone account for just over 70% of all the N à N compounds in my data,
which, again, reflects previous research on these types of constructions. It should be noted that
although PART follows both the basic and reversed patterns for NN compounds, it is only
reversed for N à N constructions: all occurrences are of the type “N1 of which N2 is a part.”
This particular template is also favoured for NN compounds, which further emphasizes that
partitive French compounds are strongly conditioned to have their head constituents denote the
whole element126 (cf. whole-part in Knittel 2010).
USE and LOCATION for N à N compounds, on the other hand, are both reversible (like their NN
counterparts), but not without some issues:
(156) a. USE reversed chardon à foulon, hache/frein/sac à main; chaise à
porteurs, abreuvoir à mouche
b. LOCATION reversed arbre à grives, boule à neige, chambre/tube à
air/gaz/vide; poire à poudre, moulin à prière
The reversed form of USE (H that C uses) could, in most cases, be treated as purposive. In fact,
only chardon à foulon (= ‘chardon que le foulon utilise’) in (156a) seems like a clear case of
USE, as it is difficult to claim that a thistle’s purpose is to be used by a fuller. Compounds such
as N à main, where N is an instrument of sorts, are analyzed as instances of USE, but one might
argue that N1 is “destined” for N2 and therefore an example of PURPOSE. This analysis has not
been retained here because PURPOSE is typically reserved for compounds in which the modifier
is the object of the underlying relation. This means that a purposive reading of hache à main
would result in an axe whose purpose is to cut (off) hands. The USE relation emphasizes that it is
instead an axe that one uses with one’s hand (as opposed to larger axes requiring the use of both
hands and one’s upper body) and thus falls under the “powered by” scope of this relation. The
same can be said for chaise à porteurs, which means ‘chair carried by bearers’ and not one that
carries them, which is why it is subsumed under USE and not PURPOSE. There may be sufficient
influence from the complement’s deverbal status to block the incorrect interpretation, but this
126
As was mentioned earlier in Section 6.1.1, this particular constraint is reversed for compounds involving de, a fact that is most likely related to free syntactic partitive constructions (e.g. morceau de sucre, pointe de tarte, litre de lait, etc.).
244
remains to be seen. As for abreuvoir à mouche, its exocentric status favours USE over other
relations as it is not in fact a drinking trough for bugs, but is instead understood, metaphorically,
as one used by bugs (its meaning being ‘wound’).
Similar issues exist with N à N compounds and the reversed form of LOCATION (H in/on/near
which M is located), which is to say, that PURPOSE could also be invoked for some of the
compounds listed. Of the compounds analyzed as instances of a reversed LOCATION, arbre à
grives and boule à neige (snowglobe) are the most representative of this type. A number of
compounds built on chambre/tube/diode à N might also benefit from a purposive reading, but
such an analysis fails to recognize that these items serve a purpose other than containing
something, as evidenced by the inappropriateness of paraphrasing them using for (e.g. *tube
pour [le] vide, ?chambre pour le gaz). As for poire à poudre and moulin à prière, these are
either weakly endocentric or exocentric compounds that, despite the figurative meaning of the
head, seem to target a locative relation as either can be paraphrased as “H that contains M.”
Finally, LOCATION remains one of the better examples of a fully reversible relation as both the
basic and reversed patterns are as evenly distributed for N à N compounds as they are for NN
compounds.
Beyond the four main relations identified above, PRODUCTION is also well represented (18
items), even though its counterpart CAUSE is far less so (5 items):
(157) a. PRODUCTION abeille à miel, cabane à sucre, machine à café, vache à lait,
ver à soie
b. CAUSE armes à enquerre, tête à claques/gifles, charbon à tumeurs,
pierre à feu
Neither of these relations are reversible for N à N compounds. This fact contrasts with the
results obtained for NN compounds where these relations were present for both templates.
Again, this is most likely due to the preposition’s strong directionality. Knittel (2010) includes
PRODUCTION under class 3, which also accounts for combinations in which N2 is the source of
energy of N1 (which corresponds to USE in my typology). Also included under PRODUCTION
here are compounds denoting plants, such as acajou à pommes, arbre à cornichons and arbre à
pain, for which it can be understood that “N1 produces N2” (see Section 5.2.2.10 of Chapter 5
for a discussion of this analysis).
245
As for CAUSE, only charbon à tumeurs and pierre à feu are relatively uncontroversial instances
of this relation, although the latter might instead be included under PRODUCTION. Armes à
enquerre refers to a coat of arms that possesses unconventional features. The word enquerre,
here, is slightly problematic as a number of similar, yet different analyses are available.
According to LPR 2010, this lexeme is part of an adjectival locution headed by a preposition
(i.e. à enquerre). Citing armes à enquerre as its example, LPR defines this entry as follows:
“Qui présentent une singularité, une irrégularité à éclaircir (en parlant d'armes).” If this
particular word only exists within the context of this construction, then it may be excluded from
the dataset according to the criteria set forth in Chapter 3. Yet, the word itself remains a noun
elsewhere. The TLFi, for instance, while it also discusses the word in the context of an
adjectival locution, provides the following definition based on the verbal form enquerir:
(158) enquerre emploi subst. masc. “Recherche de la signification, vérification”
Using this particular sense, it seems justified to treat the compound as meaning “coat of arms
that prompts inquiry,” which places it squarely within the causal relation, albeit with some
degree of coercion. The synonymous pair tête à claques and tête à gifles are also difficult to
treat, but are labeled as instances of CAUSE as they may be paraphrased accordingly (i.e. tête qui
cause des gifles). The TLFi defines these constructions as follows:
(159) tête à gifles: “Visage déplaisant et exaspérant de bêtise, de fatuité, à tel point
qu'on voudrait le gifler”
These compounds are often used to denote a person, which technically renders them exocentric.
What is interesting is that the result (i.e. gifle/claque) is never actualized. It is the desire to carry
out these acts that is provoked. Despite this particular characteristic, there nevertheless remains
a sense of cause between the compounds’ elements, which is why they have been labeled using
this relation.
Similar in structure to the above relations is SOURCE. Because of à’s strong directionality, this
relation runs counter to PRODUCTION and CAUSE for N à N compounds and instead patterns with
PART as a “reversed-only” relation. SOURCE related compounds can only mean “H that M is
from,” which is to say that the head element must denote the origin, while the modifier must
denote the resulting product. That said, this mirrored template is perhaps a further argument for
246
folding SOURCE into either PRODUCTION and CAUSE, thus allowing these to be reversed. Of the 6
compounds labeled as SOURCE, most could easily be treated as reversed instances of
PRODUCTION:
(160) SOURCE: betterave/canne à sucre, mûrier à papier, palmier à huile, pierre à
chaux/plâtre
Under the current analysis, a compound such as sugar cane (canne à sucre) is paraphrased as
“cane from which sugar is made” (canne à partir duquel le sucre est fait). The alternative
treatment involving PRODUCTION (e.g. cane that produces sugar; canne qui produit du sucre),
while technically sound, is somewhat infelicitous as the origin element does not actually
“produce” the element denoted by the modifier. Treating the compounds in (160) as instances of
SOURCE has the benefit of not only reflecting the earlier observation that N à N compounds are
mostly unidirectional, but also remaining consistent with the distinction made earlier between
PRODUCTION and SOURCE for NN compounds (e.g. chêne-liège; see Section 5.2.2.10 for a
discussion). Moreover, viewing the compounds in (160) as instances of SOURCE and not
PRODUCTION better relates them to their N de N analogues, a few of which are given below:
(161) a. sucre de canne (SOURCE) ~ canne à sucre (SOURCE REVERSED)
b. huile de palmier (SOURCE) ~ palmier à huile (SOURCE REVERSED)
Source is the natural analysis for the constructions in (161) involving de, which lends support to
the treatment adopted here for their N à N analogues.
As for the remaining relations identified, most seem to be of marginal relevance for N à N
compounds. The relative frequency of COMPOSITION is similar to that of SOURCE, though the
occurrences identified are unlikely to raise any questions.
(162) a. COMPOSITION code à octets, (co)polymère à blocs, étoile à neutrons, puce
à gènes
As was mentioned in Chapter 5, COMPOSITION differs from PART in that one constituent is the
sole component of the whole denoted by the other constituent. The compounds in (162) above
are all plausible candidates for this particular relation as they may all be paraphrased without
difficulty as “H composed of M.”
247
Very little can be said of the TIME relation. Only two tokens were noted in the dataset, both of
which involve a modifier possessing temporal features:
(163) TIME échange/marché à terme
The two cases of TIME, much like the ones identified for NN combinations, involve lexemes that
belong to temporal semantic classes. They are likely interpretable without a priori knowledge of
any compound relations. These constructions, along with those identified as temporal for NN
compounds, further suggests that TIME may not be a fundamental relation and instead simply
one that arises from the meaning of a compound’s constituents. As for TOPIC, no occurrences
were found for N à N compounds. Given how few cases were identified for NN compounds, it’s
not surprising that none were found in the N à N data. It is possible that a larger corpus would
result in some instances of TOPIC, but it may very well be that this particular relation is entirely
subsumed by other constructions such as N de N (e.g. livre d’histoire ~ *livre à histoire ;
chanson d’amour ~ *chanson à amour).
Finally, it was said in the previous chapter that the POSSESSION relation was inapplicable to NN
compounds and of only limited applicability to N à N compounds. The following compounds
are the only ones in my N à N data that accept a possessive reading, all of which are reversed (H
that M possesses):
(164) POSSESSION barbe à papa, bonnet à prêtre, bourse à berger/pasteur, fils à papa
There is, however, one basic case of possession, punk à chien (‘punk qui possède un chien’),
which, while clearly based on that relation, runs counter to what is expected in this type of
construction (see Chapter 5, Section 5.2.2.5 for further discussion of these cases).
Despite POSSESSION’s seemingly limited applicability, it was nevertheless retained because it
would no doubt be a necessary relation to account for a number of N de N constructions (e.g.
droit d’auteur). What is worth noting is that à is often said to express possession or belonging
alongside de, often interchangeably. Le bon usage (2011), for instance, lists this as one of à’s
roles (1396). In the research on à discussed in Section 6.2.1 above, possession comes up
frequently, albeit often in the part-whole sense, but there are also instances involving belonging
(e.g. le livre à Pierre). It might thus seem strange that this relation is so uncommon in the data.
248
One possible explanation is that this particular usage of à belongs to oral speech. Le bon usage,
for example, notes that the possessive sense expressed in phrases such as “la fille unique à M. le
maire” or “le manteau à M. Bernard”, belongs to a colloquial register:
“The expressions mentioned above come from either tradition or popular speech, where à remains, nearly everywhere, in use to indicate belonging. (The statement from the Ac. 2001, art. à, IV, 1, “This expression is no longer in use,” is inadequate.) But seldom does it appear in written texts, outside of instances where authors wish to imitate former usage [. . .] or to reproduce local or popular expressions127.” (Le Bon Usage: 455)
Furthermore, they add that the nominal complement in possessive constructions involving à is
always a person or animal and never a thing128. The examples in (164) above seem to support
this statement.
This distinction is echoed in Spang-Hanssen’s (1963) work on prepositions, in which he remarks
that there is a striking contrast between the indication of possession in written and oral speech.
He observes that “only de is judged correct in these type of sentences, but in everyday speech à
is used freely, which should not be considered crude, but simply colloquial129” (33). These
characteristics of à’s usage might explain why N à N compounds in French seldom involve
possession, that the coining of compounds involving possession is influenced by a strong
prescriptivism in favour of N de N constructions. As Le bon usage remarks, only two N à N
fixed constructions seem to retain a possessive meaning, both of which are present in
Wiktionary and labeled as possession here: barbe à papa and fils à papa. The other cases listed
in (164) are exocentric compounds denoting plants, which entered the French language long ago
(e.g. bourse à pasteur, circa 1600 according to TLFi) at a time when à was perhaps still the
preferred marker of the genitive case.
127
“Les expressions signalées plus haut viennent, soit de la tradition, soit du parler populaire, où à reste, à peu près partout, très vivant pour marquer l’appartenance. (La formule de l’Ac. 2001, art. à, IV, 1, « Cette expression n’est plus en usage », est inadéquate.) Mais ceci apparaît rarement dans la langue écrite, en dehors des cas où les auteurs veulent imiter l’usage ancient [. . .] ou reproduire les expressions populaires ou locales.” (Le bon usage 2011: 455). 128
“Le complément concerne des personnes, parfois des animaux, jamais de choses” (Le bon usage 2011: 455). 129
“De seul est estimé correct dans ces sortes de phrases, mais le langage courant se sert plus volontiers de à qui n’est nullement vulgaire, simplement familier” (Spang-Hanssen 1963: 33).
249
In summary, 281 (approximately 85%) of the 319 N à N compounds collected from Wiktionary
were accounted for using the 15 relations retained in the previous chapter. The most prominent
of these relations correlated strongly with previous research on à. It was also found that
reversibility is significantly impacted by the strong directionality of the preposition, which also
influences which relations are in fact applicable to N à N constructions. Thus, unlike for NN
compounds, equative relations are not permitted for N à N constructions. Despite the high
degree of relevance exhibited by the retained relations, 15% of the compounds examined could
not be accounted for. These residual cases will be the focus of the next section.
6.2.3 N à N: Residual Data
Like NN compounds, a number of N à N compounds could not be analyzed using the relations
from Chapter 5. Of the 319 N à N compounds examined, 38 defied analysis. Although a few of
these bear some resemblance to the unanalyzable NN compounds discussed earlier, there are a
number of particularities to N à N compounds that placed them outside the scope of my
analysis. I will look at each of these in turn.
6.2.3.1 Idiosyncratic and Semantically Unrelated N à N
Similar to NN counterparts, a number of N à N compounds involve lexemes that do not
contribute meaning to the whole. A total of 17 exocentric N à N compounds were identified, all
of which fall into one of three groups:
(165) a. N1 metonymic/metaphoric bouche à feu, pelle à cul
b. N1-N2 unrelated (lexicalized) manche à balle130, pot à tabac131
c. Idiosyncratic relation face à main, tête à queue
The first type of exocentric compound includes those in which the leftmost constituent, the
head, only retains its meaning from a metaphoric or metonymic perspective. Thus, bouche à feu,
130
manche à balle: “région. (arg. étudiants de Belgique). Étudiant qui se signale par son zèle à travailler” (TLFi). 131
pot à tabac: “personne petite et grosse” (LPR 2010). There is also a literal acceptation for this combination, for which the relation would be PURPOSE.
250
referring to a canon, is based on a metonymic relation132 (‘mouth of the canon’), while in pelle à
cul (lawn chair), the meaning of the head is metaphorical. Because N à N compounds are always
left-headed, there are no exocentric compounds in which only N2 is unrelated. Much like for
NN compounds, however, it is possible for neither constituent to contribute meaning to the
whole, as in the examples in (165b) above. Most of these compounds can be said to be
lexicalized. As for the compounds listed in (165c), these are exocentric N à N constructions
which retain their meaning, but that rely on relations not included in my list: face à main refers
to a small pair of binoculars held up to ones face using a handle, while tête à queue refers to a
movement in which the head and tail swap positions. These compounds are not unusual,
however, as they seem to rely on functions long since attributed to à, namely destination or
direction. The closest relation available here is LOCATION, but it is not able to capture the
meaning of these compounds adequately, most likely because of their exocentric nature.
Like NN compounds, a number of N à N compounds also involve some form of reduplication,
though in this instance they are mostly exocentric:
(166) goutte-à-goutte, main à main, mot à mot, porte-à-porte, terre-à-terre
Unlike their NN counterparts, N à N compounds involving identical lexemes typically express a
sequence of acts or events, but with a sense of repetition, which is to say one after the other (e.g.
goutte-à-goutte = ‘goutte après goutte’, porte-à-porte = ‘aller d’une porte à l’autre’). In most
cases, these constructions are adverbial in nature and have been nominalized133.
There were only 17 endocentric compounds that could not be analyzed using the relations
retained. Some of these include cases for which the modifier does not contribute meaning (as in
167a), while others simply involve a head and modifier connected in some idiosyncratic fashion
(as in 167b):
132
No doubt mouth here is also, to some degree, based on a metaphoric interpretation, but the use of this particular lexeme to refer to an opening is well-established (mouth of a cave, mouth of a bottle, mouth of a river). 133
For instance, the TLFi says of goutte-à-goutte: “Substantivation de la loc. goutte-à-goutte attestée dès 1170.”
251
(167) a. manche à balai, valet à patin
b. logiciel à contribution, pelle à balai, acquit-à-caution
Perhaps more problematic are a series of compounds which clearly preserve the meaning of
their individual components, but which involve already established phrases headed by à:
(168) boule à zéro, compte à rebours, fabrication à façon, oeuf à cheval, steak à cheval, tueurs
à gages
These compounds are actually best treated as instances of an N with a postposed PP, as the latter
are constructions in their own right (e.g. à rebours, à façon, à cheval, etc.). Of the tests
discussed in Chapter 3 for distinguishing between syntactic constructions and compounds, the
separability criterion shows that most of the cases in (168) are not sufficiently atomic (e.g.
fabrication industrielle à façon, steak saignant à cheval). Even if we treated these constructions
as compounds, assigning them a relation is difficult, as doing so usually requires that the
preposition be replaced by the targeted relation. Because these PPs have specific meanings that
differ greatly from the meaning of their internal nouns, the preposition cannot be removed from
the construction. For instance, à cheval means ‘straddle’, while the simple lexeme cheval
means, among other things, ‘horse’. Even if a relation were acceptable for a compound such as
oeuf à cheval, paraphrasing it as oeuf RELATION cheval could not possibly work. These instances
are therefore beyond the scope of the analysis proposed here.
6.2.3.2 N à N Compounds Involving Nominalizations
Also present in the dataset of N à N compounds are a number of nominalizations with internal
arguments filled by the modifier, though their numbers are limited. These were mentioned
briefly in Section 6.1.2.4 above, where they were labeled using the ARGUMENT relation. They
are repeated here for convenience:
(169) mise à disposition/jour/niveau/pied, passage à vide, maintien à poste; condamné à mort
Such constructions are perhaps sufficiently transparent so as to avoid, under most
circumstances, lexicalization, which would explain why so few cases are present in the
Wiktionary. Another explanation might be that there aren’t actually that many resultative
nominalizations of verbs with indirect objects that allow for constructions similar to those in
252
(169). One possible constraint is that the proposition must be locative. This would explain why
the constructions in (169a) below are acceptable, while those in (169b) are not without the
inclusion of a determiner.
(170) a. affichage à écran, arrivée à destination
b. *aboutissement à résultat, *invitation à mariage, *contribution à projet
Further examination of these types of constructions, while warranted, goes well beyond the
scope of this work. Suffice it to say that N à N construction, like NN compounds, do share some
similarities with other verbal-nexus compounds when the head constituent is a nominalization
and the modifier fills the former’s argument slot. Because the elements that make up these
particular types of constructions are connected in a relatively obvious way, it is likely that these
compounds are easier to understand than those requiring that some unexpressed relationship be
established by the speaker.
6.3 Summary
In this chapter, I examined how the 15 relations proposed earlier might apply to French
compounds by analyzing 729 NN and 319 N à N compounds. Factoring out compounds whose
elements did not contribute meaning to the whole, the retained relations were able to account for
92% of all NN compounds and 94% of all N à N compounds. The results of this analysis also
show that these two types of constructions favour different relations. Table 6.5 on the following
page contains the relative frequency134 of each relation only for compounds for which a given
relation was assigned.
Given what is understood about the preposition à, it is not surprising to see that fewer relations
were present for compounds involving this preposition than those that do not. NN compounds
seem to make use of all retained relations save POSSESSION.
134
The relative frequency is calculated by dividing the number of compounds of a given relation by the number of compounds that were labelled using a relation. Relations with a relative frequency greater than 0.05 (i.e. 5%) are in bold.
253
Table 6.5. Relative frequency of relations across compound types.
Relation NN N à N COORDINATION 0.194 0.000 SIMILARITY 0.191 0.029 FUNCTION 0.136 0.000 LOCATION 0.098 0.098 PART 0.077 0.250 PURPOSE 0.074 0.312 HYPERNYMY 0.070 0.000 USE 0.052 0.163 COMPOSITION 0.033 0.018 SOURCE 0.024 0.022 PRODUCTION 0.019 0.062 TOPIC 0.014 0.000 CAUSE 0.012 0.018 TIME 0.005 0.007 POSSESSION 0.000 0.022
In the following chapter, I will attempt to incorporate some of the results and observations made
here into a typology of semantic transparency based on the features discussed earlier in Chapter
4. It should be noted, however, that the overall relative frequency of relations will not factor
heavily into the proposed typology. On the one hand, the core dataset is most likely too small to
truly reveal unambiguous distributional information about the retained relations, and on the
other, relational constraints imposed by a compound’s constituents are probably better
predictors of what associations may emerge at the level of meaning construal. The incorporation
of the relations will instead be based on the complexity of their semantics, along with SRI
values in instances where compounds share otherwise identical transparency profiles. These
aspects will be explored in greater detail in the first half of Chapter 7.
254
Chapter 7
Putting It All Together
In chapters 4 and 5, I examined several different morphological and semantic properties of
compounds that I argue play a role in their degree of semantic transparency. Some of these
properties are variations on features discussed elsewhere in the literature on transparency, while
others, such as a compound’s semantic relation, have received less attention within this context.
Although most of these features were addressed individually, they all show a great deal of
interdependency, both in terms of the constraints they introduce and the ways in which they
combine to form a compound’s meaning. Consequently, if traditional models of semantic
transparency are to be expanded upon, these features must be assessed holistically. In this
chapter, I explore how these properties relate to each other and conclude with a re-examination
of the data collected in light of the proposed typology of semantic transparency.
7.1 Semantic Transparency: A Definition Revisited
At the conclusion of Chapter 2, I offered a working definition of semantic transparency,
repeated here for convenience:
(171) For a lexical unit C, semantic transparency refers to the degree of semantic
interpretability of C
Little has in fact changed since this definition was first proposed. The focus has instead been on
establishing a set of features that might allow for a better evaluation of the perceived challenges
involved in determining the meaning of a given compound. Consequently, several factors were
explored, all of which were assumed to play some part in transparency as defined above. Before
going into greater detail on how these features might relate to each other, I would like to briefly
emphasize a few key points about the definition in (171).
255
First, semantic transparency is to be understood as a function of the relationship between form
and meaning to the extent that this relationship may be inexistent or imperfect. In other words,
although the meaning of a compound is usually related to the meaning of its parts in non-trivial
ways, this relationship seldom succeeds in explicitely communicating all aspects of a
compound’s meaning. For instance, although the meaning of doghouse may be formalized as
‘house’ ⊕ PURPOSE ⊕ ‘dog’, this simple concatenation fails to highlight that a doghouse has a
certain shape and size, is usually located in a yard, doesn’t typically have a door, etc. This
additional information is not necessarily out of the speaker’s reach, however, given that
combining concepts also involves establishing compatibility between them. In doghouse, the
modifier imposes certain constraints on the head, which allows for greater denotational
specificity. Nevertheless, there exists a discrepancy between form and meaning, and it is
precisely this discrepancy that makes compounds good case studies for semantic transparency.
Conversely, this deviation is also why compounds pose significant challenges to any model of
the phenomenon. Moreover, when we consider that not all of a compound’s constituents may
factor into its meaning, the relationship becomes even more tenuous. This is not to say,
however, that a compound may not be transparent, but that transparency is a highly relative
concept.
Second, it bears repeating that any claim regarding transparency or opacity must largely ignore
the fact that it may vary from one speaker to the next. Semantic transparency is therefore a
speaker dependent concept. Unfortunately, there is no way to account for this fact in a
generalized theory of transparency. For instance, if someone is unfamiliar with the words dog
and house, he or she is unlikely to understand the compound dog house, which would render this
compound opaque to him or her; universally, however, this proposition is untenable. We
therefore cannot discuss semantic transparency with individual speakers in mind, but must
instead attempt to evaluate the concept with a sort of idiolectic agnosticism. In other words, a
theory of semantic transparency should be formulated according to an “ideal speaker-listener”
paradigm (Chomsky 1965). The only assumption we may make is that the speaker or listener is
familiar with the construction’s constituents.
Third, semantic transparency should be understood as a characteristic that applies to both
existing and novel compounds, though perhaps not in exactly the same way. In the case of novel
compounds, one might prefer Štekauer’s (2005) term “meaning predictability” as it emphasizes
256
that meaning for newly coined words should ideally be predictable. This is not to say that
predictability doesn’t apply to existing compounds, but the focus should rather be on how
“accessible” meaning is given that transparency is only relevant if the speaker doesn’t already
know the meaning of the compound. After all, a novel compound doesn’t have an established
meaning: the speaker coins a new compound with a particular designatum in mind, which is
typically related to the semantic representations of its constituents; meaning predictability is
therefore the likelihood that a novel combination AB will mean M given the meaning of A and
B. Existing compounds, on the other hand, already have an established meaning, which may or
may not be predictable. The question here is therefore “To what degree is the meaning of C
derivable from that of A and B?” Although the distinction may be subtle, it nevertheless remains
crucial.
Finally, the approach to transparency adopted in this work is primarily focused on the
interpretive process. This is not to say that transparency has no effect on the creation of novel
compounds or the use of existing ones, but rather that a construction’s degree of transparency is
largely a matter of the listener’s ability to establish meaning based on the information available,
which we may assume consists of two parts: the construction itself and the context within which
it was used. Crucially, the most transparent compound is one that does not require any context to
be understood. When a compound is first coined, it is ideally created so as to achieve maximal
transparency (cf. Grice’s maxims of quantity and manner, Grice 1975), but once the item is
established, the factors or conditions that originally motivated its creation may no longer be
obvious to the listener. Any processing costs associated with transparency are therefore incurred
at the level of interpretation.
If the premises above are in fact well-founded, then we may expand upon the definition
provided earlier by, on the one hand, elaborating on what transparency entails, and on the other,
incorporating the features explored in previous chapters. Semantic transparency may therefore
be defined as follows:
257
(172) For a lexical unit C composed of units A and B, for which the meaning(s) of A and B are
known, semantic transparency refers to the degree of semantic interpretability of C,
given
i. the headedness of C
ii. the compositionality of AB
iii. the nature of the relation held between A and B
iv. the semantic homogeneity of C-like constructions
The first three properties listed in (172) represent the minimum factors required to adequately
formalize semantic transparency for compounds. The homogeneity property in (172iv) is meant
to augment the relational features in (172iii).
The aim of any formal model is to remain as parsimonious as possible while providing the same
descriptive and explanatory value that another, more complex system might offer. In this regard,
the features and factors retained in this work represent an attempt at extending existing models
of semantic transparency without introducing a large number of distinct elements. While it is
entirely possible that future work on transparency may lend support for the incorporation of
additional factors, the definition provided above remains sufficiently expanded for the purposes
of developing a typology involving compounds.
7.2 Semantic Transparency: A First Pass
Chapter 3 looked at a pair of related features that have often been cited as crucial components to
a compound’s semantics, namely headedness and compositionality. Several refinements were
proposed, however, to better account for some of the variation observed in the data.
Consequently, additional factors such as head position and sense extension were incorporated
into previous models of semantic transparency. Together, these features allow for compounds to
be classified in a manner that reflects the challenges they pose at the level of interpretation.
7.2.1 Primary Factors
It was argued that the most crucial factor in a compound’s semantic transparency is its
centricity: an endocentric compound is more transparent than an exocentric one. This is largely
258
due to the fact that endocentric compounds offer a starting point for processing multi-word
lexemes by providing an answer to the question “what is it?” An endocentric compound
typically supplies this information by way of its semantic head, which is a hypernym of the
compound. Exocentrics do not usually provide this information and in instances where they do,
it is usually by means of a metonymy that does little to help the hearer in determining the nature
of the designatum (e.g. razorback = wild hog). Favouring endocentric compounds in a typology
of transparency is also, to some extent, supported by the data: the majority of the compounds
examined are in fact endocentric, which is to be expected if compounds are, at the most basic
level, meant to communicate meaning in sometimes contextually impoverished conditions.
In the case of headed compounds, it was also argued that the position of its head is a major
factor of transparency, where we may distinguish between canonical and non-canonical
positions. Again, the distinction rests on the assumption that non-canonical heads will pose
greater challenges at the level of interpretation. Moreover, the head may be subject to sense
extension, such as metaphor or metonymy, which will have an effect on how easily the
compound’s designatum may be identified. The result is a compound that is either strongly or
weakly endocentric.
Compositionality is determined according to individual components’ meaning in relation to that
of the whole. In other words, a compound is fully compositional if both constituents retain their
meaning. For semantically headed compounds, this feature is based on the meaning contribution
of the non-head element, which may undergo sense extension, thus reducing its degree of
compositionality. Although endocentric compounds cannot be non-compositional because the
head contributes meaning to the whole, exocentric compounds may go from fully compositional
to non-compositional, depending on the retention of meaning by their constituents.
These features may be combined into a hierarchic tree in which each terminal node represents a
particular combination of properties. The results of this distribution is presented in its entirety in
Figure 7.1 on the following page.
259
Figu
re 7
.1. A
typo
logy
of s
eman
tic tr
ansp
aren
cy o
f com
poun
ds.
Com
poun
d
Endo
cent
ric
Can
onic
al
Hea
d
Stro
ngly
En
doce
ntric
Fully
C
ompo
sitio
nal
Wea
kly
Com
posi
tiona
l Pa
rtial
ly
Com
posi
tiona
l
Wea
kly
Endo
cent
ric
Fully
C
ompo
sitio
nal
Wea
kly
Com
posi
tiona
l Pa
rtial
ly
Com
posi
tiona
l
Non
-C
anon
ical
H
ead
Stro
ngly
En
doce
ntric
Fully
C
ompo
sitio
nal
Wea
kly
Com
posi
tiona
l Pa
rtial
ly
Com
posi
tiona
l
Wea
kly
Endo
cent
ric
Fully
C
ompo
sitio
nal
Wea
kly
Com
posi
tiona
l Pa
rtial
ly
Com
posi
tiona
l
Exoc
entri
c
Fully
C
ompo
sitio
nal
Wea
kly
Com
posi
tiona
l Pa
rtial
ly
Com
posi
tiona
l N
on-
Com
posi
tiona
l
Sem
antic
Rel
iabi
lity
Inde
x (S
RI)
Se
man
tic R
elat
ions
260
The proposed hierarchy contains 16 possible permutations, but as was discussed in Chapter 4,
not all of these permutations are necessarily possible. No French compounds, for instance, were
found that involved an established trope on both the head and the modifier135, thus potentially
pointing to limitations imposed on weakly endocentric compounds. This is also true for non-
canonically headed compounds. It is unclear if this is a language dependent constraint or if other
factors related to French are responsible for these restrictions. Every other combination,
however, is attested, albeit with varying degrees of frequency. The following table shows the
distribution of compounds for each of the attested permutations in Figure 7.1.
Table 7.1. Distribution of features for the French compounds collected from Wiktionary.
Endo. Canonical Head
Strong Centricity Compositionality136
# of Items
NN N à N
+ + + Full 491 240
+ + + Weak 79 25
+ + + Partial 17 0
+ + − Full 8 16
+ − + Full 61 ---
+ − + Weak 2 ---
+ − + Partial 9 ---
+ − − Full 1 ---
− NA NA Full 31 20
− NA NA Weak 7 13
− NA NA Partial 8 1
− NA NA Non 17 4
One will notice that compounds generally favour combinations that provide as much semantic
content as possible. The vast majority of items in the data are not only endocentric, but also
canonically headed with modifiers that don’t involve tropes. Based on these observations, we
135
It should be noted that tropes may apply to both constituents for exocentric compounds, similar to Benzces’s (2006) findings for English. 136
While the other features are all binary, compositionality may have one of four values: fully compositional, weakly compositional, partially compositional, and non-compositional.
261
may state that compounds favour transparency over opacity. If this is the case, how can we
further distinguish between what amounts to the majority of compounds? The lower section of
Figure 7.1 offers insight into additional means of classification, namely by using semantic
relations and the semantic reliability index to differentiate between compounds that might
otherwise possess the same degree of transparency.
7.2.2 Semantic Relations
While the meaning of a compound’s constituents is paramount to determining just how
transparent it is, the nature of the unexpressed semantic relation that connects its elements is
arguably just as important. In Chapter 5, I proposed a set of 15 basic relations based on a survey
of previous research on the subject. In Chapter 6, following a close examination of the
Wiktionary data, I discussed three additional associations (i.e. argument, adjective, and
classification). The focus was on determining how frequent some of these relations were, as well
as whether they might feasibly be applied across a large set of disparate compounds. The results
of the analysis showed that the majority of compounds do indeed make use of a relatively
restricted number of semantic associations, which suggests that speakers may be sensitive to this
information during processing. Psycholinguistic research lends support to this position (see
Chapter 5 for relevant discussion).
It should be noted at the outset that the semantic relations that hold between a compound’s
elements are inversely related to the degree of compositionality: the less compositional a
compound is, the less likely it is to involve a relation. In fact, this correlation is quite probably
discrete. In other words, only fully compositional and weakly compositional compounds may
make use of semantic relations to link their elements, which means that semantic relations may
only factor into a compound’s semantic transparency if it is compositional. Thus, the discussion
that follows is only relevant for a subset of the items in the typology, though this subset does
account for approximately 90% of the data collected.
7.2.2.1 Relation Types
The relations I proposed largely consist of frequent and recurring predicates used to join a
compound’s constituents (i.e. X causes Y, X is part of Y, X is a type of Y, etc.). These relations,
however, are not all functionally identical: they differ according to the nature of the relationship
262
held between elements. Wisniewski (1996) distinguishes between three types of relations in his
approach to compound processing, which is to say that speakers link concepts via either
relational predication, property mapping, or hybrid combination. Costello and Keane (2000)
argue that there are in fact five such “interpretation types”: relational, property, hybrid,
conjunctive, and known-concept. A more widely used typology of compound types, however,
comes from Scalise and Bisetto (2009)—first proposed in Bisetto and Scalise (2005)—who
group together compounds according to three basic associations: a constituent may either
depend on the other (subordinate) or it may qualify it (attributive), or both constituents may
share equal status within the compound (coordinate)137. If we consider this particular typology,
we may distribute the 15 relations I proposed in Chapter 5 (along with the non-conceptual
relations discussed in Chapter 6) as follows:
(173) a. Subordinate: PRODUCTION, CAUSE, PART, COMPOSITION, SOURCE, LOCATION,
POSSESSION, TIME, TOPIC, PURPOSE, USE (also argument)
b. Attributive: SIMILARITY (also adjectival, classificatory)
c. Coordinate: COORDINATION; HYPERNYMY, FUNCTION
As we can see, most of the relations identified are subordinate in nature as they involve
constituents in a complement relation (see Scalise and Bisetto 2009 for a more detailed look at
their typology). Only SIMILARITY is attributive as it describes constituents that relate to each
other on the basis of property mapping (i.e. X is similar to Y based on property W). Also
attributive in nature are NN compounds in which one element functions as an adjective (e.g.
maître-cylindre, chef-lieu, expert-comptable). We may add that classificatory compounds, such
as particule bêta and mâle alpha, also involve an attributive association, but these types are not
as crucial given that they are not compositional (see Chapter 6, Section 6.1.2.3 for a brief
discussion of these cases). As for coordinate compounds, three relations retained in this work
correspond to this particular class, most prominently the COORDINATION relation. We may say
that this relation represents the most prototypical coordinative relation. HYPERNYMY and
137
It is worth noting that these three categories correspond exactly to those found by Wisniewski (1996) during his experiments: subordinate = relational, attributive = property mapping, coordinate = hybridization.
263
FUNCTION, on the other hand, might be considered sub-types of coordinate compounds as they
allow for copulative expansion (this will be discussed in a moment).
No matter the number of relations identified, one must ask whether the nature of the relationship
between constituents has an effect on a compound’s degree of transparency. According to Bell
and Schäfer (2013), a compound whose designatum is an intersection of its constituents
represents the most fundamental case and might consequently represent the most transparent
instance of compounding:
“The most basic configuration possible would be one where A and B retain their original meaning, and the relationship is set to identity. That is, the property expressed by A and by B hold of the very same entity, and the semantics is thus intersective. These combinations might be regarded as the most transparent AB combinations. Classic examples result from the combination of Kamp’s (1975) predicative adjectives with a nominal head, e.g. fourlegged animal.” (Bell and Schäfer 2013: 3).
Although the type of relationship described by Bell and Schäfer coincides primarily with the
adjectival relation observed for AN and NA constructions, certain attributive NN compounds
would also be subsumed under this class (e.g. grandeur nature, maître-cylindre). These types
might therefore be considered instances of highly transparent associations. While compounds
involving a classificatory relation are also largely attributive in nature, their intersective
meaning representation is eclipsed by the fact that they make use of nouns that have no
designatum on their own (e.g. particule bêta, where bêta does not signify anything). This means
that, even if they are considered highly transparent, they fall under a less transparent category in
the typology (i.e. partially compositional). SIMILARITY, on the other hand, is not a likely
candidate for high transparency, this despite its inclusion under the attributive class in Bisetto
and Scalise’s (2005) typology. The fact is that, while SIMILARITY does differ from subordinate
relations, it is only partially intersective. A tigershark, for instance, is a tiger with some of a
shark’s features: only a subset of a shark’s semantic representation is included in the meaning of
the compound (i.e. a tigershark is not the set of sharks that are tigers). This contrasts with other
attributive compounds where the non-head constituent is typically included in its entirety in the
meaning of the whole (cf. maître-cylindre). Moreover, SIMILARITY is a highly malleable and
multi-faceted relation. In some instances, a compound involving this relation may target
physical similarities (e.g. chou-fleur), functional similarities (e.g. magasin phare), behavioural
similarities (e.g. fourmi-lion), etc. This wide range of possible associations is also further
264
complicated when we take into account the fact that these compounds often rely on metaphor to
establish the relation between their constituents. A compound based on the SIMILARITY relation
is thus inherently more complex than other attributive types.
Also analogous to Bell and Schäfer’s concept of intersective semantics are the COORDINATION
and HYPERNYMY relations. Compounds involving these relations make complete use of the
semantic representation of their elements and are thus, as I’ve said previously, equative. While
COORDINATION is intersective, HYPERNYMY is inclusive, as shown in the following diagrams:
Figure 7.2. Difference between coordinated and hypernymic compounds.
A case can be made to consider these types as instances of Bell and Schäfer’s most transparent
class as, according to their approach, the relation is set to identity (i.e. BE). This may be shown
using the following copula constructions:
(174) a. COORDINATION a singer that is a songwriter / a songwriter that is a singer
b. HYPERNYMY an oak is a tree / the tree is an oak
We may also add compounds involving the FUNCTION relation to the list of intersective
combinations as they usually invoke an identity interpretation as well: compounds such as
mémoire tampon (buffer memory) or bateau lavoir (wash-shed), although not strictly speaking
coordinated compounds, may nevertheless be paraphrased using a copula (i.e. memory that is a
buffer) because a functionality reading can, under most circumstances, reclassify an object (i.e.
a rock used as a paperweight is a paperweight).
Are compounds based on the semantic intersection of their constituents really the most
transparent? Although there undoubtedly exist arguments in favour of a number of different
analyses, I propose that both the ARGUMENT and PURPOSE relations be ranked higher than those
discussed above with regards to their global transparency effects.
tree oak
singer songwriter
265
The chief argument in favour of this position is that compounds for which constituents are in a
head-argument relationship require far less guesswork in how they should be interpreted: in
standard synthetic compounds, the modifier typically fills the first internal argument slot of the
head. Consequently, compounds such as groupement phosphate (phosphate grouping) and
administrateur réseau (network administrator), both instances of the ARGUMENT relation, are
semantically complete because relational information is present in the head constituent. In other
words, the predicate linking the elements is explicit. This is in stark contrast with most other
types of compounds where determining what relation might hold between their elements is not
immediately apparent. Only prototypical attributive compounds (i.e. those involving adjectives
or nouns with adjectival functionality) might allow for similar arguments to be made, although
the locus of the relation typically lies with the non-head element. Even coordinate compounds,
which are clearly intersective, still require that a semantic relation be established, namely one of
identity. This relation does not emerge from either constituent, but instead surfaces once they
are combined. In other words, nothing in the meaning representations of either singer or
songwriter indicates that a coordination of elements will emerge when they are combined; this
information is instead gleaned from the combination itself (i.e. they are co-hyponyms, they
share semantic features, etc.).
An interesting corollary to treating argument based compounds as the most transparent type is
that traditional French synthetic compounds, such as ouvre-bouteille and lave-vaisselle, are also
granted a great deal of transparency. This consequence seems entirely tenable given the fact that
this type of compound is both productive and frequent138, and poses few comprehension
challenges despite the fact that they possess zero-affix heads (‘a V-N is an artefact that does V
to N’). If one adheres to a strict semantic definition of centricity, these compounds typically fail
the IS-A test, which might lead to an exocentric treatment139, but the nature of the relation
would still ensure that they are granted the highest degree of semantic transparency within their
class. Exactly how they would be featured in the typology proposed here, however, is a matter
of future research.
138
In the original data retained from Wiktionary, there are 885 V-N compounds. 139
This is not to say that V-N compounds should in fact be treated as exocentric, as the phonologically unrealized head is functionally similar to the –er affix in English synthetics (Lieber 1992).
266
Similarly, some purposive compounds are also semantically complete, although to a lesser
degree than typical synthetic compounds. When compounds involving the PURPOSE relation
target the head constituent’s proper function (PF), the exact nature of the relation is available via
the head. The difference between these types of compounds and synthetic compounds is that in
the latter the non-head element fills an argument slot of the head, while in the former, the non-
head is an argument of an unexpressed verbal predicate. Although synthetic compounds nearly
always allow for a paraphrase using of/de (as in 175a; cf. Jackendoff 2010), purposive
constructions typically use for/pour (as in 175b). Another distinction is that that the verbal
predicate may be morphologically distinct from the head for purposive compounds—this is not
generally the case for true synthetics. The following examples from English illustrate these
differences:
(175) a. deverbal head: truck driver driver of truck / person who drives a truck
snow removal removal of snow / act of removing snow
b. purpose (PF): fish bowl bowl ?of/for fish (bowl that holds fish)
bread knife knife *of/for bread (knife that cuts bread)
Despite the additional step required to establish how purpose links together the constituents, we
may argue that compounds involving the head’s proper function might prove easier to interpret
by speakers than other types. Compounds such as abri-vent, passage piétons and timbre-poste
are therefore closer to synthetic compounds in terms of transparency than compounds involving
other predicating relations.
In summary, the relations discussed above can be ordered as follows, from most transparent to
least transparent:
(176) Relational Transparency Hierarchy:
argument > purpose (PF) > adjectival > intersective > similarity
7.2.2.2 Ordering Subordinate Relations
What of the remaining subordinate relations? How might they be ordered? At the heart of this
question is how these relations might be said to emerge in the first place. Some authors argue
that compound relations are entirely a matter of pragmatics, which certainly explains the
267
presence of less frequent or even idiosyncratic relations (Downing 1977, Bell and Schäfer
2013). While it is true that some novel compounds are indeed coined relative to the context of
their utterance (see Downing’s 1977 bike-girl example), many come into being simply because
the thing they are meant to denote is best described using a particular pair (or more) of words.
As Bolinger (1975) says (cited in Downing 1977):
“Words are not coined in order to extract the meanings of their elements and compile a new meaning from them. The new meaning is there FIRST, and the coiner is looking for the best way to express it without going to too much trouble.” (Bolinger 1975: 109)
Thus, the relation that binds a pair of words, while not entirely pre-determined, may be
surprisingly predictable. Wisniewski’s (1996, 1997) tests show that when two lexemes are
combined, the relation that is most likely to arise depends on how the hearer perceives their
relatedness. In other words, highly similar items will favour a property or hybrid reading (i.e.
zebra fish = animal-animal, resemblance), while different items will favour a relational reading
(i.e. bee sting = animal-act, cause). Although this approach allows for predictions to be made
regarding the emergence of top-level relations (i.e. subordinate, attributive, or coordinate), it
does not tell us exactly what sub-type might be actualized.
Another problem is that conceptual relatedness may not in fact be a very good predictor of
compound type. Experiments by Estes and Glucksberg (2000), for instance, show that otherwise
incompatible lexemes may result in a property interpretation if the modifier possesses a highly
salient feature capable of filling a relevant dimension in the head constituent. Their tests showed
that speakers favoured a property relation for feather luggage (i.e. ‘luggage that weighs very
little’), because “light” is a salient feature of feather and a luggage’s weight is considered highly
relevant. These results contrast with a combination such as feather storage, which they found
was far less likely to elicit a property reading from speakers because weight is not a relevant
feature of storage. According to Estes and Glucksberg, relational information depends on
features of both the head and the modifier, with one being incorporated into the other’s frame.
This model, often called a schema or slot-filling theory, is widely adopted and is at the heart of
several theories of conceptual combinations (among others, Wisniewski 1996, Estes and
Glucksberg 2000, Baroni et al. 2007). We may also draw parallels between this approach and
other models in combinatorial semantics (cf. Pustejovsky 1995).
268
One question that arises is whether the locus of relational information is in the head or the
modifier. The standard approach is for the modifier to fill a relevant slot in the head constituent
(cf. feather luggage discussed above). Even in instances where the association between elements
is subordinative, if the relation may be derived from one of the elements, it most likely
originates from the head, similar to what was discussed earlier regarding proper functions (e.g.
fish bowl; a bowl is meant to hold something → a bowl that holds fish). On the other hand, some
have argued that relational information resides instead in the modifier. Gagné (2001), for
instance, argues that “relations are associated with the modifier’s representation, rather than
existing as independent structures’’ (247). This approach is based on research by Gagné and
Shoben (1997) that showed that the interpretation of a compound is facilitated by how
frequently a particular relation is used with the modifier. Speakers found compounds involving
the lexeme mountain far easier to interpret when they made use of a relation frequently used
with that lexeme (LOCATION: mountain cloud) than when they involved an infrequent relation
(TOPIC: mountain magazine). They call the set of a modifier’s relations its relational
distribution.
The examples provided by Gagné (2001), however, only serve to underline how the source of
relational information is perhaps not so static. In mountain cloud, location may indeed stem
from the modifier, but in mountain magazine, topic most likely originates from the head.
Another aspect that challenges Gagné’s approach is that it requires that many lexemes possess
an almost infinite number of relations. Estes and Jones (2006), who argue that “relations
constitute representational structures in and of themselves” (90), offer the following examples to
illustrate just how various the relations stemming from the concept bear in modifier position
would need to be:
(177) bear paw (part/whole), bear scare (causal), bear season (temporal), bear toy
(possessive), bear tracks (from), bear cave (habitat), bear cub (subtype), bear family
(of), bear story (about), bear playground (for)
Although Estes and Jones do not believe that relations are part of individual lexemes’ semantic
or conceptual representation, their examples lend support to an approach that favours the head
over the modifier as the locus of the instantiated relations. This is not to say, however, that the
modifier plays no role in the validation of such relations, as certain lexemes in modifier position
269
will influence the type of relation available. Materials or substances, for instance, will favour a
composition reading when in modifier position (e.g. copper tube, rubber duck, paper bag, etc.).
Unfortunately, even this remains a tendency and not an immutable fact as compounds with
identical material or substance-like modifiers may differ in their use of relations (e.g. milk
chocolate: PART, milk glass: PURPOSE).
The following sections examine ways in which we might be able to rank subordinate relations
according to their degree of semantic transparency.
7.2.2.3 Source of the Relation
Given the previous discussion on the source of semantic relations within compounds, we might
wish to draw a hard line between modifier and head as factors in transparency. According to
Baroni et al. (2007), the head of a compound functions as the base of meaning composition if
the compound is relational, whereas the modifier fulfils this function for attributive compounds.
Intuitively, adjectival compounds do seem to depend on the modifier as the source of the
association—such is the nature of determination. The SIMILARITY relation also seems to function
in this manner, as a salient property of the modifier is applied to the head (e.g. a zebra fish is a
fish with stripes, stripes being a zebra’s most salient feature). In this way, the relation may be
said to originate in the modifier. As for subordinate relations, which are relational in Baroni et
al.’s framework, we might ask if the head is indeed a reliable predictor of compound type.
Unfortunately, there is no shortage of compounds that challenge this approach to the source of a
compound’s relation. Consider, for instance, the following examples:
(178) a. stylo-bille ‘Stylo dont la plume est remplacée par une fine bille de métal’
Relation: PART Source: bille? (modifier)
b. chêne kermès ‘chêne qui abrite les kermès’
Relation: LOCATION Source: chêne? (head)
c. bouton-pression ‘bouton engagé à l’aide de pression’
Relation: USE Source: ?
270
In (178a), the PART relation may originate from the modifier, bille, but only if we understand it
as a component object. Although this is not entirely implausible, it does require that the meaning
representation of bille be either extended or underspecified. Similarly, in (178b), we may trace
the LOCATION relation to the head constituent, chêne, but again, the lexeme’s semantic
representation does not explicitly condition this association. This is further evidenced by the fact
that the three other instances of plant-animal compounds in the data do not involve a locative
relation (e.g. chou-vache, fougère aigle, menthe-coq). If we turn to bouton-pression in (178c),
we find a case where neither constituent may be said to bring about the USE relation. Such
compounds, while perhaps not representative, are certainly not exceptional: there are many
instances where the relation, when seen from the perspective of the whole, is entirely
reasonable, but when viewed from the point of view of the constituents, is not apparent without
significant coercion (e.g. code-barres, COMPOSITION; danse-poteau, USE; train-fantôme140,
LOCATION, etc.). Such cases suggest that relations may occasionally originate from the
combination itself and not from any individual constituent.
That said, many compounds do in fact make use of a relation associated with one of its
constituents. The compounds below in (179) may be contrasted with those in (178a-b) above:
(179) a. montre-bracelet PART source: bracelet (modifier)
b. sauce tomate SOURCE source: sauce (head)
In both cases, the relation in question seems to originate from one of the constituents. What we
notice, however, is that the direction of a relation’s application is largely a function of its source.
In other words, reversibility, as it was defined in Chapter 6, is tied to either the head or the
modifier’s role in producing the association. For instance, for locative compounds, the pattern
“X in/on/at Y” is generated when the non-head functions as the location, while the pattern “X
that Y is in/on/at” is the result of a locative head:
(180) a. manchot antipode ‘manchot situé aux antipodes’ source: modifier
b. poche-revolver ‘poche dans laquelle est situé un revolver’ source: head
140
“Train installé dans les parcs d’attractions [. . .] dans lequel les visiteurs viennent se faire peur avec [. . .] des apparitions lugubres.” (<http://fr.wiktionary.org/wiki/train_fant%C3%B4me>)
271
If the origin of the relation cannot always be predicted based on the semantics of either the
modifier or the head, might we instead look to reversibility as a possible means to further
differentiate between subordinate relations?
7.2.2.4 Reversibility
During the presentation of the semantic relations retained for this work, I discussed the
possibility that some relations be applied according to either a basic template (H REL M) or a
reversed template (H that M REL). It is therefore worth exploring if splitting subordinate
relations according to reversibility might allow for their classification in terms of transparency
effects. In other words, is the active use of a predicate more transparent than a passive one (or
vice versa)? Compare, for instance, the following pairs of compounds:
(181) a. piétin-échaudage = H CAUSES M arrêt maladie = H CAUSED BY M
b. appareil-photo = H PRODUCES M café-filtre = H PRODUCED BY M
c. bouton-pression = H USES M code machine = H USED BY M
While it may seem appealing to argue that the compounds in the first column are more
transparent than those in the second column, by virtue of their thematic relations, there is in fact
little evidence to support this position. In fact, we may argue that this approach is difficult to
adopt for a number of reasons.
First, although in some instances, a compound’s relation might be ambiguous with regard to its
direction of application (e.g. it is feasible that arrêt maladie might be understood as ‘arrêt qui
cause une maladie’), most cases are largely unambiguous, regardless of their application.
Second, there is no evidence to suggest that a reversed template, while perhaps incurring
additional processing costs, necessarily reduces a compound’s interpretability from the
perspective of the speaker. In other words, is beeswax (‘wax made by bees’) more opaque than
sweat gland (‘gland that makes sweat’) simply because its head is the theme and not the agent?
This question brings us to a third point, which is that some relations clearly favour an indirect
application. PART for instance, is predominantly applied as H that M is a part of—it is therefore
unlikely that this template is in fact the most opaque of the two available for this relation.
Furthermore, some relations don’t seem to favour either template: LOCATION, for example, is
nearly evenly split between its basic template and its reversed form. Given these facts, the most
272
prudent argument to make regarding reversibility is that it may affect transparency in a relative
manner, according to the relation involved. In other words, the reversed PART template for NN
and N à N compounds is more transparent than its basic form, but that no such distinction may
be made for the LOCATION relation. While this is useful for comparing compounds that make use
of the same semantic relation, it does little, however, to evaluate compounds across different
relational types.
7.2.2.5 Frequency
The relative frequency of certain relations might also be considered a likely indicator of
transparency: compounds that make use of highly frequent relations could be more transparent
than those involving low frequency relations. The distribution of relations across the data was
provided in Chapter 6. The results for unranked subordinate relations are repeated below:
Table 7.2. Number of compounds in the data for each subordinate relation.
Relation NN N à N LOCATION 56 27 PART 47 69 USE 32 45 SOURCE 12 6 PRODUCTION 13 18 TOPIC 8 0 CAUSE 7 5 TIME 3 2 POSSESSION 0 6
Although this approach is appealing for a number of reasons, namely the ease with which it can
be incorporated into a transparency model, it unfortunately fails to take into account that not all
noun pairs will allow for any and all of these relations. While the head may govern the
emergence of a given relation, the modifier must fulfill the requirements of the relation as it
pertains to the head. Thus, some compounds are unlikely to make use of highly frequent
relations if they are incompatible with the semantic features of their constituents. It is unlikely,
for instance, that a compound headed by an animate being would involve the topic relation.
Conversely, compounds headed by items with informational content are likely to involve this
relation (i.e. history book, news magazine, horror story, etc.).
273
This is in fact where the SRI discussed in Chapter 4 proves most useful. By calculating the
distribution of relations for various templates, we may then compare relations and compounds
based on their relative pertinence for a given compound. In other words, compounds involving
two different relations may be equally transparent if these relations are equally pertinent for
their respective heads. Conversely, compounds that make use of the same relation may possess
different degrees of transparency if that relation is more frequent for one than the other. To
illustrate this point, the following table contains 3 different compounds that make use of the
PART relation, each of which differs according to their calculated SRI141:
What the table shows is that a relation may carry more weight for certain compounds than for
others. Although PART is a frequent relation overall, its relative importance varies according to
the head noun. We may hypothesize that compounds that make use of PART alongside the noun
voiture may be more transparent that those involving the noun bloc. This is especially true in
cases where a template clearly favours a particular relation even if that relation is not frequent
across all compound types142. For instance, compounds involving sauce as their head noun (e.g.
sauce tomate, sauce soja, sauce arachide, etc.) make extensive use of the SOURCE relations,
which is evidenced by its high SRI (0.500 over 14 compounds). These numbers contrast with
SOURCE’s overall low frequency (14 occurrences across 729 NN compounds, or approximately
2%). The presence of the SOURCE relation is no doubt governed, to some degree, by the nature
of the head (i.e. a sauce is made from something) and the ability of the modifier to fill its
corresponding source/composition slot. It would therefore be unwise to state that a compound
141
The SRI is calculated by dividing the number of compounds for a given template (i.e. template N-X) that involve the same semantic relation by the total number of compounds for that template. In the table above, the SRI was calculated using compounds listed under the head lexeme’s entry in LPR2010, in Arnaud (2003), as well as in the data originally collected from Wiktionary. The retained templates were selected based on the total number of items from all three sources. 142
Given the low number of templates identified in the Wiktionary data, it must be noted that some compounds may not allow for this sort of analysis. It is presumed that a large dataset is required for the SRI to be truly effective.
Compound Relation # of types of RELATION
# of types of N-X SRI
bateau-pompe PART REVERSED
H that has M as a part
3 13 0.231 bloc-ressort 2 11 0.182 voiture-radio 5 13 0.385
274
such as sauce tomate has a low semantic transparency rating by virtue of the SOURCE relation’s
low frequency across other compound types. Subsequently, the evaluation of relations according
to their frequency for a given template allows for semantic affinity to be judged without
establishing lexical and schematic representations for all nouns. In other words, if CAUSE
frequently occurs with template N-X, then N and X are clearly compatible in a way that allows
for that relation to emerge. This is not to say that assessing compatibility should be ignored, but
simply that identifying relations for a large set of compounds would arguably reveal similar
information regarding their shared features. This compatibility might be characterized in a
number of ways, such as in the conceptual classes of the compound’s elements (e.g. artefact,
animal, food, etc. cf. Maguire et al. 2010) or the feature sets of their semantic representations
(cf. qualia in Pustejovsky 1995, semantic body in Lieber 2004). Thus, a compound such as
pumpkin-squash might involve a coordinative relation because this is the most likely association
for PLANT-PLANT combinations. Although the matter clearly warrants further research, we may
state here that the degree of semantic transparency observed for subordinate compounds can be
evaluated based on the frequency of the relation within the set of compounds involving similar
lexemes. This evaluation may be achieved by using the SRI equation described in Chapter 4.
7.2.3 Summary
Based on the observations made above, relational information for compounds may be ranked
according to the following hierarchy:
(182) a. ARGUMENT
b. PURPOSE (proper function)
c. adjectival relation
c. COORDINATION, HYPERNYM, FUNCTION (as well as classificatory143 relations)
d. Subordinate Relations ordered according to the compound’s SRI
e. SIMILARITY
f. Idiosyncratic Relations
143
As has already been mentioned, the classificatory relation, as observed in the data, does not seem to involve highly transparent compounds. Although this relation shares much in common with other attributive relations, it falls under the partially compositional node of the hierarchy and therefore does not compete with other relations in terms of transparency. It is included here for the sake of completeness.
275
One will have noticed that the SIMILARITY relation, which was said to be attributive in nature, is
ranked lower than subordinate relations. The reason for this is that, as was discussed earlier,
SIMILARITY is an underspecified relation: any number of attributes or properties may be at play
in these compounds, which arguably reduces their degree of interpretability. Given the large
number of possible realizations for this relation (e.g. shaped like, the color of, smells like, tastes
like, etc.), I assume that it is in fact less transparent than other basic predicative relations (e.g.
cause, produce, part of, etc.).
It should also be noted that at the lowest point on this list are idiosyncratic relations. These are
associations that do not correspond to any of the relations proposed here. Distinguishing
between a set of basic relations for compounds, which is to say highly recurring and elementary
relations, and those that only occur in a select few compounds allows us to hypothesize that the
latter are less transparent than others because they require that meaning be established using
knowledge not typically observed for other compound types. Therefore, compounds such as
avantage choc, capital-risque and laine renaissance, although strongly endocentric and fully
compositional, may present challenges not necessarily present in compounds that involve
recurring relations if these are taken to be a fundamental component of their word formation.
To conclude, compounds may be ordered according to their degree of semantic transparency,
which is determined based on several factors. Consequently, the proposed hierarchy takes into
account the factors discussed in Chapter 4, namely centricity and compositionality. Moreover,
compounds may, to some degree, be classified using the semantic relations that join their
elements. Some of these relations are more basic than others, according to either the nature of
the relationship (i.e. intersective) or the completeness of their semantics (i.e. argumental).
Subordinate relations, which are defined as relations in which one constituent is dependent of
the other, may be further classified, but only when taking into account the semantic
homogeneity of their templates.
The following section re-examines the compounds extracted from Wiktionary in light of the
typology proposed here.
276
7.3 The Semantic Transparency of French Compounds
This section focuses on examining attested French compounds according to their place within
the typology outlined in Section 7.2. The emphasis is on offering examples for each of the
transparency profiles discussed above, while expanding on the role of semantic relations at
individual levels of the typology.
7.3.1 Canonical Endocentric Compounds
The vast majority of compounds present in the data are left-headed, endocentric compounds. Of
the 666 NN endocentric compounds identified, 595 (or 81%) have their head in canonical
position144. In the case of N à N endocentrics, all cases are left-headed and are therefore
canonical according to the definition provided in Chapter 4. We may, however, further
distinguish this subset of compounds according to other features, such as compositionality and
relationship type.
7.3.1.1 Strongly Endocentric, Fully Compositional: passage piétons and
boîte à outils
Most of the endocentric compounds in the data are both strongly endocentric, which is to say
that the head may be understood literally, and fully compositional. By fully compositional, I
mean that the non-head element contributes meaning to the whole in a literal sense. According
to the features explored in this work, this particular type represents both the most semantically
transparent and the most frequent instance of compounds. The following table shows how these
features compare across all the compounds retained:
NN N à N
# of compounds 729 319
# of +canon, +strong, +compositional 491 240
% of +canon, +strong, +compositional 67.3% 75.2%
144
Coordinated compounds are considered canonically headed and are therefore included in this count.
277
Within this class of compound, we may distinguish between the following types, ordered for
transparency according to the hierarchy of semantic relations discussed in Section 7.2.2:
Table 7.3. +canon, +strong, fully compositional compounds ordered according to relations.
NN N à N Relation
groupement phosphate mise à niveau ARGUMENT
passage piétons boîte à outils PURPOSE (PF)
grandeur nature --- adjectival
auteur-compositeur banane plantain circuit tampon
--- --- ---
COORDINATION HYPERNYMY FUNCTION
various subordinate relations
chou-fleur escalier à vis SIMILARITY
médicament conseil logiciel à contribution idiosyncratic
Table 7.3 shows how strongly endocentric, fully compositional compounds might be ranked in
terms of their degree of semantic transparency when taking into account their relational
associations. Intersective compounds not involving adjectival nouns are grouped together—no
attempt is made here to rank these relations. Further research into the matter might provide
additional insight on how these relations might be contrasted in terms of the challenges they
pose at the level of meaning construal. Subordinate compounds are not listed in the table above
and will be discussed in a moment. Compounds based on the SIMILARITY relation are ranked
lower according to the arguments presented in Section 7.2.2.2.
At the very bottom of the list are compounds that make use of idiosyncratic relations, which is
to say relations that cannot be said to be basic or recurrent. Médicament conseil serves as a
rather suitable example of this type of compound: its constituents are connected by the semantic
template H purchased based on M (of W). In the case of logiciel à contribution, the relation is
best described as financially supported by.
Compounds involving subordinate relations are unranked, but may be compared according to
their localized frequency (i.e. within compound templates). As was discussed in Section 7.2.2.5,
278
when used in conjunction with the retained semantic relations, the SRI can provide an indication
of how frequent a particular relation is for a given compound template. The table below offers
SRI data for NN compounds involved in high frequency templates with the head element
serving as its base. These compounds were selected according to the number types for each
template, so that SRI calculations were based on a similar number of compounds (which range
from 13 to 17 per template)145.
Table 7.4. SRI values for compounds within recurring N-X templates.
Compound Relation SRI of C Average SRI of X-N
sauce tomate SOURCE 0.500 0.327
effet revenu CAUSE (rev) 0.438 0.352
voiture radio PART (rev) 0.385 0.290
voiture salon LOCATION (rev) 0.308 0.290
bateau pompe PART (rev) 0.231 0.195
carte adresse LOCATION (rev) 0.176 0.135
bateau pirate USE (rev) 0.077 0.195
carte senior USE (rev) 0.059 0.135
What the table above shows is that it is not immediately clear whether SRI values are in fact a
useful metric for comparing degrees of transparency. Intuitively, none of the compounds in
Table 7.4 is particularly difficult to understand, nor are they wildly different in their use of
relations. It therefore seems highly doubtful that these numbers should serve, under most
circumstances, as indicators of transparency. That said, in extreme cases such as sauce-N, the
SRI does highlight instances where a particular relation is either highly dominant or marginal,
which can be used to assess transparency. Moreover, when used in conjunction with a
template’s average SRI146, a compound’s SRI value will indicate how it compares to the
template’s set homogeneity. Most of the compounds in Table 7.4 have higher than average SRI
values, which suggests that their relational meaning is dominant across other similar
145
Once again, because the number of recurring templates in the data taken from Wiktionary is quite low, the reported SRI here makes use of other sources, namely Arnaud (2003) and LPR2010. See Appendix A for a list of the compounds used. 146
A template’s SRI is the average of all SRI values for that template’s compounds.
279
compounds. Only two of the compounds listed here, carte senior and bateau pirate, have below
average SRI values, both of which seem to run counter to these arguments given how intuitively
easy they are to understand.
Setting aside for a moment the fact that these compounds involve the low frequency relation
USE, carte senior’s low SRI may in fact be explained by carte’s polysemous nature: within the
17 carte-N compounds examined, three different heads were identified:
(183) a. petit carton rectangulaire... ex. carte-adresse, carte-senior
b. représentation à échelle réduite... ex. carte météo, carte radar
c. circuit imprimé ex. carte-tuner, carte mère
If the SRI were only calculated using synonymous head nouns, carte-senior would possess a
higher SRI value, which might better correspond with one’s intuitions regarding this compound.
This approach, however, fails to take into account the actual task of disambiguating the head
noun within the compound. While the method adopted here is meant to account for “noise” in
the shared space of a compound’s template, only rigorous testing of these types with speakers is
likely to reveal which of the two approaches is a better indicator of transparency.
As for bateau-pirate, it is clear that USE is not a dominant relation for this template, despite the
head noun’s artefactual nature. According to the data, bateau-N compounds favour FUNCTION
(e.g. bateau-lavoir) and PART (e.g. bateau-pompe) relations, which results in a low SRI for all
other types. Does bateau-pirate possess a lower degree of transparency then? Perhaps not, given
that the constituents share the same semantic space (i.e. pirates are chiefly understood in terms
of ships and open waters). This is in fact where the SRI most obviously fails, as it does not
explicitly account for the relatedness of constituents. In this regard, the concept of template
homogeneity would need to be augmented with additional factors in order to more accurately
model the probability that a particular relation should emerge for a pair of lexical items. One
such method might involve the incorporation of individual constitutents’ conceptual classes, an
approach that Ryder (1994) had in fact explored.
Despite these shortcomings, a comparison of semantic relations across similar compound types
may nevertheless reveal subtle differences within individual configurations or profiles. Whether
280
these differences ultimately affect a compound’s degree of transparency remains, for the time
being, unclear.
7.3.1.2 Strongly Endocentric, Weakly Compositional: mot-clé and piano
à queue
Weakly compositional compounds are those for which the non-head contributes meaning via a
trope such as metaphor or metonymy. A total of 75 NN and 25 N à N compounds were found to
possess these features, examples of which are as follows:
(184) a. NN cocotte-minute, crème fleurette, laine renaissance, mot-clé, plan cul
b. N à N arbre à cornichons, boîte à pet, nom à penture, piano à queue,
serpent à lunettes
These compounds may involve any of the non-equative relations, but require that meaning be
determined using a non-established trope:
(185) a. laine renaissance → laine SOURCE [metaphor renaissance]
b. arbre à cornichons → arbre PRODUCTION [metaphor cornichons]
Because of this additional layer of meaning, these types are taken to possess a greater degree of
opacity than the fully compositional compounds discussed above.
7.3.1.3 Strongly Endocentric, Partially Compositional: bateau-mouche
An endocentric compound is partially compositional if its modifier does not contribute meaning
to the whole. It is considered partially compositional because the head retains its meaning. In
most cases, there is no basic relation between the compound’s constituents (as in 186a). The
exception to this case are so-called classificatory compounds (as in 186b), which may be
paraphrased as “H of type M”. In total, 17 NN endocentric compounds exhibiting partial
compositionality were found in the data.
(186) a. bateau-mouche, belote contrée, conducteur fantôme, laurier-tin
b. mâle alpha, particule bêta, rayon gamma
281
No such cases were identified for N à N compounds: if a modifier cannot be understood
literally, it can typically be motivated via sense extension (see Section 7.3.1.2). It’s unclear why
this is the case, but the most obvious explanation is that the preposition imposes a
meaningfulness constraint on the element it introduces, thereby limiting any semantic shift
toward opacity that might occur.
Also treated as partially compositional are compounds involving common nouns that originated
as proper nouns. The modifier is therefore purely referential and does not possess a sense.
Unlike other partially compositional compounds, these cases may involve idiosyncratic relations
issuing from the proper noun’s relationship to the head:
(187) a. rose noisette rose DISCOVERED BY Noisette147
b. valet-à-patin valet INVENTED BY Patin148
In all cases, however, the chief features of the compound are available via the head element, but
with no immediate means to further restrict the set of its designatum. In other words, the listener
may correctly conclude that rose noisette (or bateau-mouche, etc.) is a type of rose, but will be
unable to determine how it may be distinguished from other types because of the modifier’s
referential or inexistent semantic content.
7.3.1.4 Weakly Endocentric: valse-hésitation and poire à poudre
Weakly endocentric compounds are compounds for which the head constituent retains meaning
via a well established trope that typically does not violate the IS-A condition149. As was
discussed in Chapter 4, weak centricity seems to constrain the modifier, in that the latter must
147
“[L]a Rose Noisette, variété à laquelle on a donné ce nom en l’honneur de M. Noisette, qui, le premier, a fait connaître cette charmante fleur” (Jardin de France, Vol. 37, 1846: 319). 148
“On lui a donné le nom de valet, parce qu’il sert de lui même comme serviteur ; de à Patin, du nom de celui à qui on en attribue l’invention” (Dictionaire des sciences médicales, 1821, Volume 56, p. 493). Despite this dictionary’s claim regarding the meaning of valet, which would suggest that it be treated as weakly endocentric, I am treating it as an ordinary endocentric as valet may now mean ‘tool’ (see entry B in TLFi or entry III in LPR2010). This acceptation therefore corresponds to the meaning of valet-à-patin, albeit with a highly ambiguous referent (i.e. the compound does not allow one to determine just what kind of tool it refers to). 149
The test often requires that it be weakened : C is like an H (e.g. une valse-hésitation est comme une valse).
282
contribute meaning to the whole literally. This observation contrasts with Benzces’s (2006)
findings for English, where tropes may operate simultaneously on both constituents, although
her research focused on compounds traditionally viewed as exocentric.
The following NN and N à N compounds can be considered representative of this type:
(188) a. valse-hésitation ‘suite de décisions, d'actes contradictoires’
b. poire à poudre ‘petite gourde de cuir bouilli [. . .] dans laquelle on mettait de la
poudre de chasse’
Although there are too few such cases to allow for definitive conclusions in terms of the
relations involved, the different types observed suggest that most relations are likely available to
tropic compounds (COMPOSITION and LOCATION in 188a-b respectively). Other examples are as
follows:
(189) a. pomme-cajou, site internet, page web
b. cheval à bascule, échelle à poissons, piège à cons, tête à claques
Although technically endocentric, the compounds identified here no doubt pose a greater
challenge at the level of interpretation than literal endocentrics, but they most likely remain
easier to understand than true exocentrics.
It bears repeating that not all instances of sense extension are equivalent at the cognitive level.
Some metaphors are no doubt easier to decipher than others, and metonymy might be less
challenging than most metaphors. As I stated in Chapter 4, however, these attributes are set
aside here in favour of a narrow typology, where flagging instances of tropes is taken as a
sufficient indicator of reduced transparency. That said, nothing about the approach adopted here
prevents future models from further discriminating between compounds whenever sense
extension is present.
7.3.2 Non-Canonical Endocentric Compounds
Only NN compounds may involve non-canonical heads, which is to say compounds for which
the head element occupies the least frequent position for a given compound type. As I showed in
Chapter 4, French nominal compounds are typically left-headed, but may occasionally be right-
283
headed. Under the typology proposed here, non-canonical heads are said to reduce semantic
transparency and are considered a major axis of the hierarchy. The data reveal that, like
canonically headed compounds, right-headed compounds may include several configurations
based on centricity and compositionality.
7.3.2.1 Strongly Endocentric, Fully Compositional: bracelet-montre
Most right-headed compounds are strongly endocentric and fully compositional, which is
unsurprising given that canonically headed compounds favour this configuration as well. Of the
71 right-headed NN compounds present in the data, 62 (or 86%) fall under this category.
Interestingly, most of the relations from Chapter 5 are accounted for, suggesting that head
position does not impose any restrictions on what kind of associations are available to
compounds with non-canonical heads. The following table contains examples for each of the
relations present in the data:
Table 7.5. Strongly endocentric, fully compositional NN compounds, ordered by relation.
NN Relation
photo-interprétation ARGUMENT
auto-école PURPOSE (PF)
maître-cylindre adjectival
lieutenant-général colloid-calcite test match
COORDINATION HYPERNYMY FUNCTION
bracelet-montre fan-club panier-repas art-thérapie ciné-club agora-phobie
PART COMPOSITION LOCATION USE TOPIC CAUSE
taupe-grillon SIMILARITY
clin-foc idiosyncratic
Although subordinate relations such as PRODUCTION and TIME were not present in the data, it is
likely that a larger dataset would contain compounds that make use of these associations. It
should also be noted that the subordinate relations in the table above are unranked as few of
284
these compounds are involved in templates frequent enough to calculate an SRI. Nevertheless,
the listed compounds represent the most transparent of non-canonically headed NN compounds,
with a possible differentiation based on relation type.
7.3.2.2 Strongly Endocentric, Weakly Compositional: reine-marguerite
Very few of the right-headed compounds in the data involve a trope on the modifier, but some
cases were nevertheless identified. The compound reine-marguerite is one such case, where
reine is related to the flower’s extreme beauty150. Another instance is mule-jenny, borrowed
from English and for which the modifier mule is taken to mean ‘hybrid151.’ In either case, this
type of compound will most likely prove more difficult to fully understand than those listed in
section 7.3.2.1 given that the modifier’s semantic contribution is weakened by the presence of a
metaphor.
7.3.2.3 Strongly Endocentric, Partially Compositional: aube-vigne
Partially compositional right-headed compounds contain modifiers that, for one of several
reasons, do not contribute meaning to the whole. In the case of aube-vigne, for instance, its
etymology points to a deformation of its Latin origins (see Note 109109 in Chapter 6 for details
on this compound). Only 8 such cases were present in the data. As the following examples
show, some of these types of compounds are in fact English loanwords that have been adapted
to French:
(190) boule-dogue, lime uranite, quartier-maître
This particular type of compound can therefore be explained using etymological facts: they are
right-headed because they are originally from a language with right-headed compounds (i.e.
Latin and English) and their modifiers do not contribute meaning because they are deformations
150
“[. . .] ils convinrent de lui donner le nom reine-marguerite, en considération de sa beauté et de sa ressemblance avec nos marguerites” (Tessier et al. 1787: 710) 151
“Empr. à l’angl. mule-jenny (1792 ds NED) comp. de jenny* et de mule, issu du fr. mule*, employé au sens de « hybride » pour désigner une machine combinant des systèmes empruntés à deux types de machines différents” (TLFi).
285
or calques of foreign words (e.g. eng. bull → boule, eng. quarter → quartier). These compounds
are presumed to possess a greater degree of opacity than those discussed above given that the
only component that contributes meaning is a non-canonical head.
7.3.2.4 Weakly Endocentric: vidéo-lynchage
Right-headed compounds that might be considered weakly endocentric, yet fully compositional,
are not numerous. In fact, only vidéo-lynchage seems to fit this particular set of criteria, which is
defined as follows:
(191) Activité consistant à filmer une personne à son insu au moyen d'un téléphone cellulaire
pendant qu'on la bat pour ensuite diffuser par Internet la vidéo ainsi obtenue152.
We may hypothesize that compounds with non-canonical heads are not likely to involve tropes
on the modifying element as they are already difficult to process. In the case of vidéo-lynchage,
the preposed modifier is not quite as troublesome because of its prefix-like functionality, which
allows for the correct morphological parsing to occur. One must nevertheless make sense of the
metaphoric head, a task that will influence how the compound is interpreted if no contextual
support is provided.
7.3.3 Exocentric Compounds
In terms of semantic transparency, exocentric compounds may be opposed to endocentric
compounds by an inadequate denotation of the head: while an endocentric compound denotes a
hyponym of its the head, no such relation may be established for exocentrics. Based on this
distinction, we may state that exocentric compounds are inherently more opaque constructions
than their endocentric counterparts. This does not mean, however, that all exocentric compounds
should be viewed as equally opaque. Compositionality, for instance, remains a factor for many
of these types of compounds, which also means that semantic relations may be present in
exocentric compounds. In the following sections, we look at each of these cases in turn.
152
<http://fr.wiktionary.org/wiki/vidéo-lynchage>
286
7.3.3.1 Fully Compositional Exocentric: ballon-panier and pied à boule
Even if a compound is exocentric, its constituents may still contribute meaning to that of the
whole. In this sense, they are fully compositional. Although several of these compounds involve
idiosyncratic relations that are not easily paraphrased using basic predicates, a number of them
do make use of some of the relations discussed previously. Table 7.6 contains examples of the
relations observed for exocentric NN compounds:
Table 7.6. Exocentric NN compounds involving basic semantic relations.
Compound Hypernym Relation
mort-chien plant ARGUMENT
jambon-beurre sandwich COORDINATION
poisson-évêque mythical creature SIMILARITY
ballon-panier game LOCATION/PURPOSE
chèvre-pied mythical creature PART
chiffre-taxe stamp/ticket TOPIC
lac-laque residue COMPOSITION
The compounds listed above represent examples of the typology’s most semantically transparent
of exocentrics: although they may not possess semantic heads, their constituents all provide
meaning in non-trivial ways. It should be noted, however, that not all relations retained were
observed for exocentric compounds. Given the small number of basic relations identified for this
class (i.e. 19/37 for NN, 12/20 for N à N153), I will refrain from making any claims with regards
to either possible or impossible associations. A larger dataset might offer more conclusive
evidence with regards to semantic relations within exocentric compounds, but we may
nevertheless presume that exocentrics do not block the use of basic relations.
A few exocentric and compositional compounds make use of idiosyncratic relations to connect
their constituents. The compound année-lumière might be used to represent not only this class
of compound, but also a specific subtype involving various units of measurement:
153
The numbers reported here are only for exocentric compounds for which both constituents retain their literal meaning.
287
(192) électron-volt, kilogramme-force, kilomètre-heure, tonne-mètre
Although one might be tempted to state that the relation for such compounds is one used to
express a rate (i.e. kilomètre par heure), this approach is not always correct (e.g. *électron par
volt). Other exocentric compounds that remain fully compositional and make use of non
standard relations, while not numerous, are present in the data. Examples are as follows:
(193) a. cheval-vapeur ‘unité de mesure de la force d’une machine à vapeur
selon la force excercée par un cheval154’
b. face à main ‘binocle à manche que l’on tient à la main155’
Although exocentric N à N compounds may also be fully compositional and make use of
various basic relations, these relations are not as well represented. This may be partly related to
the fact that, in general, N à N compounds involve fewer basic relations than their NN
counterparts. Only ARGUMENT and LOCATION are present for compositional exocentric N à N
compounds:
Table 7.7. Exocentric N à N compounds involving basic semantic relations.
Compound Hypernym Relation
mise à pied termination (employment) ARGUMENT
pied à boule warning (bowling) LOCATION
Because of the low number of tokens for either type of compound, relations are not ranked
based on SRI values. Although they are treated together here, this is not to say that the SRI
cannot be used to differentiate between exocentric compounds. As was already discussed in
Chapter 4, compounds patterned on the template N-lumière, where N is a period of time, are all
analogically based on année-lumière and therefore share the same, albeit idiosyncratic, relation.
154
This is an approximate paraphrase that does not take into account how this unit is actually calculated : “La force d’un cheval vapeur équivaut à 75kil. élevés à la hauteur d’un mètre par seconde, mais la force réelle d’un cheval vivant ne représente pas plus de 50 kil. élevés à la même hauteur pendant le même espace de temps” (Dictionnaire du commerce et des marchandises, tome 2, 1839, p. 1401). 155
LPR2010.
288
It is likely that a speaker familiar with the base of the analogy will find any of the other forms
easy to interpret, which might therefore be predicted by its higher than average SRI156.
7.3.3.2 Weakly Compositional: radio-trottoir and cage à écureuil
An exocentric compound may also be weakly compositional if one or more of its constituents is
subject to sense extension. These compounds may manifest themselves in several ways,
depending on which element undergoes a tropic shift:
(194) a. trope on X, Y is literal bec-figue / moulin à parole
b. X is literal, trope on Y ---
c. trope on X, trope on Y radio-trottoir / cage à écureuil
One should recall that compounds that involve sense extension on the head element are only
considered (weakly) endocentric if the trope is established (i.e. listed as such in a lexicographic
work). Otherwise, the compound is understood as exocentric. Interestingly, no compounds were
observed in the data for which only the right-most element is subject to a trope. This may be
related to observations made in Chapter 4 regarding endocentrics and figurative non-heads, but
it is unclear how this could be formalized to also account for exocentrics. Again, a larger dataset
might provide additional information with which to offer a hypothesis regarding the relationship
between centricity and sense extension on the modifier. We may state, however, that the
compounds in (194a) (and 194b, as the case may be) are more transparent than those in (194c)
as the former only involve a single trope.
One should note that combinations in (194c) are instances for which metaphor (or metonymy)
applies to the individual elements and not the whole. In radio-trottoir (‘réseau de
communication personnel’), for instance, the metaphors involved target the individual
components: radio for communication, and trottoir for the means of transmission. The same
may be said of cage à écureuil (‘construction pour enfant’), where cage stands for the structure
based on physical resemblance and écureuil for children based on their actions and behaviour
156
Only six N-lumière constructions are present in the data, all of which mean ‘distance parcouru par la lumière en un(e) N’. Arnaud (2003) contains one additional compound sharing this pattern (prise lumière), but which differs in meaning, suggesting a near absolute SRI value for this template.
289
within said structure. This contrasts with instances where the metaphor applies to the whole
compound, which is to say combinations that contain individual lexemes understood literally,
but that together consist of a metaphor. The following examples represent such cases:
(195) pot à tabac (‘personne grosse et courte’), barbe à papa (‘confiserie’),
nid à rats (‘logement obscur et malpropre’)
Pot à tabac, for instance, in its non-literal use, refers to a short and portly individual, an allusion
to the general shape of the object that the compound denotes literally (‘pot destiné à contenir du
tabac’). The individual constituents, however, are not the source of meaning. These cases may
all be paraphrased as “an N that is like a C.” Although we might wish to treat these as non-
compositional compounds, which would highlight the fact the individual elements do not
contribute directly to the meaning to the whole, such an approach would fail to recognize that
these compounds are, in Svensson’s (2008) terms, “motivatable.” The meaning of the whole is
in fact related to its parts, albeit in a roundabout manner as the compound must be understood
figuratively. They are therefore treated here as weakly compositional based on the fact that they
can be motivated.
NN compounds that are exocentric on the one hand and rely on tropes on the other are in fact
uncommon in the data (7 items). Conversely, N à N compounds with these features, while not
very frequent, are much easier to pick out (13 items):
(196) abreuvoir à mouche (‘plaie’), moulin à paroles (‘personne bavarde’),
chair à canon (‘militaire en première ligne’), bouche à feu (‘canon’)
The compounds described above most likely possess a similar transparency profile to the
compositional exocentrics from the previous section. Like those, weakly compositional
exocentrics are motivated constructions. The chief difference, however, is that these cases make
use of sense extension in their semantic representations, which adds an additional layer of
complexity to the relationship between form and meaning. They are therefore considered
slightly less transparent than literal exocentrics, but more transparent than the two types that
follow.
290
7.3.3.3 Partially Compositional: chat-château and soda à pâte
Exocentric compounds are partially compositional if only one of their constituting elements
retains meaning. Although the most likely candidate for meaning retention is the right-most
constituent (as in 197a), the head may occasionally contribute to the meaning of the whole
(197b):
(197) a. or-sol, trou-madame Y unrelated
b. chat-château, bourg-épine X unrelated
Partially compositional compounds such as these do not typically involve any relation between
their elements. This is entirely expected as there is no way to bind a semantically irrelevant
constituent to a semantically relevant one.
Only one such case is found for N à N compounds, soda à pate, which is a calque of the English
baking soda. While this compound is in fact endocentric in English, the head soda should be
translated in French as soude. The borrowed term is therefore technically meaningless in
French, which might explain why the Office québécois de la langue française argues against its
usage157.
7.3.3.4 Non-Compositional: cap-mouton and sagne à tamis
Non-compositional exocentric compounds represent the most opaque type of compound. These
are typically instances of lexicalized combinations that can no longer be motivated on
synchronic grounds. Most of these compounds entered the lexicon long ago—any relationship
between the constituents and the meaning of the whole is no longer apparent without specific
knowledge of their etymological or historical origins. Examples are as follows:
(198) a. cap-mouton, chef-mois, compère-loriot, coq-souris, mont-joie
b. fauteuil à voile, manche-à-balle, sagne à tamis, ventre à choux
157
“Le terme soda à pâte, calque de l'anglais baking soda, est un emprunt entrant inutilement en concurrence avec les termes français existants” (OQLF, <http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=8871488>).
291
It is highly unlikely that a speaker, knowing only the meaning of the constituting elements,
would be able to interpret these compounds even if contextual information were provided. From
a purely semantic point of view, these compounds have more in common with simplex words
than they do with other compounds as their constituents provide no means with which to
accurately establish meaning for the whole. The relationship between form and meaning in these
instances is therefore non-existent.
7.4 Summary
In this chapter, I sought to synthesize the compound properties discussed in previous chapters,
focussing on how these features might combine into a more granular typology of semantic
transparency. The examination of the data from the perspective of this typology revealed a
diverse, if not disproportional set of compounds. Table 7.8 on the following page summarizes
the typology discussed in this chapter and contains example compounds for each combination of
factors. The contents of the table are ordered in descending order of transparency, which is to
say that the compounds at the top are more transparent than those at the bottom.
Given the limited number of compound templates present in the data, further subdivision based
on semantic relations have been omitted from the table. The reader is encouraged to consult
Section 7.3.1.1 for a discussion on how these relations apply to the largest class in the typology.
Based on the features and properties discussed, French compounds are accounted for in at least
twelve of the sixteen semantic transparency profiles proposed in Section 7.2. While not every
configuration proved relevant for the data examined, one should be careful not to take these
observations as undeniable proof that the twelve attested profiles represent the limits of
transparency classification. A study of additional French compounds, compound types (e.g. AN,
NA, N de N, etc.) or of compounds in other languages may very well provide evidence in
support of unattested configurations.
292
Table 7.8. Summary of compounds and their features, ordered by transparency.
Endo. Canonical Head
Strong Centricity Compositionality Compounds
+ + + Full passage piétons boîte à outils
+ + + Weak mot-clé piano à queue
+ + + Partial bateau-mouche ---
+ + − Full valse-hésitation poire à poudre
+ − + Full bracelet-montre ---
+ − + Weak reine-marguerite ---
+ − + Partial aube-vigne ---
+ − − Full vidéo-lynchage ---
− NA NA Full ballon-panier pied à boule
− NA NA Weak radio-trottoir cage à écureuil
− NA NA Partial chat-château soda à pâte
− NA NA Non cap-mouton sagne à tamis
It should also be noted that caution must be taken when comparing transparency across
compound types. In other words, there is no definite indication that we may objectively evaluate
the transparency of NN compounds alongside N à N compounds. While both types were
assessed and classified according to the typology, it is possible that one type shows a greater
overall degree of transparency than the other. We might for instance hypothesize that, all things
being equal, N à N compounds are more transparent than other types based on the fact that the
preposition was shown to be meaningful, which restricts both the possible relational associations
that may emerge and head position configurations. Even this line of reasoning, however, would
need to factor in distributional differences for relational categories (i.e. intersective relations are
not applicable for N à N compounds, but highly relevant for NN compounds), which makes
cross-type comparisons all the more difficult.
293
In summary, the proposed typology offers a richer and more granular approach to the concept of
semantic transparency. The features discussed, when combined, arguably better reflect the
numerous characteristics that determine a compound’s semantic representation from the
perspective of interpretation.
294
Chapter 8
Conclusion
The explicit aim of this thesis was to re-examine the concept of semantic transparency from the
perspective of compounds, the purpose of which was to propose a typology that offered both a
comprehensive and granular approach to the concept. To this end, a data set of French nominal
compounds was analyzed and four particular semantic features were examined, namely
headedness, compositionality, semantic homogeneity, and unexpressed relations.
8.1 Contributions of the Thesis
Semantic transparency, both as a theoretical concept and as a lexical property, has traditionally
been understood narrowly as a direct function of the relationship between a complex unit’s
meaning and its constituents. In other words, transparency is usually equated with
compositionality, the result of which is that a compound is considered transparent if it is
compositional and opaque if it is not. My review of the literature on the topic in Chapter 2
showed that, although this description is in fact prevalent among researchers, there have been
efforts to move away from this narrow view of the concept toward a more multi-faceted
approach. In the course of this discussion, I argued that transparency should be understood
holistically as a property involving several interrelated features (one of which is
compositionality) and that any attempts to formalize the concept should seek to incorporate
these features into its model. It is my contention that such an approach offers not only a richer
conceptual space within which to discuss the semantics of compounds, but also a more effective
means to distinguish between them from the perspective of meaning construal.
The typology of semantic transparency proposed in Chapter 7 is the result of the analysis of over
1,000 attested French compounds (729 NN and 319 N à N) collected from Wiktionary. The
decision to focus on these types of constructions was in part based on the fact that French
compounds have received limited attention with regards to transparency. Moreover, positional
295
constraints on the head constituent are typically looser in French than they are in English, not to
mention that French binomial constructions may make use of linking units (i.e. N à N). It should
be noted, however, that none of the features examined are in fact language dependent, but
simply that French compounding offers a greater diversity of material with which to discuss
transparency.
Overall, my typology consists of two major parameters, each of which is actualized along a
number of other properties. The first is headedness (or centricity), which is used to distinguish
between endocentric and exocentric compounds. This distinction is deemed crucial to the
evaluation of a compound’s semantic transparency given that the head is the principal classifier
for the item: an endocentric compound provides speakers with the key features necessary to
establish its broad denotation. Furthermore, head position is also considered significant
whenever variation may occur. In French, for instance, NN compounds are typically left headed,
but they may nevertheless be right-headed under certain conditions (e.g. auto-école, taupe-
grillon). This fact is accounted for by incorporating the notion of canonical head into the
typology, which assumes that non-standard heads pose a greater challenge to speakers
attempting to establish meaning. This feature is parameterized according to language and
compound type (e.g. canonical head position for English NN compounds is to the right, while in
French it is to the left; nominal AN compounds in both of these languages are typically right-
headed). Finally, compounds for which the head undergoes an established sense extension are
considered weakly endocentric (e.g. metaphor in valse-hésitation) and are therefore less
transparent than literal (or strong) endocentrics, but more transparent than exocentrics.
The second major component of the typology is compositionality, which is determined by the
semantic contribution of a compound’s constituents. Endocentric compounds are at a minimum
partially compositional as the head necessarily contributes meaning to the whole; exocentric
compounds, on the other hand, may be completely non-compositional. Furthermore, just as
tropes may affect centricity, so too may they affect compositionality: when the non-head
constituent contributes meaning via an established trope, it is considered weakly compositional
(e.g. metaphor in mot-clé). Both headedness and compositionality were the focus of Chapter 4.
To my knowledge, the approach adopted in this work for both of these features offers new
insight into compound transparency. By proposing weak variants for endocentricity and
296
compositionality, we are able to highlight instances of compounds that can be motivated on non-
literal grounds without treating them as opaque.
Also explored in Chapter 4 is a factor I called semantic homogeneity, which is the measure of
how semantically related similar compounds are. Compounds, like other multi-lexeme
constructions, may be reduced to patterns based on a single shared constituent (e.g. papier-N or
pompe à N, where the common element is the head, and N-lumière or N à vapeur, where the
shared constituent is instead the modifier). By considering all compounds that fit a particular
template, we may determine how semantically homogeneous they are by dividing the number of
compounds that share the same relational association by the total number of compounds for that
template. The result of this equation was labeled the semantic reliability index (or SRI).
Compounds with a high SRI value indicate that they pattern semantically with other similar
types, which might, on the one hand, indicate a strong compatibility between their constituents,
and on the other, have a facilitatory effect on its interpretation if the speaker relies on analogy
during processing. It was argued, however, that this approach requires a large number of types
for the SRI to be truly meaningful and that it is likely to be most useful when trying to evaluate
transparency for compounds that otherwise share the same transparency features.
The fourth major component of this work involves the unexpressed relations observed between
a compound’s constituents. A survey of the French compounds retained revealed that of the 15
most prominent relations cited in the literature, a mere 10 sufficed to account for more than 75%
of the NN French constructions examined. For N à N compounds, the number of relevant
relations was even lower as the preposition was shown to impose significant constraints on what
type of association may be instantiated between constituents: 5 of the basic relations retained
account for 78% of all N à N compounds examined.
The typology proposed in the previous chapter incorporates all of these features into a hierarchy
consisting of 16 possible configurations based on headedness and compositionality. Of these
possible configurations, only 12 were found to be relevant in French. Compounds were
subsequently ordered according to these features, the results of which are reproduced in the
following table:
297
Table 8.1. Transparency configurations, from most to least transparent.
Endo. Canonical Head
Strong Centricity Compositionality Compounds
+ + + Full passage piétons boîte à outils
+ + + Weak mot-clé piano à queue
+ + + Partial bateau-mouche ---
+ + − Full valse-hésitation poire à poudre
+ − + Full bracelet-montre ---
+ − + Weak reine-marguerite ---
+ − + Partial aube-vigne ---
+ − − Full vidéo-lynchage ---
− NA NA Full ballon-panier pied à boule
− NA NA Weak radio-trottoir cage à écureuil
− NA NA Partial chat-château soda à pâte
− NA NA Non cap-mouton sagne à tamis
Chapter 7 also describes how semantic relations may be ranked according to transparency,
where synthetic compounds, along with those involving a purposive relation based on a
constituent’s proper function, are considered the most transparent because they are relationally
“complete”. The order of all relations, grouped according to their shared properties, is given as
follows:
(199) ARGUMENT > PURPOSE (PF) > adjectival > intersective > subordinate > SIMILARITY > idiosyncratic
It was suggested that subordinate compounds with identical transparency profiles, which
account for the largest class of relational compound, could be differentiated using the SRI
described in Chapter 4.
The work presented in this thesis is based on the premise that a compound’s meaning—and thus
its semantic transparency—is not a simple function of its parts. Rather, a compound’s meaning
298
construal relies on a number of factors, all of which may influence how easily it may be
understood. This work, while theoretical, represents a first step toward a formalization of
transparency that takes into account the multi-faceted nature of compounds. That said, the role
of compositionality in a compound’s overall transparency cannot be ignored: according to the
analysis conducted in Chapter 7, approximately 80% of NN and 81% of N à N compounds are
fully compositional in the sense adopted in this work (i.e. both A and B contribute meaning to
the whole literally). These findings reveal two crucial points about the semantics of compounds.
First, compounding is evidently a productive process, in the qualitative sense of the term (what
Bauer 2001 calls availability158): their formation is governed by traditional generative
operations that produce semantically compositional constructions. In this regard, these findings
might be said to lend support to a treatment of compounding within syntax, where fully
compositional compounds are generated according to the same rules that produce syntactic
phrases (similar to Lees 1960 or Levi 1978; see also Fabb 1998, Di Sciullo & Williams 1987).
After all, unlike morphological derivation, compounding is not a closed system: any two words
may feasibly be combined to form a compound, an analogous principle to phrase generation. At
the level of processing, non-compositional compounds might therefore be stored as single units
and accessed as such, whereas compositional compounds might undergo decomposition as any
other phrase might. This approach, however, only paints part of the picture as it must still
contend with the fact that relational information is in most instances implicit and that many
compounds, despite their compositionality, do not always provide a means for these relations to
emerge (i.e. they are not fully predictable given the nature of the components, e.g. chêne
kermès). This is all the more pertinent given that the majority of compounds examined can be
accounted for using a relatively small set of basic relations, which suggests that there are in fact
constraints on compound formation that are not easily accounted for using compositionality
alone. This is in part why I argued in Chapter 2 that compositionality does not strictly imply
transparency, but that transparency most certainly entails compositionality (see Section 2.4.1 for
a discussion of this relationship).
158
“The availability of a morphological process is its potential for repetitive rule-governed morphological coining, either in a general or in a particular well-defined environment or domain” (Bauer: 2011). Also, see Schröder (2011) for a recent discussion of morphological productivity.
299
Second, the prominence of compositionality within the data also shows why expanding
transparency to include other factors is a worthwhile endeavour. If transparency is simply
another word for compositionality, then 80% of compounds are transparent, yet a close
examination of the data reveals that many of these fully compositional compounds are
semantically distinct on other grounds: compounds such as café-filtre and jambon-beurre differ
in terms of centricity, while head position distinguishes auto-mitrailleuse from auto-école.
There is thus sufficient evidence to support an expanded view of transparency, one that takes
into account compositionality while also making use of other features to further distinguish
between otherwise similar compounds. The features explored in this thesis allow for this more
granular model of the concept.
Even if the proposed typology were found to be incorrect, it is unlikely that it would result in a
significant reduction of features. As I argued in Chapter 7, the semantic transparency of
compounds involves, at a minimum, headedness, compositionality, and unexpressed relations;
semantic homogeneity measures are meant to further augment these three basic factors. Any
issues with the typology are most likely to be related to how these features are weighted, which
could be adjusted following testing with speakers (this will be touched upon briefly in Section
8.4.5).
Apart from the proposed typology, the work conducted here also provides a great deal of
practical information with which to pursue future research on French compounds. For instance,
the data collected on headedness and other properties may be used to further explore, among
other things, the effects of mismatched morphological features (i.e. gender percolation).
Furthermore, the work on semantic relations offers empirical support to the notion that
compounds typically make use of a small set of recurring associations. Additionally, it was
shown that not only does the preposition à greatly restrict what relations may emerge for a given
pair of nouns, it also affects the directionality of said relations: while many NN compounds
allow for relations to be reversed, N à N constructions do not possess the same degree of
flexibility. Subsequent work on N de N constructions could offer additional insight on the
relationship between prepositions and compound relations. These findings could ultimately
contribute to future morphological and semantic research on compounding, as well as inform
other fields such as psycholinguistics and computational linguistics, where the focus is on
300
gaining a better understanding of meaning composition by looking more closely at how
compounds are both stored and processed.
8.2 Remarks on the Wiktionary Data
In chapter 3, I discussed some of the issues related to the use of Wiktionary as the source of
compounds for this work, namely that its openness and its lack of lexicographic rigour might
introduce questionable items into the data. While it is true that some infrequent or region
specific compounds were present in the final dataset (e.g. manche à balle, fauteuil à voile,
radio-trottoir), the vast majority of the entries retained are also accounted for in traditional
dictionaries, either LPR2010 or TLFi. Although I did not cross-reference every single entry
because many of the compounds under investigation were known to me as a native speaker (e.g.
moulin à vent, barbe à papa, café-crème), those that required a closer examination were either
listed entries in those dictionaries, or could be found listed within the entries of the head word
(e.g. papier-bible, for instance, is not listed separately in LPR2010, but it can be found within
the entry for the lemma papier). In instances where no reference could be found for a particular
compound, a search in specialized dictionaries or older texts usually provided sufficient
information regarding both its usage and its origins (see Chapter 6 and 7 for examples of such
cases).
The question, of course, is whether some of the conclusions advanced in this work are in fact
based on a representative sample of French compounds. To answer this question would arguably
require that another dataset be compiled and that the same work be conducted with those
compounds. The methodological choice made at the beginning of this work, which is to say to
rely solely on Wiktionary as the source of data, no doubt introduced uncommon constructions
into the study, but as long as these compounds could be said to exist (i.e. attested elsewhere),
then they are legitimate items for the investigation of transparency as they must still be
interpreted by speakers.
8.3 Polylexical.com
As was mentioned in Chapters 1 and 3, the nominal compounds extracted from Wiktionary have
been tagged and made available to other researchers at www.polylexical.com through a database
301
and search interface I created over the course of this project. Although all 10,000 nominal
constructions are included in the database, only NN and N à N compounds are fully labeled. All
other constructions (AN, NA, VN, N de N, etc.) are nevertheless labeled with their lexical
categories, gender, and number. The following figure shows the search interface available to
users.
Figure 8.1. Search interface for Polylexical.com.
Users may search for any string using the input in section (1). The basic search function will
return all matches, regardless of position. For instance, a search for the string “table” will return
table d’hôte and sel de table, as well as partial matches like expert-comptable and étable à
pourceaux. For more precise results, users may use the advanced search function in (2), which
allows for specific positions to be targeted, including exact string matches (i.e. so that a search
for “table” does not return expert-comptable). The advanced search function also makes use of a
number of parameters to further restrict results. These parameters, shown in (3), allow users to
conduct searches according to constituents’ part of speech and gender, as well as the
compound’s linking unit (e.g. preposition, hyphen, determiner, etc.), head, gender, number, and
302
semantic relation. The very last parameters are based on the semantic reliability index discussed
in Chapter 4. Users may search for recurring templates for either the leftward or rightward
constituent, while also performing queries based on a compound’s SRI value. The following
screenshot shows the results of a search for N à N compounds involving the USE relation and
having no fewer than four occurrences of N1.
Figure 8.2. Sample of search results from Polylexical.com.
The entire dataset can also be downloaded as a comma separated value (CSV) file, which will
allow other researchers to further manipulate and label the compounds according to the needs of
their own projects.
8.4 Future Perspectives
Although the features and factors discussed in this work were meant to extend previous models
of compound transparency without introducing a large number of variables, there are
nevertheless other aspects of compounding that merit exploration. Some of these additions
involve refining existing semantic properties, namely the treatment of tropes, while others
consist of incorporating new features into the typology, such as the conceptual classes of a
303
compound’s constituents. Not all additions, however, are semantic in nature. There has also
been a great deal of research on more quantitative factors in compound processing, such as in
overall and relative frequency, as well as in morphological family sizes. What follows are brief
descriptions of how future work on transparency could benefit from these areas of research.
8.4.1 Sense Extension
Although sense extension was incorporated into the typology proposed in this work, its
implementation is somewhat rudimentary. To wit, when a trope is present in either constituent,
the compound is classified as weak regardless of the trope’s type (metaphor or metonymy) or its
relative complexity. As was briefly touched upon in Chapter 4, however, tropes may operate
together at several different levels. According to Benczes’s (2006, 2010) study of exocentric
compounds in English, metonymy and metaphor may be present individually or together and
may combine into several different configurations. Thus, the modifier may involve a metonymy,
while the head makes use of a metaphor (e.g. firedog ‘iron support for burning logs in a
fireplace’), or both constituents might involve a metaphor (e.g. flame sandwich ‘note consisting
of a negative comment between two positive ones’). Benczes also shows that these tropes may
simultaneously apply to the relation held between constituents and the constituents themselves.
For instance, the compound bell-bottoms (‘trousers that are very wide at the bottom of the leg’)
involves a relational metaphor involving shape (i.e. bottoms shaped like a bell), and a
metonymy on the head constituent (i.e. bottoms for pants → PART FOR WHOLE). The fact that
several tropic configurations are possible suggests that compounds involving metonymy and
metaphor may present greater variation in their transparency, which could then be reflected at
several different levels of the typology.
Furthermore, sense extension itself may emerge in a variety of ways. In fact, the terms metaphor
and metonymy cover a wide range of cognitive associations, many of which present different
types of processing challenges. Radden and Kövecses (1999), in their study of metonymy, show
that the concept may be actualized in several different ways, including (but not limited to)
instances of MANNER FOR ACTION, CAUSE FOR EFFECT, and CONTAINER FOR CONTENTS. By
combining cognitive and communicative principles, Radden and Kövecses offer a hierarchy of
metonymic vehicles, which they state as follows: form > form-concept > concept > reality.
According to their framework, forms are prefered over concepts because human experience
304
better relates to concrete objects than it does to abstract concepts; concepts, on the other hand,
rank higher than reality because a person’s experience is necessarily subjective, which may not
always coincide with reality. Form-concepts, which are understood as signs (i.e. a word used to
refer to a real thing or event) occupy an intermediate space between these two realms. What
Radden and Kövecses show is that metonymy is a multi-layered phenomenon and that its
presence in language is a complex matter with clear consequences on how speakers both use and
process language. The same may be said of metaphor (Kövecses 2002).
The manner in which tropes were incorporated into the typology of semantic transparency
proposed in this work, while clearly a reasonable first step, undoubtedly represents only a partial
formalization of their role in the interpretation of compounds.
8.4.2 Conceptual Classes
When speakers are presented with a pair of nouns, they must establish how these items relate to
each other. Psycholinguistic research on concept combination is well-established and has shown
that speakers make use of the classes of the consituents involved (animal, plant, artefact, etc.;
Wisniewski and Love 1998), as well as the frequency of a particular relation for a given lexical
item (Gagné and Shoben 1997). Downing (1977) had already observed that the relation held
between a compound’s constituents was related to the semantic classes of its constituents. Thus,
compounds involving animals favour relations targetting appearance and habitat, while those
containing natural objects favour composition, origin, and location.
Recent work by Maguire et al. (2010) on a large corpus of English compounds used statistical
data to assess both semantic similarity of combined items and the frequency with which certain
concepts co-occur. In their first analysis, they found a strong correlation between the semantic
content of the consituents and their use in combinations. More specifically, the more
semantically similar two nouns were, the more likely they were to combine. This is explained
according to slot-filling theories, where the modifier fills a slot in the head; conceptually similar
nouns are more likely to allow for this operation to occur. Maguire et al. also conducted a
second test that looked at the conceptual classes of the constituents. They found that some
combinations were highly recurrent: the most frequent compounds are plant-plant combinations
(54%), followed by substance-substance (24%) and location-group (13%). More importantly,
305
they found a strong correlation between pairs of categories and the type of relation they
instantiated. For instance, a random sample of 100 substance-artefact combinations showed that
they involved a composition relation in 68% of cases; the same sampling for area-animal
combinations revealed an almost exclusive use of a locative relation (91% of cases).
When discussing the semantic reliability index in Chapter 4, I mentioned that relying on
templates based on a particular lexeme (i.e. papier-N, pompe à N, etc.) may only paint part of
the picture, that templates might benefit from using semantic classes or categories instead of
specific lexemes. Ryder’s (1994) own work used both methods and showed that classes, when
sufficiently constrained, correlated with a speaker’s interpretation of novel compounds (i.e.
container + N). In Chapter 7, I suggested that one of the benefits of the SRI is that it implicitly
reveals the compatibility between not only individual constituents, but also their classes: the
more frequently a particular template makes use of a relation, the more likely it is that
constituents will share features or be compatible via other means. Given these observations,
along with the findings discussed above, it seems entirely justified to incorporate conceptual
classes into a model of semantic transparency. At present, we may assume that compounds
involving items of either similar or frequent conceptual classes possess a greater degree of
semantic transparency than those involving otherwise incompatible constituents. Future work
would require that these compatibilities be quantified and that any language specific properties
be established.
8.4.3 Frequency
Little was said here about lexical frequency as a factor of transparency. The principal reason for
this omission is that frequency is not a semantic property, nor is it a property unique to
compounds (see Ford et al. 2010 or Hay 2003 for frequency effects in derivational morphology).
That said, substantial research has been conducted on the effect of item frequency on compound
processing, a factor some might wish to see included in a formal model of transparency.
Experimental data tends to show that there exists an inversely proportional relationship between
frequency of a lexical item and the time it takes speakers to process it, though results vary with
regards to constituent and whole item frequency effects. Early research by Van Jaarsveld and
Rattink (1988) on processing novel and lexicalized Dutch compounds showed that lexical
306
decision times were generally only affected by the frequency of the compound and not by that of
its constituents, although some of their tests revealed that the frequency of the second noun (i.e.
the head), under certain conditions, could also affect response times. Just over twenty years
later, Baayen et al. (2010) found similar results for lexical decision tasks involving English
compounds: compound frequency is by far the strongest predictor of response time latencies,
with constituent frequencies seemingly having little effect. Interestingly, however, Baayen et al.,
arguing that traditional linear models of measurement are insufficient, re-examined the data
using a non-linear model, which showed that the effect of a compound’s frequency on reaction
times is in some part modulated by the modifier’s frequency when the compound itself is of low
frequency. These findings, along with other observations based on experiments in naming tasks
and eye-tracking records, led Baayen et al. to hypothesize that frequency effects in compound
processing is a complex system of interactions involving several factors.
A recent study based on user ratings, however, reveals that constituent frequency may in fact
play a significant role in how speakers interpret compounds. In Bell and Schäfer (2013),
participants were tasked with rating the perceived literality (i.e. compositionality) of a series of
compounds on a scale of 0 to 5. Bell and Schäfer then analyzed these ratings using frequency
data for their constituents and found that the literality judgements of the raters strongly
correlated with the frequency of either constituent. In other words, the more frequent a particular
constituent was, the higher the literality rating, and vice-versa. The results of their analysis
suggest that speakers view compounds as more literal if their constituents are highly familiar.
While compound and constituent frequency is no doubt relevant to models of lexical processing
(i.e. decomposition vs. full-access theories), I think that, conceptually, it should only be viewed
as a secondary component of semantic transparency given how the concept was defined in
Chapter 7159. The reasoning behind this position is that, while the more frequent a compound’s
constituents are, the easier it may be to recognize as a possible word (cf. lexical decision tasks),
frequency itself says nothing about the meaning of the item or how it relates to the meaning of
the whole construction. Of course, the only way to confirm this hypothesis is to conduct
159
Crucially, the semantic transparency of a compound is based solely on its constituents (i.e. the speaker is unfamiliar with the compound itself).
307
experiments in which token frequency is accounted for within the proposed typology.
Regardless of the position held on the matter, were the hierarchy advanced here taken as a
framework with which to evaluate the likelihood that certain compounds would be stored and
accessed as single units, then frequency data would no doubt prove to be a necessary component
of such a model.
8.4.4 Family Size
Also related to some degree to frequency effects in lexical processing is morphological family
size, which is loosely defined as the number of complex words in which a particular item may
participate. Research has successfully shown that the family size of a constituent or affix has an
effect on how speakers process complex words. Schreuder and Baayen (1997), for instance,
reported that participants responded much faster in lexical decision tasks for simplex words with
large family sizes. They followed up this work in Bertram et al. (2000) and found that the family
size of affixes in Dutch affected participants’ reaction times, although significant semantic
effects were also observed (i.e. family size effects were only observed when semantically
unrelated family members were removed from the analysis). Subsequent work on compounds
has shown similar effects, but with nevertheless interesting differences. De Jong et al. (2002),
for instance, found a positional family frequency effect for both English and Dutch compounds,
which is to say that the frequency of a constituent in a particular position was a better predictor
of reaction times than the constituent’s family size. More recently, Juhasz and Berkowitz (2011)
again found evidence that family size has an effect on English compound recognition. In
particular, they found that participants responded faster to compounds when the first constituent
possessed a large family size. Furthermore, in a sentence reading experiment in which gaze
duration was measured, Juhasz and Berkowitz found that participants spent less time on
compounds containing constituents with large morphological families.
One will have no doubt noticed parallels between family size and the semantic homogeneity
feature discussed in Chapter 4. Although the templates discussed in that chapter were strictly
based on a single lexical item, the approach could easily be expanded to include family size.
What the findings in both Bertram et al. (2000) and De Jong et al. (2002) show, however, is that
the incorporation of family size into the typology should take into account constituent position,
308
as well as the semantic homogeneity of the family itself (i.e. the presence and number of
homonyms within the family).
8.4.5 Testing the Typology and Closing Remarks
Finally, the typology proposed here remains entirely theoretical until it is tested with speakers.
Although the features and factors discussed are all based on previous research on compounding,
including experimental work done in psycholinguistics, the resulting classification of
compounds cannot be said to be conclusive until it is confirmed with speakers. Chapter 2 looked
at some of the ways that a compound’s semantic transparency has been quantitatively assessed,
most of which involved lexical decision tasks measuring response times with or without priming
(Sandra 1990, Jarema et al. 1999, Libben et al. 2003). The assumption is that lower response
times indicate reduced processing costs, which may be interpreted as a higher degree of
semantic transparency. In this regard, such experiments could be used to test the validity of the
typology proposed here: if overall reaction times for compounds correspond to the different
levels of the typology, then we may assume that the order and weight of the retained features is
consistent with the compounds’ degree of transparency. Conversely, if no such correspondence
is found, then the typology would need to be modified according to the results. Caution,
however, should be taken regarding this approach, as lexical decision tasks are also sensitive to
factors largely unrelated to semantic transparency (e.g. word length).
Another issue that arises from this approach, however, is that the typology was created using
mostly established compounds, which makes it difficult to avoid introducing outside effects on
processing such as frequency or even word length (cf. Hudson and Bergman 1985). In other
words, many of the compounds that were analyzed will be familiar to speakers, which will result
in faster response times under most conditions. Moreover, as I had briefly touched upon earlier
when discussing frequency effects, it is not entirely clear if response times are in fact a good
measure of transparency, at least not as it is defined here. A transparent compound is understood
here as a compound for which meaning is easily determined. It is a qualitative property. In this
regard, I believe that the best way to evaluate the correctness of the typology is to make use of
questionnaires in which speakers are asked a) if they are familiar with the compound, b) if they
are familiar with the constituents, c) if they already know the meaning of the compound, d) if
they know the meaning of the constituents, and e) what the meaning of the compound is. This is
309
similar to the work conducted by, among others, Gleitman and Gleitman (1971) and Ryder
(1994). We might also wish to offer speakers the opportunity to rate each compound’s degree of
“transparency” themselves, similar to Sandra (1994) and Zwitserlood (1994). In this way, we are
able to gather information regarding the speaker’s knowledge of the compounds, as well as the
meaning they attribute it. Such an approach would therefore serve to either validate or invalidate
the typology proposed and ideally offer insight on how it could be improved upon in the future.
After all, while semantic transparency is arguably a fundamental property of compounds, it is
ultimately dependent on the speaker and his or her ability to make sense of the sometimes highly
ambiguous pairings that permeate the language.
310
References
Ackerman, Farrell and Philip LeSourd. 1997. Toward a lexical representation of phrasal predicates. In Complex Predicates, ed. Alex Alsina, Joan Bresnan, and Peter Sells, 67–106. Stanford University: Center for the Study of Language and Information.
Adams, Valerie. 1973. An introduction to modern English word-formation. London: Longman.
Adams, Valerie. 2001. Complex words in English. Harlow, England: Pearson Longman.
Allen, Margaret. 1978. Morphological investigations. Doctoral dissertation, University of Connecticut.
Amiot, Dany. 2005. Between compounding and derivation: Elements of word-formation corresponding to prepositions. In Morphology and its demarcations: selected papers fron the 11th Morphology Meeting, ed. Wolfgang U. Dressler, Dieter Kastovsky, Oskar E. Pfeiffer, and Franz Rainer, 183-196. Amsterdam: John Benjamins Publishing Company.
Anderson, Stephen R. 1982. Where’s Morphology? Linguistic Inquiry 13(4): 571–612.
Andreevskaia, Alina and Sabine Bergler. 2006. Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In Proceedings of the 11th Conference of the European Association for Computational Linguistics, 209–216. Trento, Italy.
Anscombre, Jean-Claude. 1990. Pourquoi un moulin à vent n’est pas un ventilateur. Langue française 86: 103–125.
Anscombre, Jean-Claude. 1999. Le jeu de la prédication dans certains composés nominaux. Langue française 122: 52–69.
Apothéloz, Denis. 2002. La construction du lexique français: principes de morphologie dérivationnelle. Paris: Ophrys.
D’Arcais, Giovanni B. Flores. 1993. The Comprehension and Semantic Interpretation of Idioms. In Idioms: Processing, Structure, and Interpretation, ed. Cristina Cacciari and Patrizia Tabossi, 79–98. Hillsdale, NJ: Lawrence Erlbaum Associates.
Arcodia, Giorgio F., Nicola Grandi, and Bernhard Wälchli. 2010. Coordination in Compounding. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 177–198. Amsterdam: John Benjamins Publishing Company.
Arnaud, Pierre J. L. 2003. Les composés Timbre-poste. Lyon: Presses Universitaires de Lyon.
Arnaud, Pierre J. L. 2004. Problématique du nom composé. In Le nom composé: données sur seize langues, ed. Pierre J. L. Arnaud, 329-353. Lyon: Presses universitaires de Lyon.
Arnaud, Pierre J. L. 2008. Semantic Complexity in English [NN]N Compounds. Anglophonia 24: 7–21.
311
Arnaud, Pierre J. L., Emmanuel Ferragne, Diana M. Lewis, and François Maniez. 2008. Adjective + Noun sequences in attributive or NP-final positions: Observations on lexicalization. In Phraseology: an interdisciplinary perspective, ed. Sylviane Granger and Fanny Meunier, 111–125. Amsterdam: John Benjamins Publishing Company.
Aronoff, Mark. 1976. Word Formation in Generative Grammar. Cambridge, MA: MIT Press.
Aronoff, Mark. 2007. In The Beginning was the Word. Language 83(4): 803–830.
Aronoff, Mark and Kirsten Fudeman. 2005. What is Morphology? Malden, MA: Blackwell Publishing.
Baayen, R. Harald, Victor Kuperman, and Raymond Bertram. 2010. Frequency effects in compound processing. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 257–270. Amsterdam: John Benjamins Publishing Company.
Baayen, R. Harald and Rochelle Lieber. 1996. Word frequency distributions and lexical semantics. Computers and the Humanities 30(4): 281–291.
Baker, Mark. 1985. The mirror principle and morphosyntactic explanation. Linguistic inquiry 16(3): 373–415.
Banerjee, Satanjeev and Ted Pedersen. 2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational Linguistics and Intelligent Text Processing, ed. Alexander Gelbukh, 136–145. Berlin: Springer.
Barbaud, Philippe. 1971. L’ambiguïté structurale du composé binominal. Cahier de linguistique 1: 71–116.
Baron, Irène and Michael Herslund. 2001. Semantics of the verb HAVE. Typological Studies in Language 47: 85–98.
Baroni, Marco, Emiliano Guevara, and Vito Pirrelli. 2006. Sulla tipologia dei composti N+N in italiano: principi categoriali ed evidenza distribuzionale a confronto. In Atti del 40esimo Congresso della SLI, 21–23. Roma: Bulzoni. Cited in Baroni et al. 2007.
Baroni, Marco, Emiliano Guevara, and Vito Pirrelli. 2007. NN Compounds in Italian: Modelling Category Induction and Analogical Extension. Lingue e linguaggio 6(2): 263–290.
Bartning, Inge. 2001. Towards a typology of French NP de NP structures or how much possession is there in complex noun phrases with “de” in French. In Dimensions of possession, ed. Irène Baron, Michael Herslund, and Finn Sørensen, 147–167. Amsterdam: John Benjamins Publishing Company.
Bassac, Christian. 2006. A compositional treatment for English compounds. Research in Language 4: 133–153.
312
Bassac, Christian and Pierrette Bouillon. 2013. The telic relationship in compounds. In Advances in Generative Lexicon Theory, ed. James Pustejovsky, 109–126. Dordrecht: Springer.
Bates, Elizabeth and Brian MacWhinney. 1987. Competition, Variation, and Language Learning. In Mechanisms of Language Acquisition, ed. Brian MacWhinney, 157–194. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Bauer, Laurie. 1978. The Grammar of Nominal Compounding. Odense: Odense University Press.
Bauer, Laurie. 1983. English Word-Formation. Cambridge: Press Syndicate of the University of Cambridge.
Bauer, Laurie. 1998. When Is a Sequence of Two Nouns a Compound in English? English Language and Linguistics 2(1): 65–86.
Bauer, Laurie. 2001a. Compounding. In Language Typology and Language universals, Vol. 1, ed. Martin Haspelmath, König Ekkehard, Wulf Österreicher, and Wolfgang Raible, 695–707. Berlin: Mouton de Gruyter.
Bauer, Laurie. 2001b. Morphological Productivity. Cambridge: Press Syndicate of the University of Cambridge.
Bauer, Laurie. 2003. Introducing Linguistic Morphology. 2nd ed. Washington, DC: Georgetown University Press.
Bauer, Laurie. 2004. A Glossary of Morphology. Edinburgh: Edinburgh University Press.
Bauer, Laurie. 2008a. Dvandva. Word Structure 1(1). 1–20.
Bauer, Laurie. 2008b. Les composés exocentriques de lʼanglais. In La composition dans une perspective typologique, ed. Dany Amiot, 35–47. Arras: Artois presses université.
Bauer, Laurie. 2010. The typology of exocentric compounding. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 167–175. Amesterdam: John Benjamins Publishing Company.
Bavoux, Claudine. 2008. Le français des dictionnaires: l’autre versant de la lexicographie française. Bruxelles: De Boeck.
Becker, Thomas. 1993. Back-formation, cross-formation, and “bracketing paradoxes” in paradigmatic morphology. Yearbook of morphology 6: 1–25.
Bell, Melanie J. and Martin Schäfer. 2013. Semantic transparency: challenges for distributional semantics. In Proceedings of the IWCS 2013 workshop: Towards a formal distributional semantics, ed. Aurelie Herbelot, Roberto Zamparelli, and Gemma Boleda, 1-10. Potsdam: Association for Computational Linguistics.
313
Benczes, Réka. 2005. Metaphor- and metonymy-based compounds in English: a cognitive linguistic approach. Acta Linguistica Hungarica 52(2): 173–198.
Benczes, Réka. 2006. Creative compounding in English: the semantics of metaphorical and metonymical noun-noun combinations. Amsterdam: John Benjamins Publishing Company.
Benczes, Réka. 2010. Setting limits on creativity in the production and use of metaphorical and metonymical compounds. In Cognitive Perspectives on Word Formation, ed. Alexander Onysko and Sascha Michel, 219–242. New York: Mouton de Gruyter.
Bertram, Raymond, R. Harald Baayen, and Robert Schreuder. 2000. Effects of family size for complex words. Journal of Memory and Language 42(3): 390–405.
Bisetto, Antonietta and Sergio Scalise. 2005. The classification of compounds. Lingue e linguaggio, 4(2): 319–332.
Bloomfield, Leonard. 1933. Language. Chicago: The University of Chicago Press.
Bolinger, Dwight. 1975. Aspects of Language. 2nd ed. New York: Harcourt Brace Jovanovich.
Booij, Geert. 2007. The grammar of words: An introduction to linguistic morphology. 2nd ed. Oxford: Oxford University Press.
Booij, Geert. 2010. Compound construction: Schemas or analogy? A construction morphology perspective. In Cross-disciplinary issues in compounding, ed. Sergio Scalise and Irene Vogel, 93–108. Amesterdam: John Benjamins Publishing Company.
Borillo, Andrée. 1996. La relation partie-tout et la structure [NI à N2] en français. Faits de langues 4(7): 111–120.
Bosredon, Bernard and Irène Tamba. 1991. Verre à pied, moule à gaufres: préposition et noms composés de sous-classe. Langue française 91(1): 40–55.
Botha, Rudolf P. 1984. Morphological Mechanisms: Lexicalist Analyses of Synthetic Compounding. 1st ed. New York: Pergamon Press.
Bresnan, Joan and Sam A. Mchombo. 1995. The lexical integrity principle: Evidence from Bantu. Natural Language & Linguistic Theory 13(2): 181–254.
Brousseau, Anne-Marie and Emmanuel Nikiema. 2001. Phonologie et morphologie du français. Saint-Laurent, Québec: Fides.
Burrow, Thomas. 1955. The Sanskrit language. London: Faber and Faber.
Butterworth, B. 1983. Lexical representation. Language production 2: 257–294.
Cadiot, Pierre. 1997. Les prépositions abstraites en français. Paris: Armand Colin.
314
Cajolet-Laganière, Hélène, Pierre Martel, and Chantal-Édith Masson. 2010. Le dictionnaire général du français de l’équipe FRANQUS: quelques aspects originaux de la description lexicographique. In XXVe CILPR Congrès International de Linguistique et de Philologie Romanes Innsbruck, ed. Maria Iliescu, Heidi Siller-Runggaldier and Paul Danler, 241–249. Berlin: Walter De Gruyter.
Cañas, Alberto, J., Alejandro Valerio, Juan Lalinde-Pulido, Marco Carvalho, and Marco Arguedas. 2003. Using WordNet for word sense disambiguation to support concept map construction. In String Processing and Information Retrieval, ed. Mario A. Nascimento, Edleno S. de Moura, and Arlindo L. Oliveira, 350–359. Berlin: Springer-Verlag.
Caramazza, Alfonso, Alessandro Laudanna, and Cristina Romani. 1988. Lexical access and inflectional morphology. Cognition 28(3): 297–332.
Carstairs-McCarthy, Andrew. 2002. An introduction to English morphology: words and their structure. Edinburgh: Edinburgh University Press.
Cervoni, Jean. 1991. La préposition: étude sémantique et pragmatique. Paris: Duculot.
Chialant, Doriana and Alfonso Caramazza. 1995. Where is morphology and how is it processed? The case of written word recognition. In Morphological aspects of language processing, ed. Laurie Beth Feldman, 55–76. Hillsdale, NJ: L. Erlbaum Associates.
Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.
Chomsky, Noam. 1970. Remarks on Nominalization. In Readings in English Transformational Grammar, ed. Roderick A. Jacobs and Peter S. Rosenbaum, 184–221. Washington, D.C.: Georgetown University Press.
Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press.
Chomsky, Noam and Morris Halle. 1968. The Sound pattern of English. New York: Harper & Row.
Clark, Eve V. and Ruth A. Berman. 1987. Types of linguistic knowledge: Interpreting and producing compound nouns. Journal of Child Language 14(3): 547–567.
Cohen, Benjamin and Gregory L. Murphy. 1984. Models of concepts. Cognitive Science 8(1): 27–58.
Copestake, Ann and Ted Briscoe. 1995. Semi-productive Polysemy and Sense Extension. Journal of Semantics 12: 15-67.
Corbin, Danielle 1992. Hypothèses sur les frontières de la composition nominale. Cahiers de grammaire 17: 26–55.
Corbin, Danielle. 1997. Locutions, composés, unités polylexématiques : lexicalisation et mode de construction. La locution : entre langue et usages, 53–101. Fontenay: ENS Éditions.
315
Costello, Fintan J., Tony Veale, and Simon Dunne. 2006. Using WordNet to automatically deduce relations between words in noun-noun compounds. In Proceedings of the COLING/ACL on Main conference poster sessions, 160–167. Morristown, NJ: Association for Computational Linguistics.
Costello, Fintan J. and Mark T. Keane. 2000. Efficient creativity: Constraint-guided conceptual combination. Cognitive Science 24(2): 299–349.
Cruse, D.A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.
Dal, Georgette and Dany Amiot. 2008. La composition néoclassique en français et ordre des constituants. In La composition dans une perspective typologique, ed. Dany Amiot, 89–113. Arras: Artois Presses Université.
Darmesteter, Arsène. 1874. Traité de la formation des mots composés dans la langue française comparée aux autres langues romanes et au latin. Paris: A. Franck.
Derwing, Bruce L. and Royal Skousen. 1989. Morphology in the mental lexicon: A new look at analogy. In Yearbook of morphology 2, ed. Geert Booij and Jaap van Marle, 55–71. Dordrecht: Foris.
Dirven, René and Marjolyn Verspoor. 2004. Cognitive exploration of language and linguistics. 2nd rev. ed. Amsterdam: John Benjamins Publishing Company.
Dohmes, Petra, Pienie Zwitserlood, and Jens Bölte. 2004. The impact of semantic transparency of morphologically complex words on picture naming. Brain and language 90: 203–212.
Downing, Pamela. 1977. On the creation and use of English compound nouns. Language 53(4): 810–842.
Dressler, Wolfgang U. 1985. On the predictiveness of natural morphology. Journal of Linguistics 21(2): 321–339.
Estes, Zachary and Sam Glucksberg. 2000. Interactive property attribution in concept combination. Memory & Cognition 28(1): 28–34.
Estes, Zachary and Lara L. Jones. 2006. Priming via relational similarity: A copper horse is faster when seen through a glass eye. Journal of Memory and Language 55(1): 89–101.
Fabb, Nigel. 1998. Compounding. In The handbook of morphology, ed. Andrew Spencer and Arnold M. Zwicky, 66–83. Oxford: Blackwell Publishers.
Feldman, Laurie Beth and Matthew John Pastizzo. 2003. Morphological facilitation: The role of semantic transparency and family size. In Morphological structure in language processing, ed. R. Harald Baayen and Robert Schreuder, 233–258. Berlin: Mouton de Gruyter.
316
Feldman, Laurie B., Emily G. Soltano, Matthew J. Pastizzo, and Sarah E. Francis. 2004. What do graded effects of semantic transparency reveal about morphological processing? Brain and Language 90: 17–30.
Fellbaum, Christiane, ed. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Fonagy, Ivan. 1975. La structure sémantique des constructions possessives. In Langue, discours, société: pour Emile Benveniste, ed. Julia Kristeva, Jean Claude Milner and Nicolas Ruwet, 44–84. Paris: Éditions du Seuil.
Ford, M.A., M.H. Davis and W.D. Marslen-Wilson. 2010. Derivational morphology and base morpheme frequency. Journal of Memory and Language 63(1): 117–130.
Fradin, Bernard. 2003. Nouvelles approches en morphologie. Paris: Presses universitaires de France.
Fradin, Bernard. 2009. IE, Romance: French. In Oxford Handbook on Compounding, ed. Rochelle Lieber and Pavol Štekauer, 417–435. Oxford: Oxford University Press.
Frauenfelder, Uli H. and Robert Schreuder. 1992. Constraining psycholinguistic models of morphological processing and representation: The role of productivity. In Yearbook of morphology 1991, ed. Geert Booij and Jaap van Marle, 165–183. Dordrecht: Springer Netherlands.
Frisson, Steven, Elizabeth Niswander-Klement, and Alexander Pollatsek. 2008. The role of semantic transparency in the processing of English compound words. British Journal of Psychology 99(1): 87–107.
Gaeta, Livio and Davide Ricca. 2009. Composita solvantur: Compounds as lexical units or morphological objects. Italian Journal of Linguistics 21(1): 35–70.
Gagné, Christina L. 2001. Relation and lexical priming during the interpretation of noun-noun combinations. Learning, Memory 27(1): 236–254.
Gagné, Christina L. 2002. Lexical and Relational Influences on the Processing of Novel Compounds. Brain and Language 81: 723–735.
Gagné, Christina L. and Edward J. Shoben. 1997. Influence of Thematic Relations on the Comprehension of Modifier-Noun Combinations. Journal of Experimental Psychology: Learning, Memory, and Cognition 23(1): 71–87.
Gagné, Christina L. and Edward J. Shoben. 2002. Priming relations in ambiguous noun-noun combinations. Memory & Cognition 30(4): 637–646.
Gagné, Christina L. and Thomas L. Spalding. 2007. Conceptual Combination: Implications for the mental lexicon. In The representation and processing of compound words, Vol. 1, ed. Gary Libben and Gonia Jarema, 145–169. Oxford: Oxford University Press.
317
Gagné, Christina L. and Thomas L. Spalding. 2009. Constituent integration during the processing of compound words: Does it involve the use of relational structures? Journal of Memory and Language 60: 20–35.
Gagné, Christina L., Thomas L. Spalding, and Melissa C. Gorrie. 2005. Sentential context and the interpretation of familiar open-compounds and novel modifier-noun phrases. Language and Speech 48(2): 203–219.
Geeraerts, Dirk. 2002. The interaction of metaphor and metonymy in composite expressions. Metaphor and metonymy in comparison and contrast, ed. René Dirven and Ralf Pörings, 435–465. Berlin: Mouton de Gruyter.
Gibbs, Raymond, Nandini Nayak, and Cooper Cutting. 1989. How to kick the bucket and not decompose: Analyzability and idiom processing. Journal of memory and language 28(5): 576–593.
Girju, Roxana, Adriana Badulescu, and Dan Moldovan. 2003. Learning semantic constraints for the automatic discovery of part-whole relations. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 1–8. Association for Computational Linguistics.
Girju, Roxana, Dan Moldovan, Marta Tatu, and Daniel Antohe. 2005. On the semantics of noun compounds. Computer Speech & Language 19(4): 479–496.
Girju, Roxana, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. 2007. SemEval-2007 Task 04: Classification of semantic relations between nominals. In Proceedings of the 4th International Workshop on Semantic Evaluations, 13–18. Association for Computational Linguistics.
Giurescu, Anca. 1975. Les mots composés dans les langues romanes. The Hague: Mouton.
Gleitman, Lila R. and Henry Gleitman. 1971. Phrase and Paraphrase: Some Innovative Uses of Language. New York: Norton.
Glucksberg, Sam. 1993. Idiom meanings and allusional content. In Idioms: Processing, structure, and interpretation, ed. Cristina Cacciari and Patrizia Tabossi, 3–26. Hillsdale, NJ: L. Erlbaum Associates.
Goethem, Kristel Van. 2009. Choosing between A+N compounds and lexicalized A+N phrases: The position of French in comparison to Germanic languages. Word Structure 2(2): 241–253.
Goossens, Louis. 1995. Metaphtonymy: The interaction of metaphor and metonymy in figurative expressions for linguistic action. Pragmatics & beyond. New series 33: 159–174.
Grandy, Richard E. 1990. Understanding and the Principle of Compositionality. Philosophical Perspectives 4: 557–572.
318
Granger, Sylviane and Magali Paquot. 2008. Disentangling the Phraseological Web. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 27–49. Amsterdam: John Benjamins Publishing Company.
Grevisse, Maurice and André Goosse. 2011. Le bon usage: grammaire francaise. 15th ed. Bruxelles: De Boeck-Duculot.
Grice, Paul. 1975. Logic and conversation. Syntax and semantics 3: 41–58.
Gries, Stefan Th. 2008. Phraseology and Linguistic Theory: A Brief Survey. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 3–25. Amsterdam: John Benjamins Publishing Company.
Gross, Gaston. 1988. Degré de figement des noms composés. Langages 90: 57–72.
Gross, Gaston. 1996. Les expressions figées en français. Paris: Ophrys.
Guevara, Emiliano, Sergio Scalise, Antonietta Bisetto, and Chiara Melloni. 2006. Morbo/Comp: a multilingual database of compound words. Paper presented at the LREC 2006, 5th Conference on Language Resources and Evaluation, Genoa, Italy.
Ten Hacken, Pius. 1994. Defining morphology. Zurich: Georg Olms Verlag.
Ten Hacken, Pius. 1999. Motivated tests for compounding. Acta linguistica hafniensia 31(1): 27–58.
Hale, Kenneth L. and Samuel Jay Keyser. 2002. Prolegomenon to a theory of argument structure. Cambridge, MA: MIT Press.
Halle, Morris. 1973. Prolegomena to a theory of word formation. Linguistic inquiry 4(1): 3–16.
Halle, Morris and Alec Marantz. 1994. Distributed morphology and the pieces of inflection. The view from building 20. 111–176.
Hamel, Marie-Josée. 2010. Prototype d’un dictionnaire électronique de reformulation pour apprenants avancés de français langue seconde. Cahier de l’APLIUT 29(1): 73-82.
Hatcher, Anna Granville. 1960. An Introduction to the Analysis of English Noun Compounds. Word 16: 356–373.
Hay, Jennifer. 2003. Causes and consequences of word structure. New York: Routledge.
Herbst, Thomas. 1996. What are collocations: Sandy beaches or false teeth? English Studies 77(4): 379–393.
Hudson, Patrick T. W and Marijke W. Bergman. 1985. Lexical knowledge in word recognition: Word length and word frequency in naming and lexical decision tasks. Journal of Memory and Language 24(1): 46–58.
319
Van Jaarsveld, Henk J. and Gilbert E. Rattink. 1988. Frequency effects in the processing of lexicalized and novel nominal compounds. Journal of Psycholinguistic Research 17(6): 447–473.
Van Jaarsveld, Henk J., Riet Coolen, and Robert Schreuder. 1994. The role of analogy in the interpretation of novel compounds. Journal of Psycholinguistic Research 23(2): 111–137.
Jackendoff, Ray. 1974. Morphological and semantic regularities in the lexicon. Bloomington, IN: Indiana University Linguistics Club.
Jackendoff, Ray. 1992. Semantic Structures. Cambridge, MA: MIT Press.
Jackendoff, Ray. 2009. Compounding in the parallel architecture and conceptual semantics. In Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 105–128. Oxford: Oxford University Press.
Jackendoff, Ray. 2010. Meaning and the lexicon: the parallel architecture, 1975-2010. New York: Oxford University Press.
Jarema, Gonia, Céline Busson, Rossitza Nikolova, Kyrana Tsapkini, and Gary Libben. 1999. Processing Compounds: A Cross-Linguistic Study. Brain and Language 68(2): 362–369.
Jespersen, Otto. 1961. A modern English grammar on historical principles. Vol. VI. London: George Allen & Unwin. [1942].
Johnston, Michael and Frederica Busa. 1996. Qualia structure and the compositional interpretation of compounds. In Proceedings of the ACL SIGLEX workshop on breadth and depth of semantic lexicons, 77–88. Santa Cruz, California: Association for Computational Linguistics.
Jong, Nivja H. De, Laurie B. Feldman, Robert Schreuder, Matthew Pastizzo, and R. Harald Baayen. 2002. The processing and representation of Dutch and English compounds: Peripheral morphological and central orthographic effects. Brain and Language 81(1): 555–567.
Juhasz, Barbara J. and Rachel N. Berkowitz. 2011. Effects of morphological families on English compound word recognition: A multitask investigation. Language and Cognitive Processes 26(4-6). 653–682.
Kamp, Hans. 1975. Two theories about adjectives. In Formal semantics of natural language, ed. E. L. Keenan, 123–155. Cambridge, UK: Cambridge University Press.
Katz, Jerrold J. 1973. Compositionality, idiomaticity, and lexical substitution. In A festschrift for Morris Halle, ed. Morris Halle, Stephen R. Anderson, and Paul Kiparsky, 357–376. New York: Holt, Rinehart, and Winston.
320
Kehayia, Eva, Gonia Jarema, Kyrana Tsapkini, Danuta Perlak, Angela Ralli, and Danuta Kadzielawa. 1999. The Role of Morphological Structure in the Processing of Compounds: The Interface between Linguistics and Psycholinguistics. Brain and Language 68(1-2): 370–377.
Kim, Su Nam and Timothy Baldwin. 2005. Automatic interpretation of noun compounds using WordNet similarity. In Natural Language Processing–IJCNLP 2005, ed. Robert Dale, 945–956. Berlin: Springer.
Knittel, Marie-Laurence. 2009. Le statut des compléments du nom en [de NP]. The Canadian Journal of Linguistics 54(2): 255–290.
Knittel, Marie-Laurence. 2010. Modification et détermination dans les expressions N à N en français. Inria, ATILF. http://hal.archives-ouvertes.fr/docs/00/53/25/80/PDF/Knittel_N_A_N_.pdf
Kopecka, Anetta. 2006. The semantic structure of motion verbs in French. In Space in languages: Linguistic systems and cognitive categories, ed. Maya Hickmann and Stéphane Robert, 83–101. Amesterdam: John Benjamins Publishing Company.
Kövecses, Zoltan. 2002. Metaphor: A Practical Introduction. New York: Oxford University Press.
Lakoff, George and Mark Johnson. 1980. Metaphors we live by. Chicago: University of Chicago Press.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Theoretical prerequisites. California: Stanford University Press.
Langacker, Ronald W. 2009. Metonymic grammar. In Metonymy and metaphor in grammar, vol. 25, ed. Klaus-Uwe Panther, Linda L Thornburg, and Antonio Barcelona, 45–71. Amesterdam: John Benjamins Publishing Company.
Lapointe, Steven. 1980. Lexical Analysis of the English Auxiliary Verb System. In Lexical Grammar, ed. Teun Hoekstra, Harry van der Hulst, and Michael Moortgat, 215-254. Dordrecht: Foris Publications.
Lauer, Mark. 1995. Designing statistical language learners: Experiments on noun compounds. Doctoral dissertation, Macquarie University.
Lees, Robert B. 1960. The grammar of English nominalizations. 5th ed. 1968. Bloomington, IN: Mouton.
Leonard, Rosemary. 1984. The interpretation of English noun sequences on the computer. Amsterdam: Elsevier Science Publications Company.
321
Lesselingue, Chrystèle. 2003. Les noms composés [NN] N “holonymiques”: illustration de la spécificité sémantique des unités construites morphologiquement. In Silexicales 3. Les unités morphologiques, vol. 3. Silexicales, ed. Bernard Fradin, Georgette Dal, Nabil Hathout, Françoise Kerleroux, Marc Plénat, and Michel Roché, 100–107. Villeneuve-d’Ascq: U.M.R. SILEX.
Levi, Judith N. 1974. On the alleged idiosyncracy of non-predicate NP’s. In Papers from the Tenth Regional Meeting, Chicago Linguistic Society, ed. Michael W. La Galy, Robert A. Fox, and Anthony Bruck, 402–415. Chicago, IL: Chicago Linguistic Society.
Levi, Judith N. 1978. The Syntax and Semantics of Complex Nominals. New York: Academic Press.
Li, Xiaobin, Stan Szpakowicz, and Stan Matwin. 1995. A WordNet-based algorithm for word sense disambiguation. In Proceedings of the Fourteenth International Joint Conference On Artificial Intelligence, Vol. 2, ed. C. S. Mellish, 1368–1374. San Meteo, CA: Morgan Kaufmann Publishers.
Libben, Gary. 1998. Semantic Transparency in the Processing of Compounds: Consequences for Representation, Processing, and Impairment. Brain and Language 61(1): 30–44.
Libben, Gary. 2006. Why Study Compound Processing? An Overview of the Issues. In The Representation and Processing of Compound Words, ed. Gary Libben and Gonia Jarema, 1–22. Oxford: Oxford University Press.
Libben, Gary, Martha Gibson, Yeo Bom Yoon, and Dominiek Sandra. 2003. Compound Fracture: The Role of Semantic Transparency and Morphological Headedness. Brain and Language 84(1): 50–64.
Libben, Maya R. and Debra A. Titone. 2008. The multidetermined nature of idiom processing. Memory & cognition 36(6): 1103–1121.
Lieber, Rochelle. 1980. On the organization of the lexicon. Bloomington: Indiana University Linguistics Club.
Lieber, Rochelle. 1989. On percolation. In Yearbook of Morphology 2, ed. Geert Booij and Jaap van Marle, 95–138. Dordrecht: Foris.
Lieber, Rochelle. 1992. Deconstructing Morphology. Chicago: University of Chicago Press.
Lieber, Rochelle. 2004. Morphology and Lexical Semantics. Cambridge: Cambridge University Press.
Lieber, Rochelle. 2009. A Lexical Semantic Approach to Compounding. In The Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 78–104. Oxford: Oxford University Press.
Lieber, Rochelle. 2010. Introducing morphology. Cambridge, UK: Cambridge University Press.
322
Lieber, Rochelle and Sergio Scalise. 2007. The Lexical Integrity Hypothesis in a new theoretical universe. Lingue e linguaggio 1: 7–32.
Lieber, Rochelle and Pavol Štekauer, eds. 2009. The Oxford Handbook of Compounding. Oxford: Oxford University Press.
Maguire, Phil, Edward J. Wisniewski, and Gert Storms. 2010. A corpus study of semantic patterns in compounding. Corpus Linguistics and Linguistic Theory 6(1): 49–73.
Manning, Christopher D. and Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT press.
Marchand, Hans. 1965. The analysis of verbal nexus substantives. Indogermanische Forschungen 70(2): 117–145.
Marchand, Hans. 1960. The categories and types of present-day English word-formation. 2nd ed. 1969. München: Beck.
Marslen-Wilson, William, Lorraine. K Tyler, Rachelle Waksler, and Lianne Older. 1994. Morphology and meaning in the English mental lexicon. Psychological Review 101(1): 3–33.
Martin, Robert. 1997. Sur les facteurs du figement lexical. In La locution : entre langue et usages, ed. Michel Martins-Baltar and Blanche-Noëlle Grunig, 291–305. Fontenay-aux-Roses: ENS Éditions.
Mathieu-Colas, Michel. 1994. Les mots à trait d’union : problèmes de lexicographie informatique. Paris: Didier Erudition.
Mathieu-Colas, Michel. 1995. Un Dictionnaire électronique des mots à trait d’union. Langue Francaise 108: 76–85.
Mathieu-Colas, Michel. 1996. Essai de typologie des noms composés français. Cahiers de Lexicologie 69: 71–125.
Mel’čuk, Igor, André Clas, and Alain Polguère. 1995. Introduction à la lexicologie explicative et combinatoire. Louvain-la-Neuve: Duculot.
Melis, Ludo. 2003. La préposition en français. Paris: Ophrys.
Moldovan, Dan, Adriana Badulescu, Marta Tatu, Daniel Antohe, and Roxana Girju. 2004. Models for the semantic classification of noun phrases. In CLS ’04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, 60–67. Stroudsburg, PA: Association for Computational Linguistics.
Monsell, Stephen. 1985. Repetition and the lexicon. Progress in the psychology of language 2: 147–195.
323
Müller, Christof and Iryna Gurevych. 2009. Using wikipedia and wiktionary in domain-specific information retrieval. In Evaluating Systems for Multilingual and Multimodal Information Access: Ninth Workshop of the Cross-Language Evaluation Forum, ed. Carol Peters, 219–226. Berlin: Springer.
Murphy, Gregory L. 1988. Comprehending complex concepts. Cognitive Science 12(4): 529–562.
Navarro, Emmanuel, Franck Sajous, Bruno Gaume, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP, ed. Iryna Gurevych and Torsten Zesch, 19–27. Morristown, NJ: Association for Computational Linguistics.
Noailly, Michèle. 1990. Le substantif épithète. Paris: Presses universitaires de France.
Nunberg, Geoffrey. 1979. The non-uniqueness of semantic solutions: Polysemy. Linguistics and Philosophy 2(3): 145–184.
Nunberg, Geoffrey. 1995. Transfers of meaning. Journal of semantics 12(2): 109–132.
Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language 70(3): 491–538.
Olsen, Susan. 2001. Copulative compounds: a closer look at the interface between syntax and morphology. In Yearbook of morphology 2000, ed. Geert Booij and Jaap Marle, 279–320. Dordrecht: Springer Netherlands.
Partee, Barbara Hall, Alice G. B. ter Meulen, and Robert Eugene Wall. 1990. Mathematical methods in linguistics. Dordrecht: Kluwer Academic.
Pham, Hien and R. Harald Baayen. 2013. Semantic relations and compound transparency: A regression study in CARIN theory. Psihologija 46(4): 455-478.
Polguère, Alain. 2003. Lexicologie et sémantique lexicale. Montréal: Les Presses de l’Université de Montréal.
Pollatsek, Alexander and Jukka Hyönä. 2005. The role of semantic transparency in the processing of Finnish compound words. Language and Cognitive Processes 20(1): 261–290.
Pustejovsky, James. 1995. The Generative Lexicon. Cambridge, MA: MIT Press.
Radden, Günter and Zoltán Kövecses. 1999. Towards a theory of metonymy. In Metonymy in language and thought, ed. Klaus-Uwe Panther and Günter Radden, 17–60. Amsterdam: John Benjamins Publishing Company.
Rainer, Franz and Soledad Varela. 1992. Compounding in Spanish. Rivista di linguistica 4(1): 117–142.
324
Rastle, Kathleen and Marjolein Merkx. 2011. Semantic constraints on morphological processing. In Lexical Representation: A Multidisciplinary Approach, ed. M. Gareth Gaskell and Pienie Zwitserlood, 13–31. Berlin: De Gruyter Mouton.
Riegel, Martin. 1988. Les séquences composées N1-N2: une catégorie floue. Studia romanica posnaniensia 13: 129–138.
Riegel, Martin. 1991. Ces noms dits “composés” : arguments et critères. Studia Romanica Posnaniensia 16: 149–161.
Riegel, Martin. 2001. The grammatical category “Possession” and the part-whole relation in French. In Dimensions of Possession, ed. Irène Baron, Michael Herslund, and Finn Sorensen, 187–200. Amesterdam: John Benjamins Publishing Company.
Roelofs, Ardi and Harald Baayen. 2002. Morphology by itself in planning the production of spoken words. Psychonomic Bulletin & Review 9(1): 132–138.
Roeper, Thomas and Muffy E.A Siegel. 1978. A lexical transformation for verbal compounds. Linguistic Inquiry 9(2): 199–260.
Rosario, Barbara and Marti Hearst. 2001. Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, ed. Lillian Lee, 82–90. Association for Computational Linguistics.
Rosenberg, Maria. 2007. Classification, Headedness and Pluralization: Corpus Evidence from French Compounds. Acta Linguistica Hungarica 54(3): 341–360.
Rosenberg, Maria. 2008. La formation agentive en français. Les composés [VN/A/Adv/P]N/A et les dérivés V-ant, V-eur et V-oir(e). Doctoral dissertation, Stockholm University.
Rosenberg, Maria. 2011. Les composés francais VN – aspects sémantiques. Revue Romane 46(1): 69–88.
Roussarie, Laurent and Florence Villoing. 2003. Some semantic investigations on the French VN construction. In Proceedings of the Second International Workshop on Generative Approaches to the Lexicon, 1–8. Geneva.
Rubin, Gary S., Curtis A. Becker, and Roger H. Freeman. 1979. Morphological structure and its effect on visual word recognition. Journal of Verbal Learning and Verbal Behavior 18(6): 757–767.
Ryder, Mary Ellen. 1994. Ordered chaos. Berkeley: University of California Press.
Saint-Dizier, Patrick. 2006. Introduction to the Syntax and Semantics of Prepositions. In Syntax and Semantics of Prepositions, vol. 29, ed. Patrick Saint-Dizier, 1–25. Dordrecht: Springer.
325
Sandra, Dominiek. 1990. On the representation and processing of compound words: Automatic access to constituent morphemes does not occur. The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology 42(3): 529–567.
Scalise, Sergio. 1984. Generative morphology. Dordrecht: Foris Publications.
Scalise, Sergio. 1992. Compounding in Italian. Rivista di linguistica 4(1): 175–200.
Scalise, Sergio and Antonietta Bisetto. 2009. The Classification of Compounds. In The Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 34–53. Oxford: Oxford University Press.
Scalise, Sergio and Antonio Fábregas. 2010. The head in compounding. In Cross-disciplinary issues in compounding, ed. Sergio Scalise and Irene Vogel, 109–126. Amsterdam: John Benjamins Publishing Company.
Scalise, Sergio and Emiliano Guevara. 2006. Exocentric compounding in a typological framework. Lingue e linguaggio 2: 185–206.
Scalise, Sergio and Irene Vogel. 2010. Why Compounding? In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 1–20. Amsterdam: John Benjamins Publishing Company.
Schreuder, Robert and R. Harald Baayen. 1995. Modeling Morphological Processing. In Morphological aspects of language processing, ed. Laurie Beth Feldman, 131–156. Hillsdale, NJ: Lawrence Erlbaum Associates.
Schreuder, Robert and R.Harald Baayen. 1997. How Complex Simplex Words Can Be. Journal of Memory and Language 37(1): 118–139.
Schröder, Anne. 2011. On the productivity of verbal prefixation in English: synchronic and diachronic perspectives. Tübingen: Narr.
Di Sciullo, Anna-Maria and Edwin Williams. 1987. On the Definition of Word. Cambridge, MA: MIT Press.
Séaghdha, Diarmuid. 2008. Learning compound noun semantics. Doctoral dissertation, University of Cambridge.
Selkirk, Elisabeth. 1982. The Syntax of Words. Cambridge, MA: MIT Press.
Shoben, Edward J. 1991. Predicating and nonpredicating combinations. In Psychology of Word Meanings, ed. Paula J. Schwanenflugal, 117–135. Hillsdale, NJ: Erlbaum.
Skousen, Royal. 1989. Analogical modeling of language. Dordrecht: Kluwer Academic Publishers.
Spalding, Thomas. L and Christina. L Gagné. 2007. Semantic property activation during the interpretation of combined concepts. The Mental Lexicon 2(1): 25–47.
326
Spang-Hanssen, Ebbe. 1963. Les prépositions incolores du français moderne. Copenhague: G.E.C. Gad.
Štekauer, Pavol. 1998. An Onomasiological Theory of English Word-formation. Amsterdam: John Benjamins Publishing.
Štekauer, Pavol. 2005. Meaning Predictability in Word Formation: Novel, context-free naming units. Amsterdam: John Benjamins Publishing Company.
Storms, Gert and Edward J. Wisniewski. 2005. Does the order of head noun and modifier explain response times in conceptual combination? Memory & Cognition 33(5): 852–861.
Svensson, Maria Helena. 2004. Critères de figement: l’identification des expressions figées en français contemporain. Doctoral dissertation, Umea University.
Svensson, Maria Helena. 2008. A Very Complex Criterion of Fixedness: Non-Compositionality. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 81–93. Amsterdam: John Benjamins Publishing Company.
Tabossi, Patrizia, Rachele Fanari, and Kinou Wolf. 2008. Processing Idiomatic Expressions: Effects of Semantic Compositionality. Journal of Experimental Psychology: Learning, Memory, and Cognition 34(2): 313–327.
Taft, Marcus & Kenneth I. Forster. 1975. Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior 14(6): 638–647.
Taft, Marcus and Kenneth. I Forster. 1976. Lexical storage and retrieval of polymorphemic and polysyllabic words. Journal of Verbal Learning and Verbal Behavior 15(6): 607–620.
Taft, Marcus. 1981. Prefix stripping revisited. Journal of Verbal Learning and Verbal Behavior 20(3): 289–297.
Takada, Hareo. 2008. Le mot composé: étude contrastive de certains types de mots composés : japonais et français. Niigata: Niigata University, Graduate School of Modern Society and Culture.
Tan, Keng-Woei, Hyoil Han, and Ramez Elmasri. 2000. Web data cleansing and preparation for ontology extraction using WordNet. In Proceedings of the First International Conference on Information Systems Engineering, 11-18. Los Alamitos, CA: IEEE Computer Society.
Thompson, Sandra A. 1975. On the issue of productivity in the lexicon. Kritikon Litterarum 4: 332–349. Cited in Bauer 1983.
Tulloch, Sara. 1991. The Oxford Dictionary of New Words. Oxford and New York: Oxford University Press. Cited in Bauer 2001.
327
Tutin, Agnès and Francis Grossmann. 2001. Collocations régulières et irrégulières : esquisse de typologie du phénomène collocatif. Revue française de linguistique appliquée VII(1): 1–16.
Vanderwende, Lucy. 1994. Algorithm for automatic interpretation of noun sequences. In Proceedings of the 15th conference on Computational linguistics, vol. 2, 782–788.
Villoing, Florence. 2002. Les mots composés [VN]N/A du français: réflexions épistémologiques et propositions d’analyse. Doctoral dissertation, Université Paris X.
Villoing, Florence. 2003. Les mots composés VN du français: arguments en faveur d’une construction morphologique. Cahiers de grammaire 28: 183–196.
Villoing, Florence. 2009. Les mots composés VN. Aperçus de morphologie du français. 175–197.
Wälchli, Bernhard. 2005. Co-compounds and natural coordination. New York: Oxford University Press.
Warren, Beatrice. 1978. Semantic patterns of noun-noun compounds. Göteborg: Acta Universitatis Göthoburgensis.
Weiskopf, Daniel A. 2007. Compound Nominals, Context, And Compositionality. Synthese 156(1): 161–204.
Williams, Edwin. 1981. On the Notions “Lexically Related” and “Head of a Word.” Linguistic Inquiry 12(2): 245–274.
Wisniewski, Edward J. 1996. Construal and similarity in conceptual combination. Journal of Memory and Language 35: 434–453.
Wisniewski, Edward J. 1997. When concepts combine. Psychonomic Bulletin & Review 4(2): 167–183.
Wisniewski, Edward J. 1998. Property instantiation in conceptual combination. Memory & Cognition 26(6): 1330–1347.
Wisniewski, Edward. J. and Emily J. Clancy. 2004. You don’t need a weatherman to know which way the wind blows: The role of discourse context in conceptual combination. Unpublished manuscript. Cited in Storms and Wisniewski 2005.
Wisniewski, Edward J. and Bradley C. Love. 1998. Relations versus Properties in Conceptual Combination. Journal of Memory and Language 38(2): 177–202.
Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC ‘08), 1646–1652. European Language Resources Association.
Zwanenburg, Wiecher. 1992. Compounding in French. Rivista di linguistica 4(1): 221–240.
328
Zwitserlood, Pienie. 1994. The role of semantic transparency in the processing and representation of Dutch compounds. Language and Cognitive Processes 9(3): 341–368.
Lexicographic Resources and Other References
Guillaumin, M., ed. 1839. Dictionnaire du commerce et des marchandises. Vol. 2. Paris: Guillaumin et Compagnie.
Jardins de France. 1846. Annales de la Société royale d’horticulture de Paris, et Journal spécial de l’état et des progrès du jardinage. Vol. 37. Paris: Société d’horticulture.
Lacroix, Eugène, ed. 1884. Études sur l’Exposition de 1878, annales et archives de l’industrie au XIXe siècle. Vol. 2. Paris: Librairie scientifique, industrielle et agricole.
Littré, Emile, ed. 1873. Dictionnaire de la langue française contenant: la nomenclature, la grammaire, la signification des mots, la partie historique, l’étymologie. 4 vols. Paris: Hachette.
Martin, Shannon E. and David A. Copeland. 2003. The Function of Newspapers in Society: A Global Perspective. Westport, CT: Praeger.
Nysten, Pierre Hubert, Émile Littré, Charles Robin. 1858. Dictionnaire de médecine, de chirurgie, de pharmacie, des sciencies accessoires et de l’art vétérinaire. Paris: J.B. Baillière et Fils.
Panckoucke, Charles-Louis-Fleury, ed. 1821. Dictionaire des sciences médicales. Vol. 56. Paris: C. L. F. Panckoucke.
Rey-Debove, Josette, and Alain Rey, eds. 2010. Le nouveau Petit Robert 2010: Dictionnaire alphabétique et analogique de la langue française. Digital Version. Bureau van Dijk.
De Roujoux, M. 1839. Histoire des Rois et des Ducs de Bretagne. Vol. 3. Paris: Duféy.
Rozier, François. 1793. Cours complet d’agriculture théorique, practique, économique et de médecine rurale et vétérinaire: suivi d’une méthode pour étudier l’Agriculture par principes ou Dictionnaire universel d’agriculture. Paris: Librairie d’Éducation et des Sciences et Arts.
Tessier, Alexandre Henri, Auguste Denis Fougeroux de Bondaroy, André Thouin, Louis-Augustin-Guillaume Bosc, and Jacques Joseph Baudrillart. 1787. Encyclopédie méthodique: Agriculture. Vol. 1. Paris: Panckoucke.
Le Trésor de la Langue Française informatisé. ATILF. http://atilf.atilf.fr/
Wiktionary. The Wikimedia Foundation. http://www.wiktionary.org
329
Appendices
Appendix A - Sample SRI calculations for various N-X templates
See Section 7.3.1.1 for a discussion on how the following data was used.
N1 N2 Relation Paraphrase # of Types SRI
effet
retard, revenu, trame, chaîne, papillon, placébo, revenu
CAUSE-REV H caused by M 7 0.438
boomerang, balançoire, domino, dynamo, pyjama, rebond
SIMILARITY H similar to M 6 0.375
bœuf, monstre Adjectival H ADJ M 2 0.125
tunnel (metaph) PRODUCTION H produces M 1 0.063 Total # of Types/Average SRI 16 0.352
carte1 soleil, vue, adresse LOCATION-REV H has M on/in it 3 0.176 carte1 cadeau, index, lettre FUNCTION H functions as M 3 0.176 carte1 mer, réponse PURPOSE H for (*) M 2 0.118 carte2 météo, radar TOPIC H about M 2 0.118 carte1 senior USE-REV H that M uses 1 0.059 carte3 fille, mère SIMILARITY H similar to M 2 0.118 carte3 tuner, mémoire FUNCTION H functions as M 2 0.118 carte3 son, vidéo PRODUCTION H produces M 2 0.118
Total # of Types/Average SRI 17 0.135
sauce
soja, tomate, arachide, câpres, feuilles, graine, gombo
SOURCE H made from M 7 0.500
madère, moutarde PART-REV H that M is a part of 2 0.143
poivrade, carbonara, poulette HYPERNYM H that M is a type of 3 0.214
mousseline SIMILARITY H is similar to M 1 0.071
barbecue PURPOSE H for M 1 0.071 Total # of Types/Average SRI 14 0.327
330
N1 N2 Relation Paraphrase # of Types SRI bateau citerne, feu, pompe PART-REV H that M is a part of 3 0.231
pirate USE-REV H that M uses 1 0.077
lavoir, pousseur, bus, phare FUNCTION H that functions as M 4 0.308
mouche --- --- 1 0.077
pilote PURPOSE H for M 1 0.077
dragon, mère SIMILARITY H that is similar to M 2 0.154
école (N1) PURPOSE H for M 1 0.077 Total # of Types/Average SRI 13 0.195
voiture guérite, radio,
couchettes, lits, poubelles
PART-REV H of which M is a part 5 0.385
balai, pilote FUNCTION H that functions as M 2 0.154 bar, restaurant,
salon, école LOCATION-REV H in which M is located 4 0.308
sport, poste PURPOSE H for M 2 0.154 Total # of Types/Average SRI 13 0.290
331
Appendix B – Comparison of compound relations in the literature
Relations entirely enclosed within parentheses indicate that it is implied by the author. Not included in this table are: Jespersen (1942) and Hatcher (1960).
Relation Adams (1973) Downing (1977) Levi (1978) Agentive (S-V) Subject-Verb (6 sub-classes) X ACT-Subj
Complement (V-O) Verb-Object (7 sub-classes) X ACT-Obj; Agent
Goal X X
Attribute X X X
Classification Names X
Coordination X Half-Half (BE)
Identity Appositional (C) X BE
Hypernymy Appositional (B1, B2) X BE
Resemblance Resemblance (8 sub-classes); Form (B) Comparison (BE)
Similarity Resemblance (B, D1) Comparison (BE)
Part-Whole Associative (A1, C1) Whole-Part; Part-Whole HAVE
Composition Compostion (A1, A2, A3, C1) Composition MAKE2
Possession Associative (A2, A3, C2) X HAVE
Cause Instrumental (B3, D2) X CAUSE
Make/Produce X Product MAKE1
Result X X X
Source Associative (C3); Instrumental (D3) Source FROM
Instrumental Instrumental (13 sub-classes) User USE
Location Locative (8 sub-classes) Place IN
Contents Contents (C2) (Place) IN
Temporal Locative (8 sub-classes) Time IN
Purpose X Purpose FOR
Function Appositional (A) X X
Manner X X X
Means Instrumental (D1) X X
Topic X X ABOUT
Occupation X Occupation X
Prevent/Protect Instrumental (B1, B2) X (FOR)
Other
332
Relation Warren (1978) Shoben (1991) Vanderwende (1994) Agentive (S-V) X X Who/What?
Complement (V-O) X X Whom/What?
Goal Goal-Object X X
Attribute X X X
Classification Proper Names X X
Coordination (Dvandva) (is) X
Identity Copula: Attributive is (What kind of?)
Hypernymy Copula: Subsumptive is What kind of?
Resemblance Resemblance X X
Similarity Resemblance; Copula: Like X X
Part-Whole Whole-Part; Part-Whole has
What is it a part of?/What are its parts?
Composition Source-Result made of Made of what?
Possession Possessor-Belonging has Whose?
Cause Causer-Result causes What does it cause?/What causes it?
Make/Produce (Causer-Result) makes X
Result X X X
Source Origin-Object; Source-Result derived from X
Instrumental
uses/used by How?
Location Location-Object located Where?
Contents Place-Object; (Purpose) located (Where?)
Temporal Time-Object X When?
Purpose Purpose for What for?
Function X X X
Manner X X X
Means Motive Power-Result X X
Topic X about X
Occupation X X X
Prevent/Protect X X X
Other
333
Relation Lauer 1995 Rosario and Hearst 2001 Arnaud 2003160 Agentive (S-V) V-Subj Subject bn, if
Complement (V-O) V-Obj Object bb
Goal X X X
Attribute X Characteristic; Attribute; Property ap
Classification X X ar, ce, ay
Coordination BE X X Identity BE X X
Hypernymy BE X X Resemblance X X ja
Similarity X X X
Part-Whole OF X af, bv, bh, ka Composition OF Material aw, ca, ia, ib, ax
Possession OF X dd Cause (WITH) Cause (ad)
Make/Produce (FOR) Produce de, ba, an Result X X X
Source FROM X ad, ie Instrumental WITH Instrument (aq)
Location AT Location aa, da, bw, id, dc, bf, bz Contents (IN; AT) X ab, au
Temporal IN; ON Time; Frequency cj, bj, by Purpose FOR Purpose al, cb
Function X X X
Manner X X X Means X (Instrument) aq
Topic ABOUT Topic bx Occupation X X X
Prevent/Protect X X ak, aj
Other
Medecine: Procedure, Defect, Measure of, Inhibitor, etc. Support: bi, db, ic
160
Arnaud assigns each semantic relation an arbitrary code consisting of pairs of letters. They are retained here in the interest of space, but may easily be cross-referenced using Table 22 (65-67) in Arnaud (2003).
334
Relation Moldovan et al. 2004
/Girju et al. 2005 Girju et al. 2007 Seaghdha 2008
Jackendoff 2010
Agentive (S-V) Agent X (ACTOR)/(INSTR) X
Complement (V-O) X X (ACTOR)/(INSTR) Argument
Goal Recipient X X X
Attribute Attribute-Holder X HAVE - Property X
Classification X X X CLASSIFY
Coordination X X X BE
Identity X X BE - Identity X
Hypernymy Is-A (Hypernymy) X (BE - Identity) KIND-OF
Resemblance X X (BE - Similarity) SAME/SIMILAR
Similarity X X BE - Similarity X
Part-Whole Part-Whole Part-Whole HAVE - Part-Whole PART-OF
Composition X BE - Substance/Form COMPOSED-OF
Possession Possession X HAVE - Possession HAVE
Cause Cause Cause-Effect (ACTOR)/(INSTR) CAUSE
Make/Produce Make/Produce Product-Producer (ACTOR)/(INSTR) MAKE
Result Result X X X
Source Source Origin-Entity X MADE-OF
Instrumental Instrument Instrument-Agency INSTRUMENT X
Location Location/Space X IN - Spatially located BE-LOC
Contents (Location/Space) Content-Container IN - Spatially located (BE-LOC)
Temporal Temporal X IN - Temporal BE-LOC-TEMP
Purpose Purpose X X (Proper Function)
Function X X X SERVES-AS
Manner Manner X X X
Means Means X X X
Topic Topic Theme-Tool ABOUT X
Occupation X X X X
Prevent/Protect X X X PROTECT-FROM
Other Kinship HAVE - Group member
App
endi
x C
– P
artia
l sch
ema
of W
iktio
nary
’s d
atab
ase
stru
ctur
e
Gre
y ou
tline
s ind
icat
e ta
bles
use
d to
con
stitu
te d
atas
et o
f Fre
nch
com
poun
ds a
s des
crib
ed in
Cha
pter
3.
335
336
Appendix D - NN French compounds retained from Wiktionary
All 729 NN French Compounds retained from Wiktionary. Compound Head Relation Tropes N1 SRI Val N2 SRI Val abri-vent N1 purpose
abricot-pêche N1 resemblance abricotier-pays N1 location accordéon musette N1 similarity
0.250 action éclair N1 similarity Metaph N2
action reflet N1 similarity Metaph N2 adjudant-chef N1 adjective
1.000
adresse email N1 purpose agora-phobie N2 cause-rev aller-retour N1/N2 coordination alphabet hindi N1 use-rev aluminium-épidote N2 part-rev âme sœur N1 similarity Metaph N2
ami-ami Exo --- amiante-ciment N2 part-rev ampli-syntoniseur N1/N2 coordination anacardier cajou N1 production analyste-programmeur N1/N2 coordination âne-zèbre N1/N2 coordination animal-garou N1/N2 coordination
0.990 année-homme Exo ---
année-lumière Exo ---
1.000 appareil photo N1 production
appui-main N1 purpose
1.000 appui-pied N1 purpose
1.000 0.250
appui-pot N1 purpose
1.000 appui-tête N1 purpose
1.000
arc-doubleau N1 hypernymy argent-métal N1 hypernymy-rev arginine-vasopressine N2 part-rev arrêt maladie N1 cause-rev arrêt-buffet N1 purpose art-thérapie N2 use article zéro N1 adjective
0.250 artiste plasticien N1/N2 coordination
assurance-chômage N1 purpose
0.333 assurance-emploi N1 purpose
0.667
assurance-vie N1 purpose
0.667 attaché-case N2 use-rev
aube-vigne N2 --- auteur-compositeur N1/N2 coordination auto-car N2 ---
0.250 auto-école N2 purpose Meton N1 0.250 0.833
auto-mitrailleuse N1 part-rev
0.250 auto-stop N2 argument
0.250
avantage choc N1 --- Metaph N2 avion-cargo N1 function
bâbord amure N1 location-rev
337
Compound Head Relation Tropes N1 SRI Val N2 SRI Val bain-douche N1/N2 coordination
bal musette N1 part-rev
0.500 baladeur radio N1/N2 coordination
0.250
balai-brosse N2 part baleine tueuse N1 similarity ballon-panier Exo location ballon-sonde N1 function banane plantain N1 hypernymy bande mère N1 similarity Metaph N2
0.583
bande-annonce N1 function bar-tabac N1 purpose baryton-basse N1/N2 coordination baryum-orthose N2 part-rev bateau pilote N1 purpose
0.125 0.250 bateau pousseur N1 function
0.375
bateau-bus N1 function
0.375 bateau-dragon N1 resemblance
0.125
bateau-école N2 purpose Meton N1 0.125 0.833 bateau-lavoir N1 function
0.375
bateau-mère N1 similarity Metaph N2 0.125 0.583 bateau-mouche N1 ---
0.125 0.750
baume copalme N1 source bébé éprouvette N1 location bec-figue Exo --- Meton N1
belote contrée N1 --- benne-kangourou N1 --- beurre noisette N1 resemblance Meton N1/N2
biens-fonds N1/N2 coordination bistro-brasserie N1/N2 coordination blanc-seing Exo location-rev Meton N1
bleu charrette N1 location
0.250 bleu charron N1 use-rev
0.250
bleu ciel N1 resemblance Meton N2 0.500 bleu horizon N1 resemblance Meton N2 0.500 bloc-cylindres N1 part-rev
0.250
bloc-eau N1 location-rev Meton N2 0.500 bloc-moteur N1 function
0.250
bloc-note N1 location-rev Meton N2 0.500 bœuf carotte N1 part-rev
0.333
bœuf mode N1 adjective
0.333 bœuf-carotte Exo ---
0.333
bois-chandelle N1 source-rev
1.000 boîtier adaptateur N1 function
bombe aérosol N1 production bonus malus N1/N2 coordination borne-fontaine N2 resemblance boule-dogue N2 --- bourg-épine Exo --- bourgeois-bohème N1/N2 coordination bout-dehors N1 location bouton-pression N1 use bracelet-montre N2 part-rev brin sens N1 --- broue-pub N2 purpose
338
Compound Head Relation Tropes N1 SRI Val N2 SRI Val bureau-chef N1 similarity
1.000
buse bondrée N1 hypernymy café crème N1 part-rev
0.143 café-bar N1/N2 coordination
0.429 0.500
café-bistro N1/N2 coordination
0.429 café-comptoir N1 use
0.143
café-concert N1 location-rev
0.143 café-filtre N1 production-rev
0.143
café-théâtre N1/N2 coordination
0.429 caisse-palette N1/N2 coordination
calcium-autunite N2 part-rev calcium-pyromorphite N2 part-rev camion-citerne N1 part-rev camping-car N2 purpose canapé-lit N1/N2 coordination canard colvert N1 hypernymy canne flèche N1 source-rev cap-mouton Exo --- capital-risque N1 --- capitan-pacha N2 --- capsule-congé N1 location-rev carbonate-apatite N2 composition cardinal diacre N1/N2 coordination cargo-dortoir N1 function carte fille N1 similarity Metaph N2 0.250
carte mère N1 similarity Metaph N2 0.250 0.583 carte soleil N1 location-rev Meton N2 0.250
carte tuner N1 function
0.500 carte-cadeau N1 function
0.500
carte-index N1 function
0.500 carte-lettre N1 function
0.500
carte-vue N1 location-rev Meton N2 0.250 carton-index N1 function
0.333
carton-pâte N1 source
0.667 carton-pierre N1 resemblance
0.667
cas régime N1 topic cas sujet N1 topic case départ N1 location céleri-rave N1/N2 coordination cellule assistante N1 function
1.000 cellule hôte N1 function
1.000
cellule souche N1 function
1.000 centimètre cube N1 adjective
centre-ville N1 location cercle unité N1 --- césium-analcime N2 composition cession-bail N1 purpose chapeau melon N1 resemblance Meton N2
chargeuse-pelleteuse N1/N2 coordination châssis-support N1 function chat serval N1 hypernymy
0.250 chat-château Exo ---
0.250
chat-garou N1/N2 coordination
0.250 0.990 chat-tigre N1 resemblance Meton N1/N2 0.250
339
Compound Head Relation Tropes N1 SRI Val N2 SRI Val chaussure bateau N1 purpose Meton N2
chef magistrat N2 coordination
0.667 chef-lieu N2 similarity
0.667
chef-mets Exo ---
0.333 chef-mois Exo ---
0.333
chêne kermès N1 location-rev
0.250 chêne-gomme N1 part-rev
0.250
chêne-liège N1 source-rev
0.250 chêne-pommier N1 resemblance
0.250
chèque-repas N1 purpose chèque-vacances N1 purpose cheval-vapeur Exo --- chevêche brame N1 --- chèvre-pied Exo part-rev
0.250 chien-cerf N1 ---
0.200
chien-garou N1/N2 coordination
0.200 0.990 chien-loup N1 resemblance
0.200
chien-nid Exo --- Metaph N2 0.200 chien-rat Exo coordination
0.200
chiffre-taxe Exo topic chou-croûte N1 ---
0.167 chou-fleur N1 resemblance
0.333
chou-navet N1 resemblance
0.333 chou-palmiste Exo part Metaph N1 0.167 chou-rave N1/N2 coordination
0.167
chou-vache N1 purpose
0.167 chouette chevêche N1 hypernymy
1.000
chouette effraie N1 hypernymy
1.000 chouette harfang N1 hypernymy
1.000
ciné-club N2 topic ciné-parc N2 location-rev circuit tampon N1 function
0.800 clé crocodile N1 resemblance Meton N1/N2
clé lavabo N1 purpose client-cible N1 function Metaph N2
1.000
clin-foc N2 --- cobalt-mica N1 resemblance cocotte-minute N1 --- Metaph N2
code machine N1 use-rev Meton N2 0.333 code source N1 location Metaph N2 0.333 code-barres N1 composition
0.333
colin-tampon Exo ---
0.200 colis-route N1 location
colloid-calcite N2 hypernymy comédie-ballet N1/N2 coordination commissaire-priseur N1/N2 coordination compère-loriot Exo --- compte utilisateur N1 use-rev conducteur fantôme N1 --- consommateur cible N1 function Metaph N2
1.000
contrôle-commande N1/N2 coordination coq faisan N1 hypernymy
0.667 coq-héron N1 hypernymy
0.667
coq-souris Exo ---
0.333
340
Compound Head Relation Tropes N1 SRI Val N2 SRI Val côté cour N1 location
côté jardin N1 location coton-poudre N1 function coton-tige N1 location couche-culotte N1/N2 coordination coupe colonel Exo --- Meton N1
courtier négociant N1/N2 coordination coussin péteur N1 function coût cible N1 function Metaph N2
1.000
couteau éplucheur N1 function crédit-bail N1 function crème fleurette N1 --- Metaph N2
croiseur-école N1 use-rev
0.167 cueilleuse-égreneuse N1/N2 coordination
cul-rousselet Exo --- culotte garçonne N1 use-rev danse-poteau N1 use daphné garou N1/N2 hypernymy
0.910 date butoir N1 function Metaph N2
date limite N1 adjective daurade coryphène N1 hypernymy débat-spectacle N1/N2 coordination dent œillère N1 --- Meton N2
député-maire N1/N2 coordination désintégration alpha N1 classify
1.000 disque vinyle N1 composition
distance-temps N1 --- dose limite N1 adjective drainage-taupe N1 production-rev drap-housse N1/N2 coordination duché-pairie N1/N2 coordination eau mère N1 --- Metaph N2
0.830
eaux-vannes N1 location Meton N2 écart type N1 adjective
échange cambiste N1 production-rev écho fantôme N1 similarity Metaph N2
écho mirage N1 similarity Metaph N2 écrevisse signal N1 part-rev Metaph N2 effet papillon N1 cause-rev Metaph C électron-volt Exo ---
élément formant N1 function emballage-bulle N1 composition emballage-coque N1 hypernymy émission-débat N1 part-rev épinard-fraise Exo coordination épreuve minute N1 time équivalent lait N1 composition erreur système N1 cause-rev espace-boutique N1 location espace-temps N1/N2 coordination espèce cible N1 function Metaph N2
1.000
étage vernier N1 function étalon-or N1 use expert-comptable N2 adjective
341
Compound Head Relation Tropes N1 SRI Val N2 SRI Val facteur sigma N1 classify
fan-club N2 composition femme-renarde N1/N2 coordination fermeture éclair N1 similarity Metaph N2
fibre-cellule Exo coordination fiducie-sûreté N1 use filet-poubelle N1 function fille-mère N1/N2 coordination
0.167 film-fleuve N1 similarity Metaph N2
fleur-feuille N1 resemblance fluor calcium N1 composition focalisation zéro N1 adjective
0.500 format coquille N1 location-rev Meton N2
fougère aigle N1 part-rev Metaph N2 fourmi-lion N1 similarity Metaph N2 franc métro N1 --- Meton N2 fréquence radio N1 use-rev
0.250
fric-frac Exo --- fusée-sonde N1 function gaine-culotte N1/N2 coordination gaz hydrogène N1 composition
0.750 gaz oxygène N1 composition
0.750
gaz propulseur N1 function
0.250 gaz sarin N1 composition
0.750
gène chimère N1 similarity gène suppresseur N1 function gentilhomme verrier N1/N2 coordination gin rami N2 --- gomme-résine N1/N2 coordination gorfou macaroni N1 resemblance Metaph N2
gorfou sauteur N1 similarity gorge-fouille Exo --- gouet serpentaire N1 hypernymy grandeur nature N1 adjective grave-ciment N1/N2 coordination grenouille-taureau N1 similarity Meton N1/N2
grille écran N1 part groupement phosphate N1 argument guerre proxy N1 use guillemet-apostrophe N1/N2 coordination halte-garderie N2 --- heure-lumière Exo ---
1.000 hippocampe feuille N1 resemblance
homme-fourmi N1 similarity Metaph N2 1.000 homme-grenouille N1 resemblance
1.000
homme-sandwich N1 similarity Metaph N2 1.000 horloge pointeuse N1 function
hôtel-dieu Exo ---
Metaph N1 Meton N2
hôtellerie-restauration N1/N2 coordination houx-frelon N1 hypernymy huppe-col Exo coordination Meton N1
image-gradient N1 production-rev info-ballon N2 location-rev
342
Compound Head Relation Tropes N1 SRI Val N2 SRI Val info-bulle N2 location-rev
jambon beurre Exo coordination jardin verger N1 function jaune paille N1 resemblance Meton N2
jazz manouche N1 production-rev jour-homme Exo --- jour-lumière Exo ---
1.000 jupe-culotte N1/N2 coordination
kilogramme-force Exo --- kilomètre-heure Exo --- lac-laque Exo composition Metaph N1
laine mère N1 source Metaph N2 0.333 0.830 laine pelade N1 --- Metaph N2 0.333
laine renaissance N1 --- Metaph N2 0.333 laitue asperge N1 resemblance Meton N1
lancer arbalète N1 similarity langage machine N1 use-rev Meton N2
lapin chasseur N1 --- Meton N1 lapin-garou N1/N2 coordination
0.990
larve échinocoque N1 hypernymy laurier sauce N1 source-rev Meton N1 0.500
laurier-cerise N1 production
0.250 laurier-tarte N1 source-rev Meton N1 0.500 laurier-tin N1 ---
0.250
lecteur-graveur N1/N2 coordination léopard-garou N1/N2 coordination
0.990 lettre patente N1 function
liane-corail N1 resemblance lieutenant-colonel N2 coordination lieutenant-général N2 coordination lime-uranite N2 --- linon-batiste N1 hypernymy lit mezzanine N1 resemblance lit-cage N1 resemblance livret-police N1 location-rev Meton N2
location-financement N1/N2 coordination locution-phrase N1/N2 coordination logiciel antivirus N1 function
1.000 logiciel espion N1 function
1.000
logiciel médiateur N1 function
1.000 logiciel-socle N1 function
1.000
lord-lieutenant N1/N2 coordination
1.000 lord-maire N1/N2 coordination
1.000
loup-garou N1/N2 coordination
0.990 lucilie bouchère N1 similarity Metaph N2
macareux moine N1 resemblance machin-chose N1/N2 coordination machine-outil N1 use magasin phare N1 similarity Metaph N2
magasin-pilote N1 function
0.250 mail-coach N2 purpose
maison-mère N1 similarity Metaph N2
0.583 maître-autel N2 adjective
0.200
maître-chanteur N2 adjective
0.200
343
Compound Head Relation Tropes N1 SRI Val N2 SRI Val maître-cylindre N2 adjective
0.200
maître-nageur N2 adjective
0.200 maîtresse femme N1/N2 coordination
mâle alpha N1 classify
1.000 malle-poste N1 purpose
manche pagode N1 resemblance manchot antipode N1 location
0.333 manchot empereur N1 similarity Metaph N2 0.667 manchot pygmée N1 resemblance
0.667
mandat-poste N1 use marche-palier N1 part margis-chef N1 adjective
1.000 marteau-pilon N1 part-rev
martre-zibeline N1 hypernymy marxiste-léniniste N1/N2 coordination médecin légiste N1/N2 coordination médicament conseil N1 --- mémoire cache N1 function
0.667 mémoire flash N1 adjective Metaph N2 0.333 mémoire tampon N1 function
0.667 0.800
menthe pouliot N1 hypernymy menthe-coq N1 similarity Metaph N2
menuisier-moulurier N1/N2 coordination mère maquerelle N1/N2 coordination Metaph N1
merisier-pays N1 location mètre cube N1 adjective minute-lumière Exo ---
1.000 mode paysage N1 resemblance
mode portrait N1 resemblance mois-homme Exo --- mois-lumière Exo ---
1.000 moissonneuse-batteuse N1/N2 coordination
molécule hôte N1 function monsieur-dame Exo coordination mont-joie Exo --- montre-bracelet N1 part-rev mort-chien Exo argument mot vedette N1 location
0.200 mot-clé N1 similarity Metaph N2 0.600 mot-obus N1 similarity Metaph N2 0.600 mot-outil N1 similarity Metaph N2 0.600 mot-valise N1 similarity Metaph N2 0.200 moteur vernier N1 function
moteur-fusée N1 part moto-école N2 purpose Meton N1
0.833
mouche araignée N1 resemblance Meton N1/N2 mouche-scorpion N1 resemblance Meton N1/N2 moucheron piqueur N1 function
mouette pygmée N1 resemblance moustique tigre N1 resemblance Meton N1/N2
mule-jenny N2 similarity Metaph N1 mur-rideau N1 similarity Metaph N2 navire-citerne N1 part-rev
0.333
navire-école N1 purpose
0.333 0.167
344
Compound Head Relation Tropes N1 SRI Val N2 SRI Val navire-mère N1 similarity Metaph N2 0.333 0.583 newton-mètre N1 ---
noctuelle gamma N1 part-rev Metaph N2
0.200 nœud papillon N1 resemblance
nœud-nœud N1 --- noix-chandelle N1 function nord-est N1/N2 coordination nord-ouest N1/N2 coordination œuf mimosa N1 resemblance Meton N1/N2
œuf-coque N1 location oiseau-chameau N1 resemblance
0.500 oiseau-cloche N1 similarity
0.250
oiseau-lyre N1 part-rev Metaph N2 0.250 oiseau-mouche N1 resemblance
0.500 0.750
ombre-chevalier N1 --- onde radio N1 use-rev
0.250 or-sol Exo ---
orchestre musette N1 part-rev
0.500 orienteur-marqueur N1/N2 coordination
ours-garou N1/N2 coordination
0.990 page web N1 location Metaph N1
pal-fer N1 composition palmier-dattier N1 hypernymy palpe-mâchoire N1 location panier-repas N2 location panthère-garou N1/N2 coordination
0.990 papa-gâteau N1 ---
papier bible N1 purpose
0.910 papier brouillard N1 resemblance
0.910
papier calque N1 purpose
0.910 papier carbone N1 part-rev
0.910
papier japon N1 production-rev Meton N2 0.910 papier kraft N1 hypernymy
0.910
papier maïs N1 source
0.910 papier toilette N1 purpose
0.182
papier-cul N1 purpose
0.182 papier-filtre N1 function
0.182
papier-monnaie N1 function
0.182 papillon dauphin Exo coordination Meton N1/N2
papy-boom N2 argument paquet-cadeau N1/N2 coordination parc relais N1 function parking-relais N1 function participation-pari N1 --- participe présent N1 topic particule alpha N1 classify
1.000 1.000 particule bêta N1 classify
1.000
particule gamma N1 classify
1.000 0.800 passage piétons N1 purpose
pause-café N1 purpose pause-carrière N1 time peptide signal N1 function persan dari N1 hypernymy pétrolier-minéralier N1/N2 coordination
345
Compound Head Relation Tropes N1 SRI Val N2 SRI Val phage transducteur N1 function
phosphate-allophane N2 composition photo-identification N2 use
0.250 photo-interprétation N2 argument
0.750
photo-interprète N2 argument
0.750 photo-montage N2 argument
0.750
photolyse éclair N1 cause-rev pie-mère Exo ---
0.830 pied cube N1 adjective
pieds-paquets Exo location pierre miel N1 resemblance Meton N1/N2
pierre ponce N1 hypernymy piétin-échaudage N1 cause piétin-verse N1 cause pigeon voyageur N1 similarity pince crocodile N1 resemblance Meton N2
piqueur-suceur N1/N2 coordination pitaine clés N1 production plan cul N1 adjective Metaph N2 0.333
plan médias N1 use
0.333 plan-séquence N1 composition
0.333
plante-crayon N1 resemblance Meton N1 plante-éponge N1 hypernymy
plateforme bus N1 --- pneu contact N1 use poche-cuiller Exo --- poche-revolver N1 location-rev poids coq N1 argument Metaph C 0.800
poids mouche N1 argument Metaph C 0.800 0.250 poids paille N1 argument Metaph C 0.800
poids plume N1 argument Metaph C 0.800 poids welter Exo hypernymy
0.200
point presse N1 location-rev
0.333 point zéro N1 location
0.333 0.250
point-virgule N1/N2 coordination
0.333 poisson fourrage N1 function
0.143
poisson soleil N1 ---
0.143 poisson-chat N1 resemblance Meton N1/N2 0.286 poisson-épée N1 part-rev Metaph N2 0.286 poisson-évêque Exo resemblance
0.286
poisson-pilote N1 similarity Metaph N2 0.143 0.500 poisson-sabre N1 part-rev Metaph N2 0.286
pomme cajou N1 production-rev Metaph N1 pomme cannelle N1 similarity Metaph N1 pont-bascule N1 part-rev
pont-canal N1 location pop-punk N1/N2 coordination porte papillon N1 resemblance Meton N2
porte-fenêtre N1/N2 coordination portrait-robot N1 production-rev Metaph N2
potentiel hydrogène N1 --- pouce-pied Exo ---
0.250 poule faisane N1 hypernymy
poursuite-bâillon N1 function Metaph N2
346
Compound Head Relation Tropes N1 SRI Val N2 SRI Val prés-bois N1 location-rev
prince-président N1/N2 coordination programme-cadre N1 function promotion canapé N1 use Meton N2
pulvérisateur-mélangeur N1/N2 coordination punaise-mouche N1 resemblance
1.000 0.750 punk rock N1/N2 coordination
quark beauté N1 --- quark charme N1 --- quarte-fagot Exo --- quartier-maître N2 --- quartz morion N1 hypernymy quartz prase N1 hypernymy question piège N1 function Metaph N2
quinte flush N1/N2 coordination raccourci clavier N1 location
radio trottoir Exo --- Metaph N1 Meton N2 0.143
radio-amateur N2 use
0.286 radio-gramophone N1/N2 coordination
0.429
radio-phonographe N1/N2 coordination
0.429 radio-réveil N1/N2 coordination
0.429
radio-taxi N2 use
0.286 radio-télévision N2 use
0.143
radioactivité alpha N1 classify
1.000 1.000 radioactivité bêta N1 classify
1.000
radioactivité gamma N1 classify
1.000 0.800 raie léopard N1 resemblance Meton N1/N2
ramasseuse-presse N1/N2 coordination rat-garou N1/N2 coordination
0.990 raton laveur N1 similarity
rayon alpha N1 classify
1.000 rayon gamma N1 classify
0.800
rayonnement alpha N1 classify
1.000 réception-cadeaux N1 location-rev
reine mère N1/N2 coordination
0.167 reine-marguerite N2 similarity Metaph N1
renouée-bambou N1 resemblance Meton N1 répondeur-enregistreur N1/N2 coordination
réponse type N1 adjective requin-baleine N1 resemblance requin-marteau N1 part-rev Meton N1/N2
restaurant-bar N1/N2 coordination
1.000 0.500 restaurant-bistro N1/N2 coordination
1.000
restaurant-brasserie N1/N2 coordination
1.000 restaurant-pub N1/N2 coordination
1.000
retour chariot N1 argument retraite-chapeau N1 --- réunion-bilan N1 topic réveil-matin N1 time robe-housse N1 similarity robot mixeur N1 function roche-mère N1 similarity Metaph N2
0.583
roman-feuilleton N1/N2 coordination
0.333
347
Compound Head Relation Tropes N1 SRI Val N2 SRI Val roman-fleuve N1 similarity Metaph N2 0.333
roman-photo N1 part-rev
0.333 rose noisette N1 ---
rouge sang N1 resemblance Meton N2 roulage-décollage N1/N2 coordination
rouleau compresseur N1 function sabre laser N1 part-rev sac poubelle N1 purpose saisie-arrêt N1/N2 coordination
0.750 saisie-attribution N1/N2 coordination
0.750
saisie-brandon N1 use
0.250 saisie-exécution N1/N2 coordination
0.750
salaire-coût N1/N2 coordination salicaire pourpier N1 hypernymy sapeur-pompier N2 --- sauce soja N1 source sauce tomate N1 source saule amandier N1 resemblance Meton N1/N2 0.500
saule daphné N1 hypernymy
0.500 saule marsault N1 hypernymy
0.500
saule pleureur N1 similarity Metaph N2 0.500 scie égoïne N1 hypernymy
science-fiction N2 topic séchoir-atomiseur N1/N2 coordination secret défense N1 topic semaine-lumière Exo ---
1.000 sénateur-maire N1/N2 coordination
sergent-chef N1 adjective
1.000 sergent-major N2 coordination
serpent roi N1 similarity Metaph N2 serveur mandataire N1 function
service support N1 function serviette-éponge N1/N2 coordination signe moins N1 function signe plus N1 function silence radio N1 location
0.250 silure glane N1 hypernymy
singe hurleur N1 similarity
0.400 singe-araignée N1 resemblance Meton N1/N2 0.600 singe-chouette N1 similarity
0.400
singe-écureuil N1 resemblance Meton N1/N2 0.600 singe-lion N1 resemblance Meton N1/N2 0.600 site internet N1 location Metaph N1 0.750 site intranet N1 location Metaph N1 0.750 site miroir N1 function Metaph N2 0.250 site web N1 location Metaph N1 0.750 société écran N1 function
soie tussah N1 hypernymy sonde lambda N1 classify sorbier alisier N1 hypernymy soutien-gorge N1 purpose souveraineté-association N1/N2 coordination spath fluor N1 composition spectacle solo N1 composition
348
Compound Head Relation Tropes N1 SRI Val N2 SRI Val station aval N1 location
0.250
station essence N1 purpose
0.500 station pivot N1 function
0.250
station-service N1 purpose
0.500 stock-tampon N1 function
0.800
structure limite N1 adjective stylo gomme N1 function stylo-bille N1 part-rev sud-est N1/N2 coordination sud-ouest N1/N2 coordination support-chaussettes N1 purpose sursaut gamma N1 classify
0.800 table-bureau N1 function
talon aiguille N1 resemblance tamarin lion N1 resemblance Meton N1/N2
tamarin sauteur N1 similarity taupe-grillon N2 similarity taux plafond N1 function Metaph N2
taux plancher N1 function Metaph N2 teinture-mère N1 similarity Metaph N2
0.583 ténia échinocoque N1 hypernymy
tente-abri N1 function terme source N1 --- terre diatomée N1 composition terre-noix Exo --- test-match N2 function test-objet N2 function tête-bêche Exo --- Meton N1
tic-tac Exo --- ticket-restaurant N1 purpose tigre-garou N1/N2 coordination
0.990 timbre-amende N1 purpose
0.333
timbre-poste N1 purpose
0.333 timbre-taxe N1 purpose
0.333
tiret cadratin N1 function tiroir-caisse N1 part tissu-éponge N1/N2 coordination titan-cotte Exo --- tolérance zéro N1 adjective
0.500 tonne-mètre Exo ---
tortue-alligator N1 resemblance Meton N1/N2 tortue-boîte N1 resemblance Meton N1 trachée-artère N1 similarity
train fantôme N1 location Metaph N2 train-train N1 ---
trépan-benne N1 part-rev trique-madame Exo --- trou-madame Exo --- unité monomère N1 composition vache-biche Exo coordination valeur vedette N1 similarity Metaph N2
valse musette N1 use
0.250 valse-hésitation N1 composition Metaph N1
variole cameline N1 ---
349
Compound Head Relation Tropes N1 SRI Val N2 SRI Val vecteur navette N1 similarity Metaph N2
vélo-école N2 purpose Meton N1
0.833 vendeur négociateur N1/N2 coordination
ventre-madame Exo part Meton N1 vert méthyle N1 source
verveine citronnelle N1 similarity vidéo-achat N2 use
0.800 vidéo-clip N2 part
0.200
vidéo-lynchage N2 use Metaph N2 0.800 vidéo-protection N2 use
0.800
vidéo-surveillance N2 use
0.800 village-rue N1 location
ville-dortoir N1 similarity Metaph N2 violon alto N1 hypernymy
viorne tin N1 --- virus assistant N1 function voiture balai N1 function Metaph N2 0.286
voiture-bar N1 location-rev
0.429 0.500 voiture-couchettes N1 part-rev
0.286
voiture-lits N1 part-rev
0.286 voiture-pilote N1 function
0.286 0.500
voiture-restaurant N1 location-rev
0.429 voiture-salon N1 location-rev
0.429
vote sanction N1 purpose wagon-bar N1 location-rev
0.429 0.500 wagon-citerne N1 part-rev
0.571
wagon-foudre N1 part-rev
0.571 wagon-lit N1 part-rev
0.571
wagon-poche N1 part-rev
0.571 wagon-restaurant N1 location-rev
0.429
wagon-salon N1 location-rev
0.429 yacht-club N2 purpose Meton N1
zéolithe cyanite N1 hypernymy zinc-blende N2 hypernymy zone tampon N1 function
0.800
350
Appendix E - N à N French Compounds retained from Wiktionary
All 319 N à N French Compounds retained from Wiktionary. Compound Head Relation Tropes N1 SRI Val N2 SRI Val abeille à miel N1 production
abreuvoir à mouches Exo use-rev Metaph N1
0.250 acajou à pommes N1 production Metaph N2
acquit-à-caution N1 --- adents à crémaillère N1 part-rev amortisseur à fluide N1 use ampoule à incandescence N1 use arbalète à jalet N1 use arbre à cames N1 part-rev
0.167 arbre à cornichons N1 production Metaph N2 0.500 arbre à grives N1 location-rev
0.333
arbre à pain N1 production Metaph N2 0.500 arc à poulies N1 part-rev
arme à feu N1 use
0.125 arme à implosion N1 use
armes à enquerre N1 cause armoire à glace N1 part-rev arquebuse à croc N1 part-rev Metaph N2
arquebuse à rouet N1 part-rev autocaravane à cellule N1 part-rev bac à sable N1 purpose baignoire à porte N1 part-rev bail à cheptel N1 purpose bail à complant N1 purpose baleine à bosse N1 part-rev balle à queue N1 part-rev banque à domicile N1 location Meton N1
banque à pitons N1 use Metaph N1 bar à champagne N1 purpose
bar à putes N1 location-rev barbe à papa Exo possession-rev Metaph C
barre à mine N1 purpose bateau à vapeur N1 use Meton N2
1.000
batte à beurre N1 purpose
0.250 batte à feu N1 purpose
0.125
bec à fente N1 part-rev bêtes à cornes N1 part-rev betterave à sucre N1 source-rev bibitte à patates N1 location billet à ordre N1 purpose boîte à camembert N1 purpose
0.857 boîte à cigare N1 purpose
0.857
boîte à gants N1 purpose
0.857 boîte à lettres N1 purpose
0.857
boîte à outils N1 purpose
0.857 boîte à pet N1 purpose Meton N2 0.857 boîte-à-musique N1 production
0.143
bombe à fission N1 use
1.000 bombe à fusion N1 use
1.000
351
Compound Head Relation Tropes N1 SRI Val N2 SRI Val bombe à hydrogène N1 use
1.000
bombe à neutrons N1 use
1.000 bonnet à prêtre Exo possession-rev Metaph C
bouche à feu Exo production Meton N1/N2
0.125 bouche à oreille Exo --- Meton N1/N2
bouilloire à thé N1 purpose boule à neige N1 location-rev boule à zéro N1 --- Metaph N1
bourse-à-berger Exo possession-rev Metaph C bourse-à-pasteur Exo possession-rev Metaph C broche à glace N1 purpose
brosse à cheveux N1 purpose brosse à dents N1 purpose cabane à sucre N1 production cage à écureuil Exo purpose Metaph N1/N2
canne à mouche N1 purpose Meton N2
0.250 canne à pêche N1 purpose
canne à sucre N1 source-rev canon à électrons N1 purpose canon à neige N1 purpose carte à puce N1 part-rev cassican à collier N1 part-rev Metaph N2
cave à liqueurs N1 purpose chair à canon Exo --- Meton N1
chaise à bascule N1 part-rev chaise à porteurs N1 use-rev chambre à air N1 location-rev chambre à feu N1 location-rev Meton N2
0.250
chambre à gaz N1 location-rev char-à-bancs N1 part-rev charbon à tumeurs N1 cause chardon à foulon N1 use-rev châssis à tabatière N1 similarity chaussette à clous Exo part-rev Metaph N1
cheval à bascule N1 part-rev Metaph N1 cigare à moustache Exo --- Metaph N1/N2 clé à béquille N1 similarity
0.182
clé à bougies N1 purpose
0.910 clé à chaîne N1 part-rev
0.636
clé à cliquet N1 part-rev
0.636 clé à crémaillère N1 part-rev
0.636
clé à douille N1 part-rev
0.636 clé à ergot N1 part-rev
0.636
clé à fourches N1 part-rev
0.636 clé à molette N1 part-rev
0.636
clé à pipe N1 similarity
0.182 clé à pompe N1 use Meton N2 0.910 code à octets N1 composition
compensateur à ressort N1 part-rev compte à rebours N1 --- compte à vue N1 --- condamné à mort N1 argument conduite à risque N1 part-rev conteneur à verre N1 purpose
352
Compound Head Relation Tropes N1 SRI Val N2 SRI Val copolymère à blocs N1 composition
cornet à bouquin N1 part-rev corps à corps Exo location coton à fromage N1 purpose couleuvre à collier N1 part-rev Metaph N2
course à pied N1 use Meton N2 couteau à beurre N1 purpose
0.250
couteau à poisson N1 purpose couteau à viande N1 purpose croix à degrés N1 location Meton N2
cuiller à dessert N1 purpose
0.500 cuiller à œuf N1 purpose
0.500
cuiller à pot N1 purpose
0.167 cuillère à café N1 purpose
0.333
cuillère à moka N1 purpose
0.333 cuillère à soupe N1 purpose
0.500
diode à vide N1 location-rev
0.833 échange à terme N1 time
échelle à poissons N1 purpose Metaph N1 écrou à créneau N1 part-rev
escalier à vis N1 similarity étable à pourceaux N1 purpose étoile à neutrons N1 composition fabrication à façon N1 --- face-à-face Exo location face-à-main Exo --- Metaph N1
0.200
fauteuil à bascule N1 part-rev fauteuil à voile Exo --- femme à barbe N1 part-rev fenêtre à guillotine N1 similarity fenêtre à tabatière N1 similarity fer à cheval N1 purpose
0.250 fermeture à glissière N1 part-rev
fil à plomb N1 part-rev filet à cheveux N1 purpose filet à provisions N1 purpose fils à papa N1 possession-rev four à chaux N1 production frein à main N1 use-rev
0.600 frein à pédale N1 use
fruit à pain N1 similarity goutte-à-goutte Exo --- groseille à maquereau N1 purpose hache à main N1 use-rev
0.600 herbe à chat N1 purpose
huile à broche N1 purpose if à baies N1 production instrument à cordes N1 part-rev instrument à vent N1 use
0.750 lampe à décharge N1 use
1.000
lampe à huile N1 use
1.000 lampe à incandescence N1 use
1.000
lampe à sodium N1 use
1.000 langage à objets N1 part-rev
353
Compound Head Relation Tropes N1 SRI Val N2 SRI Val lémurien à fourche N1 part-rev Metaph N2
ligne à intervalles N1 location lit à baldaquin N1 part-rev lit à courtines N1 part-rev locomotive à vapeur N1 use Meton N2
1.000
logiciel à contribution N1 --- machine à café N1 production machine à sous N1 use machine à vapeur N1 use
1.000 main à main Exo ---
0.200
maintien à poste N1 argument manche à air N1 location Metaph N1
manche à balai N1 purpose Metaph C manche-à-balle Exo ---
manchot à jugulaire N1 part-rev Metaph N2 manomètre à écrasement N1 use
marché à terme N1 time marche à vue N1 use marsouin à lunettes N1 part-rev Metaph N2
mise à disposition N1 argument
0.833 mise à jour N1 ---
0.833
mise à niveau N1 argument
0.833 mise à pied Exo argument
0.167
mongol à batteries Exo use Metaph N1 mot à mot Exo ---
moteur à explosion N1 use mouche à miel N1 production moule à manqué N1 purpose moulin à café N1 purpose
0.444 moulin à eau N1 use
0.222
moulin à légumes N1 purpose
0.444 moulin à papier N1 production
0.222
moulin à paroles Exo production Metaph N1 0.222 moulin à poivre N1 purpose
0.444
moulin à prières N1 location-rev
0.111 moulin à sel N1 purpose
0.444
moulin à vent N1 use
0.222 0.750 munition à fragmentation N1 use
mûrier à papier N1 source-rev muscari à grappe N1 part-rev muscari à toupet N1 part-rev Metaph N2
nid à rats Exo purpose Metaph C niveau à bulle N1 use
nom à penture N1 part-rev Metaph N2 œuf à cheval N1 ---
0.250
orange à nombril N1 part-rev Metaph N2 os à moelle N1 part-rev
otarie à crinière N1 part-rev ouvrage à cornes N1 similarity palmier à huile N1 source-rev panier à salade N1 purpose papier à musique N1 purpose Meton N2
parenté à plaisanterie Exo --- passage à faune N1 purpose
0.250
354
Compound Head Relation Tropes N1 SRI Val N2 SRI Val passage à niveau N1 location
0.250
passage à tabac Exo ---
0.250 passage à vide N1 argument Metaph N2 0.250 0.167
passe à poissons N1 purpose pâte à choux N1 purpose
0.333 pâte à dents N1 purpose
0.167
pâte à papier N1 purpose
0.333 pâte à pet N1 production Meton N2 0.333 pâte à prouts N1 production
0.333
pâte à sel N1 part-rev
0.167 patente à gosse N1 ---
patin à glace N1 purpose patin à roulettes N1 part-rev pêche à feu N1 use Metaph N2 0.750 0.125
pelle à balai N1 --- Metaph N1 pelle à neige N1 purpose
pelle-à-cul Exo purpose Metaph N1 pentode à vide N1 location-rev
0.833
piano à bretelles N1 part-rev Metaph N1 piano à queue N1 part-rev Metaph N2 pièce à conviction N1 purpose
pied à boule Exo location pied à coulisse Exo part-rev Metaph N1
pied-à-terre Exo location piège à cons N1 purpose Metaph N1
pierre à briquet N1 purpose
0.500 pierre à chaux N1 source-rev
0.333
pierre à évier N1 purpose
0.500 pierre à feu N1 cause
0.167 0.125
pierre à fusil N1 purpose
0.500 pierre à plâtre N1 source-rev
0.333
pince à escargots N1 purpose pince à linge N1 purpose planche à dessin N1 purpose
0.500 planche à pain N1 purpose
0.500
planche à roulettes N1 part-rev
0.500 planche à voile N1 part-rev
0.500
planchette à pince N1 part-rev poêle à marrons N1 purpose poire à poudre N1 location-rev Metaph N1
polymère à blocs N1 composition pompe à bicyclette N1 purpose
0.333 pompe à carburant N1 purpose
0.500
pompe à eau N1 purpose
0.500 pompe à huile N1 purpose
0.500
pompe à sodium N1 purpose
0.167 pompe à vélo N1 purpose
0.333
porcelaine à feu N1 ---
0.125 porte-à-porte Exo ---
pot à feu N1 location-rev Meton N2 0.250 0.250 pot à tabac Exo purpose Metaph C 0.250
poudre à canon N1 purpose propulseur à liquide N1 use puce à gènes N1 composition
355
Compound Head Relation Tropes N1 SRI Val N2 SRI Val punk à chien N1 possession
râteau à feuilles N1 purpose roue à aubes N1 part-rev rouge à lèvres N1 purpose Meton N1
rouleau à pâtisserie N1 purpose roulement à billes N1 use sac à dos N1 location sac à main N1 use-rev
0.600 sagne à tamis Exo ---
sanglier à verrues N1 part-rev saut à ski N1 use savonnette à vilain Exo purpose Metaph N1
scannage à domicile N1 location scie à métaux N1 purpose scie à ruban N1 part-rev séparateur à œuf N1 purpose serpent à lunettes N1 part-rev Metaph N2
serpent à plumes N1 part-rev Metaph N1 serpent à sonnette N1 part-rev Metaph N2 serrure à bosse N1 part-rev
soda à pâte Exo --- soffite à caissons N1 part-rev station à essence N1 purpose steak à cheval N1 ---
0.250 stylo à bille N1 part-rev
tapette à mouche N1 purpose
0.250 tapisserie à personnages N1 part-rev
tasse à vin N1 purpose télévision à péage N1 use Meton N1
terre-à-terre Exo --- tête à claques N1 cause Meton N1 0.400
tête à gifle N1 cause Meton N1 0.400 tête à perruque N1 purpose
0.200
tête-à-queue Exo ---
0.200 tête-à-tête Exo location
0.200
tétrode à vide N1 location-rev
0.833 train à vapeur N1 use Meton N2
1.000
triode à vide N1 location-rev
0.833 trombone à coulisse N1 part-rev
trombone à pistons N1 part-rev tube à décharge N1 use
0.250 tube à essai N1 purpose
0.250
tube à vide N1 location-rev
0.500 0.833 tueur à gages N1 ---
tuteur à roulettes N1 part-rev Metaph N1 usine à gaz N1 production
vache à lait N1 production
0.200 valet-à-patin N1 ---
ventre à choux Exo --- Meton N1 ver à soie N1 production
vielle à roue N1 part-rev voiture à bras N1 use vol à voile N1 use
0.143 voyage à forfait N1 ---