Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of...

369
Toward a Typology of Semantic Transparency: The Case of French Compounds by Yves Stephen Bourque A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of French University of Toronto © Copyright by Yves Bourque 2014

Transcript of Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of...

Page 1: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

Toward a Typology of Semantic Transparency: The Case of French Compounds

by

Yves Stephen Bourque

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Graduate Department of French University of Toronto

© Copyright by Yves Bourque 2014

Page 2: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

ii

Toward a Typology of Semantic Transparency:

The Case of French Compounds

Yves Bourque

Doctor of Philosophy

Department of French Studies University of Toronto

2014

Abstract

This thesis proposes an extension to existing models of semantic transparency in compounding

by incorporating previously unexplored features into the concept. Although the term semantic

transparency is widely used in research on multi-word lexemes, the concept is often viewed as

simply a matter of semantic compositionality, which is to say that transparency is determined

solely by the meaning of individual constituents. While this approach offers a number of

advantages, it is nevertheless insufficient on two accounts: one, it groups together a number of

compositional compounds that are otherwise semantically distinct (e.g. ice breaker ~ ice cube ~

ice age), and two, it offers no clear means to order or rank partially compositional compounds

(e.g. firefly ~ butterfly ~ barfly). This thesis therefore argues for a more holistic approach to

semantic transparency, one that views the concept as both scalar and multi-faceted, the result of

which is a more granular model capable of further distinguishing between several different

compound types.

The typology of semantic transparency of compounds proposed in this work consists of four

basic factors supported by a dataset consisting of more than 1,000 French NN and N à N

compounds extracted from Wiktionary. The first factor, headedness, touches on features

Page 3: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

iii

pertaining to a compound’s semantic head, for which concepts such as canonical position and

strong/weak centricity are advanced. Second, compositionality is formalized according to a four-

way configuration based on how individual constituents contribute meaning to the whole. The

third factor, semantic homogeneity, relates to the degree of shared meaning between

analogically similar compounds. Finally, the implicit semantic relations within compounds are

explored. Consequently, fifteen basic associations are proposed and evaluated using the

compound data collected from Wiktionary. Together, these four factors yield a typology

involving sixteen possible transparency profiles, each of which is ordered according to the

relative weight of its semantic features. It is believed that this typology offers a far richer

conceptual space within which to further the discussion of semantic transparency as it pertains to

complex constructions such as compounds.

Page 4: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

iv

Acknowledgments

Although a thesis is often solitary work, it is by no means an independent one. I therefore wish

to thank the numerous people who have contributed to this work along the way.

I must first express my deepest gratitude to my advisor, Anne-Marie Brousseau, whose

mentorship has provided me with the guidance I needed to complete this project. Her

willingness to share her expertise and knowledge has contributed tremendously to my work and

I will forever be indebted to her for her unwavering support and direction over the past several

years.

I would also like to extend my thanks to my committee members, Yannick Portebois and Yves

Roberge. They have never shied from raising difficult questions or offering their pointed

criticism. Their remarks have led to many productive and fruitful discussions regarding the

direction of my work and I am grateful for their valuable input and insight. My sincerest thanks

as well to my external examinor, Jean-Pierre Koenig, for his detailed and constructive feedback.

This thesis would also not have been possible without the generous support of the French

Department, as well as the Ontario Graduate Scholarships I was fortunate to receive.

It is important to know that pursuing graduate studies means that you get to meet a lot of like-

minded people along the way. I am grateful to have met (in alphabetical order) Alisha, Maud,

Ritu, Ruth-Ellen, Simona, and Vince, with whom I have commiserated, deliberated, and

celebrated. I would also like to acknowledge Muriel and Erich for their companionship and

encouragement over these many years.

To my families, my heartfelt thanks for their steadfast support and encouragement.

Finally, Mary Jane, to whom I extend my eternal gratitude for having stuck with me all these

years. It was a long and difficult road and I couldn’t have traveled it without you.

Page 5: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

v

Table of Contents

Acknowledgments  ..............................................................................................................................  iv  

Table  of  Contents  .................................................................................................................................  v  

List  of  Tables  .........................................................................................................................................  x  

List  of  Figures  ....................................................................................................................................  xiii  

List  of  Appendices  ............................................................................................................................  xiv  

 Introduction  ......................................................................................................................  1  Chapter  1

1.1   Framework  ............................................................................................................................................  4  1.2   Object  of  Study  .....................................................................................................................................  5  1.3   Data:  Wiktionary  .................................................................................................................................  6  1.4   Database:  Polylexical.com  ...............................................................................................................  7  1.5   Organization  .........................................................................................................................................  8  

 On  Semantic  Transparency  .......................................................................................  10  Chapter  2

2.1   Semantic  Transparency:  Preliminaries  .....................................................................................  11  2.1.1   A  Brief  Word  on  Transparency  vs.  Opacity  ......................................................................................  11  2.1.2   A  Paucity  of  Description  ............................................................................................................................  12  

2.2   Semantic  Transparency:  Experimental  Studies  .....................................................................  15  2.2.1   Studies  of  Semantic  Transparency  and  Compounding  ................................................................  17  

2.3   Semantic  Transparency:  Definitions  and  Models  ..................................................................  26  2.3.1   Some  Definitions  ..........................................................................................................................................  27  2.3.2   Transparency  as  a  Continuum  ...............................................................................................................  31  2.3.3   Explicit  Semantic  Transparency  Clines  ..............................................................................................  34  

2.4   Semantic  Transparency:  A  Working  Definition  ......................................................................  40  2.4.1   Transparency  vs.  Compositionality  .....................................................................................................  40  2.4.2   Semantic  Transparency  Defined  ...........................................................................................................  46  

2.5   Summary  ..............................................................................................................................................  48  

 French  Nominal  Compounds  and  Data  Collection  .............................................  50  Chapter  3

3.1   Compounding  .....................................................................................................................................  50  3.1.1   Defining  the  Compound  ............................................................................................................................  51  3.1.1.1   Compounding  Criteria  .........................................................................................................................................  56  3.1.1.2   Compounding  in  French  .....................................................................................................................................  60  

Page 6: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

vi

3.1.1.3   Which  Compounds  Should  We  Investigate?  ..............................................................................................  62  3.2   Data:  French  Nominal  Compounds  .............................................................................................  67  3.2.1   The  Wiktionary  Database  .........................................................................................................................  69  

3.3   Selecting  Compounds  and  Cleaning  up  the  Data  ....................................................................  71  3.3.1   Which  Compounds  to  Include?  ..............................................................................................................  72  3.3.2   Reducing  the  initial  dataset  .....................................................................................................................  74  

3.4   Labeling  the  Entries  .........................................................................................................................  75  3.4.1   Automatically  Assigning  Lexical  Categories  .....................................................................................  76  3.4.2   Which  Lexical  Category?  ...........................................................................................................................  79  3.4.3   Cleaning  Up  the  Remaining  Data  ...........................................................................................................  86  3.4.4   Gender  and  Number  ...................................................................................................................................  88  

3.5   Summary  ..............................................................................................................................................  90  

 Compound  Meaning:  Features  and  Factors  .........................................................  91  Chapter  4

4.1   Centricity  .............................................................................................................................................  91  4.1.1   Endocentric  Compounds  ..........................................................................................................................  93  4.1.2   Head  position  .................................................................................................................................................  95  4.1.3   Coordinated  Compounds  ..........................................................................................................................  99  4.1.4   Exocentric  Compounds  ..........................................................................................................................  103  4.1.4.1   Exocentric  by  Trope  ..........................................................................................................................................  107  

4.1.5   Summary  .......................................................................................................................................................  112  4.2   Semantic  Compositionality  .........................................................................................................  113  4.2.1   Definition  and  Approach  ........................................................................................................................  114  4.2.2   Metaphor  and  Metonymy  ......................................................................................................................  117  4.2.2.1   Combining  Compositionality  and  Centricity  ...........................................................................................  121  

4.2.3   Summary  .......................................................................................................................................................  125  4.3   Semantic  Homogeneity  ................................................................................................................  125  4.3.1   Semantic  Reliability  Index  ....................................................................................................................  135  4.3.1.1   How  Does  the  SRI  Fit  in?  .................................................................................................................................  143  

4.4   Summary  ...........................................................................................................................................  145  

 Compound  Relations  .................................................................................................  147  Chapter  5

5.1   Studies  on  the  Semantics  of  Compounds  ...............................................................................  150  5.1.1   Early  studies  ................................................................................................................................................  151  5.1.2   Recent  Developments  in  Compound  Relations  ............................................................................  160  5.1.3   Summary  .......................................................................................................................................................  167  

Page 7: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

vii

5.2   Retained  Semantic  Relations  .....................................................................................................  168  5.2.1   Interpreting  Compounds  .......................................................................................................................  171  5.2.2   Presentation  Format  ................................................................................................................................  177  5.2.2.1   Hypernymy  ............................................................................................................................................................  179  5.2.2.2   Coordination  .........................................................................................................................................................  180  5.2.2.3   Similarity  ................................................................................................................................................................  185  5.2.2.4   Function  ..................................................................................................................................................................  188  5.2.2.5   Possession  ..............................................................................................................................................................  190  5.2.2.6   Part  ...........................................................................................................................................................................  194  5.2.2.7   Location  ..................................................................................................................................................................  198  5.2.2.8   Composition  ..........................................................................................................................................................  200  5.2.2.9   Source  ......................................................................................................................................................................  202  5.2.2.10   Cause  and  Production  ....................................................................................................................................  204  5.2.2.11   Topic  ......................................................................................................................................................................  206  5.2.2.12   Time  .......................................................................................................................................................................  207  5.2.2.13   Use  ..........................................................................................................................................................................  208  5.2.2.14   Purpose  and  Proper  Function  ....................................................................................................................  210  

5.3   Summary  ...........................................................................................................................................  215  

 Compound  Relations:  Application  Results  .........................................................  217  Chapter  6

6.1   NN  Compounds  ................................................................................................................................  217  6.1.1   Relations  .......................................................................................................................................................  217  6.1.2   Residual  NN  Compounds  .......................................................................................................................  224  6.1.2.1   Idiosyncratic  and  Partially  Unrelated  Compounds  ..............................................................................  225  6.1.2.2   Nouns  and  Adjectives  .......................................................................................................................................  227  6.1.2.3   Classificatory  Relation  ......................................................................................................................................  229  6.1.2.4   NN  Compounds  Involving  Nominalizations  ............................................................................................  229  

6.1.3   NN  Compounds:  Conclusion  .................................................................................................................  232  6.2   N  à  N  Compounds  ...........................................................................................................................  233  6.2.1   The  Preposition  À  .....................................................................................................................................  233  6.2.2   Results  for  N  à  N  Compounds  ..............................................................................................................  241  6.2.3   N  à  N:  Residual  Data  ................................................................................................................................  249  6.2.3.1   Idiosyncratic  and  Semantically  Unrelated  N  à  N  ..................................................................................  249  6.2.3.2   N  à  N  Compounds  Involving  Nominalizations  .......................................................................................  251  

6.3   Summary  ...........................................................................................................................................  252  

 Putting  It  All  Together  ..............................................................................................  254  Chapter  7

Page 8: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

viii

7.1   Semantic  Transparency:  A  Definition  Revisited  ..................................................................  254  7.2   Semantic  Transparency:  A  First  Pass  ......................................................................................  257  7.2.1   Primary  Factors  .........................................................................................................................................  257  7.2.2   Semantic  Relations  ...................................................................................................................................  261  7.2.2.1   Relation  Types  .....................................................................................................................................................  261  7.2.2.2   Ordering  Subordinate  Relations  ..................................................................................................................  266  7.2.2.3   Source  of  the  Relation  .......................................................................................................................................  269  7.2.2.4   Reversibility  ..........................................................................................................................................................  271  7.2.2.5   Frequency  ..............................................................................................................................................................  272  

7.2.3   Summary  .......................................................................................................................................................  274  7.3   The  Semantic  Transparency  of  French  Compounds  ..........................................................  276  7.3.1   Canonical  Endocentric  Compounds  ..................................................................................................  276  7.3.1.1   Strongly  Endocentric,  Fully  Compositional:  passage  piétons  and  boîte  à  outils  ......................  276  7.3.1.2   Strongly  Endocentric,  Weakly  Compositional:  mot-­‐clé  and  piano  à  queue  ...............................  280  7.3.1.3   Strongly  Endocentric,  Partially  Compositional:  bateau-­‐mouche  ...................................................  280  7.3.1.4   Weakly  Endocentric:  valse-­‐hésitation  and  poire  à  poudre  ................................................................  281  

7.3.2   Non-­‐Canonical  Endocentric  Compounds  ........................................................................................  282  7.3.2.1   Strongly  Endocentric,  Fully  Compositional:  bracelet-­‐montre  .........................................................  283  7.3.2.2   Strongly  Endocentric,  Weakly  Compositional:  reine-­‐marguerite  ..................................................  284  7.3.2.3   Strongly  Endocentric,  Partially  Compositional:  aube-­‐vigne  .............................................................  284  7.3.2.4   Weakly  Endocentric:  vidéo-­‐lynchage  .........................................................................................................  285  

7.3.3   Exocentric  Compounds  ..........................................................................................................................  285  7.3.3.1   Fully  Compositional  Exocentric:  ballon-­‐panier  and  pied  à  boule  ...................................................  286  7.3.3.2   Weakly  Compositional:  radio-­‐trottoir  and  cage  à  écureuil  ...............................................................  288  7.3.3.3   Partially  Compositional:  chat-­‐château  and  soda  à  pâte  .....................................................................  290  7.3.3.4   Non-­‐Compositional:  cap-­‐mouton  and  sagne  à  tamis  ...........................................................................  290  

7.4   Summary  ...........................................................................................................................................  291  

 Conclusion  .....................................................................................................................  294  Chapter  8

8.1   Contributions  of  the  Thesis  ........................................................................................................  294  8.2   Remarks  on  the  Wiktionary  Data  .............................................................................................  300  8.3   Polylexical.com  ...............................................................................................................................  300  8.4   Future  Perspectives  ......................................................................................................................  302  8.4.1   Sense  Extension  .........................................................................................................................................  303  8.4.2   Conceptual  Classes  ...................................................................................................................................  304  8.4.3   Frequency  .....................................................................................................................................................  305  

Page 9: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

ix

8.4.4   Family  Size  ...................................................................................................................................................  307  8.4.5   Testing  the  Typology  and  Closing  Remarks  ..................................................................................  308  

References  .........................................................................................................................................  310  

Appendices  ........................................................................................................................................  329  

Page 10: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

x

List of Tables

Table 2.1. Dressler’s hierarchy of morphotactic transparency (Dressler 1985: 330-331) ............ 35  

Table 3.1. Gross’s (1988) typology of French nominal compounds. ........................................... 61  

Table 3.2. Major classes from Mathieu-Colas’s typology retained for the present study. ........... 73  

Table 3.3. Fradin’s (2009: 420) categorial distribution for French compounds. .......................... 78  

Table 4.1. Types of exocentric compounds in Bauer (2010) ...................................................... 106  

Table 4.2. Possible combinations of compound features. ........................................................... 122  

Table 4.3. Number of templates and tokens found in the data. .................................................. 136  

Table 4.4. List of papier-N compounds taken from LPR2010 under the entry for papier. ........ 138  

Table 4.5. List of pompe à N compounds taken from LPR2010 under the entry for pompe. ..... 138  

Table 4.6. NN compounds with average template SRI based on the left-most constituent. ....... 140  

Table 4.7. NN compounds with average template SRI based on the right-most constituent ..... 140  

Table 4.8. N à N compounds with average template SRI based on the left-most constituent .... 142  

Table 4.9. N à N compounds with average template SRI based on the right-most constituent. 142  

Table 5.1. Lees’s (1960) grammatical relations of nominal compounds. ................................... 153  

Table 5.2. Adams’s (1973) compound classes. ........................................................................... 155  

Table 5.3. Downing’s (1977) minimal compound relationships. ............................................... 156  

Table 5.4. Warren’s (1978) semantic classes. ............................................................................. 157  

Table 5.5. Levi’s (1978) Recoverably Deletable Predicates. ...................................................... 159  

Table 5.6. Lauer’s (1995) preposition based treatment of compounds. ...................................... 161  

Page 11: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

xi

Table 5.7. Vanderwende’s (1994) classification schema of noun sequences. ............................ 162  

Table 5.8. Jackendoff’s (2010) 14 Basic Functions. ................................................................... 164  

Table 5.9. Arnaud’s (2003) high-level relations for French NN compounds. ............................ 166  

Table 5.10. Summary of the number of semantic relations present in the literature. ................. 167  

Table 5.11. Logico-semantic relations retained in this work. ..................................................... 170  

Table 5.12. A comparison of compound interpretations across five different works. ................ 174  

Table 5.13. Bauer’s five main types of coordinated compounds. ............................................... 182  

Table 5.14. Compounds listed as dd or “N2 has N1” in Arnaud (2003). ................................... 191  

Table 5.15. The SOURCE, PRODUCTION and CAUSE relations compared. ..................................... 203  

Table 5.16. A comparison of PURPOSE and USE. ......................................................................... 211  

Table 5.17. Summary of the relations retained in the present work. .......................................... 215  

Table 6.1. Results of relations analysis for NN compounds. ...................................................... 218  

Table 6.2. Results of judgement tests in Arnaud (2003). ............................................................ 219  

Table 6.3. Summary of Knittel (2010) relations for the preposition à. ...................................... 240  

Table 6.4. Results of Compound Relations for N à N compounds. ............................................ 241  

Table 6.5. Relative frequency of relations across compound types. ........................................... 253  

Table 7.1. Distribution of features for the French compounds collected from Wiktionary. ....... 260  

Table 7.2. Number of compounds in the data for each subordinate relation. ............................. 272  

Table 7.3. +canon, +strong, fully compositional compounds ordered according to relations. ... 277  

Table 7.4. SRI values for compounds within recurring N-X templates. ..................................... 278  

Page 12: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

xii

Table 7.5. Strongly endocentric, fully compositional NN compounds, ordered by relation. ..... 283  

Table 7.6. Exocentric NN compounds involving basic semantic relations. ............................... 286  

Table 7.7. Exocentric N à N compounds involving basic semantic relations. ........................... 287  

Table 7.8. Summary of compounds and their features, ordered by transparency. ...................... 292  

Table 8.1. Transparency configurations, from most to least transparent. ................................... 297  

Page 13: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

xiii

List of Figures

Figure 2.1. The relationship between compositionality and transparency. ................................... 45  

Figure 2.2. Representation of Cruse’s (1986) continuum of semantic transparency. ................... 32  

Figure 2.3. Levi’s (1978) continuum of derivational transparency as applied to compounds. ..... 36  

Figure 2.4. Ambiguous pairs in Libben’s (1998) typology of semantic transparency ................. 39  

Figure 2.5. A continuum of semantic transparency. ..................................................................... 47  

Figure 3.1. The results screen for the compound café-filtre in Mathieu-Colas’s database. .......... 89  

Figure 4.1. Distribution of compounds according to features related to the head. ..................... 112  

Figure 4.2. The relationship between compositionality and transparency. ................................. 114  

Figure 4.3. Possible configurations for semantic compositionality. ........................................... 116  

Figure 4.4. Distribution of features for endocentric compounds. ............................................... 124  

Figure 4.5. Relationship between compound features and the semantic reliability index. ......... 144  

Figure 7.1. A typology of semantic transparency of compounds. .............................................. 259  

Figure 7.2. Difference between coordinated and hypernymic compounds. ................................ 264  

Figure 8.1. Search interface for Polylexical.com. ....................................................................... 301  

Page 14: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

xiv

List of Appendices

Appendix A. Sample SRI calculations for various N-X templates............................................ 329

Appendix B. Comparison of compound relations in the literature............................................ 331

Appendix C. Partial schema of Wiktionary’s database structure............................................... 335

Appendix D. NN French compounds retained from Wiktionary............................................... 336

Appendix E. N à N French compounds retained from Wiktionary........................................... 350

Page 15: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

1

Chapter 1

Introduction

Compounds are, in many ways, liminal objects located somewhere between the lexicon and

syntax: they are functionally similar to words, yet they consist of words themselves. In this

regard, compounds pose several descriptive challenges, many of which involve issues touching

on a number of different domains. One such issue has to do with meaning construal. Because

compounds, like phrases, typically consist of units that, in isolation, are both meaningful and

denotational, one might expect their sense to be determined according to basic principles of

compositionality, which is to say that the meaning of the whole is a function of its parts. At first

glance, this assumption seems well-founded: speakers often have little trouble understanding

unfamiliar compounds, even when they are provided with limited contextual support.

Wisniewski and Clancy (2004), for instance, conducted a survey of more than 700 novel

combinations in several magazine and newspaper articles and found that less than 15% of these

items were preceded elsewhere in the text by both the modifier and the head (cited in Storms

and Wisniewski 2005). Furthermore, the analysis of a random sample of these compounds (296)

revealed that only 11% of them were accompanied by some sort of definition (also cited in

Storms and Wisniewski 2005). These findings suggest that authors are not typically concerned

that the use of compounds will baffle their readers, presumably because these constructions are

easy to understand. Of course, this is not to say that such novel compounds are completely

independent of context (they are, after all, used within a larger body of content), but rather that

they are naturally occurring items that typically pose few processing challenges to speakers.

If we turn our attention, however, to established compounds, we notice that many of them are

not so easily understood. Let us consider, for instance, the following compounds:

(1) a. honeybee ‘a bee that makes honey’

b. honeydew ‘a sweet, sticky substance exuded by plants and insects’

c. honeymoon ‘a romantic vacation following a wedding’

Page 16: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

2

Assuming that a speaker is unfamiliar with all three constructions, we might say that, intuitively,

the compound in (1a) is easier to understand than those in (1b) and (1c). This observation has

led some researchers to talk about the semantic transparency of compounds, a property of multi-

word lexemes related to how clearly their meaning emerges from their constituents (Sandra

1990, Zwisterlood 1994, Libben 1998). Consequently, honeybee in (1a) is viewed as

semantically transparent, while honeydew and honeymoon in (1b) and (1c) respectively, are

considered semantically opaque. This distinction is in fact what underlies a great many works on

the semantics of compounds and largely serves as an effective means of establishing which of

these constructions are harder to understand than others: all things being equal, a semantically

transparent compound is easier to interpret than a semantically opaque one.

While the matter may seem rather unambiguous, a close examination of different, yet related

compounds raises a number of questions. First, is a binary opposition sufficient? In other words,

are compounds either transparent or opaque, or might we benefit from a more granular approach

to the concept? This question stems from the fact that many compounds involving the same

lexemes show considerable differences at the level of meaning construal. The following

compounds, all of which contain the word fly, are good examples of this variation:

(2) a. housefly ‘a fly typically found in houses’

b. firefly ‘a nocturnal beetle that emits light’

c. butterfly ‘an insect with large, colourful wings’

d. barfly ‘a person who spends much time in a bar’

e. gadfly ‘an annoying person’

Without going into too much detail, we may state that the compounds in (2) above differ in

significant ways according to how the meaning of their constituents contribute to the meaning of

the whole. If we consider housefly transparent and gadfly opaque, what should we say about

firefly, butterfly, and barfly? Are they to be treated as either of these types or are they perhaps

located somewhere in between?

A second question that we may want to ask is if transparency is strictly an intuitive concept or if

it can be formalized? In other words, can we apply the labels transparent and opaque

consistently across compounds using a strict set of criteria? If such a formalization is possible,

what factors and features should such a model include? Revisiting the compounds in (2) above,

Page 17: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

3

we notice that in some cases meaning involves a metonymy (e.g. firefly) and in other cases a

metaphor (e.g. barfly and gadfly). We may also notice that some compounds contain an element

with no bearing on the meaning of the whole (e.g. butter in butterfly). Can these observations be

incorporated into a model of semantic transparency and can such a model be used not only to

distinguish between compounds, but also to rank them?

Third, compounds differ from phrases and many other multi-word constructions in that many of

them are semantically incomplete, so to speak. This is especially true of binomial constructions,

where constituents are connected together by some unexpressed predicate. In housefly in (2a)

and barfly in (2d), the relation might be said to be locative, while in firefly in (2b), the relation is

one of production (albeit metaphorically). To my knowledge, although these implicit relations

have been widely researched (among others, Lees 1968, Adams 1973, Warren 1978, Levi 1978),

very little has been said regarding their role in a compound’s degree of semantic transparency. If

a formal model of transparency were indeed possible, how should semantic relations factor into

it?

Given these questions and premises, the aim of this work is to explore the dual-concept of

semantic transparency and opacity as it pertains to compounds. It is the contention of this thesis

that transparency is a scalar phenomenon that may be formalized according to several different

factors. The result of this formalization is a typology capable of classifying compounds

according to their degree of semantic transparency. Such a typology could potentially inform a

number of different fields, including psycholinguistics and computational linguistics, where

issues in lexical storage and processing, as well as automatic interpretation and translation, are

closely tied to the perceived semantic transparency of complex units such as compounds.

It is worth noting that the model developed in this work builds on a rich and varied body of

research on compounding, semantic transparency, and compositionality, as well as theories in

lexical processing. Although a number of formal models of transparency have been proposed

over the years, this thesis aims to show that further developments are in fact possible and that by

extending these models, we may introduce a far richer conceptual space within which to discuss

the semantics of compounds.

Page 18: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

4

1.1 Framework

This thesis represents work in lexical semantics: the focus throughout this work is first and

foremost on the meaning of compounds from the perspective of traditional approaches in lexical

semantics (Cruse 1986). Because compounds are in fact complex units that share a number of

features with other types of lexical units, they must also be examined within a morphological

framework. Consequently, the theoretical framework adopted in this work is largely founded on

a lexicalist approach to morphology, which is to say that word-formation is a component of the

morphological module and is governed by rules or principles not shared with those of syntax

(Chomsky 1970). Compounds are viewed here as morphological units and are treated according

to existing principles in lexicalist morphology (Allen 1978). Moreover, this framework has also

served as the foundation for a great deal of research on compounding (among others, Adams

1973, Bauer 1978, Fabb 1998, Scalise and Bisetto 2009). The choice of this particular

morphological framework, however, is not necessarily crucial to the stated goals of this project.

Given that the emphasis is primarily on the semantic features of compounds, other frameworks,

such as distributed morphology (Halle and Marantz 1994) or the minimalist program (Chomsky

1995, Hale and Keyser 2002), are also viable options for this study.

Because semantic transparency also involves a number of cognitive processes, a good deal of

the background research used to support the typology of transparency proposed in this thesis

comes from work in both cognitive linguistics and psycholinguistics. On the one hand, the

processing of compounds is intimately related to how lexical items are stored and retrieved

(Frauenfelder and Schreuder 1992, Marslen-Wilson et al. 1994, Chialant and Caramazza 1995),

and on the other, how different concepts are combined and reconciled (Murphy 1988,

Wisniewski 1997, Gagné and Spalding 2007). Moreover, compounds often involve tropes, such

as metaphor and metonymy, which make use of several well-known cognitive processes (Lakoff

and Johnson 1980). Finally, experimental studies in psycholinguistics have sought to quantify

how speakers process compounds, often based on a number of different factors (Sandra 1990,

Zwisterlood 1994, Libben et al. 2003, Frisson et al. 2008). In the course of developing a

typology of semantic transparency, the work carried out in the following chapters will touch on

these and other research endeavours involving compounds.

Page 19: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

5

1.2 Object of Study

Although much of the work on semantic transparency conducted in this thesis is based on

research on English compounding, the typology proposed in Chapter 7 is largely founded on a

study of French nominal compounds possessing the structure NN and N à N. That said, none of

the features or factors explored are specific to French compounding, which should allow for a

typology that remains applicable to compounds in other languages.

All matters pertaining to compounding are explored in greater detail in Chapter 3, but we may

say here that the decision to focus on French compounds is based on the following facts. First,

although well-represented in the literature, French compounds have not received a great deal of

attention from the perspective of semantic transparency; while research on compound

transparency is by no means exclusive to English, the majority of the work on this topic has

focused on constructions in that language. Second, compounding in French displays behaviour

not observed in English, namely in its variable head position (cf. Scalise and Fábregas 2010).

Third, French compounds, like other Romance languages, makes extensive use of prepositions

in otherwise binomial constructions. These prepositional linking units, typically absent from

English compounds, provide additional material with which to discuss transparency effects.

It is important to note that the study of semantic transparency offered in this work is based on

the premise that the concept should be examined from the perspective of the listener, which is to

say as a factor in interpretation and not in coining or formation. This position also assumes that

transparency is first and foremost a synchronic matter. Although the compounds under

investigation here have all entered the French lexicon at different stages of the language,

transparency is taken to be relevant only at the moment of interpretation. In other words, while it

may very well be true that a particular compound coined in the 18th century is less transparent

than another coined in the latter half of the 20th century—and that it may have once been more

transparent than it is now—this fact is not what may affect its interpretation. Rather, it is the

semantic drift that occurs, regardless of its origins, that poses a problem to the listener. Speakers

are often unaware of a compound’s etymology and even when they are, it is unclear that their

interpretation of it is based on their knowledge of its origins. This is not to say that one should

ignore etymology or the original motivation for the construction, but simply that for the

Page 20: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

6

purposes of formalizing semantic transparency synchronically, all compounds are to be treated

equally, irrespective of when they were first introduced into the language.

Another point that needs to be made is that the compounds retained for this work come from a

wide range of French varieties. This is in part related to the source used to collect the data

(described in the following section), but also in the difficulties of ascertaining just what

constitutes an entry used by one group, but not another. The focus here is simply on compounds

that clearly belong to French, which is to say that they are composed of French constituents and

that they are themselves French lexemes. It is clear, however, that one group of speakers may

prefer the use of a specific type of compound over another, which might introduce interference

at the level of interpretation if a particular compound involves uncommon features (e.g.

marginal relation, atypical preposition, etc.). We may state now, however, that this thesis, with

its focus on the interpretative aspect of compounds, explores the concept of transparency within

an “ideal speaker-listener” paradigm (Chomsky 1965). In other words, semantic transparency is

formalized with a speaker’s competence in mind, which assumes that he or she is familiar with a

compound’s constituents on the one hand and able to make use of his or her grammar to

establish meaning on the other. This approach is especially crucial when discussing

compositional compounds as these items, while potentially stored as single units, nevertheless

allow for decomposition to occur. Thus, features that focus on speaker dependent factors, such

as variational differences between items or personal idiolect, are not taken into account as the

key criterion is that individual constituents belong to the “ideal” speaker’s lexicon.

1.3 Data: Wiktionary

The source of the data used in this thesis is Wiktionary, an online, user-generated dictionary

developed by the Wikimedia Foundation. All French compounds retained in the present work,

as well as those included in the corresponding online database (see Section 1.4), were extracted

from a February 2011 snapshot of Wiktionary’s French repository. Chapter 3 provides

additional information regarding the methods employed to collect the data used in this project.

Because Wiktionary’s entries and content is entirely managed by its users, it generally lacks the

methodological rigour found in traditional lexicographic works. Some may therefore object to

the use of Wiktionary as a source of data. The decision to use this resource is in part based on its

Page 21: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

7

openness and freely accessible database. Chapter 3 goes into more detail regarding some of the

reasons that motivated this particular choice. It is important, however, to keep the following

point in mind: the compounds under investigation remain compounds regardless of the source.

In other words, whether the compounds chou-fleur or oiseau-mouche are taken from Wiktionary

or from traditional dictionaries such as Larousse or Le Petit Robert, they remain compounds in

French and are thus perfectly valid items with which to conduct an analysis of transparency.

Moreover, because Wiktionary does not adhere to traditional lexicographic methods, it may also

contain entries that have yet to be included in other works. It therefore offers the opportunity to

include in the analysis new or uncommon compounds. Any issues that may arise because of this

particular choice of resource are in fact largely mitigated by the fact that the focus here is on

transparency from the point of view of interpretation: even if some of the compounds retained

are infrequent or marginal, they remain valid if they possess a sense, which may then be

evaluated according to the same features used for established constructions.

1.4 Database: Polylexical.com

Another, secondary objective of this thesis is to provide other researchers with the tagged and

labeled data used for this study so that it may support future work on French compounding. All

compounds under investigation are therefore labeled with several features, such as lexical

categories and headedness, and can be found in the appendices (Appendix D and Appendix E).

Due to the limited nature of this method of presentation, however, I have also made available a

more complete version of the annotated data online at www.polylexical.com. The database

hosted on this site is searchable according to most of the parameters and features described in

this work, although only the NN and N à N compounds are fully annotated. All other types (e.g.

AN, NA, VN, N Prep N, etc.) may be queried according to their parts of speech, number, and

gender. Chapter 3 goes into greater detail regarding the methods used to label the contents of the

database, while Chapter 8 offers a brief look at the search interface I created to query the data.

Page 22: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

8

1.5 Organization

This thesis is organised in the following manner.

Chapter 2 consists of a review of the literature on semantic transparency and focuses on the role

this concept has played in both experimental and theoretical research. By looking at previous

work on transparency, we find that, although many definitions show considerable overlap, there

is no universally accepted description of the concept. Moreover, the term is often used with little

explanation or description, which leads to several questions regarding the types of constructions

under investigation or the exact nature of the concept’s application. The chapter closes with a

working definition of semantic transparency and offers a brief look at the fundamental aspects

behind the typology proposed in the closing chapters of this work.

Chapter 3 presents the methodological underpinnings of this research, which begins with a brief

overview of compounding and its role in word formation. Although the focus here is largely on

NN French compounds, other combinations are also discussed, including those consisting of

adjectives, verbs, and prepositions. The second half of the chapter offers a thorough explanation

of the methods used to collect the data that serves as the basis of this work. More than 10,000

items were extracted from Wiktionary, all of which were tagged for part of speech, number, and

gender using software developed specifically for this project. In the end, only NN and N à N

compounds were retained for the analysis, which consist of 729 and 319 items respectively.

In Chapter 4, the focus is on three features relevant to compounds, namely headedness,

compositionality, and semantic homegeneity. First, it examines compound headedness, a well-

established and widely analyzed aspect of compounding. Based on an examination of the French

data, several observations are made regarding headedness, which are then formalized for

transparency. Second, the notion of semantic compositionality is explored, which is shown to

vary according to sense extension. Third, semantic homogeneity is discussed. It is in this section

that the semantic reliability index (SRI) is proposed, a numerical value meant to represent how

closely a given compound patterns semantically with other, similar compounds. It is argued that

the SRI may be used to further distinguish between otherwise identical compounds.

Chapters 5 and 6 focus on the unexpressed relations held between a compound’s elements.

Chapter 5 opens with an in-depth look at previous research on the topic and offers a detailed

Page 23: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

9

comparison of sixteen different works on compound relations. Based on this research, fifteen

basic relations are proposed and described in detail with support from the French compounds

retained from Wiktionary. Chapter 6 examines the result of the application of these relations,

including their frequency and distribution across types, and subsequently discusses the

compounds that elude this type of analysis.

Finally, in Chapter 7, I offer a synthesis of the features retained. Semantic transparency is

formalized as the interplay between the features discussed in chapters 4 and 5, each of which

may be weighted according to their impact on a compound’s meaningfulness out of context. It is

here that a typology of semantic transparency is proposed. The remainder of the chapter re-

examines the data in light of this typology and offers a final ranking of compounds, the result of

which is a more granular approach to the concept of semantic transparency and one that better

reflects the numerous factors involved in how a compound’s meaning is established.

The thesis concludes in Chapter 8 with a look to the future and other potential avenues worth

investigating if the typology proposed in this work is to be both improved and expanded upon.

While several other features and factors are discussed, it is clear that the next step should be to

test the typology with speakers, a measure that would highlight any flaws in the proposed

model, as well as offer insight into how it could be improved.

Page 24: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

10

Chapter 2

On Semantic Transparency

As was briefly mentioned in the introductory chapter, the concept of semantic transparency is

discussed or alluded to in a wide range of works in semantics, morphology, and lexicology, and

in recent years, has in fact been at the centre of a number of experimental studies in

psycholinguistics. Unlike many other linguistic concepts, however, there does not exist one

standard definition of semantic transparency. In fact, while many of the definitions formulated

over the years share some degree of similarity, often overlapping in crucial ways, it is not

uncommon to come across contradictory or confusing descriptions of the phenomenon. The goal

of this chapter is to propose a working definition of the concept that will serve to better orient

the work that will follow. To this end, this chapter will endeavour to highlight the theoretical

similarities and differences found in previous works on semantic transparency, thus serving to

build a case for my own definition of the concept.

Although a great deal of the work reported on in this chapter is from the field of

psycholinguistics, it should be noted that the discussion of this research is not meant to anchor

the present thesis in that field. Rather, it is meant to show that, on the one hand, semantic

transparency is a linguistic concept with psychological corollaries and, on the other hand, the

experimental work conducted on compound processing can serve to lend support to a theoretical

model of transparency. On occasion, it will be necessary to speculate on some of the cognitive

and psychological aspects behind the model advanced in the present work, but this is done with

the understanding that these assertions are hypothetical in nature.

In Section 2.1, I first make a few points regarding the use of the terms transparency and opacity,

as well as lay out some of the chief problems related to their usage. In Section 2.2, I discuss

some of the experimental work done on compounding in which transparency has been a key

feature. Section 2.3 focuses on the various definitions and hierarchies of semantic transparency

found in the literature. This is then followed in Section 2.4 by my own proposal of a working

Page 25: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

11

definition of semantic transparency, which will serve to frame the approaches and proposals

argued for in the following chapters.

2.1 Semantic Transparency: Preliminaries

Although semantic transparency is at the heart of a considerable number of research projects,

the term is often used casually, as if it were a concept that requires no explanation. As this

section will show, however, the lack of detail that accompanies the use of the term often raises a

number of questions as to the exact nature of the phenomenon.

2.1.1 A Brief Word on Transparency vs. Opacity

The literature is occasionally split on the terms to be used: should we talk about semantic

transparency or semantic opacity? Although Cruse (1986) uses both the term semantic

transparency (or more precisely, “semantically transparent”) and semantic opacity, he clearly

favours the latter in his discussion of the concept. He reserves semantic transparency only for

those expressions that are said to be the binary complements of the semantically opaque ones.

Similarly, Gross (1996) prefers the term opacity to transparency, though he also uses the term

transparency throughout his work on fixed expressions. Again, the terms are used as

complementary antonyms. A priori, nothing seems to hinge, however, on the selection of one

term over the other: fundamentally, it is a matter of perspective.

The fact of the matter is that both terms appear frequently across a variety of works covering a

number of different languages. One must, however, choose a label for the concept, and it would

seem that transparency is the preferred designation: the literature favours transparency over

opacity, even if both terms are in fact used interchangeably. The reader is invited to consult the

bibliography at the end of the present work to confirm that this is in fact the case. Furthermore,

the choice of transparency as the umbrella term is also justifiable based on the fact that opacity

tends toward the absolute. In other words, calling something opaque leaves little room for

degree or nuance. The term opacity should be reserved only for those lexical items or phrases

that cannot be understood without a priori knowledge of their meaning. Simplex words, for

instance, are opaque: it is the essence of Saussure’s arbitrariness of the sign. Compounds,

however, may allow for meaning to be construed, which renders them transparent, albeit not all

Page 26: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

12

to the same degree. It is with this in mind that I therefore favour the term semantic transparency

when discussing the concept as whole, that is to say, when referring to the phenomenon as a

feature of complex expressions. The terms transparency and opacity will be treated as

complementary antonyms, such that if an expression is said to be transparent, it cannot be

opaque, and vice-versa. This does not, however, prohibit the use of the term transparency in a

graded manner, whereas two expressions may be said to exhibit differing degrees of

transparency. The same cannot be said for opacity, as the use of the word will be reserved for

absolute cases, where meaning is not apparent given the construction’s form.

2.1.2 A Paucity of Description

In their chapter on morphology, Dirven and Verspoor (2004) offer the following assertion when

comparing compounds and syntactic groups: “On the whole, compounds are like simple words,

but in spite of their idiosyncratic meaning, the meaning of a compound is to a large extent

transparent” (57). Although the word transparent also appears earlier in the book, it is not

defined until much later and even then, the explanation is perhaps too brief to provide a full

account of the matter. According to Dirven and Verspoor, for a complex word or construction to

be transparent, its parts must be “recognizable in a larger unit” (2004: 222). While this may be a

necessary condition of transparency, it cannot under most circumstances be a sufficient one. In

the compound éléphant blanc (eng. white elephant: ‘an object or scheme with little use or value’),

for instance, the constituents are quite obviously recognizable, yet one would be hard pressed to

claim that it is also transparent. The above characterization of transparency, however, is the

closest Dirven and Verspoor get to defining the concept.

Many of the earlier references to semantic transparency in the literature are made in a similar

offhand manner, that is to say without much, if any, explanation of how the term is being used

or what it means in a given context. Often, the use of the term is such that the reader is required

to simply view it as a very basic label for something that is tacitly understood or for which

meaning has long been ascribed. While there is arguably something intuitive about the use of the

term, it remains surprising that its formalization hasn’t received more attention. For instance,

Herbst (1996), working on collocations, states that “the transparency of a combination does

indeed depend on the meaning attributed to its constituents” (386), but nothing more is said

about how this dependency is evaluated. Similarly, Downing (1977), in her work on the

Page 27: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

13

formation of novel compounds, claims that “because compounds are considerably more

transparent semantically than novel monomorphemes, compounds are ideally suited to serve as

ad-hoc names” (837). While her statement may in fact be uncontroversial, very little is said

about just what makes a compound inherently transparent.

While great strides have been made in recent years to further expand on the concept, most

evidently in research surrounding semantic transparency as a factor in language processing, it is

still surprisingly common to find the term used as a label without any clear account of its

application. Jarema et al. (1999), for instance, designate their compounds and their constituents

as either transparent or opaque, but do not explain how these labels are assigned. Nor do

Dohmes et al. (2004) offer any explanation or description of the methodology adopted for the

categorization of their stimuli, which is done using the terms semantically transparent and

semantically opaque. Kehayia et al. (1999), in their study of the processing of Greek and Polish

compounds, examine what they claim are “transparent compounds that are fully compositional

in meaning,” but provide no further commentary on how this fact was established. Of course, it

is not immediately clear that the absence of a definition is in fact a problem, given that there is

genuinely something intuitive about the labels “semantically transparent” or “semantically

opaque.” Issues may arise, however, when attempting to replicate these studies or to apply these

labels independently. How can we be sure that transparency for one author means the same

thing for another? Even when the term is defined, questions relating to its application may still

be raised. For instance, Sandra (1990), though he defines semantically transparent compounds

as those whose “meaning is related in an obvious way to their constituent meanings” (531), does

not make clear just how obvious this relationship should be. For opaque compounds, this

relationship is said to be “obscure,” but again, at what point does a constituent’s meaning go

from obvious to obscure? Sandra in fact measured semantic transparency by asking individuals

to provide paraphrases for a series of compounds and used the frequency with which their

constituents were present in the paraphrase as an indicator of transparency. He does not,

however, specify what the frequency cut-off was.

Another case of terminological ambiguity can be found in Spalding and Gagné’s (2007) study of

novel compounds, in which they claim that their two-word expressions were all transparent

(“All target combinations were novel, transparent, modifier-noun combinations” (32)), but for

which they offer no explanation as to what this transparency entails. The absence of a definition

Page 28: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

14

of the concept is significant for two reasons. First, they use novel compounds heretofore

unknown to English speakers (e.g. chocolate book, wool basket, tire tree). These combinations

therefore do not possess definitions, at least not in the traditional sense. In other words,

transparency is evaluated according to the meaning assigned by the researchers. In the case of

novel compounds, where meaning is largely determined by the coiner, it seems far more

appropriate to talk about meaning predictability rather than transparency (i.e. how likely is it

that a chocolate book means ‘a book about chocolate’ and not ‘a book that looks like chocolate’,

cf. Štekauer 2005). The second potential issue is in fact related to the first and has to do with

how Spalding and Gagné (2007) assigned meaning to their test items. In short, meaning was

established using data collected in an earlier study (Gagné et al. 2005), in which participants

were asked to choose between two pre-established definitions for each compound instead of

providing their own. Definitions were then labeled as either dominant or subdominant based on

the participants’ preferences. But just how were these two particular definitions chosen? As

Gagné et al. (2005) state: “Two definitions were constructed for each item. These definitions

represented what we thought (based on our intuition as well as input from research assistants

and students in our laboratory) would be the two most likely interpretations for the item” (208).

While their strategy may not be entirely misguided, it does have the potential to skew

transparency effects given the limited options available to participants. For instance, the two

provided definitions for paper stand were either ‘a stand for paper’ or ‘a stand made out of

paper’. It should come as no surprise that participants were strongly drawn toward the first

definition (85% chose ‘a stand for paper’ as the dominant meaning). What if instead of ‘a stand

made out of paper’, Gagné et al. had offered participants ‘a stand for selling paper’ (cf.

newspaper stand)? Would the results have been any different? Moreover, in some cases, the

dominant-subdominant distinctions were in fact borderline. For instance, when presented with

the novel compound wool basket, only 51% of participants preferred ‘a basket for wool’ over ‘a

basket made out of wool.’ Once again, because novel compounds do not possess established

designata, interpretation will often vary between speakers. As Gagné et al.’s (2005) work shows,

some combinations may not elicit any preference or may only show a clearly dominant sense

when presented alongside a far less probable one.

To be sure, there are a number of similar studies where the terms transparency and opacity are

used alongside some form of explanation or definition (Marslen-Wilson et al. 1994, Zwitserlood

Page 29: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

15

1994, Libben et al. 2003, Feldman et al. 2004). The descriptions offered by these researchers,

while seldom identical, do in fact show a great deal of common ground across the various

interpretations. My contention, however, is that semantic transparency, especially in the case of

compounding, remains an insufficiently developed concept. The next few sections will look

more closely at the manner in which many researchers have broached the topic and will serve to

emphasize some of the key elements at issue when discussing semantic transparency.

2.2 Semantic Transparency: Experimental Studies

A number of experimental studies conducted over the years have looked at how

morphologically complex words are both accessed and stored. Two early theories emerged from

this research: either complex words are stored as whole units and thus accessed once their word

boundaries have been identified (so-called full storage theories, cf. Rubin et al. 1979,

Butterworth 1983), or they are perceived as amalgamations of morphemes and treated as such,

which is to say that they are first decomposed into their smaller constituents before lexical

access is performed (so-called decomposition theories, cf. Taft and Forster 1975, Taft 1981). In

an attempt to reconcile these two extreme positions, a third, arguably more defensible proposal

has also been advanced, one that argues for a theory of dual access in which complex words

trigger full storage access and decomposition simultaneously. The mental lexicon would thus

consist of both complex words and bound morphemes, as well as a mechanism allowing for

some degree of interaction between the two. Consequently, a number of models were developed

to explain how and when these units are accessed. In the Augmented Addressed Morphology

model (AAM, Caramazza et al. 1988, Chialant and Caramazza 1995), for instance, only

unfamiliar words are guaranteed to be decomposed because whole word activation is taken to be

quicker when the word is known, whereas in the Morphological Race Model (MRM,

Frauenfelder and Schreuder 1992, Schreuder and Baayen 1995), the successful processing

method is based on a number of factors such as lexical frequency and affix productivity.

Although these particular models focus on affixal morphology, the fundamental issues related to

lexical access via either decomposition or full access remain relevant for compounding.

Another factor said to influence how morphologically complex words are processed is semantic

transparency. In their seminal article on access representation, Marslen-Wilson et al. (1994)

looked at whether semantic transparency had an effect on lexical decision tasks for derived

Page 30: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

16

words. They contrasted between transparent derivations (e.g. unhappy, punishment, friendly)

and opaque ones (e.g. department, casualty, release), stating that “a morphologically complex

word is semantically transparent if its meaning is synchronically compositional” (Marslen-

Wilson et al. 1994: 5). Their stimuli were also controlled for phonological alternations (i.e.

chaste / chastity). Using a series of cross-modal priming tests in which participants first heard a

prime, then saw a related probe on screen, their results led them to conclude that opaque

complex words are stored as single entries, while transparent ones are accessed via

decomposition. This conclusion is supported by significant priming effects for semantically

related pairs such as punishment (prime) / punish (target) and little effect for unrelated pairs

such as casualty (prime) / casual (target). A number of studies have since added additional

support to a theory of lexical processing that includes semantic transparency as a factor (Roelofs

and Baayen 2002, Feldman and Pastizzo 2003, Rastle and Merkx 2011), though a number of

other factors such as family size (Schreuder and Baayen 1997, Bertram et al. 2000, Juhasz and

Berkowitz 2011), word frequency (Baayen and Lieber 1996, Hay 2003, Ford et al. 2010), and

even the type of tasks involved (Feldman et al. 2004) are also said to be at play. Researchers

have also looked at the effects of semantic transparency in idiom interpretation (Flores d’Arcais

1993, Tabossi et al. 2008, Libben and Titone 2008).

The studies conducted on derived words have all treated semantic transparency as a

phenomenon based on compositionality (this distinction will be addressed in Section 2.4.1).

Bearing in mind that affixes generally comprise of a closed class with considerable meaning

predictability at the output of word formation (Aronoff 1976), such an approach is arguably

appropriate in derivational morphology1. The same, however, cannot be said of compounds

given the fact that the meaning of the whole may be related to those of its components in a

number of different ways (cf. milkman ~ snowman ~ garbage man). Libben (2006) thus

suggests that compounds offer a unique means to further the discussion on computation vs.

storage in language processing because they involve a number of factors absent elsewhere.

1 This is not to say that affixes are without their own analytical issues. A few examples of morphological and

semantic issues related to affixes, as noted by Aronoff (1976), are homonymous affixes (e.g. baker vs. cooker) and stems for which it is difficult to assign meaning (e.g. –mit in permit, remit, submit, etc.).

Page 31: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

17

Libben (1998) makes clear just what it is about compounds that makes them ideal candidates for

the study of lexical processing:

Compound words present a paradox for models of morphological representation and processing. On the one hand compounding is a very productive morphological process, so that in a language such as English, the probability of encountering a novel compound form (e.g., SLUSHFOAM) is very high. Because such forms are easily comprehended and because this comprehension can only be achieved through the meanings of the compound’s constituents, these forms seem to be ideal candidates for routinized morphological decomposition. On the other hand, however, compounds are perhaps the multimorphemic forms that are most sensitive to semantic drift and thus frequently show high degrees of semantic opacity. It is this opacity that would thwart a routinized morphological decomposition procedure.” (Libben 1998, 34-35)

The question is therefore whether compounds such as milkman and snowman are stored as

unique entries, and thus accessed as such, or if their meaning is computed via the entries no

doubt stored for their components (/snoʊ/+/mæn/ and /mIlk/+/mæn/ respectively). For a number

of researchers, part of the answer to this question lies in how semantically transparent or opaque

these constructions are. The following section will look at some of the studies that have focused

on semantic transparency as a key component of compound access and processing.

2.2.1 Studies of Semantic Transparency and Compounding

The interpretation of a novel compound can be said to rely extensively on a speaker’s semantic

knowledge of the compound’s constituents–in most cases, speakers seem to be aware that

compounds consist of discrete units and that meaning composition may apply. Gleitman and

Gleitman (1971) showed that when presented with somewhat novel three-word compounds,

speakers consistently exploited the meaning of their constituents in order to provide the

interviewer with a paraphrase. For instance, the median definitions provided by each of their

three groups for the compound glass-bird house were as follows: i) a house which is lived in by

glass-birds, ii) a bird-house that’s made of glass, or iii) a house made of glass for birds. While

the proposed paraphrases sometimes violate the accepted rules and principles that govern

compounds (i.e. embedded head in (ii) and (iii)2), it is clear that the participants attempted to

2 Gleitman and Gleitman (1971) state that these incorrect paraphrases are due to errors of stress. Because English

compounds are usually stressed on the left-hand constituent, participants should be able to identify the internal organisation of the complex compound. The hyphen indicates how the stimuli was presented to the participants,

Page 32: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

18

define novel compounds by means of semantic compositionality, often bridging gaps with extra-

linguistic knowledge even if this required that concepts be applied in unintuitive ways (e.g. a

bird made out of glass). We might therefore ask ourselves if compounds are always decomposed

(i.e. are their internal constituents always available to the speaker?) and if the meaning of its

constituents is always a factor in its interpretation? These questions are in fact related to the

concept of semantic transparency, though perhaps only laterally, because a widely held position

seems to be that the semantic transparency of a morphologically complex word is based on

whether the meaning of its constituents is present in the meaning of the whole.

Ryder (1994) conducted similar research on novel two-word compounds. In the surveys Ryder

administered to participants, she asked them to provide paraphrases for a variety of novel NN

English compounds. These novel constructions were all based on existing patterns in the

language. Some of the novel compounds Ryder looked at were constructed using frequent

“core” words, that is to say words that figure prominently in a number of English compounds

(e.g. board, box, man, etc.); others reflected looser recurring associations (e.g. X + LOCATION, X

+ CONTAINER, ANIMAL + ANIMAL, etc.). Ryder discovered that when presented with highly

probable and frequent associations, speakers offered a highly homogenous set of responses. For

instance, all participants paraphrased bean-garden with some variant of ‘garden containing

beans,’ while few participants offered the same type of definition for table-field or elephant jar.

The latter compounds, also cases of the X + LOCATION pattern according to Ryder, were

paraphrased based on other relationships such as similarity (e.g. ‘field as flat as a table’; ‘jar

shaped like an elephant’). Her results suggest that speakers are keenly aware of the semantics

held between a given compound’s elements and that prior knowledge is a factor in its

interpretation. Ryder also found a correlation between how homogeneous participants’

responses were based on how semantically established the pattern was for attested compounds.

Thus, 90% of responses to novel ANIMAL + ANIMAL compounds followed the ‘animal X like

animal Y’ frame, which reflects nearly all attested compounds of this type (e.g. zebra fish,

catfish, bull moose, etc.). Furthermore, when participants were asked to paraphrase novel

that is, with internal stress on GLASS. The result, in any case, is that some participants failed to identify the head of the embedded compound.

Page 33: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

19

compounds based on substituted core words, their responses were often related to existing

compounds. For example, most participants interpreted milk-woman as ‘a woman who delivers

milk,’ no doubt basing their interpretation on the established compound milkman. Not

surprisingly, when the meaning of the existing compound was less obvious, participants

provided far more heterogeneous paraphrases (e.g. needlelizard based on needlefish). Her

findings suggest that speakers, when faced with novel compounds, will in some cases engage in

metalinguistic operations and attempt to establish meaning using their knowledge of existing

compounds. There are, however, a few shortcomings to Ryder’s work, due in large part to her

rather ambiguous stance on transparency and headedness. According to Ryder, “most

established compounds are fairly semantically transparent” (1994: 146), a view that has her treat

a number of dissimilar compounds as analogous (e.g. fireant and firefly, on which she bases the

novel compound fire-spider). The result is that not all participants base their responses on the

same established compound when attempting to interpret a novel variant, but with no clear

reason as to why (i.e. is fireant more or less transparent than firefly?). Moreover, Ryder’s core

words—the recurring word for a given pattern—were not controlled for their position within the

compound. Thus, compounds such as fishpond and catfish are said to be based on the same core

word (i.e. FISH), yet its role within each respective compound differs significantly (i.e. a catfish

is a fish, but a fishpond is a pond). Nevertheless, Ryder’s work offers support for a view of

compound processing that involves both internal word recognition and prior compound

knowledge. Her work will therefore be discussed in greater detail in Chapter 4, some of which

will be adapted in the context of the present research.

Gleitman and Gleitman’s (1971) and Ryder’s (1994) studies were both qualitative in nature:

they solicited responses from their subjects, the results of which were then examined for

patterns and error types. Most of the research on semantic transparency or compositionality,

however, has focused on lexical decision tasks that rely heavily on the reaction times of

participants. These quantitative experiments usually involve a priming element, related in some

fashion to the target word (or non-word) that participants are asked to evaluate. Results from

these experiments are unfortunately varied. While there is evidence that semantic transparency

plays some role in compound interpretation, a number of studies have produced results that

weaken many claims of transparency effects.

Page 34: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

20

Taft and Forster (1976), following their earlier research on the processing of derived words

(Taft and Forster 1975), conducted a study on compounds in order to determine how speakers

process compound non-words. Using a lexical decision task, they found that complex words are

accessed primarily via their initial constituents. For instance, non-word compounds that

contained a real word in initial position (e.g. footmilge) took longer to classify as non-words

than those containing a real word in final position (e.g. trowbreak). Moreover, reaction times to

non-word compounds in which both constituents were also non-words (e.g. mowdflisk) were no

faster that ones containing a real word in final position only (e.g. trowbreak). Perhaps even more

revealing, however, is that the non-word compounds that produced the slowest reaction times

were those that consisted of two real words (e.g. dustworth, taxbrief). Participants identified

these items as non-words, but did so more slowly than for compounds containing at least one

non-word. Based on these results, Taft and Forster argued for a theory of compounding in which

decomposition is not only largely obligatory and automatic, but also governed by the lexical

status of the first constituent. They do not, however, speculate as to how this approach would

apply to non-words containing real words and for which meaning is either plausible (e.g.

chaircloth) or implausible (e.g. chairbird). Monsell (1985) obtained similar results for pseudo-

compounds, but also found that either constituent could be primed, regardless of its position in

the target, lending further support to automatic decomposition during lexical processing.

Sandra (1990), looking to establish whether all compounds are indeed automatically

decomposed during lexical processing, tested semantic priming effects for opaque (e.g.

buttercup, milky way) and transparent (e.g. butter dish, campfire) Dutch compounds, as well as

for pseudo-compounds in which one constituent looks like a lexeme but isn’t (e.g. boycott; cf.

cranberry morphs in Aronoff 1976). According to Sandra, because opaque and pseudo-

compounds have more in common with simplex words than they do complex ones, it is likely

that they are stored and accessed as single units. Conversely, transparent compounds should be

more susceptible to facilitation effects from priming because both constituents are present in the

meaning of the whole. As was mentioned earlier, Sandra does not go into great detail about what

exactly a transparent or opaque compound entails, but it is understood that an opaque compound

is one where the constituents have no bearing on the meaning of the whole, whereas they both

do for transparent ones (see Section 2.1.2 for comments on how this was established). In

Sandra’s first experiment, compounds were preceded by either semantically related or unrelated

Page 35: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

21

primes. For instance, buttercup (opaque) could be primed by bread (targeting the constituent

butter), while toadstool could be primed by table (targeting the constituent stool). The

compounds used were divided into two groups based on the targeted constituent (i.e. initial or

final). For opaque compounds, as well as pseudo-compounds, primes were only semantically

related to the target constituent in isolation. Results revealed that there was no significant

facilitation effect for either opaque or pseudo-compounds regardless of the position of the

targeted constituent or the relatedness of the prime. In other words, opaque and pseudo-

compounds showed no facilitation effects to constituent priming. Sandra therefore concluded

that these particular compounds were not subjected to decomposition during access. The results

from a similar test conducted with transparent compounds (e.g. butter dish, campfire) showed

that prime type was a significant factor in reaction times. Participants reacted faster when the

prime was semantically related to the target than when it was unrelated. In a third experiment,

only the final constituent of both transparent and opaque compounds was targeted. Sandra found

that prime type (related-unrelated) and compound type (transparent-opaque) interaction showed

only borderline significance. He therefore suggests that this weak result was due to inter-item

variability, that is to say that not all transparent compounds were equally transparent. This leads

him to mention an often neglected, yet fundamental distinction:

This might be related to a difference between the notions ‘transparency’ and ‘compositionality’. Whereas the former notion refers to the relationship between compound and constituent meanings, the latter refers to the possibility of determining the whole-word meaning from the constituent meanings. (Sandra 1990: 550)

Once again, this distinction is most likely an important one and will be discussed in greater

detail in Section 2.4.1.

Similarly to Sandra (1990), Zwitserlood (1994) also looked at priming effects for Dutch

compounds using a series of lexical decision tasks. He classified his compounds as fully

transparent, partially opaque, or fully opaque. Like Sandra, Zwitserlood assessed compound

transparency using a pre-test that asked subjects to rate the semantic relatedness of a

compound’s constituents to the meaning of the whole. Subjects used a scale of 1 (= very

unrelated) to 5 (=very related) to evaluate the items. Only those constructions that received a

mean rating above four were deemed transparent. These rated compounds served as primes (e.g.

church organ) and were followed by targets identical to one of the prime’s constituents (e.g.

Page 36: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

22

either church or organ). Results showed that both transparent and opaque compounds primed

their constituents, suggesting that speakers do in fact access, at some level, the components of

even the most lexicalized compounds. Zwitserlood also discovered that semantically transparent

compounds primed their second constituent more than their first, a result that contrasts with Taft

and Forster’s (1976) earlier findings for non-word compounds. He suggests that headedness

might prove to be a factor in compound interpretation, given that Dutch compounds are right-

headed. Furthermore, when Zwitserlood used semantically related targets (as Sandra 1990 did),

he found significant priming effects for both constituents of transparent and partially opaque

compounds, but none for fully opaque and pseudo-compounds. His results support Sandra’s

(1990) findings for pseudo-compounds, but not for partially opaque ones. This is largely due to

a difference in nomenclature: what Sandra calls an opaque compound, Zwitserlood calls a

partially opaque compound (e.g. a compound such as jailbird is considered opaque in Sandra,

but partially opaque in Zwitserlood). If Zwitserlood’s results are correct, they suggest that

speakers do in fact view compounds in which one constituent retains its meaning differently

from those in which neither component contributes to the meaning of the whole. What

Zwitserlood failed to control for, however, is headedness for partially opaque compounds (i.e.

those for which the head is the meaningful constituent versus those for which only the modifier

retains its meaning). Other researchers have since looked to explore the effects of headedness on

compound processing.

Jarema et al. (1999), following Libben’s (1998) proposal of a typology of semantic transparency

(see Section 2.3.3 for a detailed discussion of his approach), tested participant reaction times for

French compounds in order to address a previously ignored factor in compound interpretation,

which is to say headedness. While French compounds are primarily left-headed, there are a

number of right-headed cases. This fact allows for the priming of constituents based not only on

their position serially, but also according to the position of the head. They were thus able to test

for effects of constituent transparency along a wider distribution of parameters. They tested the

following five combinations: TT, TOL, OTR, TO, OO3, where T stands for transparent, O for

3 The examples Jarema et al. (1999) give for each combination are as follows: TT (haricot vert, “green bean”), TO

(argent liquide, “cash”), OO (éléphant blanc, “white elephant”), OTL (garçon manqué “tomboy”), and OTR (grasse matinée, “sleep in”).

Page 37: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

23

opaque, and the subscript letters indicates headedness (Left or Right). Their experiment

consisted of a lexical decision task where either of a compound’s constituents, along with

unrelated lexemes, served as primes. Jarema et al. found that both initial and final constituents

showed significant priming effects, regardless of their transparency rating, but that the effect

was more pronounced for the initial constituent. They argue that these results reflect French’s

tendency for left-headed compounds. Of the four compound types tested, only OTR differed in

reaction time: whereas all other combinations saw a lower mean reaction time when the first

constituent was primed, OTR compounds were recognized faster when their final constituent

was primed. Again, Jarema et al. take this particular result as additional evidence that

morphological headedness plays a role in compound interpretation. Interestingly enough, they

did not test any right-headed TT compounds (e.g. auto-école, radio-taxi). If priming the

transparent head does in fact improve recognition reaction times, these compounds should

pattern like their OTR stimuli. Finally, Jarema et al. also found that comparing participant

reaction times based on the transparency of constituents (i.e. TT and TO versus OO and OTL for

initial constituent priming and TT and OTL versus OO and TO for final constituent priming)

revealed no significant effects. They do not offer any explanation for the absence of effect, but

they nevertheless claim that transparency is a factor in compound processing based on results

obtained for Bulgarian constructions also included in the same paper. They found that for

Bulgarian compounds, priming the second constituent of a right-headed TO compound showed

weaker priming effects than for its first constituent, which Jarema et al. took as evidence that

headedness affects semantic transparency at some level.

Libben et al. (2003), looking at English compounds, also found a similar effect in their own

series of lexical decision tests. Overall, reaction times for TT (e.g. car-wash) and OT (e.g.

strawberry) compounds patterned together, as did those for OO (e.g. hogwash) and TO (e.g.

jailbird) compounds. These results are somewhat contradicted in a study by Kehayia et al.

(1999), however, where they found that priming either constituent improved word recognition

times for both Greek and Polish compounds, but that priming the initial constituent showed a

greater overall effect. Their results also conflict with the findings in both Jarema et al. (1999)

and Libben et al. (2003) because, as in Bulgarian and English, Greek and Polish compounds are

primarily right-headed. Regardless, these cross-linguistic studies lend some support to a theory

of compounding in which the transparency of the head plays a role in how the speaker processes

Page 38: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

24

compounds: reaction times during lexical decision tasks seem to pattern together based on the

transparency of the compound’s head. One key issue, however, is that headedness in both

Jarema et al. (1999) and Libben et al. (2003) is defined purely in terms of lexical category, a

consequence of claiming that the head of a given compound can in fact be opaque (i.e. car-wash

and garçon manqué are traditionally labeled as exocentric in nature4). Thus, in both cases,

semantic transparency is understood as a consequence of constituent meaning. This might also

explain, to some extent, the differing results obtained in Kehayia et al. (1999) as they only

looked at “transparent compounds that [were] fully compositional in meaning” (371).

Dohmes et al. (2004), instead of using lexical priming tests, used a picture naming task to

determine if constituent meaning had any morphological priming effects in German compounds.

In two related experiments, participants were asked to name pictures after having first been

presented with either related or unrelated compounds (distractors). For instance, the participant

might see either wildente ‘wild duck’ (transparent), zeitungsente ‘false report’ (opaque), or

honigwabe ‘honeycomb’ (unrelated), which was then followed by a picture of a duck (target =

de. ente). In all instances, semantically transparent and opaque compounds revealed nearly

identical facilitatory effects, that is to say, reaction times to the picture naming task were

reduced when participants were presented with compounds that contained the target morpheme,

regardless of its meaning within the compound. Dohmes et al. argue that semantic transparency

plays only a minor role in compound comprehension, but concede that this may not remain true

during actual language production. They also suggest that the absence of a transparency effect

may also be due to the fact that the target constituent was always in head position, arguing that

competition at the lemma level could generate interference during the picture naming task. They

therefore conducted a third experiment with compounds that contained the target morpheme at

onset and found that although transparent compounds showed a slight improvement in reaction

times over opaque compounds, the difference remained marginal overall. Based on these results,

Dohmes et al. maintain that semantic transparency is not a factor in compound processing.

4 The compound [[garçon]N [manqué]A]N is a noun, which is why Jarema et al. (1999) treat it as left headed.

According to traditional approaches to centricity, however, the head must be a hypernym of the compound, which is why it may also be regarded as exocentric.

Page 39: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

25

Pollatsek and Hyönä (2005) employed yet another type of experiment on compound processing:

a reading task in which Finnish compounds were embedded into sentences and participants were

monitored using both eye and head tracking cameras. Transparent compounds were compared to

opaque compounds and were rated using a pre-test similar to those in Sandra (1990) and

Zwitserlood (1994). It is worth noting, however, that opaque compounds were either opaque at

the whole-word level (OO) or at the first constituent only (OT). Compounds were matched for

frequency, but first constituent frequency was manipulated (either low or high). Participants

were asked to read the sentences that appeared on screen, which they were then asked to

paraphrase. Pollatsek and Hyönä found that the frequency of the first constituent had a

significant effect on gaze duration: participants stared longer at the compound with a low

frequency first constituent. Gaze duration analysis also showed that transparency seemed to

have little to no effect on participants’ traversal of the sentence. Because these compounds were

embedded in very different sentence frames, they conducted a second test in which transparent

and opaque compounds, matched for frequency, were inserted into the same sentence. Although

gaze duration for transparent compounds was slightly lower than for opaque ones, the difference

was not significant. Pollatsek and Hyönä interpret these results as evidence that semantic

transparency plays little role in compound processing. A series of similar sentence reading

experiments conducted by Frisson et al. (2008) produced similar results for English compounds.

Their stimuli involved all possible permutations for constituent transparency (i.e. TT, OT, TO,

OO) and were embedded into sentences, which participants were asked to read. Overall, the

semantic transparency of the compounds’ constituents had no significant effect on gaze

duration.

Studies that have used experimental data to evaluate whether semantic transparency is a factor

in compound processing have produced mixed results. While studies involving lexical decision

tasks show evidence that constituent transparency does affect recognition times (Taft and

Forster 1976, Sandra 1990, Zwitserlood 1994, Jarema et al. 1999, among others), others show

limited support for transparency effects (Dohmes et al. 2004, Pollatsek and Hyönä 2005, Frisson

et al. 2008). While the results of these studies are not truly homogeneous, they do suggest that

speakers are aware of a compound’s internal structure, even in cases where constituent meaning

may be obscure. Overall, however, compounds comprising of opaque constituents are not

typically treated in the same fashion as those with transparent constituents. Moreover, there

Page 40: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

26

seems to be some degree of interplay between transparency and headedness, though, here again,

the evidence is mixed. While Jarema et al. (1999) and Libben (2003) found that a compound’s

head plays a greater role in participant’s recognition times, Kehayia et al. (1999) did not see

similar effects with Greek and Polish compounds. As for those studies in which semantic

transparency was said to be irrelevant during compound processing, two were based on reading

tests using eye-tracking devices (Pollatsek and Hyönä 2005, Frisson et al. 2008). These varied

findings may stem from differences in the experimental paradigms used or in the types of

constructions examined (i.e. existing compounds, non-words, pseudo-compounds), or even in

the languages under investigation.

Another explanation for the wide range of results obtained in the above studies comes from the

rather liberal approach to transparency adopted by a number of authors. They do not all view

semantic transparency in the same way and many do not explicitly lay out what it is that makes

a compound either transparent or opaque. In most cases, constituents are said to bear the lion’s

share of compound transparency, yet little is said about just how much meaning must be

retained for a constituent to be transparent. This is largely due to the absence of a rigorous

definition of the concept. The remainder of this chapter will attempt to address this issue by

proposing a working definition of semantic transparency, one that will allow for a more

thorough definition to be advanced.

2.3 Semantic Transparency: Definitions and Models

Despite some of the criticism offered in the previous sections regarding the rather limited

description of transparency in the literature, a number of definitions of the concept have in fact

been proposed. Some of these definitions are explicit, while others must be surmised from a

variety of peripheral indications mentioned by the authors. On occasion, an author will offer

more than just a definition of semantic transparency and will also provide the reader with a

means to interpret the varying degrees of transparency exhibited by complex constructions.

Some of these hierarchies take on the form of clines comprised of discrete points on a linear

scale, while others are presented as a continuum for which transparency is a scalar phenomenon.

The following sections will explore a few of the definitions that have been proposed in the

literature, as well as the various models advanced to classify compounds and other complex

constructions according to their degree transparency.

Page 41: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

27

2.3.1 Some Definitions

The list of definitions that follows is by no means exhaustive, but it does highlight some of the

similarities, as well as some of the differences, between them. Many of these definitions have

been applied to derived words, some to idioms, and others to compounds. I’ve chosen to present

them together so as to show the degree of overlap that exists between them, despite their

applications to different types of complex constructions. It is my position that although a

unifying definition may in fact be possible, I will concentrate solely on compounds when later

formulating a working definition of transparency5.

(3) For derived words:

a. “A morphologically complex word is semantically transparent if its meaning is

synchronically compositional” (Marslen-Wilson et al. 1994: 5).

b. “Semantically transparent words can be fully understood given the meaning of the affix

and the meaning of the base” (Baayen and Lieber 1996: 283).

c. “A morphologically complex word is semantically transparent if its meaning is

compositional” (Roelofs and Baayen 2002: 132).

d. “For both complex and compound words, those that retain the meaning of the base

morpheme are semantically transparent relative to opaque or partially transparent

relatives whose meanings tend to be more remotely related to that of the base” (Feldman

et al. 2004: 18).

(4) For idioms:

a. “In transparent idioms, [. . .] the literal meaning is available, whereas in an opaque

idiom [. . .] the literal interpretation is no longer available or has never been or is not

even possible” (Flores D’arcais 1993: 80).

5 Some definitions are in fact formulated around the term opacity and not transparency. See Section 2.1.1 for

additional information regarding this distinction.

Page 42: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

28

b. “In these [compositional and transparent] idioms, there are one-to-one semantic

relations between the idiom’s words and components of the idiom’s meaning”

(Glucksberg 1993: 17).

c. “[An idiom’s] opacity (or transparency)–the ease with which the motivation for the use

(or some plausible motivation–it needn't be etymologically correct) can be recovered”

(Nunberg et al. 1994: 498).

d. “A transparent expression is an expression for which we understand its meaning. A

difficult (if not impossible) expression to understand is opaque6” (Svensson 2004: 98,

my translation).

Alternatively: “If, when presented with an expression, a language user understands it

without any problems, without any other previous knowledge than understanding the

separate words that make up the expression, then it is transparent” (Svensson 2008: 84).

(5) For compounds:

a. “A given sequence is said to be opaque when the meaning of the whole cannot be

reconstructed from the meaning of its constituting elements7” (Gross 1996: 155, my

translation).

b. “[T]ransparent compounds, whose meaning is related in an obvious way to their

constituent meanings” (Sandra 1990: 531).

c. “The meaning of a fully transparent compound is synchronically related to the meaning

of its composite words” (Zwitserlood 1994: 344).

d. “[T]he meanings of each of the constituents are transparently represented in the

meaning of the compound as a whole” (Libben et al. 2003: 50). Alternatively,

6 “Une expression transparente est une expression dont on comprend le sens. Une expression difficile (voire

impossible) à comprendre est opaque” (Svensson 2004: 98). 7 “Une séquence donnée est dite opaque quand, à partir des sens des éléments composants, on ne peut pas

reconstituer le sens de l’ensemble” (Gross 1996: 155).

Page 43: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

29

“semantically transparent because the meaning of the entire string can be derived from

the combination of the meanings of its constituents” (Libben et al. 2003: 51).

e. “A compound word is usually defined as transparent when the meaning of the

compound word is consistent with the meanings of the constituents (e.g., carwash). In

contrast, a compound word is defined as semantically opaque, when its meaning cannot

be constructed by directly combining the meanings of the individual constituents (e.g.,

pineapple)” (Pollatsek and Hyönä 2005: 262).

f. “[T]he meaning of both constituents is transparently related to the meaning of the

compound word as a whole” (Frisson et al. 2008: 87).

Most of the above definitions are short, succinct characterizations of transparency. They also

share a number of similarities, even across all three expression types: transparency is in nearly

all cases based on a meaning of the whole ~ meaning of the parts relationship. In fact, of the 14

definitions retained, only two seem to differ in any significant way (i.e. Flores d’Arcais in 4a

and Nunberg et al. in 4c)8, both of which are for idioms. Moreover, every description implies

that transparency is related in some way to comprehension. While these recurring themes are by

far and large the most dominant elements of the above definitions, we see a number of other

common factors at play as well, namely synchrony (3a, 4c) and compositionality (3a, 3c, 4d).

Synchrony seems like a reasonably plausible factor for transparency if we are interested in

addressing it from the perspective of comprehension. If this were not the case, we would then be

required to take into account constituent meanings that may have long since fallen out of usage,

as well as their etymology. This diachronic approach seems misguided, however, given that

these factors would depend crucially on what could conceivably be understood as historical

knowledge, possessed by only a small subset of a given linguistic community. Transparency is

therefore best interpreted as a synchronic phenomenon. As for compositionality, while it

undoubtedly plays some role in transparency, it should not be taken as the deciding factor in the

concept, as I will argue in Section 2.4.1.

8 This is not meant as an accurate reflection of the distribution of definitions as these were selected based on the

needs and scope of my stated goals and do not account for all descriptions used in the literature.

Page 44: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

30

Despite the high level of uniformity present in the above descriptions, we do find a number of

peculiarities that show just how ambiguous the concept can be. In some cases, the definitions

are in fact tautologies (i.e. a complex expression is transparent when it is transparent, as in 4d

and 4e). In others, key words are used rather loosely, as in Nunberg et al. (1994) for idioms,

where motivation is said to be a factor of transparency. This motivation is best understood as the

speaker’s ability to “wholly recover the rationale for the figuration it involves” (Nunberg et al.

1994: 496) and is reminiscent of Saussure’s distinction between arbitrary sign and motivated

symbol. This is the approach also adopted by Svensson (2004) in her work on idioms and is a

criterion largely applied to the expression ex post facto, that is to say after the speaker has

learned its meaning. This factor may prove to be applicable to compounds as well. Taking the

compound oiseau-mouche (eng. hummingbird) as an example, one can see how motivation could

apply: even if the speaker were not able to find any semantically plausible relation between the

constituents, he or she might be able to do so once the meaning of the compound were revealed

to him. This will be touched upon again in chapters 4 and 7.

One of the aspects of semantic transparency that is seldom discussed explicitly, however, is the

inherent ambiguity present in compounds. It is in this regard that Frisson et al. (2008) offer a

more sensible and mitigated description of semantic transparency. While they agree that the

degree to which a compound is transparent is closely related to the relationship between the

meaning of its constituents and the meaning of the whole, they note that the concept remains an

inexact and ambiguous phenomenon9:

Indeed, compound words can differ in their type of opacity, as there exist compound words composed of two constituents for which either the first, the second, or both, constituents can be opaque. [. . .] However, we should quickly note that transparency is a relative concept, as even the meaning of transparent compounds cannot be unambiguously computed from the meanings of the constituents. (Frisson et al. 2008: 87-88)

For compounds, this ambiguity is largely due to the lack of predication between constituents. As

Pollastek and Hyönä (2005) plainly state: “Usually, for most transparent compound words, the

9 It is worth noting that Frisson et al. (2008) adopt the typology of semantic transparency first proposed in Libben

(1998), that is to say one based on the transparency of the compound’s constituents. This will be discussed in further detail in Section 2.3.3.

Page 45: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

31

meaning of the word cannot be uniquely computed from the constituents, as a carwash could be

some sort of device that washes with a car; instead the meaning is usually a highly plausible

combination of the constituent meanings” (262)10. These relations must, however, be taken into

account when discussing the transparency of compounds as they no doubt affect how these

constructions are interpreted and have in the past been an important part of research on

compounding (Lees 1968, Levi 1978, Ryder 1994, Adams 2001, Lieber 2004, Jackendoff 2009).

In fact, it is precisely these relations that have proven to be the most challenging aspect of

automatic meaning generation for compounds (Lauer 1995, Rosario and Hearst 2001, Séaghdha

2008). Arnaud et al. (2008) have also argued that a loss of transparency can occur for N-N

compounds when “the relationship between the pre-modifier noun and the modified noun is no

longer explicit,” which means that “the reader/hearer must infer the relationship from the

context or [that] it must be stored in memory” (112). It is precisely the “relative” nature of

transparency that makes the use of the label in many works seem incomplete or possibly even

insufficient. The fact that the meaning of many so-called transparent compounds cannot be

unequivocally established shows that a more in depth look at the concept is not without its

merits. In fact, based on the factors above, one could reasonably argue that there are few (if any)

fully transparent compounds and that those traditionally labeled as such are just far more

transparent than others. It is precisely this broad characterization of semantic transparency, as it

is applied to compounds, that motivates the present research.

2.3.2 Transparency as a Continuum

If we return to Cruse’s work in lexical semantics, we find that he goes to great lengths to

emphasize the importance of viewing semantic transparency (or opacity as he prefers to call it)

as a concept based on degrees, that is to say, as “a continuum of degrees of opacity” (39), with

“fully transparent” at one end and “to some degree opaque” at the other. Cruse avoids using the

terms “completely opaque” and “not completely opaque” as end-points, reasoning that it allows

for a more satisfactory grouping of elements with significant similarities. While I argued in an

earlier section that the term “completely transparent” might prove to be infelicitous, there are

10

This is not necessarily the case for synthetic or verbal-nexus compounds in which the non-head is an argument of the head (e.g. truck driver, snow removal, etc.). These types are also present in French and will be discussed in chapters 6 and 7.

Page 46: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

32

undoubtedly compounds (and arguably idioms) for which the term “completely opaque” could

be used accurately, their semantic characteristics being similar to those of simplex forms (e.g.

éléphant blanc, compère-loriot). As for the region between these two poles, Cruse claims that

the continuum possesses a “somewhat indeterminate transitional zone between opacity and

transparency” (40). This nebulous area of semi-opaqueness revolves around semantic indicators,

which, according to Cruse, can be either full or partial. Full indicators are constituents that are

uniform in their meaning, both within and without the complex expression (Cruse offers black-

and -bird in blackbird as examples of full indicators). Partial indicators, on the other hand, are

constituents that are said to only retain some of their meaning within a given complex

expression (Cruse gives -house in greenhouse as an example). The degree of semantic opacity

of an expression is therefore in part derived from the number, as well as the nature, of its

constituent indicators. Cruse also claims that the discrepancy of the combined contribution of an

expression’s indicators and its global meaning also factors into the degree of opacity, though he

does admit, perhaps rightly so, that such a discrepancy is difficult to measure. In order to further

illustrate his approach to semantic opacity, I offer the following figure, along with examples

from French:

Figure 2.1. Representation of Cruse’s (1986) continuum of semantic transparency.

Because Cruse says nothing about the position of a multi-word lexeme’s indicators, one must

assume that the only requirement for it to be located in the “indeterminate transitional zone”

(40) labeled as semi-opaque in the range above is that its indicators be partial or null11. Even if

we were to set aside the position of the indicator as a factor, there remains the question of how

exactly one is to populate this section of the line if we distinguish between constituents whose

11

Cruse calls a constituent that does not contribute semantically to the meaning of the whole an “impure tally.” I will use “null indicator” so as to retain the term indicator for all possible values for a given constituent.

Fully Transparent Semi-Opaque To Some Degree Opaque

mot-clé clé-anglaise demi-clé

Page 47: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

33

meaning contributes only partially to that of the whole and constituents that contribute nothing

semantically. A reasonable assumption would be to class the expressions containing partial

indicators left of centre on the scale and those containing semantically null indicators to the

right of centre. This approach, though justifiable, is not without its problems, however. Because

we are dealing with multi-part expressions, each of which may have a different value in terms of

its semantic contribution, the possible pairs are in fact numerous if we assume, as Cruse does,

that there are three levels of indicators: full, partial, and null (32 for 2-word lexemes).

(6) Fully transparent: [full + full]

Semi-opaque: [partial + partial]

[partial + full] OR [full + partial]

[partial + null] OR [null + partial]

[null + full] OR [full + null]

To some degree opaque: [null + null]

We are thus left with seven possible patterns to be dealt with at the semi-opaque level, but no

clear manner by which to classify them. This distribution also fails to account for the second

factor mentioned by Cruse when discussing the degree of opacity, that is to say the gap between

the combined meaning of the constituents and the actual meaning of the compound. Martin

(1997) discusses this particular property of compounds, stating that their meaning, while clearly

related to their parts, can occasionally go far beyond those parts in sometimes unpredictable

ways. For instance, the meaning of life guard certainly involves the meaning of its constituents,

but nothing about these elements indicates that it also involves bodies of water. Yet it seems

counter-intuitive to treat such a compound as opaque. Cruse clearly recognizes this, but avoids

factoring for this discrepancy, as it would only add to the difficulty of establishing a multi-word

lexeme’s position within his continuum.

Similarly to Cruse, Gross (1996) also claims that opacity is a scalar phenomenon, which can be

divided into three parts: completely opaque, partially opaque, or non-opaque12. His terminology

12

“[L]’opacité est un phénomène scalaire : elle peut être totale (la clé des champs), partielle (clé anglaise) ou inexistante (clé neuve)” (Gross 1996: 11).

Page 48: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

34

differs from Cruse’s, however, and is arguably less susceptible to criticism as he avoids using

fully transparent and instead chooses to label such compounds as non-opaque. That said, we are

once again left to decipher the region occupying the continuum’s middle section, which consists

of constructions Gross views as “partially opaque.” Because Gross treats transparency as a

direct function of compositionality, we are presumably to populate this area with compounds in

which only one constituent imparts its meaning to the whole. This approach does not allow,

however, for partial semantic contributions by a compound’s constituents, as Cruse’s does. We

are then to assume that compounds such as sage-femme and mauvais oeil, in which only one

constituent contributes semantically to the whole, are indistinguishable in terms of transparency.

In both Gross’s and Cruse’s frameworks, compounds that differ from the perspective of

headedness may potentially be treated as having the same degree of transparency, a result that

seems not only counter-intuitive, but that may in fact be incorrect based on some of the

experimental work done on headedness and compounding (as discussed in Section 2.2.1).

Despite such criticism, I agree with both Cruse and Gross that semantic transparency is best

viewed as a scalar phenomenon, that complex expressions such as compounds exhibit

sufficiently different semantic features so as to be treated along a continuum and not in terms of

either/or labels. Granted, even in works where compounds or other similar constructions are

treated in a binary fashion (Marslen-Wilson et al. 1994, Dhomes et al. 2004), it is never

explicitly claimed that no middle ground between the two poles exists. The contention here is

that between transparent and opaque lies a graded spectrum of transparency that is best

represented in terms of degrees. What thus needs to be made explicit is the manner in which

these degrees are calculated.

2.3.3 Explicit Semantic Transparency Clines

Explicit clines of semantic transparency do in fact exist and have been applied to derivational

morphology, as well as compounding. One such cline comes from Dressler’s (1985) work on

morphotactic transparency. Although this hierarchy is, strictly speaking, a representation of

phonological transparency as applied to derivational morphology, it is significant because of its

recognition that transparency is best represented as a multi-leveled hierarchy. Dressler’s

approach shares much in common with strictly semantic based treatments of transparency,

which he defines as “a biunique relationship between meaning and form” (Dressler 1985: 329).

Page 49: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

35

The result of his analysis is a ranking of derivational forms based on a variety of operational

rules, the final level representing the most opaque type of morphological operation (i.e.

suppletion):

Table 2.1. Dressler’s hierarchy of morphotactic transparency (Dressler 1985: 330-331)

I Intrinsic allophonic PRs excite$+ment

II PRs interfere, e.g. resyll. exis$t+ence

III Neutralizing PRs, e.g. flapping rid+er (am.)

IV MPRs (no fusion) velar softening electric+ity

V MPRs with fusion conclusion

VI MRs intervene, e.g. Great Vowel Shift decision

VII weak suppletion (no rules!) childr+en

VIII strong be, am, are, is, was

Legend: MR: morphological rule; PR: phonological rule; MPR: morphophonological rule

Dressler’s multi-level evaluation of a complex word’s degree of transparency, based on specific

phonological, morphological, and morphophonological rules, translates into a far more granular

approach to the concept. Thus, forms derived at level I are more transparent than those at level

II, which are more transparent than those at level III, and so forth. This hierarchy supposedly

served as the basis for Kopecka’s (2006) own cline in her work on motion verbs in French (i.e.

prefix + root, dé-rouler). Kopecka defines semantic transparency as “the extent [to which] each

constituent part of the derived word is semantically interpretable” (2006: 94) and organizes her

cline along the following labels:

(7) i. + transparent: the relation between form and meaning is perceptible and

comprehensible

ii. ± transparent: the relation between form and meaning is not clearly perceptible,

despite the formal link between the simple form and the derived form

iii. − transparent: the relation between form and meaning is lost

Here we see an explicit hierarchy of semantic transparency similar to those implicitly suggested

by Cruse (1986) and Gross (1996). Once again, however, there is the potential for an ambiguous

Page 50: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

36

middle zone where a complex construction is considered more or less transparent, with little

way to further refine the distribution at this particular level. In Kopecka’s defence, the words

she includes under the rubric ± transparent are those she views as +form/−meaning, that is, that

are analyzable as prefix + root, but for which the meaning is not predictable (e.g. ac-céder ‘get

to’). This contrasts with those verbs she considers − transparent, which are −form/−meaning

(e.g. affluer ‘flow to’). In this way, her approach differs from the traditional means of assessing

semantic transparency, which usually consists of determining whether all, some, or none of a

complex form’s constituents contribute semantically to the whole. While Kopecka’s cline could

arguably prove useful if applied to fused compounds (e.g. vinaigre, plafond), it is doubtful that

it would be effective for traditional compounds as these are, by their very definition, analyzable

into distinct parts.

Levi (1978), in her work on complex nominals, talks about a “continuum of derivational

transparency,” and proposes a hierarchy consisting of five levels (64):

Figure 2.2. Levi’s (1978) continuum of derivational transparency as applied to compounds.

Although Levi’s continuum predates those discussed by Cruse (1986) and Gross (1996), it

distinguishes itself by recognizing headedness as a factor in a compound’s degree of

transparency. Setting aside for the moment the ambiguity of the description given in (b), what is

most striking about her cline is that exocentric compounds are considered more transparent than

the endocentric ones labeled partially idiomatic. If we rely solely on the semantic contribution

of a compound’s constituents, Levi’s approach may not only prove correct, but also consistent

Transparency

Opacity

a. derivable by regular syntactic processes (mountain village, family reunion) b. were once transparent, but have since become more opaque (grammar school, briefcase) c. exocentric (birdbrain, razorback) d. partially idiomatic (polka dot, monkey wrench) e. wholly idiomatic (honeymoon, fiddlesticks)

Page 51: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

37

with those of other researchers: the examples she provides in (c) are compounds in which both

constituents conserve their meaning to some degree, while only one constituent (i.e. the head)

meaning is retained in those in (d). This strategy, however, can be criticized based on the fact

that it nestles endocentric compounds between exocentric ones (i.e. compounds in (a) (b) and (d)

are endocentric, while those in (c) and (e) are exocentric). Levi presumably does this in order to

group together compounds that are, on the surface, difficult to motivate semantically prior to

actually learning the meaning of the combination (those in (d) and (e)). After all, while a

monkey wrench is in fact a wrench, what role does monkey play in its interpretation? Her wholly

idiomatic compounds are thus those traditionally viewed as opaque, but remain distinct, in terms

of their transparency, from both compositional exocentric compounds and endocentric

compounds in which only the head contributes semantically to the meaning of the whole.

But what do we then do with compounds such as jailbird or cardshark? These are cases of

exocentric compounds where only the first constituent retains its meaning within the

compound. In Levi’s continuum, jailbird would be grouped with either the exocentric (c) or the

partially idiomatic (d) compounds. If we were to introduce additional levels to her hierarchy,

however, jailbird could be inserted either between (c) and (d), or between partially idiomatic

(d) and wholly idiomatic (e) compounds. Unfortunately, it is unclear which of these two options

is correct: between (c) and (d), we are emphasizing exocentricity as a factor of transparency;

between (d) and (e), we are emphasizing the fact that only one constituent retains its meaning.

Interestingly enough, if we were to treat exocentricity as a more fundamental element of

opacity and thus rearrange the continuum accordingly, only one insertion point would be

possible for compounds such as jailbird: (a) mountain village à (b) grammar school à (d)

monkey wrench à (c) birdbrain à jailbird à (e) honeymoon. Of course, the question is

whether birdbrain is more or less transparent than jailbird. Although this modification is based

on a number of assumptions regarding headedness as a factor in transparency, it is an approach

that has in fact been suggested elsewhere, most notably in Libben (1998).

The semantic transparency cline proposed by Libben (1998) focuses primarily on a given

compound’s constituents. Rather than simply looking at whether a compound’s constituents

impart their meaning to the whole, however, Libben also takes into account whether the

semantic head is present. On the one hand, he assigns a value of opaque (O) or transparent (T)

to each constituent based on its meaning within the compound, and on the other, he groups the

Page 52: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

38

resulting permutations along what he calls componentiality (more traditionally known as

centricity). The resulting combinations are as follows (adapted from Libben 1998: 38):

(8) Componential (endocentric)

a. T-T blueberry

b. O-T strawberry

c. T-O shoehorn

(9) Non-componential (exocentric)

a. T-T bighorn

b. O-T yellowbelly

c. T-O jailbird

d. O-O hogwash

The assumption behind this approach is that endocentric compounds are inherently more

transparent than exocentric ones because their heads are hypernyms of the entity targeted by the

compound (i.e. a blueberry is a type of berry). According to Libben et al. (2003), endocentric

O-T compounds (chopstick) do in fact pattern with endocentric T-T (coalmine) for participant

reaction times and are generally more easily processed than exocentric T-O compounds

(cardshark). What their study suggests is that factoring in only the semantic contribution of a

given compound’s constituents is not sufficient when attempting to determine its degree of

semantic transparency. As Libben (1998) argues, morphological headedness also plays an

important role in the processing of compounds and should thus be included in any typology of

semantic transparency.

Libben’s cline is not without its problems, however. While grouping together compounds based

on their headedness does allow for a much richer typology, there remain ambiguities within

each cluster. For instance, he does not distinguish between O-T and T-O compounds in terms of

their respective degrees of transparency. Thus, it is unclear if shoehorn (T-O) is more or less

transparent than strawberry (O-T) given that they are both said to be componential (i.e.

endocentric). In the case of shoehorn, Libben argues that while it is not technically a horn, it

can be understood as ‘a horn for a shoe,’ presumably because the lexeme horn possesses a

somewhat marginal acceptation based on shape (‘a horn spoon or scoop’ in the OED entry II-

Page 53: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

39

11a)13. The same problem arises for non-componential O-T (yellowbelly) and T-O (jailbird)

compounds. Part of the reason for these ambiguities is arguably because of English’s rather

rigid right-headedness for compounds. In French, however, although NN compounds are

mostly left-headed, there are a number of right-headed cases. This allows for O-T and T-O

compounds to be unequivocally endocentric (e.g. aube-vigne and bateau mouche, respectively).

The question, of course, is whether they should be treated as equally transparent. The following

figure illustrates the ambiguities related to Libben’s approach.

Figure 2.3. Ambiguous pairs in Libben’s (1998) typology of semantic transparency

One apparent way to account for these undetermined pairs (O-T and T-O) is simply to state that

they exhibit the same degree of semantic transparency. Again, this may or may not prove

correct, and will only be determined through further examination. Despite these questions,

however, Libben’s cline is most certainly a step in the right direction: it takes into account

factors other than the mere semantics of a compound’s constituents and recognizes that some of

its components may play a greater role than others in determining a compound’s degree of

transparency. I believe that this model can be further refined by integrating additional factors

into its framework.

13

This seemingly contradictory interpretation may in fact explain why Libben chooses the componential/non-componential dichotomy instead of the traditional endocentric/exocentric one.

ENDOCENTRIC T - T : stylo-feutre O - T : aube-vigne T - O : bateau mouche

Transparent

Opaque

f

EXOCENTRIC T - T : bec-figue O - T : chat-château T - O : trou-madame O - O : compère loriot

Page 54: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

40

2.4 Semantic Transparency: A Working Definition

The formalization of semantic transparency with respect to compounds requires that a clear

description of the concept be put forward. This chapter will thus close with the proposal of a

working definition of transparency, which will then be expanded upon in Chapter 7 following a

thorough examination of the semantics of compounds. Before this definition may be formulated,

however, a word must first be said on the distinction between transparency and

compositionality, and on how the latter is understood in this thesis.

2.4.1 Transparency vs. Compositionality

A common thread throughout much of the work on semantic transparency is that the concept is

often taken to mean semantic compositionality. There are a number of cases where

compositionality, as applied to compounds, is defined similarly to semantic transparency (e.g.

“the meaning of compositional compounds can be successfully derived from the meaning of the

noun constituents” Girju et al. 2005: 488). This conflation of concepts is especially important

when discussing compounds, because some linguists believe that they are by their very nature

non-compositional constructions14. Does this then mean that transparency is merely an artefact

of compositionality?

The notion of semantic compositionality has long been applied to phrase generation in order to

account for a speaker’s ability to understand novel sentences or expressions, a concept that has

widely been called Frege’s Principle of Compositionality: “The meaning of a complex

expression is a function of the meanings of its parts and of the syntactic rules by which they are

combined” (Partee et al. 1990). Over the years, linguists and philosophers alike have often taken

issue with this principle, but accept that such a concept nevertheless does exist in some form or

another, for as Grandy (1990) sums it up, “in spite of the fact that we have no adequate

semantics for any natural language we feel that there MUST be compositional semantics for

14

In a post on his co-authored blog Language Log, Geoffrey K. Pullum criticized the Oxford Dictionaries organization for choosing squeezed middle as their Word of the Year 2011. The Oxford Dictionaries argued that the expression is a compound and thus eligible for Word of the Year status, but Pullum argued against the construction’s wordhood based on the fact that, according to him, its meaning is fully compositional (<http://languagelog.ldc.upenn.edu/nll/?p=3573>).

Page 55: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

41

ALL natural languages, because otherwise people could not learn them” (557). At issue in the

present work is therefore not whether compositionality exists, but whether it differs from

transparency. My position is that we are dealing with two distinct, yet related concepts.

In its simplest form, semantic compositionality is “[understood] informally to mean that the

meaning of a complex syntactic expression is determined by its structure and the meanings of its

constituents” (Aronoff 2007: 803). For Weiskopf (2007), compositionality is in fact a somewhat

simpler operation, stating that “the default mode of semantic combination corresponding to

syntactic or morphological concatenation is set intersection” (162). Formally, compositionality

can be also be stated as a condition, as Katz (1973) does for traditional free-form constructions:

“For every syntactically complex constituent C of [sentence] S (including S itself) whose meaning is nonidiomatic, the set of semantic representations R assigned to C is a function of the sets of semantic representations assigned to the subconstituents that make up C and their grammatical relations in the sentence S” (Katz 1973: 357).

While the concept was originally applied to phrases, it has since been applied to a number of

more restricted constructions, such as collocations, idioms, and compounds. In many cases,

compositionality is a factor in establishing whether a particular construction is a member of a

particular group (i.e. Gibbs et al. 1989 for idioms, Tutin and Grossmann 2001 for collocations,

Weiskopf 2007 for compounds). It is also regarded as a fundamental component of derivational

morphology (Jackendoff 1974, Aronoff 1976, Lieber 1992, Bauer 2001b, Lieber 2004).

Compositionality is often referenced in terms of its applicability to a given expression and is

frequently discussed in its negative form. For instance, Gibbs et al. (1989) state that an

expression is “non-compositional” when “the figurative meaning of an idiom is not a function of

the meanings of its parts” (576). By the same token, we can thus assume that an expression is

compositional when its meaning is a function of the meanings of its parts. Regardless of how

one formulates compositionality, the term is frequently mentioned in a number of works

discussing semantic transparency (see definitions in Section 2.3.1 for example). It should come

as no surprise then, that some researchers have equated the concepts, viewing them as both

mutual and inseparable features of complex constructions.

This conflation of compositionality and transparency is often a result of overlapping definitions.

Roelofs and Baayen (2002), for instance, state that a morphologically complex word is

transparent if it is compositional. We see similar approaches in Marslen-Wilson (1994), Gross

Page 56: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

42

(1996), and Libben et al. (2003). In the glossary to his work on French fixed expressions, Gross

defines compositionality as follows: “A given construction is said to be compositional when one

can deduce its meaning from that of its component elements linked by a specific syntactic

relation”15 (1996: 154, my translation). This definition is then followed with a note inviting the

reader to consult the entry for “Opacity,” in which Gross echoes his description of

compositionality, stating that a construction is opaque when its meaning cannot be derived from

the meaning of its parts. It is clear that in Gross’s view, non-compositionality and opacity are

not only related, but may in fact be indistinguishable concepts. There are two particular

components to Gross’s definition of compositionality: one is the semantic contribution of an

expression’s elements, the other is the syntactic relationship held between said elements. The

first is hardly controversial and is in fact the basis for many definitions of compositionality (i.e.

see the definitions mentioned earlier). The second component, however, is not so easily

characterized, especially for NN compounds. If two nouns in apposition are said to be in a

syntactic configuration, it is not usually clear what type of predication might follow from it. As

was mentioned in the previous section, many NN compounds can be interpreted by virtue of the

meaning of the individual constituents, yet the relational association between them is obscure

(e.g. wool basket = ‘basket for wool’ or ‘basket made of wool’, Spalding and Gagné 2007). This

stands in stark contrast to synthetic compounds in English (and to some extent, in French), in

which the relational component can be deduced from the verb-complement relationship still

evident in the deverbal head (e.g. a truck driver is a ‘driver of trucks’; Roeper and Siegel 1978,

Botha 1984). For NN compounds, however, the syntactic relation itself is insufficient to

establish their full meaning, which means that most primary compounds (i.e. those not based on

a deverbal head) must be treated as non-compositional constructions. While this point of view

may in fact coincide with a number of claims regarding compounds (i.e. a compound is a non-

compositional multi-word lexeme, as Langacker 2009 suggests), it fails to take into account the

fact that the semantics of many compounds remain closely linked to the meaning of their

constituents.

15

“Une construction donnée est dite compositionnelle quand on peut déduire son sens de celui de ses éléments composants reliés par une relation syntaxique spécifique” (Gross 1996: 154).

Page 57: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

43

For the most part, compositionality is viewed as a binary property of complex constructions: in

the traditional Fregean view, a construction is either compositional or non-compositional. Thus,

according to the definitions discussed above, compositionality involves not only the meaning of

a construction’s parts, but also the rules or operations by which they are combined. If we treat

compositionality and transparency as identical concepts, then we must also view transparency as

a binary property: either a construction is transparent or it is opaque. If, on the other hand, we

view the two concepts as distinct, then we begin to allow for a more complex model of

transparency.

To be clear, a number of researchers clearly distinguish between compositionality and

transparency. Nunberg et al. (1994), in their seminal article on idioms, offer a definition of

compositionality that is distinct from that of transparency, describing the former as “the degree

to which the phrasal meaning, once known, can be analyzed in terms of the contributions of the

idiom parts” (498). This definition, however, is remarkably similar to some of those proposed

elsewhere for transparency, again showing just how blurred the line between the two concepts

is. Svensson (2004), in a more elaborate description of compositionality, defines the concept

along four distinct dichotomies, one of which is transparency - opacity. Although, she clearly

distinguishes between compositionality and transparency, she views the latter as a subset or

contributing factor to the former. This is quite different from a number of other approaches,

where compositionality is instead treated as a factor of transparency (Kehayia et al. 1999,

Pollatsek and Hyönä 2005, Tabossi et al. 2008). Svensson succinctly defines compositionality as

follows : “If all the words contribute to the meaning of the expression, we will say that it is

compositional”16 (Svensson 2004: 73). One will no doubt notice that if one were to replace the

word “compositional” with “transparent,” the description would be nearly indistinguishable

from many of those proposed by other researchers for transparency. Nevertheless, in Svensson’s

mind, the two concepts are distinct.

This approach, however, requires that we adopt a much narrower view of compositionality, one

that focuses solely on the meaning of a construction’s parts. Briefly, a compositional

16

“Si tous les mots contribuent au sens de l’expression, nous dirons qu’elle est compositionnelle” (Svensson 2004: 73).

Page 58: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

44

construction is one for which all constituents contribute semantically to the meaning of the

whole (Svensson 2004, Girju et al. 2005). In this regard, compositionality may then be partial if

only one of its constituents retains its meaning within the whole. Although this particular use of

the term may stray slightly from traditional usage, it acknowledges that a compound’s

constituents may contribute meaning without necessarily rendering it transparent. This is best

illustrated using the compounds hammerhead and arrowhead: in both cases, the meaning of

their constituents contribute to the meaning of the whole (“a shark with a head shaped like a

hammer” and “the head of an arrow” respectively), but in the case of the former, crucial

information is absent (i.e. that a hammerhead is a shark). Within a narrow view of

compositionality, key properties of certain exocentric compounds (e.g. hammerhead, birdbrain,

and redcoat) are captured despite their relative opacity. Compositionality can thus be said to

“feed” into transparency.

An interesting consequence that arises from the treatment of compositionality and transparency

as two distinct, yet related attributes is that the relationship between the two can only be

bidirectional under certain conditions. If we return to Gross’s (1996) approach to the concepts,

we notice that what he is effectively stating is that a non-compositional expression is an opaque

expression. Can we therefore say that an opaque construction is non-compositional? While this

statement seems plausible in his framework, it is not in fact tenable if the two concepts are

treated independently. Intuitively, when compositionality is viewed as a factor of transparency,

we generate a series of unequal relationships that show an interesting pattern. To illustrate this

point, let us look at the possible permutations of the dichotomies, as well as the assumptions one

can make regarding the relationship held between them:

(4) a. a compositional expression can be either transparent or opaque  

b. a non-compositional expression can be opaque, but not transparent

c. a transparent expression can be compositional, but not non-compositional

d. an opaque expression can be either compositional or non-compositional

While the assertions in (4b) and (4c) are largely hypothetical in nature, they nevertheless remain

intuitively plausible. Moreover, similar points have been made elsewhere (cf. Svensson 2004).

To summarize, a non-compositional expression cannot be transparent, while a transparent one

cannot be non-compositional. This polarity reflects many approaches to compositionality and

Page 59: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

45

transparency, whether they are treated as distinct concepts or not. What perhaps distinguishes

my position from that of others, however, are the types of implications that are possible given

the observations above. Most importantly, compositionality never strictly implies transparency,

nor does non-transparency strictly imply compositionality. Figure 2.4 on the following page

illustrates what entailments are in fact possible.

Figure 2.4. The relationship between compositionality and transparency.

As we can see, only two logical implications are present, which are indicated by solid lines (i.e.

a transparent construction is necessarily compositional and a non-compositional construction is

necessarily non-transparent). The dotted lines indicate possible relationships between concepts.

For instance, a compositional construction may be transparent, but it is not necessarily so; it

might instead be non-transparent17. One need only think of classic cases of exocentric

compounds to see how this can be true (e.g. fr. rouge-gorge; eng. redcoat). This relationship is

also reflected in ad-hoc expressions created spontaneously based on some contextually

dependent information (i.e. apple juice seat in Downing 1977). A consequence of the approach

described above is that the terms compositional and opaque (or non-transparent) are thus

ambiguous with respect to each other: stating that an expression is compositional is ambiguous

with regards to its transparency; the same is true of opaque expressions and their

compositionality.

17

It is important to note that the terms transparent and non-transparent are not used here in an absolute sense, as was initially discussed in Section 2.1.1. More precisely, one should understand them directionally, which is to say “tends toward transparent” and “tends toward non-transparent.”

Compositional Transparent

Non-transparent Non-compositional

Page 60: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

46

For our purposes here, we may state that by adopting a narrower view of compositionality that is

clearly distinct from transparency, we are in a better position to develop a richer account of how

form and meaning relate to each other. If the goal is to re-examine how compound transparency

may be formalized, this distinction offers the advantage of recognizing that some compounds

have a direct connection to the meaning of their constituents without being fully transparent.

This relationship will be revisited in Chapter 4 when I explore some of the features I believe

should be incorporated into a typology of transparency.

2.4.2 Semantic Transparency Defined

Although the majority of definitions or descriptions proposed for semantic transparency have

relied heavily on the relationship between the meaning of the whole and that of its component

elements, I will, for the time being, avoid stipulating this factor as a condition of the concept and

will instead formulate it in more general terms. It should be noted that while the following

definition is proposed with compounds in mind, it may in fact be applicable to other

morphologically complex constructions.

(10) a. For a complex lexical unit C, semantic transparency refers to the degree of semantic

interpretability of C

b. Semantic transparency is a property of C that:

i. is scalar (i.e. is not simply a +/− feature)

ii. is multi-faceted (i.e. based on a number of factors)

The definition in (10a) is based on the widely accepted view that semantic transparency is

related to comprehension of a lexical unit and emphasizes that its interpretability is a matter of

degrees. The stipulations in (10b), which are consequences of (10a), are considered defining

features of transparency. First, I maintain that semantic transparency cannot be viewed as a

binary feature of complex constructions, as has been tacitly held by some researchers. In other

words, it cannot be said that a compound is either transparent or opaque, but rather that it

exhibits some degree of transparency. Of course, any typology of semantic transparency will

almost inevitably take the form of some sort of cline, but with enough parameters, such a

typology could be sufficiently granular for the purposes of compound classification. Second,

transparency is said to be dependent on a number of different factors, such as compositionality

Page 61: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

47

and headedness. This allows for factors to be weighted differently or identically, whatever the

case may be. If constituent meaning proves to be a leading factor in semantic transparency, it

will be treated as such. We may find, however, that individual constituent meaning is trumped

by an infrequent semantic relation held between them (Allen 1978) or that in some cases

exocentric compounds show a high degree of transparency.

It must be noted that while we may formalize semantic transparency as a linguistic concept, its

application may not always reflect the speaker’s perception of it. In other words, the perception

of transparency for a particular construction will no doubt vary from speaker to speaker. A

model of transparency might label salad dressing as transparent, but if the speaker is unfamiliar

with either of these words, he or she will most likely not perceive it as such. As Svensson (2008)

emphasizes, however, this variation does not necessarily preclude the possibility of “label[ing]

certain expressions as opaque for the majority of language users who have not learned the

meaning yet” (88). The above definition is meant to be compatible with this view.

Moreover, as was discussed in Section 2.1.1, the term opaque will be reserved for only those

expressions that are deemed truly opaque, that is as having the same transparency status as

simplex words. In all other cases, compounds will be said to show a certain degree of

transparency, which may vary greatly from one construction to the next. The term transparent,

when used, will be understood to mean “high degree of transparency.” Figure 2.5 illustrates how

these terms relate to each other as points on a continuum. The extremities of this scale represent

absolutes to the extent that nothing greater may be located beyond them (i.e. there is no

compound more opaque than an opaque compound).

Figure 2.5. A continuum of semantic transparency.

Of the stipulations in (10), those in (b) are the most central to my approach. What I am

suggesting is that, on the one hand, transparency may vary from one construction to another and

Transparent X degree of transparent Opaque

Page 62: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

48

is therefore best represented as the continuum above. On the other hand, this variation results

from a number of different factors and features. The latter position is already present to some

degree in Libben’s typology of transparency (Libben 1998) and is also hinted at in some of the

experimental work on the role of transparency in compound interpretation (Jarema et al. 1999

and Libben et al. 2003 for compounds, Feldman et al. 2003 for derived words). Some studies

suggest that a speaker’s ability to access compounds is affected by a number of compounded

factors, such as frequency, transparency, and productivity (Dohmes et al. 2004). My contention

here is that to effectively evaluate the semantic transparency of a complex lexical unit, these

factors must be taken into account. Based on past research on compounding, I will argue in

Chapters 4 and 5 for a typology of semantic transparency that incorporates the following

semantic features:

(11) a. The position and the nature of a compound’s head

b. The semantic contribution of a compound’s elements

c. The unexpressed semantic relation held between a compound’s constituents

d. The degree of semantic similarity between related compounds

It is my contention that the features in (11a-c) represent what a typology of compound

transparency should, at a minimum, take into account. The property listed in (11d) is meant to

augment the relational feature in (11c). By incorporating several factors into existing models of

semantic transparency, we may propose a more granular typology of the concept, one that

allows for a classification of compounds that better reflects the numerous ways in which their

meaning may be composed and established.

2.5 Summary

In this chapter, I looked at a number of issues related to the term “semantic transparency” (or

“semantic opacity”) as it has been used alongside morphologically complex words, including

compounds. I have argued that usage of the term generally lacks specificity and often differs

across works on the subject. This limited description of the concept has led to a wide range of

claims regarding not only the processing of compounds, but also their classification, as well as

their status as lexical items. Many of the studies that claim to look at semantic transparency

often do so without first explaining just how compound A is more or less transparent than

Page 63: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

49

compound B. This is precisely the focus of this thesis. In this chapter, I argued that transparency

and compositionality are two distinct, yet related concepts and that a compound can be

compositional without being fully transparent. This distinction, I believe, affords a much wider

and richer view of transparency, one that includes elements that have seldom factored into

previous discussions on the concept. The working definition in Section 2.4.2 is meant to reflect

this broader approach to semantic transparency and will be revisited and improved upon as I

explore the factors mentioned at the close of this chapter.

Page 64: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

50

Chapter 3

French Nominal Compounds and Data Collection

This chapter focuses on what is arguably the most fundamental component of the present

research, namely compounds. I begin by discussing compounding and how it is understood in

the context of this thesis. This discussion also touches on French compounds, highlighting how

other researchers have treated them in the past. I then narrow the scope of the study of semantic

transparency by setting limits on the types of constructions under investigation. Finally, I outline

the methods used to collect the core data that will serve as the basis for the analysis presented in

the remainder of this work.

3.1 Compounding

Despite its long and rich history, compounding remains a somewhat controversial object of

study. While most researchers agree that there is such a thing as a compound, not everyone

agrees on what exactly it is or what it should even look like. Yet, as with any other research

effort, a description of the object of study, however limited it may be, must be provided if

anything meaningful is to be said on the topic. To this end, the following sub-sections will

explore not only how compounding has been defined by other researchers in the past, but also

the criteria that have been used to distinguish between compounds and phrases. Moreover,

because this thesis focuses on semantic transparency from the perspective of French

compounds, several typologies and approaches to compounding in French are examined. It is

important to note that the purpose of this discussion is not to establish once and for all what

constitutes a compound, but rather to set clear boundaries with which to limit the scope of the

present study.

Page 65: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

51

3.1.1 Defining the Compound

Compounding is largely understood as the process by which a compound is formed. A survey of

introductory texts in morphology reveals a rather homogeneous—though perhaps simplified—

approach to both the operation and its result. Let us consider the following definitions taken

from a number of recent works in general morphology:

(12) a. “words formed by combining roots” (Carstairs-McCarthy 2002: 59).

b. “What we mean by ‘compounding’ is the construction of a complex lexical unit from

at least two bound or free lexical morphemes18” (Apothéloz 2002:18, my translation).

c. “The formation of a new lexeme by adjoining two or more lexemes is called

compounding” (Bauer 2003: 40).

d. “A compound is a lexeme which contains two (or more) stems and which does not

have any derivational affix which applies to the combination of stems” (Bauer 2004: 32).

e. “A derived form resulting from the combination of two or more lexemes” (Aronoff

and Fudeman 2005).

f. “[compounding] consists of the combination of two words” (Booij 2007: 75).

g. “Compounds are words that are composed of two (or more) bases, roots, or stems”

(Lieber 2010: 43)

Two major criteria can be retained from the descriptions or definitions above. First, a compound

is typically viewed as a lexical item. A corollary of this principle is that we can expect any

multi-word construction labeled as a compound to behave similarly to traditional lexemes.

Second, a compound is the product of composition between two (or more) otherwise

18

“[O]n entend par ‘composition’ la construction d’unité lexicale complexe au moyen [...] d’au moins deux morphèmes lexicaux libres ou liés” (Apothéloz 2002:18).

Page 66: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

52

independent lexemes19. This criterion allows us to distinguish rather easily between derived

words and compounds (e.g. abaissement ~ abaisse-langue). What remains to be established,

however, is which of a language’s multi-word constructions are in fact compounds. This is

especially important when discussing compounds in French, as they often bear a significant

likeness to syntactic phrases (i.e. la robe de mariée ~ la robe de la mariée). Despite these

similarities, however, compounds are mostly viewed as distinct entities given that, as Bauer

(2001) states, they “[show] some phonological and/or grammatical isolation from normal

syntactic usage” (695). The task is therefore to determine what such “isolation” entails.

More detailed and complete definitions of compounds reveal just how complex the issue

actually is. In Bauer’s (1978) seminal work on compounding in English, French, and Dutch, he

describes the phenomenon as follows:

“[I]t can be said that a compound is a morphologically complex unit, made up of two words (lexemes) acting as a single word (lexeme). The words or (in most cases) potentially free formatives may themselves be further subdivided. The compound, it is claimed, shows a degree of phonological, morphological and semantic isolation. However, these points are better considered as tendencies than as rules, since there appear to be very few ‘rules’ in compounding that admit of no exceptions.” (54)

While more elaborate definitions of compounds, such as the one above, share a number of

similarities with the more cursory descriptions mentioned in (12), they often highlight the fact

that the concept is nuanced and not necessarily subject to easy circumscription. Bauer therefore

speaks of tendencies, which may be prudent, but this approach also has the potential to weaken

the conclusions one might draw based on a study of the phenomenon.

It should therefore come as no surprise that many researchers do attempt to restrict what types of

constructions may be considered instances of compounds. This narrow approach to the topic is

usually done for a number of reasons. On the one hand, many authors wish to consign as much

as they can to syntax, thus accounting for the fact that some constructions behave internally as

syntactic units. On the other hand, researchers want their frameworks to be consistent and easily

19

Apothéloz’s (2002) definition mentions bound morphemes (“morphèmes lexicaux liés”) as acceptable compound constituents. This is done to treat words such as bibliophile as compounds, traditionally referred to as neo-classical compounds (Scalise and Bisetto 2009). As stated in Section 3.2.1, these constructions will be ignored in this work for technical reasons.

Page 67: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

53

duplicated, a goal more easily achieved with a much narrower view of compounding. Let us

take, for instance, Ten Hacken’s (1999: 41) definition of a compound, stated as follows:

(13) A compound is a structure [X Y]Z or [Y X]Z, such that:

• The denotation of Z is a subset of the denotation of Y;

• If S is a possible way of specifying Y, the denotation of Z is determined by a range S’s that are compatible with the semantics of X;

• X does not have independent access to the discourse

Ten Hacken establishes not only what structure a compound has, but also its semantics. The first

clause stipulates that a compound must have a semantic head, thereby denying compoundhood

for English constructions such as pickpocket and redcoat, which have long been treated as

compounds elsewhere (Allen 1978, Selkirk 1982, Scalise 1984, Lieber 1992). The second clause

refers to the possible relations that may hold between a compound’s constituents, the range of

which is restricted based on their semantics. The third clause of Ten Hacken’s definition

accounts for the non-head’s inability to pick out a reference without contextual support20.

Semantic criteria, however, is not limited to headedness. According to Langacker (2009: 54) a

compound is by definition non-compositional because its meaning is indeterminate, but this

principle also challenges the widely held belief that a compound’s meaning can in fact be

computed from its components (see Chapter 2 for a discussion of compositionality in

compounding). Alternatively, some definitions restrict membership based on the types of lexical

units involved. For example, Arnaud (2004) states that “a compound noun is a nominal lexical

unit that results from the union of two (or more, recursively) open class lexical units21” (329,

my translation). According to Arnaud’s definition, nominal constructions such as arrière-plan

and sans-abri must be rejected as compounds because they contain elements from a closed

lexical class, which would also include any construction containing a preposition as a linking

20

It has been observed that the non-head element of a compound does not typically have a reference: house in housefly, for instance, does not refer to a particular house. The only exceptions are proper names that are sufficiently specific so as to possess a referent independent of discourse, which explains why Chopin fan is acceptable, but *Frédéric fan is not. 21

“Un nom composé est une unité lexicale nominale résultant de l’assemblage de deux (ou récursivement de plusieurs) unités lexicales de classe ouverte” (Arnaud 2004: 329).

Page 68: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

54

unit. Yet many of these constructions show a great deal of syntactic autonomy and are widely

considered to be compounds elsewhere (see, for instance, Amiot 2005 for a discussion of Prep-

N compounds).

The fact is that compounds are not necessarily definable according to a fixed set of features, nor

are these features likely to coincide cross-linguistically. Although there is general agreement

that a compound consists of two otherwise independent lexemes, such a definition is most likely

too broad to prove very effective in identifying instances of compounds. In this regard,

establishing compoundhood is largely a matter of testing candidates according to various

behavioural and functional criteria. Even this approach, however, is bound to encounter

difficulties given that compounding, as a linguistic operation, may actually be connected to

several different domains. Ten Hacken (1994) succinctly states the difficulties, as well as the

resulting issues, at hand: “Compounding has borderlines with affixation, with syntax, and with

the lexicon. For each of these borderlines, there are cases where the classification is not

straightforward. A definition will have to result in a decision for these borderline cases” (23).

Thus, the next few sections will look at some of the criteria advanced to facilitate this task.

Before moving on, however, a brief word must first be said on terminology. There exist

numerous labels for the multi-word combinations that defy many of the syntactic and semantic

constraints of the language. The exploration of such constructions usually falls under the

purview of phraseology (see Gries 2008 for an overview of the field). A number of categories

have been proposed, but not everyone agrees on what these categories are or which items should

be included in them. Polguère (2003), for instance, labels fruit de mer and nid de poule as

nominal locutions, while Mathieu-Colas (1996) calls them compounds. Of course, the

terminology one adopts largely stems from one’s preferred theoretical framework (Polguère, for

example, works within the Meaning-Text Theory, Mel’cuk et al. 1995). Granger and Paquot

(2008) propose three major categories, one of which groups together “referential phrasemes”

that includes seven types: lexical collocations, idioms, irreversible bi- and trinomials, similes,

compounds, grammatical collocations, and phrasal verbs. Compounds are said to “resemble

single words in that they carry meaning as a whole and are characterized by high degree of

inflexibility, viz. set order and non-interruptibility of their parts” (43). These characteristics

presumably set compounds apart from lexical collocations, where one constituent is said to

depend on the other, and from idioms which, according to Granger and Paquot, “are

Page 69: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

55

characterized by their semantic non-compositionality” (43). These criteria are only partial

indicators of membership, however, as each type is in fact defined according to several different

features.

It is also useful to note that a multi-word expression does not need to be listed to be considered a

compound. Di Sciullo and Williams’s (1987) “hierarchy of listedness” places compounds

somewhere between words and phrases, stating that “many of the compounds are listed” (14).

While few researchers have in fact claimed that compounds, because they are lexemes, must be

listed, it still bears mentioning that such a criterion should only be given limited weight.

Listedness cannot be a required property of compounds because compounding is widely

considered one of the most morphologically productive processes in word formation (Downing

1977, Roeper and Siegel 1978, Libben 1998, Bauer 2001b). Thus, to claim that the result of

such a process must be listed would be to argue that compounding is in fact unproductive.

Downing (1977) emphasizes the productive nature of compounds and discusses what she calls

deictic compounds, pragmatic constructions made up on the spot and used to refer to a

temporary situation or condition. These nonce forms—which includes her now famous example,

apple juice seat—will most likely never be listed anywhere, yet they are examples of

compounds in the wild, so to speak. Allen (1978) made similar remarks, distinguishing between

lexicalized and non-lexicalized compounds and suggested that semantic non-compositionality,

among others, is a contributing factor to the lexicalization of a compound, which further

increases its chances of being listed.

All things being equal, the use of the term compound in this thesis will reflect its usage across

works anchored in lexicalist morphology. Many of the definitions offered at the beginning of

this section are good examples of such works (e.g. Bauer 2003, Aronoff and Fudman 2005,

Lieber 2010) and while they are all cases of introductory texts on morphology, the definitions

they propose are nevertheless both sufficiently similar and explicit that they may support the

discussion to follow. In sum, compounds are understood as lexemes that are themselves

composed of two or more lexemes. I use the term lexeme so as emphasize three points: 1) that

compounds are typically formed using free morphemes; 2) that these constituents can be either

simplex (e.g. panier à salade) or derived forms (e.g. assurance-emploi); and 3) that such

combinations function as a single unit (cf. Bauer 1978).

Page 70: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

56

As for a working definition, I think it prudent to offer something as concise as possible, while

relying on tests or criteria to determine with certainty whether a particular multi-word

constructions should be treated as a compound. These criteria will follow in the next section. To

conclude this section, I thus propose the following brief, albeit representative definition of

compounds:

(14) A compound is a morphologically complex unit composed of at least two lexemes that

may or may not be fused, but which functions as a single lexeme.

The above definition not only closely echoes Bauer’s (1978) own description mentioned earlier,

but also strays very little from most other definitions. What remains to be done is to establish

where one should draw the line between freely constructed syntactic units (e.g. brown table = ‘a

table that is brown’) and morphologically complex units (e.g. blueberry = ‘a berry that is blue’).

The following section discusses some of the key ways in which this distinction may be made.

3.1.1.1 Compounding Criteria

As was briefly touched upon earlier, the work presented in this thesis is largely based on a

lexicalist view of morphology, which is to say that there is such a thing as the lexicon and that

word formation falls within the domain of morphology. This approach, which originated in

Chomsky (1970), has given rise to the Lexicalist (or Lexical) Integrity Hypothesis (henceforth

LIH). Briefly, this hypothesis draws a hard line between what belongs to syntax and what

belongs to morphology and places constraints on how these modules may interact (Halle 1973,

Aronoff 1976, Allen 1978). Lapointe (1980) frames the matter succinctly: “syntactic rules are

not allowed to refer to, and hence cannot directly modify, the internal morphological structures

of words” (222). Consequently, such a framework also prohibits simultaneous operations from

taking place across modules: “No deletion or movement transformations may involve categories

of both W[ord]-structure and S[entence]-structure” (Selkirk 1982: 70). The original lexicalist

position was eventually weakened to allow for a certain degree of interaction between

morphology and syntax, mostly in order to account for inflectional morphology’s dependence

across multiple units (Anderson 1982, also see Baker 1985 for additional evidence supporting a

weakened lexicalist position). Although the LIH has undergone several refinements over the

Page 71: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

57

years (see Ackerman and LeSourd 1997, Lieber and Scalise 2007), the fundamentals of the

approach have largely remained intact.

Given the premise outlined above, we may state that if compounds are in fact words, and words

are considered “syntactic atoms” (DiSciullo and Williams 1987), then establishing what

constitutes a compound involves examining how it responds to syntactic based tests. Many such

tests have been proposed, most of which are based on criteria that reflect the behaviour and

functionality of morphological and syntactic items.

Before discussing the syntactic tests mentioned in the literature, however, it is important to note

that the identification of compounds has in fact involved criteria from several domains.

Orthographic markers such as spaces and hyphens have received some attention, but as

Mathieu-Colas (1994) shows, they exhibit too much variation to be of much use in discerning

between phrases and compounds. Several phonological criteria have also been proposed over

the years, but many of them have proven either inconsistent or incorrect. For example, the oft-

cited stress criterion for English compounds, popularized by Marchand (1960) and formalized in

Chomsky and Halle (1968) as the Compound Stress Rule, states that stress is located on the left-

most constituent of a compound, but on the right most constituent of a regular noun phrase. This

rule, however, has been shown to vary greatly between otherwise identical constructions (for

instance, apple pie is left-stressed, while apple cake is right-stressed, but both are typically

considered compounds; see Bauer 1998). This particular criterion is also language dependent: it

is not relevant for languages that lack lexical stress, such as French. As for morphological

criteria, Bauer (1978) discusses plurality marking as a means of identifying compounds, stating

that inflection is usually present on the head. Rosenberg (2007), however, recently showed that

while all markings are possible (head-marking, external marking, and double marking) for

French compounds, double marking is by far the most frequent inflectional operation used for

indicating number (e.g. chèques-restaurants, secteurs-clés, lourds-légers, etc.). A prevalent

semantic criterion is related to a compound’s denotation and is included in many definitions of

compounding. Because compounds, like words, are naming units, they typically refer to a single

concept. Compounds are therefore said to possess both “a stable referent [and] a unitary

meaning” (Gaeta and Ricca 2009: 39). Thus, a zebrafish does not denote a zebra on the one

hand and a fish on the other, but rather a fish with properties like those of a zebra. This

particular criterion, however, is typically only useful for constructions containing more than one

Page 72: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

58

noun. Nevertheless, it is widely accepted that compounds denote a single thing or concept, as

evidenced by its frequent inclusion in definitions of compounding.

The most prominent type of test, however, involves verifying whether a particular construction

is resistant to syntactic manipulation. Firmly rooted in the lexicalist tradition, these tests are

abundant in the literature. For examples on how these tests have been used to establish

compoundhood, the reader is invited to consult, among others, Allen (1978), DiSciullo and

Williams (1987), Ten Hacken (1994), Bresnan and Mchombo (1995), Bauer (1998), and Lieber

and Scalise (2007). For the application of these tests on French compounds, one should consult,

among others, Barbaud (1971), Gross (1988), Riegel (1988) and (1991), Anscombre (1999), and

Arnaud (2003). It should be noted that in many cases, these tests target the non-head constituent

as it is usually a compound’s most syntactically isolated unit (Bauer 1998). Of the numerous

tests proposed to identify compounds, three in particular stand out, all of which rely on the same

principle, namely the syntactic atomicity of words. These tests are summarized below as criteria

and are illustrated using examples from French.

The first test is most often used for instances of compounds that might otherwise be considered

nominal phrases with an adjectival modifier, but it may also be used with constructions

containing nouns. Because compounds function as single units, it is not typically possible to

modify the individual constituents—instead the modification must apply to the entire

compound.

(15) Criterion 1: A compound’s constituents may not undergo modification

a. sage-femme *[sage [jeune femme]] → [jeune [sage femme]]

*[[très sage] femme]

b. bureau de poste *[[bureau climatisé] de poste] → [[bureau de poste] climatisé]

The second test involves coordinating the elements of a compound with those of others. While

phrases typically allow for units to be coordinated, compounds do not allow for coordination to

occur, where such an operation produces either an ungrammatical construction (as in 16a) or a

strange or incorrect interpretation (as in 16b).

Page 73: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

59

(16) Criterion 2: A compound’s constituents may not undergo coordination

a. *son beau-frère et père → son beau-frère et (son) beau-père

b. ?des bancs de sable et de neige → des bancs de sable et des bancs de neige

While the coordination (16a) is unacceptable, the one in (16b) gives rise to a strange,

contradictory reading, namely that the bank is made of both sand and snow. It should be noted

that the coordination test may also produce ungrammatical constructions for phrases if the

elements being coordinated are not semantically related (e.g. *an artificial heart and island).

The third test verifies whether a constituent is in fact independent by attempting to refer to it

using an anaphoric pronoun. Although most phrasal elements may be referenced in this manner,

a compound’s constituents may not function as antecedents.

(17) Criterion 3: A compound’s constituents may not serve as a reference for an anaphoric

pronoun

a. *C’était un délicieux café crèmei même s’il y eni avait pas assez.

b. *La base de donnéesj est maintenant en ligne; ellesj appuieront votre recherche.

Many other tests have also been proposed under a variety of names, most of which involve

similar syntactic based constraints to those just discussed. Their purpose, however, remains the

same: to distinguish between a freely formed phrase and a compound. A few questions

regarding the use of such tests do arise, however. First, how many of these criteria should apply

before a decision may be made regarding the status of a particular construction? In other words,

if we limit ourselves to the three syntactic tests above, must a combination conform to all of

them in order for it to be considered a compound, or is one perhaps sufficient? Second, are these

tests truly conclusive? In other words, if a particular combination fails to meet a series of

stipulated criteria, does this guarantee that it isn’t actually a compound? Bauer (1998),

examining a set of seven frequently discussed tests for identifying NN compounds in English,

including those discussed above, concludes that “none of the possible criteria gives a reliable

distinction between two types of construction” (78). His study shows that some compounds fare

better with certain tests than others, but that the overall degree of correspondence may not be

substantial enough to be used to categorically deny or affirm compoundhood.

Page 74: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

60

Regardless of these issues, syntactic criteria has arguably served as the prevailing method for

distinguishing between compounds and phrases. Of course, one must first establish which types

of constructions warrant testing. In English, the focus has largely been on pairs of appositional

nouns or nouns with an adjectival modifier. When considering other languages, however, the

focus may be shifted to other types of multi-word constructions. As we will see in the next

section, this is certainly the case for French.

3.1.1.2 Compounding in French

Darmester’s (1874) treatise Traité de la formation des mots composés dans la langue française

is considered one of the earliest works to exhaustively look a French word-formation. Although

he uses the term mot composé, it should be noted that his usage covers a broad range of

operations, some of which would no longer be considered instances of compounding. The four

major classes of compounds identified by Darmester are as follows:

(18) a. compounds by juxtaposition coffre-fort, blanc-bec, toujours

b. compounds with a particle malheureux, biscuit, non-pareil

c. true compounds chou-fleur, arrière-cour, portefeuille

d. compounds from other languages jurisprudence, acrobate, auberge

The classification above may give the impression that Darmester’s study was perhaps too broad

and insufficiently granular, but this is not in fact the case. The categories above represent his

major types, each of which includes many subtypes based on a diverse set of criteria. His

treatise offers a rich description of word-formation in French, but, given its 19th century origins,

it relies heavily on etymological evidence for its classification, often retaining as compounds

items that few speakers would ever submit to decomposition (e.g. biscuit, from bis-cuit =

‘galette cuit deux fois’).

More recent research on French compounds has instead focused on grouping together

compounds based on the lexical categories of their elements. Gross (1988), considering only

nominal compounds consisting of no more than two major lexical categories, proposes 26 types.

His typology is reproduced in the following table:

Page 75: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

61

Table 3.1. Gross’s (1988) typology of French nominal compounds.

Type Example(s) Type Example(s)

N de N une pomme de terre, un coup de force V Conj V un va-et-vient

NAdj un cordon-bleu, un cercle vicieux VAdv un frappe-devant AdjN un blanc-bec, un grand ensemble à N un à-coup, un à-côté NN un café-filtre, un cheval-vapeur contre N un contre-projet, une contre-allée Npartprés un poisson volant, un chat-huant sur N du sur-place, le sur-moi N par N la preuve par neuf sans N un sans-culottes, un sans-abri

N en N un arc-en-ciel, une entrée en fonction arrière N une arrière-saison, un arrière-train

N à N une pelle à gâteau, une roue à aubes avant N un avant-projet, une avant-scène

N Prép N de la sculpture sur bois Prép Pro un chez-soi VN un gratte-papier, un crève-cœur Prép Adv un en-avant V Prép Inf un pince-sans-rire Adv Partprés un bien-pensant V Prép N un tire-au-flanc Adj Prép N un haut-de-forme Vimpér Pro un rendez-vous Numér N un trois-pièces, un dix-tonnes

Using Gross’s work as the foundation for his own study of French compounds, Mathieu-Colas

(1996) proposes what amounts to the most exhaustive attempt at cataloguing nominal

compounds using lexical categories. His typology includes 17 major classes alongside 8

complementary classes (for compounds containing more than two lexemes), each of which

contains numerous sub-classes. The typology is said to consist of over 700 compound types.

What Mathieu-Colas’s classification thus reveals is that nearly all lexical categories may be

combined to produce a nominal compound. No doubt due to its unwieldy nature, Mathieu-

Colas’s typology has not been widely adopted, though it does remain useful when distinguishing

between certain subsets of compounds (e.g. Adjective + Nominalized Participle = un mauvais

perdant ~ Adjective + Participle = un nouveau né). A number of smaller, more restrained

typologies of French compounds have been put forward in recent years (e.g. Zwanenburg 1992,

Corbin 1997, Brousseau and Nikiema 2001, Fradin 2009), many of which vary greatly in the

types of constructions they include. Moreover, classifying compounds based on the lexical

categories of their constituents remains pertinent for a number of different languages:

MorboComp, a multilingual database of compounds, contains 110 possible combinations based

on an analysis of 23 different languages (Scalise and Vogel 2010). Given its prevalence in the

Page 76: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

62

literature, an approach based on lexical categories will also serve as the basis for my

investigation of French compounds.

3.1.1.3 Which Compounds Should We Investigate?

As was discussed in Section 3.1.1.1, a discussion of compounding first requires that we

distinguish between instances of morphological units and instances of freely formed phrasal

units. This step presumes, however, that some combinations may be either. Looking at the major

combination types present in French, we find that some are more likely to be acknowledged as

instances of compounds than others.

Just as in English, NN constructions are the most widely accepted type of compound for French

and other languages (for French, see, among others, Bauer 1978, Riegel 1988, Gross 1996,

Lesselingue 2003, Arnaud 2003, Takada 2008, Fradin 2009). Much has been said regarding

these types and the tests discussed in Section 3.1.1.1 lend support to treating the vast majority of

these combinations as compounds (see previous references). A second type of combination also

widely viewed as an instance of compounding in French is the VN construction, which is

typically understood as French’s synthetic compound (see, among others, Roussarie and

Villoing 2003, Villoing 2002, 2003, 2009; Rosenberg 2008, 2011). According to Corbin (1997),

these two types of constructions are the only true instances of compounds in French, all other

constructions being lexicalized phrases. But even the most widely of accepted types have been

denied compoundhood status by some: Noailly (1990) and Fradin (2003), for instance, argue for

a very restricted class of NN compounds, and DiSciullo and Williams (1987) consider most

nominal combinations, including VN constructions, syntactic words, that is to say, phrases

inserted in N position. Again, the question driving the debate rests, in most cases, upon the

morphological and syntactic distinction discussed earlier: which combinations are governed by

morphological principles and which obey syntactic rules? Despite the range of positions held on

the matter, there is nevertheless a sufficient body of work to support treating NN and VN

constructions as morphological objects and thus compounds.

Another major class of construction that raises questions regarding their status as compounds

are those involving adjectives. In French, adjectives may be preposed or postposed to the noun

Page 77: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

63

they modify, a fact that manifests itself quite clearly in constructions typically viewed as

compounds, as the following examples show:

(19) a. AN rouge-gorge, belle-mère, petit fils

b. NA cordon bleu, maison blanche, accent aigu

The challenges posed by these types of construction are related to the question of where one

should draw the line between compound and freely formed phrase. After all, even if we wish to

use lexicalization as a possible criterion, we must still admit that they are first introduced as

phrasal units. Corbin (1992), who rejects any phrase-like combination, argues against AN and

NA compounds (see also Fradin 2003, Rosenberg 2007, Gaeta and Rica 2009). The

modification test discussed in the previous section, however, works well for these examples and

is therefore the typical method employed to label them as compounds (e.g. *un [cordon [bleu

foncé]]). Alternatively, we might also wish to assess these cases according to a set of semantic

criteria, such as compositionality. This approach is attractive, given how easily AN and NA

compounds allow for both literal and non-literal readings. Presumably, freely constructed AN

and NA phrases would only permit a literal reading. If we revisit the examples in (19) above, the

differences are quite clear:

(20) a. belle(-)mère = ‘beautiful mother’ or ‘mother-in-law’

b. cordon bleu = ‘a blue tie/string’ or ‘an excellent cook’

This approach, however, does pose a problem for studies in semantic transparency. Retaining

only those AN and NA constructions that possess a non-literal meaning will result in a study of

transparency that relies heavily on the least transparent constructions. Consequently, if this

criterion is deemed sufficient, it must also be applied across all types, discarding NN

combinations such as pause-café and auteur-compositeur in the process. Such a result is highly

undesirable as these types are arguably those that will most benefit from a typology of semantic

transparency, not to mention their status as prototypical compounds in French. Moreover, NA

and AN compounds differ significantly from other compound types in that the relationship

between their elements is seldom ambiguous. In other words, combinations involving an

adjective are largely all attributive in nature. Even in the most figurative or lexicalized instances,

the attributive relation held between the modifier and the head is typically retained. In cordon

Page 78: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

64

bleu, for instance, bleu still modifies cordon (i.e. cordon qui est bleu), despite the fact that the

compound refers to an individual22. This stability greatly contrasts with constructions involving

two nouns, where the relation may be realized as a predicate with several possible values (Allen

1978).

Given these observations, as well as the overall difficulties associated with establishing

compoundhood for NA and AN combinations (see Van Goethem 2009 for a recent cross-

linguistic look at the many issues involved), these types are not included in this study. While

some may object to this methodological choice on the grounds that I am ignoring a potentially

rich set of data, the scope of this project must nevertheless remain narrow if it is to successfully

investigate multiple factors in semantic transparency. Compounds involving adjectives,

although no doubt relevant to the discussion, would only add additional complexity to an

already encumbered investigation.

Another type of French construction, not found in English but present in most Romance

languages23, involves two nouns linked by a preposition, such as lune de miel, moulin à vent,

arc-en-ciel, etc. These types have been treated as compounds by several different researchers

over the years (Giurescu 1975, Gross 1988, Anscombre 1990 and 1999, Bosredon et Tamba

1991, Mathieu-Colas 1996), but they have also had their share of detractors (Corbin 1997,

Rosenberg 2007, Booij 2007, Fradin 2009). Those who reject these types as compounds do so

for many of the same reasons that lead to the dismissal of AN and NA combinations, namely

that they are instantiated in the syntax. As Booij (2007) says:

“The structures N à N and N de N are instantiations of the syntactic structure [N PP]NP, a noun phrase consisting of a head N followed by a PP complement, and have developed into constructional idioms. Such phrases are functionally equivalent to compounds in Germanic languages, and that is why the mistake is made to consider them compounds.” (Booij 2007: 83)

22

The etymology of the construction confirms the attributive relation in which bleu modifies cordon: “Se dit figurément et par plaisanterie d'une cuisinière très-habile (Ac. 1835-1932); plaisanterie qui porte sur l'éminence du grade de cordon bleu et sur l'ancien tablier bleu des servantes” (TLFi). 23

For Spanish, see Rainer and Varela (1992); for Italian, see Scalise (1992).

Page 79: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

65

According to Booij, this erroneous account of N Prep N constructions as compounds is largely a

matter of a misguided cross-equivalency, which is to say that because toolbox is a compound in

English, its French equivalent boîte à outils is also a compound, regardless of its structure.

Furthermore, both English and French are head-initial languages, a fact that secures toolbox’s

status as a compound given its right-headedness, but which calls into question boîte à outils’s

own status considering its distinctly phrase-like structure. While Booij’s original argument is no

doubt well-founded, it would also mean the rejection of AN and NA constructions as they are

also noun phrases. Yet he considers some AN English constructions compounds and discusses

the modification test as proof of this classification (e.g. [dark [blackboard]] and not *[[dark

black] board]]). If the modification test does in fact indicate compoundhood, then many N de N

and N à N constructions should be treated as compounds: compare, for instance, *[[boîte vide]

de conserve] and [[boîte de conserve] vide]. Moreover, as was shown in Section 3.1.1.1, these

types of constructions also typically fail both the coordination and the anaphora tests.

It is also worth noting that the N Prep N constructions usually taken to be instances of

compounds lack determination in the PP’s NP complement, a fact that is not typical of French

NPs24. Thus, there is a clear difference between otherwise identical pairs of nouns connected by

a preposition. The following examples from Cadiot (1997: 104) illustrate this point:

(21) a. bac (à + ?au) sable fin / bac (*à + au) sable mouillé

b. sac à dos / sac au dos

In (21a), Cadiot argues that the first construction instantiates a type, while the second a token. In

(21b), he discusses referentiality: dos is only referential in the definite construction. The contrast

between these constructions shows that the absence of a determiner has a clear semantic effect

on the whole. Whether this is truly evidence of compoundhood is unclear, but it arguably lends

support to the notion that N Prep N constructions without a determiner possess behavioural

qualities that set them apart from regular N PP phrases. Moreover, as many authors have already

24

A highly frequent and productive instance of bare nouns with PPs in French involves the partitive construction (e.g. un morceau de gâteau, un litre de lait, une pointe de pizza, etc.). Although the tests discussed earlier might suggest that in some cases they are compounds (e.g. un ?[morceau de [bon gateau]] ~ un [bon [morceau de gâteau]]), they are not typically viewed as such.

Page 80: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

66

shown, many of the atomicity criteria advanced to determine compoundhood apply to these

constructions, which only further confuses the matter (e.g. *un sac à dos et à main).

As was stated at the outset, the purpose here is not to establish once and for all what constitutes

a compound, but rather to shed some light on the object of study itself so as limit the scope of

the analysis that will follow. Unsurprisingly, exploring how semantic transparency relates to

compounds requires that we first define a compound. The discussion has shown, however, that

such a definition is not without its challenges. Numerous compound types have been proposed

for French, but not everyone agrees on which of these constructions should be included in this

class.

As a means to skirt controversy, this work will primarily focus on attested NN compounds.

These constructions are not only the least likely to be confused with syntactic phrases, but also

the most widely studied type of compound in several different languages, including French.

Moreover, as previous research has shown, NN compounds do not overtly communicate the

relational association held between their constituents, a fact that distinguishes them from other

types, including N Prep N constructions where the linking unit is said to provide some relational

information (Cadiot 1997).

That said, in order to provide additional insight into the arguments and hypotheses presented in

the following chapters, I will also look at instances of N à N constructions involving bare nouns.

These will be treated here as compounds, despite the fact that this is not a widely held position.

The choice of N à N compounds over other N Prep N constructions is mainly due to two facts.

First, they have received a great deal of attention over the years and their treatment as

compounds is perhaps less controversial than it is for other N Prep N constructions (see, among

others, Anscombre 1990, 1999, Bosredon and Tamba 1991, Borillo 1996). Second, the

preposition à is widely considered more semantically restrictive than de, yet less so than other

prepositions. The analysis of N à N constructions will thus allow for a broader investigation of

semantic transparency while still retaining a narrow focus, especially as it relates to factors such

as headedness and semantic relations. Whether they are considered compounds may not in fact

have much of an impact on the observations and conclusions I make, as semantic transparency is

understood to apply to both syntactic and morphological objects.

Page 81: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

67

The main criterion used in the selection of the compounds at the heart of this study is therefore

structural: pairs of appositional nouns with or without the preposition à between them.

Additional restrictions are introduced in the following sections, but these are primarily used to

ensure that the items under investigation are in fact composed of nouns. The data I will use to

support the hypotheses and theories explored in the following chapters come from Wiktionary

(this will be discussed in greater detail in Section 3.2.1). By relying on a single lexicographic

source, along with a purely structural criterion, I hope to avoid introducing unwanted bias into

the data. In other words, no personal judgments are made with regards to the types of

constructions examined, the result of which should be a data set containing compounds that span

the entire spectrum of semantic transparency.

3.2 Data: French Nominal Compounds

Online dictionaries are plentiful, but not all are readily accessible for research purposes. The

English language famously has WordNet (Fellbaum 1998), a lexical database that many would

consider the gold standard for electronic lexicographic research. WordNet has served as the

basis for many projects, primarily because of its rich ontology organized around groupings

called synsets. It has been used in works on word sense disambiguation (Li et al. 1995, Banerjee

and Pedersen 2002, Canas et al. 2003), data mining and text extraction (Tan et al. 2000,

Andreevskaia and Bergler 2006), as well as in research more closely related to the present work,

such as in the semantic analysis and automatic processing of compounds (Kim and Baldwin

2005, Costello et al. 2006). One of the key factors in the WordNet project’s appeal in secondary

research is its Application Programming Interface (API), a software layer that allows for anyone

to retrieve information from WordNet’s database from one’s own system. This API has, over the

years, been ported into a variety of programming languages (e.g. Java, PHP, Perl, etc.) and it is

precisely this degree of openness that has played a pivotal role in the development of a number

of third party tools25, making WordNet the de facto resource for lexicographic work in English.

French lexicography, despite its rich history (see Bavoux 2008 for a broad overview), has not

produced as successful a tool as WordNet for the French language. While there are many French

25

See related projects at <http://wordnet.princeton.edu/wordnet/related-projects>.

Page 82: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

68

language dictionaries with a strong online presence, namely le Trésor de la langue française

informatisé (TLFi) and Le Petit Robert (Rey-Debove and Rey 2010), few offer a degree of

access similar to that of WordNet. This trend could be changing now that many French language

dictionaries are being developed as Web-only resources, such as Usito, first developed at the

University of Sherbrooke under the name Franqus (Cajolet-Laganière et al. 2010) and the Dire

Autrement project at the University of Ottawa (Hamel 2010). What most, if not all of these

online reference tools still lack, however, is free and open access to their lexical databases. This

effectively makes projects that target a specific subset of words or expressions, such as

compounds, difficult. In some cases, specialised online dictionaries have been developed to

address this need, such as with Mathieu-Colas’s (1995) collection of over 12,000 French

compounds. Access to the database, however, is limited to queries via the dictionary’s Web

interface. Even more problematic, the pre-defined manner of consultation of the data is crippled:

one can only view a maximum of 10 records per search and only a small amount of information

is provided for each entry (i.e. mainly plural and gender for a given compound). More recently,

the MorboComp group at the University of Bologna is working on a more expansive repository

of compounds via their CompoNet project, a multi-lingual database of compounds from over 20

different languages (Guevera et al. 2006). This resource is said to contain a great deal of

information on a variety of compounds, including labels for headedness, classification type, and

lexical categories, but at the time of writing, the project remains closed and its database

inaccessible to the general public.

There exists, however, an alternative online resource appropriate for lexicographic based

research. Wiktionary is one of the WikiMedia foundation’s projects, a sister site to the well-

known Wikipedia project. It functions in the same way as Wikipedia: it is an information

storehouse managed by the online community. Anyone can add a word to Wiktionary’s

database, modify an entry, improve a definition, reclassify a word, or even provide an

etymology. Of course, just like Wikipedia, Wiktionary is subject to some degree of vandalism

and inaccurate information, but this hasn’t stopped investigators from using the dictionary in a

variety of research projects (Zesch et al. 2008, Müller and Gurevych 2009, Navarro et al. 2009).

Wiktionary’s appeal is largely due to the impressive number of languages represented: the site

Page 83: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

69

contains entries for 158 different languages, 22 of which contain over 100,000 articles26.

Perhaps more importantly, however, Wiktionary’s text is available under a Creative Commons

Attribution/Share-Alike Licence, which, among other things, means that the information it

contains can be used freely as long as attribution is stated. This openness is compounded by the

fact that all Wikimedia projects offer free access to an API with which to connect to its sites, as

well as frequent dumps of its databases so as to conduct local research using copies of the

information available online.

3.2.1 The Wiktionary Database

I downloaded an XML dump of the entire French language version of Wiktionary [version

20110204, Feb. 2nd 2011], which included nearly 2 million lexicographic entries. This XML file

weighed in at approximately 1.5 GB of data and needed to be converted into a much more

manageable format in order to be parsed. The file was therefore converted into a MySQL

database using the MWDumper Java application provided for exactly these purposes by the

WikiMedia foundation. Wiktionary’s UTF8 encoding was preserved so as to ensure that all

French accents and special characters would not be lost during the conversion.

Unfortunately, compounds are not identified as such in Wiktionary. The closest category to

compounds is “Locutions nominales en français,” but this class of items contains a wide range

of constructions from acronyms such as ADN and MMORPG to very long fixed expressions

such as la goutte d’eau qui fait déborder le vase and temps que les moins de 20 ans ne peuvent

pas connaître. The only real criterion for inclusion of a multiword lexeme seems to be its status

as a noun or that it be headed by one. There is also some degree of inconsistency with the use of

this particular label such that clé à molette is listed as a “Locutions nominales en français” and

clé à chaîne as “Nom commun en français.” Nevertheless, this category proved to be a first step

in identifying constructions that might be included in a database of French compounds.

Additional information needed to be added to the data dump in order to extract the words

labeled as “Locutions nominales en français,” mainly an SQL dump of the Categorylinks table.

This table allowed me to cross-reference individual pages with the Locutions category, which

26

See <http://en.Wiktionary.org/wiki/Wiktionary:Statistics> for additional statistics.

Page 84: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

70

then allowed for a cross-referencing of the extracted words with the page revisions and any text

(definition, examples, etymology, etc.) associated with a particular word (see Appendix C for a

truncated version of Wiktionary’s database schema).

All words labeled as “Locutions nominales en français” were thus extracted from the

Wiktionary database, which produced 10,269 entries. This first group, however, only accounts

for part of the data. As I mentioned earlier, the dictionary presents a number of inconsistencies

(i.e. clé à chaîne is listed as a common noun, the same status assigned to simplex lexemes). A

quick search through the extracted locutions revealed that many common compounds were

missing from the data (e.g. moulin à vent, grand-père). Is there, then, an accurate way to

identify compounds in the Wiktionary database? Unfortunately, the answer is no. There are

however a few workarounds that allowed for the extraction of additional compounds: knowing

that Wiktionary encodes spaces in words as underscores and that hyphens are retained, I was

able to cross-reference these entries with the “Noms communs” category. This netted me an

additional 1,120 compounds separated by at least one space and 6,836 compounds containing a

hyphen. Unfortunately, there is no effective way to identify fused compounds (e.g. monsieur,

malpropre). This isn’t necessarily a problem as many of these fused compounds are sufficiently

lexicalized so as to not be decomposed by speakers. Of course, this is not true of all such

compounds, but given that Wiktionary’s typological schema does not allow for fused

compounds to be automatically identified, the decision was made to forego incorporating this

type into the present study. Regardless, the initial dataset thus contained 18,224 potential

compounds, a number that needed to be greatly reduced if any headway was to be made with the

corpus.

It is important to note that, despite its advantages, the decision to use Wiktionary to assemble

the compounds for this work still presents several issues that will no doubt raise concerns

regarding the validity of the data examined. As was mentioned in the previous section,

Wiktionary is a lexicographic resource driven by the public, which means that the information

included in its database may not always be based on the same methodological rigour adopted by

traditional dictionaries. The result is not only noise in the source data (e.g. mislabeled items,

incorrect information, etc.), but also the possibility of questionable entries making their way into

the final dataset. Such entries may take the form of novel words that have yet to gain widespread

usage or constructions that belong to only a small subset of French language speakers. While on

Page 85: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

71

its face, such criticism is entirely warranted, it must be stressed that the purpose of the data is to

lend support to a study of semantic transparency that focuses on the interpretation of complex

units. In this regard, the data extracted from Wiktionary must simply meet two particular

criteria: one, that they may be said to belong to the French lexicon (i.e. consist of French words)

and two, that they possess meaning. Even in instances where an entry is deemed marginal or too

recent to be included in traditional lexicographic works, its semantic transparency may

nevertheless be assessed if it is said to possess a sense, regardless of its status within a broader

linguistic context.

Moreover, the French data extracted from Wiktionary cannot reliably be sorted according to

language variety, register, or style. In other words, there is no way to distinguish, for instance,

between compounds used in varieties spoken in Europe and those in Canada. While this

distinction is not taken into account in this work, it is understood, however, that the processes

behind compounding may differ between varieties of French, where one particular group may

make use of simple appositional nouns (e.g. fr. hex. arrêt-maladie), while another may prefer

variants involving a preposition (e.g. fr. can. congé de maladie). Once again, the most important

factor in the retention of entries is that the compound’s constituents (and not necessarily the

compounds themselves) be members of a standard French lexicon (i.e. be included in a standard

French lexicographical work). Consequently, the work presented in this thesis sets aside the

effects, if any, that language varieties may have on semantic transparency. Again, given that the

focus here is on the synchronic interpretation of compounds from a speaker independent

perspective, these differences are unlikely to have a considerable effect on the findings

presented in this work.

3.3 Selecting Compounds and Cleaning up the Data

In order to reduce the scope of this project and to ensure some measure of feasibility, not all of

the 18224 entries extracted from Wiktionary could be retained. The next step therefore consisted

of removing any undesirable candidates from this first list. This was done using Google

Page 86: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

72

Refine27, version 2.5, a spreadsheet-like application that allows for quick and easy data

manipulation.

3.3.1 Which Compounds to Include?

Some decisions had to be made regarding the types of compounds that were to be included in

the final database. This project was never intended to be an exhaustive repository of compounds,

but rather a consistent compilation of a subset of multi-word lexemes to be used for the purposes

of research on the semantics of compounds. The data to be retained are therefore selected

according to the following criteria.

First, because this thesis’s object of study is the nominal compound, the principal criterion for

inclusion is a compound’s status as a noun. Second, only those compounds that are in fact

binary constructions, that is to say, compounds constructed with no more than two semantically

full constituents, are to be included in the database. This particular criterion is meant to allow

for compounds with constituents joined by a preposition (e.g. moulin à vent, haut de forme).

There are, however, a number of constructions in the initial dataset that follow a more syntactic

structure, namely as N PPs where a determiner is present (e.g. tout à la rue, base de l’économie,

voix dans le champ, etc.). Although these types are not the focus of my study of semantic

transparency, I will presume nothing regarding their status alongside compounds containing

bare nouns, that is to say, they will be treated as compounds for the purpose of populating the

database. The primary focus of this work, however, will be on compounds where the determiner

is in fact absent.

Furthermore, I chose to set aside compounds containing onomatopoeias (e.g. pan-pan),

acronyms (e.g. langage XML) or loan words (e.g. curriculum vitae). The latter type was

discarded because its constituents are not, in most cases, lexemes in French and are therefore

seldom meaning units for French speakers. The former cases will be ignored because, on the one

hand, onomatopoeias do not bear any meaning and, on the other, acronyms are in fact multi-

word lexemes themselves. The data also contained compounds constructed on single letters such

27

See: <http://code.google.com/p/google-refine/>. The project has since been renamed to Open Refine (<http://openrefine.org/>).

Page 87: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

73

as h muet, v lingual and hauteur d’x. These entries were also discarded as, once again, it is

unclear that these constituents carry meaning in the same way that simple lexemes do.

Finally, compounds containing proper nouns were also rejected. One reason for their exclusion

is that proper names, unlike common nouns, have no real meaning: they can only serve as a

reference to something in the real world and in many cases rely heavily on extralinguistic

information. Most compounds containing proper nouns refer to a specific place or person (e.g.

cheval de Troie, acide de Bronsted). Not only do these compounds require that the speaker have

specific knowledge of these individuals and locations, but many are also used as labels for

particular entities. In other words, a compound such as Océan Atlantique is used specifically to

refer to an entity and cannot therefore be said to have meaning. This is in stark contrast to

compounds that contain only common nouns (e.g. beau-frère, mot de passe), which are used as

generic labels that may have both a denotation and a reference.

In summary, the set of compounds retained for the database portion of this project corresponds

roughly to classes VII through XV28 of Mathieu-Colas’s typology (1996: 72), as shown in the

following table:

Table 3.2. Major classes from Mathieu-Colas’s typology retained for the present study.

Class Construction Examples

VII Composés sur VERBES tire-bouchon, couche-tard, porte-à-faux

VIII Composés sur ADJECTIFS clair-obscur, haut de forme, franc-parler

IX Composés ADJECTIF + NOM beau-frère, mauvais perdant, haut-parleur

X Composés NOM + ADJECTIF trou noir, pigeon voyageur, cerf-volant

XI Composés NOM + NOM appareil-photo, maître cuisinier, sourd-muet

XII Composés NOM + de + X prise de sang, rond de cuir, dessous-de-plat

XIII Composés NOM + à + X brosse à dent, chair à canon, pomp à eau

XIV Composés NOM + en + X retour en arrière, arc-en-ciel, mise en garde

XV Composés NOM + AUTRES PRÉP + X preuve par neuf, vol sans escale, hockey sur glace

28

This distribution accounts for the majority of preserved compounds. The database, however, does contain a number of compounds that fall within the scope of other classes, such as those with numbers (e.g. un deux-roues and les dix commandements).

Page 88: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

74

3.3.2 Reducing the initial dataset

The first step in reducing the number of entries was to remove the most easily identified

undesirable candidates. Because some of these entries contain similar constituents, there is some

degree of overlap. In total, 4,146 entries were discarded according to the following criteria:

i. 199 entries were not in fact compounds, but instead single lexeme that had been

mislabeled (e.g. jalon, durabilité). Some of these entries were simply single string

acronyms (e.g. BD, ARN), while a few were fused variants of hyphenated compounds

labeled as “Locutions nominales en français” in Wiktionary (e.g. basselisse,

bonnevoglie).

ii. 352 entries contained at least one acronym (e.g. ADN chimère, système DORIS) or a

single letter abbreviation (e.g. bombe H, J-pop). These entries were identified using

regular expressions29: any sequence of more than two uppercase letters, any isolated

single uppercase letter, or any entry containing a period was flagged. Unfortunately, it is

difficult, if not impossible to identify any lowercase acronyms. These would be removed

manually later.

iii. 82 entries contained at least one numeric character (e.g. 100 mètres, Web 2.0).

iv. 32 entries contained non-Latin characters (i.e. Arabic loanwords, Greek letters, etc.)

v. 3,481 entries containing proper nouns were removed. Though it can be argued that some

of these entries should be retained because they contained capitalized words without

being actual proper names (e.g. Casque bleu, homme d’État), many of these entries were

duplicated elsewhere in lowercase and were thus retained in their common noun variants

(i.e. casque bleu, homme d’état).

Once the entries above were discarded, the next step consisted of removing any compound

comprised of more than two lexical words (following the criteria outlined in section 3.3.1). Any

29

Regular expressions are sequences of symbols that allow for the selection of character strings based on specific patterns.

Page 89: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

75

entry containing 5 words or more was immediately discarded as these compounds necessarily

consist of more than two semantically full words (326 entries). I then removed any 4-word entry

that followed a similar pattern (e.g. chien-guide d’aveugle, laine lavée à chaud), but kept those

entries in which two words were linked by a preposition followed by a determiner (e.g. histoire

de l’entreprise, vente à l’événement), as already discussed in section 3.3.1 (513 entries).

All 3-word entries that did not satisfy the previously mentioned criteria were also discarded.

Thus, words such as huile à broche and agence d’architecture were retained, while entries like

intervalle éclair-son and parabole semi-cubique were discarded. Distinct prefixes, that is to say

those that have not been fused (e.g. parabole semi-cubique), were treated as lexemes at this

stage and were therefore discarded if they resulted in 3 word constructions (490 entries)30.

When there was any doubt as to whether a lexeme was in fact a prefix, the Petit Robert 2010

was consulted.

Following these exclusions, the candidate list was thus reduced to 12,914 entries. There

remained compounds, however, that needed to be discarded because they either consisted of

foreign words or contained prepositions in positions that cast doubt on their classification as

nominal compounds. There were also at least 194 duplicate entries because of variants in

spelling (e.g. belle-mère ~ belle mère), but these redundancies were merged into a single entry

according to the form that Wiktionary treated as the base (i.e. whichever entry the others pointed

to).

The simplest way to identify the remaining undesirable candidates was to label each of the

remaining compounds’ constituents with the appropriate lexical categories, a process that was

largely automated, but that nevertheless required a considerable amount of manual input.

3.4 Labeling the Entries

In order to further expand on the information associated with the entries, as well as to remove

any additional undesirable candidates, each individual lexeme needed to have its lexical

30

Many prefixes are in fact fused to other lexemes, making it impossible to identify them automatically. Most of these were discarded later on a case by case basis.

Page 90: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

76

category identified. This time consuming task was facilitated with the help of a computer

assisted labeling system developed expressly for this project.

3.4.1 Automatically Assigning Lexical Categories

Labeling the constituents of the nearly 13,000 entries remaining in my data would prove

difficult and very time-consuming. Fortunately, Wiktionary’s API allows for the direct

extraction of information associated with each one of its dictionary entries. Unfortunately, this

API does not allow for the targeted extraction of lexical information, such as lexical category,

gender, definition, etc. I therefore wrote a series of APIs able to request and extract specific

information from within the text of a given dictionary entry. A simplified version of this parser

is available to the public on my personal website31. It supports both English and French

Wiktionary databases natively, can easily be adapted for other languages, and has been designed

to work with either the Web version of Wiktionary or a local copy of its database.

Wiktionary’s lexical categories for French lexemes are labeled within pairs of curly brackets. If

a word has multiple entries of the same lexical category, then “num” is used to distinguish

between acceptations of a word. When an entry is in fact an inflected form, “flex” is used within

the tag. Notice that the language is also identified:

(22) a. {{-nom-|fr}} = French noun

b. {{-verb-|fr}} = French verb

c. {{-adj-|fr}} = French adjective

d. {{-adv-|num=1|fr}} = First acceptation for a French adverb

e. {{-flex-adj-|fr}} = Inflected French adjective

Because Wiktionary has adopted a very “loose” standard for tagging words (presumably, to

facilitate contributions by laypersons), there is often a considerable amount of variation in the

format of these tags. This variation required that the parser be flexible in its identification of

category tags. For example, the following pairs of labels are functionally equivalent, despite

their superficial differences:

31

<http://www.igrec.ca/projects/wikparser>

Page 91: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

77

(23) a. {{-nom-flex-|fr}} = {{-flex-nom-|fr}}

b. {{-adj-|fr|num=1}} = {{-adj-|num=1|fr}}

The first API function is a simple aggregation of lexical categories for a given lexeme. In other

words, a lexeme is fed to the API, which then scrapes the Wiktionary entry for that lexeme and

returns every possible lexical category associated with that particular word. Because I was not

interested in the number of acceptations for a given word, the parser fuses any redundant labels

(i.e. {{-nom-|num=1|fr}}{{-nom-|num=2|fr}} returns {{-nom-|fr}}). This function was used to

tag the first word of a compound. Because the first word was treated in a context independent

fashion, very little could be done to improve the accuracy of the automatic labeling without

incorporating a probabilistic model (i.e. the function returns multiple lexical categories for many

words). This first pass therefore required that the results be refined manually.

The second lexeme in a two-word compound, however, does benefit from some context:

because we are strictly interested in nominal compounds, only a subset of lexical categories are

likely as the second word (W2) for a given first word’s (W1) category. For instance, if W1 is an

adjective, W2 will most likely be a noun.32 Because Mathieu-Colas’s (1996) typology is far too

granular for the development of an automatic labeling script (nearly 700 distinct categories that

allow for practically every possible combination of lexical categories), I instead relied on the

much smaller compound typology proposed in Fradin (2009). This classification is reproduced

in Table 3.3 on the following page. According to Fradin’s nomenclature, PSTPT stands for past

participle, PRSTPT for present participle, and PTCP for simply participle. The bolded letters

indicate the category of the resulting compound; the grey boxes are used to highlight the

combinations that generate nominal compounds33.

32 W2 could also be either a verb (e.g. un bas-voler) or an adverb (e.g. un haute-contre), but these combinations are far less frequent than A+N nominal compounds. The function, as described, is not meant to be 100% accurate, but rather to reduce the amount of time required to manually label a consituent’s lexical category. 33

The examples in the table are those supplied by Fradin (2009). Despite the fact that his only examples of Adv N compounds are fused lexemes, these combinations remain valid for my data as I have a number of similar entries separated by a hyphen (ex. arrière-plan, haut-parleur).

Page 92: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

78

Table 3.3. Fradin’s (2009: 420) categorial distribution for French compounds.

N A V ADV

N

N prêtre-ouvrier poisson-chat jupe-culotte

N coffre-fort guerre-froide ------------------- N < PSTPT chassé-croisé roulé-boulé

V maintenir saupoudrer

*

A

N basse-cour, beaux-arts ------------------- N < Vinf franc-parler faux marcher

A aigre-doux ------------------- A < PTCP nouveau-né faux-fuyant

* *

V N brise-glace tire-bouchon

N gagne-petit pète-sec

V saisir-arrêter

N couche-tard passe-partout

ADV

N malchance malheur

A malpropre bienheureux ------------------- A < PRSTPT moins disant malvoyant

V maltraiter bienvenir

_

Regarding compounds joined by a linking unit (i.e. boule de neige, moulin à vent), Fradin

(2009) does not include them in his typology because “they are instantiation [sic] of the

syntactic structure [N PP]NP, a noun phrase consisting of a head followed by a PP complement”

(419). This analysis is, of course, also true of other compounds listed in the table (i.e. AN, NA,

AA < PTCP, V ADV), but he admits that setting these aside in his typology would entail a

revision of the notion of compounding, an undertaking he does not wish to tackle in his article.

As I stated earlier in this chapter, all of these constructions (including N Prep N) are treated as

compounds in the present study, following the works of, among others, Bauer (1978), Gross

(1988) and Mathieu-Colas (1996).

Taking these characteristics into account, the API function written to identify the lexical

categories of W2 in a binary compound is based on the following rules, where the label on the

left indicates the grammatical category of W1 and the list on the right the possible categories for

W2:

Page 93: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

79

(24) a. N => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-flex-verb-|fr}}

b. A => {{-(flex)-nom-|fr}}, {{-verb-|fr}}

c. V => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-adv-|fr}}

d. Adv => {{-(flex)-nom-|fr}}

In the data extracted from Wiktionary, there are a number of compounds where W1 is in fact a

preposition (e.g. après-midi, hors-sujet). The bulk of these compounds seem to be Prep-N

combinations, but because Mathieu-Colas (1996) lists other possible categories for W2, the

following rule was added to the lexical categories function:

(25) Prép => {{-(flex)-nom-|fr}}, {{-(flex)-adj-|fr}}, {{-(flex)-verb-|fr}}

The API function simply compares a given word’s categories as listed in Wiktionary with the

possible set stated by the rule and returns the results of the intersection. For instance, the lexeme

CHIEN lists both noun and adjective as attested categories, but if W1’s category is A[djective],

the function returns only {{-nom-|fr}} as the category for CHIEN. A similar function with rules

specific to those compounds linked together by prepositions was also written and used to

automatically label compounds such as rouge à lèvres and table des matières. The process is

largely inspired by a simple probabilistic bi-gram model (Manning and Schütze 1999), although

no probabilities are in fact used. Despite the limited predictive functionality of these rules, this

approach was nevertheless able to improve considerably the accuracy of the automatic labeling,

which meant that cleaning up the results took far less time than it did for those of W1. Some

work still needed to be done to ensure that words hadn’t been mislabeled, however, as the

intersected values often produced two or more possibilities, but on average, this approach

resulted in a third of the lexical category groups for W2 (for NN) and W3 (for N à N) than for

W1.

3.4.2 Which Lexical Category?

One of the major methodological quandaries associated with the identification of the lexical

categories for a given compound’s constituents has to do with the independent status of some

lexemes. For instance, while it can be said that the French word haut is primarily an adjective, it

can also be used as a noun (e.g. “Perché sur le haut d’un arbre”). These two particular

acceptations can also be observed within compounds:

Page 94: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

80

(26) a. haut de forme → A de N (= ‘un chapeau qui est haut de forme’)

b. haut de chausses → N de N (= ‘le haut de la chaussette’)

While the examples above may seem relatively uncontroversial to some, there are other cases

where the assignment of a lexical category is not nearly as simple. Mathieu-Colas (1996)

attempts to reduce the potential for confusion, and perhaps even disagreement, by labeling the

compound’s constituents according to the independent lexical categories of the individual

lexemes and not according to the roles they play within the compound, but states that the latter

will nevertheless be retained in some form or another (121). For instance, haut de chausses in

(26b) is labeled A=n de N according to his methodology: the category on the left of the equal

sign indicates the lexeme’s contextually independent category, while the label on the right

specifies the lexeme’s category within the compound. In the case of adjectives, only those that

cannot easily be nominalized are labeled as such without any secondary category (ex. clair-

obscur → AA; Mathieu-Colas 1996: 121); otherwise, they are labeled according to the process

described above. This approach, however, presumes that a lemma has a primary lexical category

and that the process of identifying this category is a straightforward one. It can be argued that

verifying the etymology of a compound’s constituents in order to confirm which lexical

category should be treated as primary is a difficult and lengthy task, one that can sometimes lead

to the assignment of labels according to the personal biases of the researcher.

Other cases, however, prove even more troublesome, such as in the case of inflected verbs that

behave as either nouns or adjectives within a given compound. In Table 3.3 presented earlier,

for instance, Fradin (2009) identifies such cases similarly to Mathieu-Colas (1996), that is to say

by labeling the lexeme with both its lexical category within the compound, as well as its

independent category:

(27) a. un nouveau né → Fradin (2009): A A < PTCP

→ Mathieu-Colas (1996): A (ou A=adv) / Pp

b. un faux-fuyant → Fradin (2009): A-A < PTCP

→ Mathieu-Colas (1996): A (ou A=adv) / Pprés

Setting aside Mathieu-Colas’s indecisiveness with regards to the lexical category of the first

constituents, the compounds in (27) clearly pose a certain number of methodological problems.

Page 95: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

81

For one, the constituents’ grammatical status within each compound is not entirely clear. In

other words, are né in (27a) and fuyant in (27b) adjectives by way of participles (as they are for

Fradin), or are they simply participles (as they are for Mathieu-Colas)? If we turn to another

similar compound, premier venu, the situation becomes even thornier when we look up venu in

the digital version of Le Petit Robert 2010 (henceforth LPR2010). The dictionary assigns both

adjective and noun as possible lexical categories for venu, citing compounds such as nouveau

venu and premier venu as justification for these labels. Similarly, LPR2010 labels né as an

adjective, again citing compounds as evidence of this category. It is not at all clear, however, if

there is independent evidence for these categories outside of the compound (i.e. ?un venu, ?un

animal né est...). The fact that other dictionaries provide different lexical categories for the same

lexemes only exacerbates the situation (Antidote RX v.5, for instance, lists venu and né as past

participles only). In order to ensure that the identification of a lexeme’s lexical category is both

consistent and accurate, the methodological approach adopted must be based on a relatively

small set of rules that is easily replicated.

The method for labeling a compound’s constituents that has therefore been adopted for this

work is similar to some degree to that of both Mathieu-Colas and Fradin, but with a somewhat

more rigid set of principles. First, a compound’s individual constituents will be labeled

according to their status within the compound if such a lexical category is attested elsewhere for

that particular lexeme. By attested, I mean identified as a possible category in LPR2010. If a

particular lexical category is not possible for a lexeme, then it is labeled according to its

independent category. The compounds haut de forme and haut de chausse are therefore labeled

in my data as they are in (26).

There is, however an exception to this practice, related in some fashion to the examples given in

(27). These are mostly cases where participles could be labeled as either a noun or an adjective

(by definition, a participle is in fact an inflected verb that functions as either one of those

categories). The reason for this exception stems from the manner in which LPR2010 identifies

the lexical categories of lexemes. In short, past participle and present participle are not listed

categories in LPR2010 (but they are in the TLFi and Antidote RX, for instance). I will therefore

label them Pprés or Ppass according to the previous guidelines, that is to say unless their role

within the compound is clearly attested elsewhere. In the case of premier venu, for instance,

although LPR2010 lists N as a possible category, it does so by virtue of its presence in

Page 96: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

82

compounds such as premier/dernier venu. For this reason, these compounds will be labeled as N

Ppass constructions because N is not truly an independent category for the lexeme venu. This

approach does not seem all that controversial given that compounds such as chasse-ennui and

garde-frontière are traditionally identified as V-N compounds, even though these particular

word forms are not listed as such in LPR2010. If we return to the examples in (27), reproduced

in (28) below, the compounds are then labeled as follows:

(28) a. un nouveau né → A Ppass

b. un faux-fuyant → A Pprés

This principle also allows for Pprés or Ppass to be used in cases where the first constituent is a

noun and the second is most likely a participial adjective, such as in the following constructions:

(29) a. menu déroulant, poisson volant

b. yack grognant, navire quittant

In the case of the compounds in (29a), the right-most constituents are listed in LPR2010 as

adjectives, which results in a label of N A for those compounds. This method is also supported

by the fact that there exist similar constructions in which the constituents agree in gender (e.g.

barre déroulante, soucoupe volante), which would indicate that they are simply adjectives and

should be labeled as such. In the case of the compounds in (29b), however, the second

constituents are not listed as lexemes in LPR2010. This would therefore allow for them to be

labeled as Pprés, but this may potentially result in some inconsistencies across the data. For

instance, while LPR2010 does not list grognant as a lexeme, it seems to be a perfectly

acceptable adjective because we can easily find cases where it agrees in gender within NPs (i.e.

a Google search for “la * grognante” generates over 16,000 results, such as in la chèvre

grognante, l’horloge grognante). In order to avoid generating irregularities in the data, these

lexemes will therefore be labeled as adjectives if they are attested elsewhere, such as on Google.

In the case of navire quittant, also in (29b), a Google search reveals that quittante is nearly non-

existent (i.e. the search string “la * quittante” returns 9 results). This compound will thus be

labeled N Pprés.

Although the above guidelines are a good starting point, a few more principles need to be added

to ensure that the labeling of lexical categories remains coherent across the data. There are a

Page 97: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

83

number of noun initial two-word compounds where the second constituent could be labeled as

either a noun or an adjective (e.g. raton laveur, ver rongeur). Again, all decisions are based on

LPR2010. If a lexeme is identified as a noun in its heading, but not as an adjective, it is labeled

as a noun; if, however, LPR2010 lists both adjective and noun as possible lexical categories for

a lexeme, the adjective label is retained if it is plausible that its function within the compound is

adjectival in nature:

(30) a. raton laveur → LPR2010: laveur, euse � n. → N N

b. ver rongeur → LPR2010: rongeur, euse � adj. et n. → N A

c. centre automobile → LPR2010: automobile � adj. et n.f → N A

There are, however, a number of borderline cases, lexemes that are identified solely as nouns in

LPR2010, but that have adjective identified as a lexical category somewhere within the entry.

Many of these cases are said to be adjectives based solely on their usage within a compound:

(31) mouche piqueur LPR2010: piqueur, euse � n.

While such a case could be labeled according to its secondary lexical category, there are a few

reasons why I chose to label them according to my original principle, that is to say according to

the attested lexical categories identified via the entry’s heading (the compound in 31 being

treated as an instance of NN). First, as is the case in (31), there is no gender agreement between

the two lexemes (?mouche piqueuse), a morphosyntactic operation that would normally apply if

the second lexeme were in fact an adjective. This is not, however, an infallible indicator, as we

see in the case of baleine tueuse: although there is gender agreement, LPR2010 does not list

tueur as an adjective, neither in its heading, nor within its entry. In this case, tueuse is therefore

labeled as a noun, albeit as an inflected form; accordingly, the compound in (31) is also treated

as an instance of NN.

A second factor at play stems from the number of inconsistencies across entries in LPR2010,

incongruities that cast doubt on the status of the lexical categories only listed within the

dictionary article (and not in the heading). For one, there are a number of similar constructions

Page 98: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

84

that LPR2010 labels simply as nouns in apposition and not fringe cases of NA compounds (as

opposed to the example in 31 for instance):

(32) a. éclair � n.m.

b. détaillant, ante � n

LPR2010 is also occasionally hesitant to assign a second lexical category within the entry when

there is room for debate, often opting for more than one possibility:

(33) a. pêcheur, euse � n.

b. surprise � n.f.

What the above examples show is that similar nouns are sometimes labeled differently within

LPR2010: in some constructions, lexemes are said to be adjectives even if they are not labeled

as such in the article heading (as in 31), while in others, they are said to be nouns apposed to one

another (32a-b). In other cases still, nouns are said to be either adjectives or apposed nouns

(33a-b). This disparity from one lexeme to another leads me to believe that it is justified to

identify a lexeme’s category based solely on those listed in the lexeme’s heading and not based

on those found within its entry.

Page 99: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

85

To conclude this section, let us summarize the set of criteria used to label each compound’s

individual constituents:

(34) Criteria used to identify the lexical categories of a compound’s constituents

For a given compound AB:

i. A and B are assigned the lexical categories that best correspond to their roles within

the compound.

ii. Said lexical categories must be independently attested for both A and B outside of

the compound.

iii. If the lexical categories that best correspond to A and B’s role within the compound

is unattested outside of the compound, then they are labeled according to the most

prominent (i.e. the first) of their independently attested lexical categories as listed in

LPR2010.

iv. Attested lexical categories are only those listed in a lemma’s heading in LPR2010.

v. If A or B is a participle, it is labeled as such only under the following circumstances:

a. It is unclear whether the lexeme’s function within the compound is nominal

or adjectival or it is clearly neither; or

b. The lexical category listed in LPR2010 for the lexeme is motivated solely by

its presence within similar compounds; or

c. The lexeme is adjectival in nature and is neither listed as a lexeme in

LPR2010, nor attested elsewhere (i.e. cannot be found inflected on Google).

To further illustrate the method described above, the reader is asked to refer to the following

examples. Although this list is not meant to be exhaustive, it nevertheless gives a good idea of

the results obtained from the method described above.

(35) a. lave-vaisselle, garde robe, porte-clé → V-N

b. clair-obscur, douce-amère, grand largue → A-A

c. franc-parler, faux-ami, beaux-arts → A-N

d. menu déroulant, fait accompli, point cardinal → N A

e. raton laveur, avantage choc, carte mère → N N

h. bon d’achat, condamné à mort, barbe à papa → N Prép N

Page 100: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

86

f. galant de nuit, fort en thème, haut en couleur → A Prép N

g. bon à tirer, prêt-à-porter, prêt-à-poster → A Prép V

h. navire quittant, chat huant, lanceur partant → N Pprés

i. laissé pour compte, achevé d’imprimer → Ppass Prép N

3.4.3 Cleaning Up the Remaining Data

Labeling each compound’s constituents for their lexical categories allowed for a second round

of candidate exclusion. According to the criteria outlined in Section 3.3.1, a number of

compounds were discarded for either being loanwords in their entirety (e.g. eng. fast food, viet.

banh canh, lat. ecce homo) or had at least one loanword as a constituent (e.g. mass-média,

a(c)qua-toffana). Compounds containing highly technical terms, and thus without any

corresponding entry in Wiktionary, were also removed (e.g. myxosomose des salmonides,

leucanie du roseau). Several compounds containing technical terms (e.g. acide cévadique)

remain in the data, however, because the constituents are listed items in Wiktionary and

therefore return a lexical category during the automatic labeling process. One such case are

compounds based on acide as the head noun (acide carboxylique, acide hypochlorique, acide

pneumique, etc.): of the 134 such compounds, 25 were retained. This selection was once again

done by consulting the corresponding entries in LPR2010. Any W2 adjective (e.g. carboxylique)

not listed in the dictionary was removed from the data. This was arguably an acceptable means

of reducing the number of technical compounds as it is likely that if such a lexeme is unlisted in

LPR2010, it is most likely not a widely used term and is therefore relatively unknown to all but

those who require the use of these highly technical terms. This was done only with the base

noun acide because it was such an egregious outlier in terms of patterning (the second highest

recurring noun in W1 position, coup, occurs 67 times or less than half of the occurrences

observed for acide). Other compounds still were discarded because they contained non-words,

that is to say words that can only be found within a given compound (e.g. stil de grain, tchic et

tchac, porte cochère). Finally, any remaining constructions containing single letters, in most

cases abbreviations, were also removed (e.g. p-acétylaminophénol, n-ième).

Furthermore, a number of entries listed as either nominal locutions or common nouns in

Wiktionary were not in fact nominal constructions at all. This mislabeling of expressions is

clearly evident from within Wiktionary entries themselves:

Page 101: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

87

(36) a.

b.

It is obvious by looking at the example sentences provided by users for these particular entries

that the constructions are mislabeled and that the expressions in (36) are phrases rather than

nouns. These candidates and any similar entries were therefore also discarded.

There were also nearly 400 duplicate entries. The presence of these redundant compounds can

be traced back to variations in spelling, mostly because of a hyphenated form (e.g. écart-type ~

écart type, châtaigne de mer ~ châtaigne-de-mer, etc.). These entries were verified individually

against the data in Wiktionary. Any compound listed as a variant was discarded, but their

orthographic forms were retained as alternative spellings for the main entries. In some cases,

two variants were given two unrelated definitions in Wiktionary (e.g. bonne grâce ~ bonne-

grâce, boeuf carotte ~ boeuf-carotte, etc.). Both forms were retained as separate entries.

The data also contained a number of duplicate entries based on number, that is to say, many

entries were present in both their singular and plural forms (e.g. petite annonce ~ petites

annonces). A total of 115 inflected compounds were removed because an uninflected variant

was already present. This task was done using the clustering techniques available in Google

Refine, where similar strings of text were identified and grouped together. Each case was then

examined manually and any plural form considered redundant was removed from the data.

Page 102: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

88

All in all, according to the above criteria, nearly 1,800 additional entries were set aside

following the labeling of lexical categories, which resulted in a total dataset of 10,410

compounds, all of which are stored in the database available online. There remained, however,

one final set of tags to be added to the data: gender and number. This was once again done using

software I wrote to extract the information from Wiktionary, but with an additional set of

functions capable of accessing Mathieu-Colas’s (1996) own database of compounds.

3.4.4 Gender and Number

In order to complete the basic set of information associated with each entry in the database (i.e.

all non-semantic related information), I needed to label each compound for both gender and

number. This information is included in Mathieu-Colas’s online database of compounds34.

Although the database of nominal compounds developed by Mathieu-Colas offers no way of

accessing its contents in bulk (only 10 entries at a time are available; see Figure 3.1 on the next

page for an example of the search output), the platform was built using a the very common

programming language PHP. This means that posting variables to its internal script is in fact

possible, which thus allows for users to submit values to its search engine outside of the

system’s Web enabled interface. I therefore wrote a small script that takes each entry in my

dataset and feeds it to the MC search engine, which returns an html page. The script then parses

the data and compares the information associated with the entry and determines whether the

given compound is masculine or feminine, as well as its number, which may have one of three

values: sing, pl, or invar.

34

<http://www-ldi.univ-paris13.fr/ODNC/moc.php>

Page 103: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

89

Figure 3.1. The results screen for the compound café-filtre in Mathieu-Colas’s database.

The most interesting result to come out of this process, however, has to do with the very limited

overlap of entries between my dataset and that of Mathieu-Colas. Of the approximately 10,400

compounds remaining in my database, only 2,450 are also present in Mathieu-Colas’s data,

meaning the latter’s set could only be used to label a small fraction of the compounds contained

in my own (approximately 22%). This discrepancy is primarily due to the fact that Mathieu-

Colas’s database only contains hyphenated compounds. If we take this into account, the

resulting overlap is slightly improved: there are approximately 3,500 hyphenated compounds in

the data collected from Wiktionary, which results in a 70% correspondence between data sets.

Yet, the results also show that there are at least 9,000 compounds in Mathieu-Colas’s database

that are not present in my own. It is unfortunately difficult to determine what the cause of this

discrepancy is without gaining direct access to Mathieu-Cola’s original data.

Because only a mere 22% of my total compounds were also found in Mathieu-Colas’s database,

I had to modify my parser to extract the gender and number for each compound from

Wiktionary. Of the remaining 9,000 compounds, a little over 4,000 contained information on the

compound’s number in Wiktionary and approximately 7,000 contained information on its

gender. This meant that quite a few of the remaining entries needed to be labeled manually for

both number and gender. In the case of number, the remaining entries were labeled using pattern

matching for regularly inflected plural (i.e. -s and -x suffixes). As for the gender of the

Page 104: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

90

unlabeled entries, knowledge of French compounds’ headedness helped tremendously. For

instance, N A and A N compounds can almost all be labeled for gender according to the nominal

constituent’s gender. As for N Prep N compounds, which accounted for more than a third of the

remaining entries, they are mostly left-headed (Bauer 1978), which meant that they could be

labeled automatically by extracting the gender of the first constituent from Wiktionary. The

remaining cases were all tagged manually.

3.5 Summary

After determining which types of compounds would best lend themselves to a study of semantic

transparency, I extracted more than 18,000 multi-word entries from Wiktionary’s database. The

number of entries was reduced according to several criteria. The result of this work is an online

database containing more than 10,000 nominal constructions composed of at least two lexemes

from several different lexical categories. Each entry contains the following information: lexical

category of its constituents, the gender and number of individual constituents (NN and N à N

compounds only), and the gender and number of the compound as a whole. The database is

searchable according to these features, but other parameters were added later so as to reflect the

research discussed in the following chapters. The reader is invited to visit www.polylexical.com

for a full listing of all compound types extracted. The entire dataset may also be downloaded as

a CSV file for personal use.

Of the thousands of constructions retained for the database, only a small subset are in fact

pertinent to the objectives of the present study. As stated at the close of Section 3.1.1.3, only NN

and N à N compounds are under investigation here. A query of the data retained from

Wiktionary reveals a total of 729 and 319 such types, respectively. These individual compounds

will therefore serve as the basis for the claims and hypotheses made in the remaining chapters of

this thesis, ultimately functioning as the foundation for the typology of semantic transparency

proposed at the close of this work.

Page 105: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

91

Chapter 4

Compound Meaning: Features and Factors

Earlier in Chapter 2, I focused on how semantic transparency in compounding has traditionally

been understood and suggested that previous approaches could be improved upon. At its most

basic, semantic transparency is usually said to be a matter of compositionality: a compound is

semantically transparent if its meaning is the product of the meaning of its components. While

this position is not unsound, it fails to take into account a number of other characteristics that

have also been used elsewhere to formalize the semantics of compounding. Aware that other

factors are indeed at play, Libben (1998) expands upon traditional models by incorporating

headedness into the basic A+B = C view of compound semantics. My position on the matter is

that, although work like Libben’s represents a crucial and necessary step in the further

development of the concept, these approaches still fall short in their account of semantic

transparency. This chapter therefore seeks to complement previous models by introducing a

number of other factors that may prove useful in establishing the degree of semantic

transparency for a given compound (i.e. its degree of semantic interpretability).

This chapter is organised in the following manner. First, headedness (or centricity) is discussed

in Section 4.1 with an emphasis on how the head contributes meaning to the whole. Following

this is a brief discussion in Section 4.2 of compositionality and how the term is used in this

work. Finally, Section 4.3 explores semantic homogeneity, focusing on analogy and templates

as a possible means to further distinguish between individual compounds.

4.1 Centricity

Compounds, like other morphological and syntactic units, are typically headed constructions,

and identifying the head element is arguably the most crucial step in establishing meaning for a

given compound. According to Baroni et al.’s (2007) framework, head identification is in fact

the first step of this process:

Page 106: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

92

“[I]n its simplest instantiation, interpreting a novel compound requires two interlocked processing steps:

i. identification of the head of the compound,

ii. interpretation of the contextually appropriate processing link between the head and its modifier, be it an argument relation, a property transfer or a conceptual hybridization.” (Baroni et al.: 265)

If this strategy is correct—and there is little reason to believe that it isn’t—then it suggests that

the head element is also an integral factor in a compound’s semantic transparency. This

approach to compound processing is largely backed up by a number of studies that show that

speakers are keenly aware of how a compound is organized internally and that the head element

is a dominant semantic marker for both novel and existing combinations. For instance, while

studies by Gleitman and Gleitman (1971), Ryder (1994), and Štekauer (2005) show that, on

occasion, a speaker will select the wrong head constituent when asked to produce paraphrases

for a novel compound (e.g. “clothes that are worn in the water” for clothes-water ; Ryder 1994:

189), the vast majority of participants correctly identify the language appropriate head element

when providing definitions. This fact is also supported by time-sensitive tasks: Libben et al.

(2003), who measured response times for lexical decision tasks involving compounds like

bedroom and cardshark, found that the latter (which they claim are opaque) produced longer

delays in response times than the former (which they claim are transparent). This difference may

also involve the headedness of the compound, as those producing longer response times had

heads that were unrelated to the meaning of the whole (i.e. a cardshark is not a literal shark).

Although headedness may be, in most cases, a relatively uncontroversial feature of compounds,

it is not without its quirks. The following sections will focus on how compounds may differ

from one another based on their centricity, which is to be understood as the property of a

compound to either possess a head or not.35 Following this discussion, the manner in which

centricity will factor into a typology of semantic transparency will then be made explicit.

35

As we will see in the following sections, this statement might lead one to believe that the issue is easily circumscribed. The notion of head, however, is not without its problems. This will be addressed throughout the remainder of Section 2.

Page 107: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

93

4.1.1 Endocentric Compounds

The traditional view of compound centricity is that compounds either possess a head or they do

not. Following Bloomfield (1933), the former are typically called endocentric, while the latter

are called exocentric (or bahuvrihi in Sanskrit, Burrow 1955)36. I will use the term centricity to

refer to a compound’s status as either endocentric or exocentric.

In semantic terms, the head of a compound is the constituent that defines the conceptual class of

the whole compound. In other words, the head is the hypernym of the thing denoted by the

whole (Bauer 1978). Endocentric compounds have therefore been formalized by Allen (1978)

using an IS-A rule and may be tested in the following manner37:

(37) a. a table saw is a (*table / saw)

b. a credit card is a (*credit / card)

c. a truck driver is a (*truck / driver)

When applied to French compounds, this test reveals that, unlike English, the head in French

compounds is typically the left-most constituent (this will be discussed in greater detail in

Section 4.1.2):

(38) a. NN: un oiseau-mouche est (un oiseau / *une mouche)

b. N à N: un moulin à vent est (un moulin / *un vent)

c. N de N: un ver de terre est (un ver / *une terre)

In terms of formal features, the head element is also typically the constituent from which lexical

category is inherited. In the examples in (37) and (38) above, this is a trivial matter because both

constituents are nouns, but as the following compounds show, when the two constituents are of

different categories, the head does in fact determine the lexical category of the compound:

36

Although exocentricity is typically understood as “possessing a head external to the compound,” this position must be weakened slightly in order to account for cases involving sense extension. This will be explored in greater detail in Section 2.4. 37

In the literature, endocentric compounds are often described in far less rigid terms: AB is a kind of B. According to Arnaud (2008), this is a hyponymic test that sometimes produces less than desirable results (?a police car is a kind/type of car). That said, when the head prototypically denotes a highly general object, a hyponymic test can serve to better establish endocentricity (compare, for instance, lipstick is a stick ~ lipstick is a kind of stick).

Page 108: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

94

(39) a. eng. [[black]A [board]N]N fr. [[coffre]N-[fort]A]N

b. eng. [[sea]N [green]A]A fr. [[vert]A [sapin]N]A

In French, as well as in many other romance languages, gender also typically percolates from

the head to the compound, but as we will see in Section 4.1.2, this behaviour may be altered

under certain conditions. For the moment, we may state that a compound is endocentric if one of

its constituents determines its conceptual class.

It is also worth noting that speakers may occasionally identify a constituent as the compound’s

head, thus interpreting it as endocentric, even though this may not in fact be the case. As Arnaud

(2008) points out, there may be disagreement between a speaker’s extra-linguistic knowledge

and scientific nomenclature. He uses watermelon as an example, explaining that it is not actually

a melon, but rather a melon-like fruit. This is similar to how speakers might interpret peanut,

which is not in fact a nut, but a legume. While Arnaud claims his informants hesitate when

asked if a watermelon is a melon or a melon-like fruit, suggesting that linguistic intuition may

influence head identification, it remains debatable whether this matters for most speakers

unfamiliar with the scientific taxonomy behind the labels. One can compare this with the lack of

consensus among laymen regarding the classification of tomatoes as either fruit or vegetables.

While analyzing the NN compounds in my data, I came across a number of cases, usually

involving plants, that show similar issues to those identified by Arnaud regarding centricity:

(40) houx-frelon, menthe-coq, laurier-tin

In all cases, the plant was given a name based on its appearance, but which was later revealed to

be of a different genus altogether. The following description from the fifth volume of Cours

complet d’agriculture shows how houx-frelon, for instance, received its erroneous label:

“Ce n’est point un houx ; la couleur et les épines dont les feuilles sont armées, lui ont mal à propos fait donner cette dénomination. Tournefort le place dans la seconde section de la première classe, qui comprend les herbes à fleur en grelot, dont le pistil devient un fruit mou ; et il l’appelle Ruscus, myrti-folius, aculeatus.” (Rozier et al. 1787, 531)

It seems reasonable to assume, however, that these types of compounds should be treated as

endocentric, as most native speakers unfamiliar with them would no doubt understand them as

such. In other words, to their knowledge, they would have interpreted the compound correctly,

Page 109: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

95

thus understanding them as hyponyms of the constituent they assumed was the head (the

statement “a watermelon is a melon” would produce a “true” reading). In these particular

instances, it is far more advantageous to acknowledge that such distinctions are simply due to

differences in lexicon (i.e. technical vs. general) and that to the average speaker, compounds

such as peanut or watermelon are in fact endocentric and not exocentric.

4.1.2 Head position

In binary constructions such as those under investigation here, one element serves as the head,

while the other typically plays the role of either modifier or argument (Scalise and Bisetto

2009). Although Williams (1981) originally argued that the morphological head was the

rightmost element of a complex word (the so-called Right Hand Head Rule), it is now largely

accepted that the head is language dependent. The situation is no different for compounds.

Compounds in languages such as English, Dutch, and Chinese, for instance, are mostly right-

headed (Scalise and Guevera 2006, also see Lieber and Štekauer 2009 for a typology of

compounds in various languages), while those in most romance languages are left-headed

(Baroni et al. 2007 for Italian, Fradin 2009 for French, Rainer and Varela 1992 for Spanish)38.

In my data, French compounds clearly pattern with other Romance languages as 493 of the 564

(~ 87%) NN compounds with a clearly defined head39 are left-headed.

Although French compounds are typically left-headed, right-headed compounds are in fact

possible (see previous references for similar observations for other Romance languages). These

types are, however, exceptional and seem to be entirely consigned to the class of NN

constructions40:

38

According to a survey conducted by Bauer (2001a) for 36 different languages, there seems to be a slight overall preference for right-headed compounds. Likewise, Scalise and Fábregas’s (2010) investigation of compounds from 22 different languages shows a strong preference for right-headedness. 39

These numbers do not include compounds that incorporate a coordination of their elements. See Section 4.1.3 for a discussion of these particular types. 40

In the French compound data collected from Wiktionary, there are no cases of N à N right-headed compounds. This fact is discussed further in Chapter 6.

Page 110: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

96

(41) a. une auto-école est (*une auto / une école)

b. un taupe-grillon est (*une taupe / un grillon)

c. un radio-taxi est (*une radio / un taxi)

Although it is not always clear why some compounds in French are right-headed, we may state

that in most cases, atypical centricity can be traced to one of the following sources:

(42) a. Calque from English quartier-maître (eng. quartermaster)

b. N1 with affixal functionality ciné-parc

c. N1 with adjectival functionality chef-lieu

The fact that French allows for a compound’s head to be either the left or right-most constituent

may produce atypical behaviour, mostly in how gender is assigned. Typically, an endocentric

compound acquires gender from its head constituent. Although most right-headed compounds

behave as they should, which is to say that gender is determined by the right-most constituent,

some show a difference in feature percolation that suggests that gender may, in some cases,

originate from the non-head constituent41. Compare, for instance, the right-headed compounds

in (43a-b) with those in (43c-d):

(43) a. [[taupe]F-[grillon]M]M lit. ‘mole-cricket’

b. [[vélo]M-[école]F]F lit. ‘bicycle-school’

c. [[bracelet]M-[montre]F]M lit. ‘strap-watch’ (wristwatch)

d. [[bateau]M-[école]F]M lit. ‘boat-school’

The compounds in (43a-b) behave as expected: gender percolates from the head, despite its

atypical position. This is in fact how most right-headed compounds in the data behave. In (43c-

d), however, gender appears to percolate from the left-most constituent, even though the head is

on the right. For (43c), it is entirely possible that this compound is in fact left-headed, which

corresponds to Arnaud’s (2003) treatment of it, but the centricity test described earlier makes

41

See Lieber (1980) for a formal description of feature percolation.

Page 111: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

97

this interpretation untenable (i.e. *un bracelet montre est un bracelet)42. This fact is also

supported by the definitions found in most lexicographic works43. It is also worth noting that

plural is marked on both constituents (i.e. bracelets-montres), suggesting that this compound

might in fact be a case of coordination, but this analysis also seems incorrect given the failed IS-

A test described above. It therefore appears to be a case where some features percolate from an

element other than the head. The same may be said of bateau-école in (43d), although this

compound stands in stark contrast with the one in (43b), which, for all intents and purposes, it

should pattern with, yet doesn’t. The disparity between the two analogous compounds might be

explained in a number of ways. First, masculine is often considered the unmarked gender, as

evidenced by exocentric compounds that otherwise contain feminine nouns (e.g. rouge-gorge,

en-tête). Second, it is entirely possible that bateau-école (‘boat school’) is subject to influence

from the homonymous left-headed bateau-école (‘school boat’) because of its non-standard

head position. This seems all the more plausible given that there are a number of results on

Google for the string “une bateau-école,” which suggests that some speakers, aware that the

head is the right-most constituent, assign gender accordingly. That said, the difference in

occurrences is significant enough that it seems unlikely that gender should be said in this case to

percolate from the head constituent44.

Given that French favours left-headed compounds while also allowing rightward heads, but that

right-headed compounds occasionally behave in non-intuitive ways, it seems reasonable to hold

that head position is a factor in compound transparency. Following Scalise and Fábregas (2010),

who argue that head position should be understood as a tendency, and not an absolute, I propose

that endocentric compounds be further characterized according to a language’s canonical and

non-canonical head position45. It is important to add that this parameter is not necessarily fixed

42

It is worth noting that bracelet in this compound is in fact redundant as a watch is usually understood to possess a strap (compare with pocket watch). 43

“Montre montée sur un bracelet de cuir, de métal ou de matière plastique” (LPR2010). 44

Google search for “un bateau-école” = 423,000 results; Google search for “une bateau-école” = 977 results. 45

This statement does not preclude the fact that a language may favour neither position, but to my knowledge, no one has reported on such a fact. If such a language did exist, then head-position might be considered a neutral factor in semantic transparency.

Page 112: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

98

for a given language, as specific compound types may favour a particular head position over

another. While left is the canonical head position for NN compounds in French, it is right for A-

N nominal compounds (e.g. beau-père, chauve-souris, longue-vue, etc.)46. We must therefore

also evaluate canonical head based on the dominant position for a given compound type.

The question, of course, is whether speakers are actually aware of this distinction and whether

transparency is at all affected by it? Though not very numerous, there are some studies that lend

support to head position as a factor in compound processing. Most notably, Jarema et al. (1999)

tested priming effects for a variety of compounds and found that reaction times for right-headed

French compounds differed from left-headed ones based on which constituent was primed.

Crucially, the priming effect observed for all compound types was greatest for the initial

constituent, regardless of its transparency, except for right-headed compounds, where priming

the final constituent resulted in a greater priming effect (see Section 2.2.1 for more details on

Jarema et al.’s experiment). Their results suggest that speakers are in fact sensitive to head

position during lexical decision tasks. These results, however, are mitigated by the fact that

Jarema et al.’s stimuli were composed of N-A and A-N compounds, which differ in terms of

canonical head position. A study in which the stimuli consist of constructions that typically

favour the same position for the head would perhaps produce different results, shedding

additional light on how speakers process compounds that follow atypical patterns.

Intuitively, it may be argued that compounds with non-canonical heads are nevertheless

unexpected, which may affect how speakers interpret them. In fact, a novel non-canonically

headed compound will most likely first be understood based on the element in canonical head

position and may only be reanalysed if the first interpretation is deemed impossible or unlikely.

I therefore posit that compounds with canonical heads are more transparent than those with non-

canonical heads, with the understanding that this stipulation remains hypothetical until tested.

Nevertheless, the data suggest that such a distinction should be considered as a possible factor in

a compound’s degree of transparency given that heads are subject to relatively strict positional

restrictions.

46

There are also left-headed A-N compounds, mostly involving colours (e.g. vert sapin, rouge sang). It is unclear if an exception regarding canonical head position should be made for these cases. This is certainly something worth investigating further.

Page 113: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

99

4.1.3 Coordinated Compounds

Further complicating matters regarding centricity are compounds involving the coordination of

their elements. Traditionally called dvandva, these types of compounds are composed of two

lexemes that denote, to varying degrees, two things, aspects, or features of equal status, which,

under certain circumstances, might allow for both constituents to function as heads. They are

typically understood as a conjunction of elements (i.e. and), but may also involve a disjunction

(i.e. or). These particular types of compounds have received a number of different labels over

the years, most of which overlap in non-trivial ways: appositional (Jespersen 1956, Bauer

1978), coordinate or coordinative (Bisetto and Scalise 2005, Arcodia et al. 2010), co-

compounds (Wälchli 2005), copulative (Olsen 2001), additive (Marchand 1969). I will opt for

the term coordinated compound as it is the most neutral and is easily opposed to compounds that

involve an asymmetrical relationship (cf. Bisetto and Scalise 2005).

Of interest here is the fact that, although many compounds involving the coordination of their

elements are exocentric and thus possess no head (Bauer 2008a), a few might be said to possess

two heads. Take for instance, the following examples from French:

(44) auteur-compositeur, café-bar, chargeuse-pelleteuse

The compounds in (44) are all no doubt endocentric, but the anomaly here is that both

constituents seem to produce acceptable results for the IS-A test (as in 45a), which is typically

also true of their English analogues (45b):

(45) a. un auteur-compositeur est un (auteur/compositeur)

b. a singer-songwriter is a (singer/songwriter)

Arnaud (2008) calls these cases bi-centric, an apt label that reflects the opinion of some that

these types do in fact possess two heads (Bisetto and Scalise 2005, Scalise and Guevera 2006).

Bauer (2008a), in his typology of dvandva compounds, argues that these are in fact appositional

compounds (and not true dvandvas) by the very fact that they are headed. They are thus opposed

to other cases where a coordination of some sort is involved, but which are exocentric. This

distinction, however, is not always made, the result of which is a class of compounds

characterized by a great deal of variation (Olsen 2001, Bisetto and Scalise 2005).

Page 114: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

100

Other than the results of the centricity test in (45) above, there is another indication that lends

further support to a bi-centric approach to coordinated compounds, namely that, in Romance

languages, both constituents are typically pluralized (46a-c). This evidence is somewhat

weakened, however, given the fact that their Germanic counterparts, among others, are typically

only inflected on the right-most constituent (46d-f):

(46) a. fr. auteur-compositeur, auteurs-compositeurs Zwanenburg (1992)

b. sp. poeta-pintor, poetas-pintores Rainer and Varela (1992)

c. it. actor-encenador, actores-encenadores Scalise (1992)

d. en. writer-director, writer-directors Olsen (2001)

e. nl. leerling-verpleegster, leerling-verpleegsters Booij (1992)

f. de. Linguiste-Psychologe, Linguist-Psychologen Olsen (2001)

All of the compounds in (46) are similar semantically, yet inflectional marking differs between

groups. This is in part due to the fact that Germanic languages don’t typically allow for

inflection within compounds, even when the non-head is plural in isolation (i.e. scissors,

scissor-sharpener). It would seem strange, then, to claim that one set is bi-centric, while the

other is not based solely on this particular difference. Perhaps an even stronger arguement

against inflection as a property of bi-centricity is that many French NN compounds are dually

marked for plural despite the lack of coordination between their elements (i.e. chou-fleur,

choux-fleurs).

There are indications, however, that while these types of compounds are no doubt semantically

headed, they may not in fact possess two heads. Most notably, in languages with nominal

gender, coordinated compounds nearly always inherit the gender of the constituent in canonical

head position in cases where gender differs between constituents. This behaviour, illustrated in

(47) below, suggests that they might actually be regular endocentric constructions:

(47) a. [[bain]M-[douche]F]M

b. [[grave]F-[ciment]M]F

Few coordinated compounds are actually mismatched for gender: 78 of the 105 endocentric NN

compounds in my data that involve some form of coordination contain elements of the same

gender. Most of the remaining compounds behave according to typical percolation conventions.

Page 115: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

101

The few instances where coordinated compounds show a disparity in how gender percolates can

all be explained via other means. Consider the following examples:

(48) a. [[huppe]F-[col]M]M ‘un oiseau avec une huppe et un col’

b. [[radio]F-[réveil/gramophone/phonographe]M]M

The compound in (48a) is in fact exocentric and is therefore most likely acquiring gender from

the external head (i.e. oiseau, cf. rouge-gorge). In (48b), radio, because of its widespread usage

as a prefix, may not be available for feature percolation, although this is purely speculative.

Regardless of these few anomalies, gender percolation, as I showed in Section 4.1.2, is not

always an accurate indicator of head as some compounds seem to inherit the gender of their

non-head constituent when the head is in a non-canonical position. One might argue, however,

that some of these compounds are perhaps not truly coordinated and that the left-most

constituent is of far greater morphological—and perhaps semantic—importance than the other,

thus rendering them single-headed. This is precisely what Scalise and Fábregas (2010: 121)

argue for in the case of it. prete-operaio ‘priest-worker’, which, despite the apparent

coordination between its elements, is left-headed because prete has more “semantic” weight

than operaio. We may make similar claims regarding the compounds in (49) below, which bear

some resemblance to those in (47), but which show a greater degree of semantic asymmetry47:

(49) a. [[cotton]M-[poudre]F]M ‘cotton-powder’ = ‘cotton that serves as gunpowder’

b. [[mémoire]F-[tampon]M]F ‘memory-buffer’ = ‘memory that serves as a buffer’

One distinguishing characteristic between these sets of compounds, according to Arnaud (2008),

is that true bi-centric compounds typically involve co-hyponyms, which is why those in (49) are

likely to possess a single head (cf. child-soldier, girlfriend, etc.). Yet it is entirely possible for a

coordinated compound to involve co-hyponyms of different genders (such as those in 47 above,

as well as baladeur-radio, bistro-brasserie, location-financement) and which would therefore

have an effect on how the feature percolates. We may assume, however, that this is the result of

a very basic constraint imposed by a language such as French: a noun may only have one gender

47

These types of compounds will be taken up in Chapter 5 during the discussion of compound relations.

Page 116: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

102

and it is most natural for a compound to inherit the gender of whichever constituent is in the

canonical head position whenever this feature differs between elements. Feature percolation is

therefore blocked as per Lieber’s (1980) Feature Percolation Conventions because nodes

acquire their features from the head stem in sister configurations.

Despite some of these inconsistencies, either constituent of endocentric coordinated compounds

may meet the criteria of semantic head, which, a priori, sets them apart from other compound

types (cf. Bisetto and Scalise’s 2005 typology of compounds). If identifying the head element is

a crucial step in compound interpretation, what can we say about compounds that might

technically have two heads? Unfortunately, to my knowledge, there are no studies on how

speakers interpret coordinated compounds48. It is undeniable that compounds like those in (46)

above are endocentric in nature, but it is unclear if the coordination of their elements and the

fact that their constituents are of equal status has an effect on transparency. As we will see in

Chapter 5, however, these types of compounds are in fact numerous and account for a non-

negligible part of French NN compounds (cf. Arnaud 2003).

I will posit that such compounds are largely no different than single-headed compounds in terms

of semantic transparency and that any additional processing they require involves relational

information, which is to say one based on coordination. This position does not, however,

prevent additional characteristics such as gender percolation from being incorporated into the

definition of headedness, thereby allowing for the model to account for instances where features

do not behave as expected. Moreover, coordinated compounds may also set themselves apart

according to the fact that both constituents contribute to the meaning of the whole—it is

impossible for a coordinated compound to contain a semantically unrelated element49. This

aspect is pertinent to the discussion of compositionality in Section 4.2. As for the particular

relational properties exhibited by these compounds (i.e. coordination), this will be a key

48

See Wisniewski (1996) for studies where participants made use of hybridization (i.e. the amalgamation of the two constituents) to interpret compounds with conceptually similar constituents. One of the two constituents, however, was typically identified as the head. 49

In Wälchli’s (2005) semantic classification of co-compound, the Ornamental class is said to involve compounds in which a constituent offers no semantic contribution to the whole (e.g. in Erza Mordvin, ve’e-sado = ‘village-hundred’ = ‘village’). I agree with Bauer (2008a), however, that such compounds are most likely best treated as non-coordinated.

Page 117: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

103

component of next chapter’s discussion of the semantic tissue that ties together the elements of

compounds. In terms of headedness, such compounds will simply be treated as endocentric and

may therefore be opposed to similar cases that are clearly exocentric (cf. Bisetto and Scalise

2005, Scalise and Bisetto 2009).

4.1.4 Exocentric Compounds

As stated at the outset, not all compounds are endocentric. In the traditional view, a compound

is exocentric when it fails the IS-A test for either constituent, as in the examples below:

(50) a. eng. a redcoat is a (*red / *coat) fr. un rouge-gorge est un(e) (*rouge / *gorge)

b. eng. a birdbrain is a (*bird / *brain) fr. un cheval-vapeur est un(e) (*cheval / *vapeur)

The test reveals, quite convincingly, that not all compounds possess a head in the semantic

sense. Not only do these particular compounds fail the endocentricity test described in Section

4.1.1, they also fail the less stringent hyponymic test mentioned earlier. The following French

NN and N à N compounds show that even a weakened centricity test cannot save them from

their exocentric status:

(51) a. le ballon-panier est [une sorte de] (*ballon / *panier) = sport

b. un jambon-beurre est [une sorte de] (*jambon / *beurre) = sandwich

c. une barbe à papa est [une sorte de] (*barbe / * papa) = candy

d. un moulin à parole est [une sorte de] (*moulin / * parole) = person

One will notice, however, that these so-called headless compounds still seem to rely more

heavily on one constituent over the other. This is especially evident for N à N compounds,

where the leftmost element, despite its limited semantic contribution, seems to nevertheless

govern the whole construction. For one, there is the matter of lexical category, which, for some

exocentrics, may be determined by the lexeme in canonical head position, regardless of its

semantic contribution. For instance, a redcoat functions as a noun and not as an adjective,

despite it not being a type of coat. In the case of true bahavruhi compounds, this fact might be

explained using the unexpressed external head as the source of lexical category (i.e. a redcoat

refers to a person, which is why it is a noun). The problem, of course, is that it is difficult to

formalize feature percolation if said features come from a constituent that isn’t part of the word-

Page 118: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

104

form. Furthermore, the constituent that would otherwise be in head position, is usually the locus

of morphosyntactic marking, such as inflection50 (e.g. airheads and not *airshead). Similarly, a

compound’s gender may be determined by the element in head position regardless of that of its

semantic class (e.g. [[barbe]F à papa]M]F est une [confiserie]M). Based largely on these

observations, Scalise and Guevara (2006) argue that compounds have, on the one hand, a

semantic head, which determines a compound’s semantic features (e.g. [± animate]) and on the

other, a formal head, which determines features such as lexical category. Accordingly, they

assess centricity along the following parameters:

“An endocentric compound has at least one formal head and at least one semantic head. If a compound has only one formal head and only one semantic head, then the two must coincide. If a compound realises any of the remaining possibilities, it will be considered to be exocentric.” (Scalise and Guevara 2006: 192)

Although distinguishing between a semantic and a formal head may be justified given that many

exocentric compounds retain the formal features of one of its constituents, it’s not entirely clear

if the line that Scalise and Guevara draw between endocentric and exocentric compounds is in

fact correct. The main problem is that according to their approach, an exocentric compound may

have a semantic head as long as it differs from its formal one, but this seems extremely unlikely

as it would render the test for a semantic head (i.e. IS-A) impossible. In other words, testing for

a semantic head requires that the hypernym be of the same lexical category as the compound,

otherwise the test produces infelicitous results (i.e. ?a [XAYA]N is a YA). Yet, Scalise and

Guevara’s stipulation produces sixteen51 possible configurations, four of which are exocentric

with a semantic head. While they admit that further research is required to determine which of

these permutations are possible, they do not include information regarding formal and semantic

heads in their list of exocentrics from Dutch, Chinese, and Italian, making it impossible to

ascertain if any of their data support some of the more unconventional configurations generated

by their proposal. I suspect that such cases do not in fact exist, but this remains, for the moment,

50

It should be noted that inflection is far more variable for French compounds (as well as other romance languages) than it is for English, but the fact remains that it is typically the constituent in head position that is marked for number and gender. 51

The high number of possible combinations is due to the fact that they allow for a compound to have more than one formal or semantic head.

Page 119: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

105

speculation. Complicating matters further, Scalise and Fábregas (2010) add to Scalise and

Guevara’s original proposal by arguing in favour of a third type of head, which they call

morphological and which is responsible for the percolation of features such as gender. Although

they do not say so, the decision to include a third type of head seems related to Scalise and

Guevara’s (2006) definition of endocentric and exocentric compounds and is most likely

influenced by the fact that some compounds appear to inherit gender from the non-head

constituent. This was discussed in Section 4.1.2; the examples are repeated here for the

convenience of the reader:

(52) a. fr. [[bateau]M-[école]F]M

b. fr. [[bracelet]M-[montre]F]M

If gender were treated as a feature of the formal head52, then the compounds in (52) would

possess distinct formal and semantic heads, which would mean that they are exocentric in

nature, but this position is difficult to maintain. By arguing that a feature such as gender

percolates from a different type of head, Scalise and Fábregas are able to preserve the centricity

position held in Scalise and Guevara (2006). While I agree with most of their arguments

regarding formal and semantic features in compounds, I would argue that their overall

stipulation is much too strong, the result of which is the postulation of a number of

fundamentally incompatible configurations.

Because the aim here is to establish parameters with which to determine how easily a compound

will be understood, I am choosing to distinguish between endocentric and exocentric

compounds based solely on the presence or absence of a semantic head. Formal features, such as

lexical category (as well as gender), while no doubt pertinent, will not factor into a compound’s

centricity53, which I will formally define as follows:

52

The literature is in fact divided on whether gender percolates in the same fashion as lexical category (see Lieber 1989). 53

Nothing precludes future integration of additional features, such as formal head and morphological head, into a typology of semantic transparency. For instance, one could imagine that a compound whose formal, semantic, and morphological features coincide would be easier to understand than one whose features are distributed among its constituents. At present, I have chosen to set aside such factors in order to concentrate on a select number of features, which I hold to be of greater significance.

Page 120: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

106

(53) A compound is endocentric if it possesses a semantic head, which is to be understood as

the constituent that determines the conceptual class of the compound. All other

compounds are exocentric.

The stipulation in (53) is not without its problems, however. Chief among them is that the

identification of the semantic head is not always an unexceptional process as factors such as

semantic drift or tropes may obscure meaning, which may in turn render centricity tests

inconclusive. Despite this drawback, however, my position regarding exocentricity remains in

line with most approaches and reflects the typology proposed by Bauer (2010)54, as shown in

the table below (English examples are from Bauer, while their French analogues are from my

data):

Table 4.1. Types of exocentric compounds in Bauer (2010)

Type of Exocentric English French

Bahuvrihi red-eye rouge-gorge

Synthetic pickpocket ---

Transpositional --- clair-obscur

Exocentric co-compounds blue-green bleu-vert

Metaphorical dust-bowl radio trottoir

The first type, for which Bauer retains the sanskrit term, are also known as possessive

compounds (bahuvrihi meaning ‘having much rice’, Burrow 1955) as they typically involve a

property possessed by the designatum. While Bauer’s example (i.e. red-eye) may not be the

most prototypical example of the possessive exocentric compound, it nevertheless emphasizes

that this particular type denotes a feature of some external, unexpressed head (i.e. a flight that

causes red eyes). The French example, rouge-gorge, is perhaps a better example as it refers to a

bird with a red throat (cf. redcoat, greybeard, etc.). This class also includes NN compounds in

which the attributive property of the association may involve other information, but which still

exemplifies an external head in relation to the compound (e.g. a hammerhead is a shark with a

54

Marchand (1969) offered a similar typology of exocentrics that also includes five types, none of which possess semantic heads.

Page 121: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

107

head like a hammer). It is worth noting that these compounds are often viewed as instances

involving metonymy as they can evoke a part-whole type relationship (e.g. an airhead, where

head is taken to mean the person, cf. Bauer 2008b).

The second type is a highly frequent and widely studied compound with numerous endocentric

analogues (Roeper and Siegel 1978, Lieber 1980, Botha 1984) and involves constituents in a

head-argument relationship. These typically manifest themselves as VN compounds in French55.

As for Transpositional exocentrics, these are compounds whose lexical category differs from

those of its constituents. Bauer offers an example from Khmer, khɔh trǝw, which combines two

adjectives (‘wrong’ and ‘right’ respectively) to form a noun (‘morality’). French has a few such

cases, usually also involving conversions from adjectives into nouns (cf. clair-obscur, cinq à

sept, douce-amère). Exocentric co-compounds are coordinated compounds (Bauer 2008) for

which the designatum is understood as a combination of its constituents. Thus, blue-green is

neither blue, nor green, but is instead a colour with the properties of both. Bauer’s final

exocentric class involves compounds whose constituents, while not strictly compositional, are

nevertheless motivated on metaphorical grounds. The French compound radio-trottoir, a term

used according to Martin and Copeland (2003) in French speaking Africa to mean word-of-

mouth communication networks, is exocentric, yet the head (radio) can be understood

metaphorically. This last type, as well as traditional bahuvrihi compounds, are worth discussing

further, as they have, in recent years received considerable attention (Goossens 1995, Geeraerts

2002, Benczes 2005, 2006, Arnaud 2008).

4.1.4.1 Exocentric by Trope

As I briefly mentioned above, traditional bahuvrihi compounds are often called possessives

because they denote a characteristic possessed by the unexpressed head of the compound (Bauer

(1978). These are usually instances involving adjectives (as in 54a-b), but may also be

appositional nouns (as in 54c):

55

It is widely accepted that French synthetic compounds consist almost entirely of VN constructions (e.g. ouvre-bouteille, lave-vaisselle, essuie-glace, etc.) and are headed by a zero affix (Lieber 1992). In this regard, they are not actually exocentric compounds.

Page 122: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

108

(54) a. greenshank = ‘a bird with a green shank’

b. rouge-gorge = ‘un oiseau avec la gorge rouge’

c. hammerhead = ‘a shark with a head like a hammer’

Some authors, like Štekauer (1998), argue that these are simply cases of ellipsis in which the

actual head element has been omitted. Others, like Bauer (2008b), believe that we may simply

be dealing with cases of metonymy, where the constituent in head position is understood as a

stand-in for the actual head56. Benczes (2005, 2006), in her extensive work on compounds

involving tropes, argues that, from a cognitive linguistics approach, these headless compounds

can be explained using conceptual metaphor and metonymy, which she says renders them far

less opaque than traditionally understood (cf. Warren 1978). This assertion is in line with Lakoff

and Johnson’s (1980) seminal work on metaphor, in which they go to great lengths to show that

tropes are a core component of language use and not simply figures of speech. Benczes

therefore proposes a typology of compounds based on the many ways that metonymy and

metaphor may interact within and without a compound. Examples of exocentric compounds due

to either metaphor57 (55a-b) or metonymy (55c-d) on the head constituent are as follows (from

Benczes 2006):

(55) a. a jailbird is a *bird → A PRISONER IS A CAGED BIRD

b. a bellybutton is a *button → THE UPPER BODY IS AN UPPER GARMENT

c. a loudmouth is a *mouth → PART FOR WHOLE

d. a gaslight is a light → PRODUCT FOR PRODUCER

The first three compounds in (55) fail the IS-A test and are therefore exocentric according to

most approaches (cf. Bauer 2010). Benczes (2006) argues, however, that in all instances, the

56

Metonymy is understood here, in cognitive linguistics terms, as CONCEPTUAL DOMAIN A FOR CONCEPTUAL DOMAIN B (Kövecses 2002). The transfer is usually based on some element of congruity between domains and often involves meronymic relations, such as WHOLE THING FOR A PART OF THE THING and PART OF A THING FOR THE WHOLE THING (see Radden and Kövecses 1999 for a detailed list of common metonymical relationships). 57

Following Lakoff and Johnson (1980), metaphor is understood as CONCEPTUAL DOMAIN A IS CONCEPTUAL DOMAIN B. Thus, the conceptual metaphor ARGUMENT IS WAR allows for the use of expressions such as “to attack someone’s point” or “to demolish his or her argument” because words that apply to one concept are transposed onto another.

Page 123: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

109

head58 can be understood if the metaphor or metonymy is sufficiently established or widespread.

Speakers familiar with the metonymy MOUTH FOR PERSON may therefore have little difficulty in

establishing meaning for a compound such as loudmouth. Moreover, if a particular metonymical

reading becomes sufficiently widespread, the centricity test will no longer produce negative

results: gaslight in (55d), for instance, which Benczes includes in her study59, involves the

PRODUCT FOR PRODUCER metonymy (i.e. LIGHT FOR LAMP), yet does not fail the IS-A test (i.e. a

gaslight is a light). In this case, the trope is in fact sufficiently conventionalized that it has

become an acceptation of that particular lemma60. Similarly, Arnaud (2008) argues that in bird

sanctuary, the head is in fact metaphorical, but that it does not prevent the compound from

being interpreted as endocentric (i.e. a bird sanctuary is a sanctuary). This, of course, isn’t

always the case, as bird in jailbird is not typically used to refer to a person, at least not in a

manner that might render its presence in the compound transparent. Thus, the head of a

compound may be susceptible to varying degrees of tropic extension, which may or may not

blur the line between instances of endocentricity and exocentricity. The distinction between the

two poles may simply be a question of metaphoric or metonymic entrenchment.

Unfortunately, evaluating degrees of entrenchment is no simple task. As Arnaud (2008)

suggests, one might choose to rely on lexicographic sources to see if a particular trope is viewed

as an acceptation of a particular lemma, but this may not always yield convincing results. In

fact, one may be forced to treat as endocentric compounds which would otherwise be treated as

exocentric. Loudmouth, for instance, where mouth is understood as denoting a person, involves

a sense extension listed in the OED, although not without an indication that its usage involves

some shift in meaning: “4. In extended use: a person who speaks.” While it would seem that the

trope is well-known, it nevertheless seems strange to label loudmouth as endocentric, at least not

without some additional designation.

58

Benczes’s work is conducted within the cognitive framework championed by Langacker (1987)—she therefore uses the term “profile determinant” for what is typically called the head in most morphological frameworks. 59

Although Benczes (2006) doesn’t explicitly state that gaslight is exocentric, the focus of her work is on compounds traditionally understood as such. According to my definition of endocentricity, gaslight is endocentric. 60

“5. A body which emits illuminating rays” (OED). This usage has also been conventionalized in French: “I.A.3 Lumière: source de lumière” (LPR2010).

Page 124: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

110

An examination of centricity across the compounds in my data reveals a number of cases where

the head involves either a metaphor (56a-b) or a metonymy (56c-d) on the head element, which

may or may not have an effect on how the compound is interpreted:

(56) a. pomme cajou lit. ‘apple cashew’ = ‘fruit of the cashew tree’

b. chou-palmiste lit. ‘cabbage-palm tree’ = ‘edible part (core) of palm tree’

c. blanc-seing lit. ‘blank-signature’ = ‘blank sheet of paper with a signature’

d. bec-figue lit. ‘beak-fig’ = ‘bird that eats figs’

Similar to the English compounds involving tropes mentioned earlier, the French compounds in

(56) all arguably fail the centricity test and are thus typically considered exocentric. In some

instances, however, the metaphor or metonymy may be sufficiently established so as to make

such judgements difficult: in (56a), for example, the metaphor expressed in pomme (‘apple’) is

well established and sufficiently prevalent to produce an endocentric reading (cf. pomme de pin

or fruit de mer). In contrast, it is unlikely that this is the case in (56b). Similarly, the metonymy

in (56c) is most likely not very familiar to most speakers, while the one in (56d) is perhaps more

so.

Although one might simply choose to maintain a hard line between endocentricity and

exocentricity using the IS-A test, it seems entirely justified to look to distinguish between

compounds in which the head retains no meaning at all (e.g. doughnut) and those that can be

motivated using operations such as metaphor and metonymy (e.g. loudmouth). I will argue,

however, that unless these tropic senses are not only widespread, but also narrow in scope, such

compounds cannot legitimately be said to be endocentric. For instance, while it may certainly be

the case that BODY PART FOR BODY is a common and widespread case of metonymy, it may not

be sufficiently narrow to allow for the accurate interpretation of the compound. Thus,

compounds like razorback or yellowtail, or greenshank, which all involve this particular

metonymic relationship, will remain, to some degree, opaque to the speaker as he or she has no

way of knowing what type of entity it denotes (a wild pig, a fish, and a bird respectively). In

cases like airhead or redhead, the trope might indeed be more circumscribed (i.e. HEAD FOR

PERSON), thus rendering them potential candidates for endocentricity via metonymy, but it is

difficult to argue that these compounds are endocentric in the same way as, say arrowhead or

shower head are.

Page 125: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

111

I therefore propose that a compound may be either strongly or weakly endocentric, as well as

exocentric, depending on the head’s requirements in terms of sense extension. Although far

from perfect, the method used to determine the centricity is as follows:

(57) A compound C is

i. strongly endocentric if it passes the IS-A test

ii. weakly endocentric if it does not pass the IS-A test, but the head constituent involves

an established sense extension (i.e. listed in a lexicographic work)

iii. exocentric in all other cases

The compound pomme cajou, listed in (56) above, is therefore weakly endocentric. It is also

important to note that sense extensions that seemingly arise naturally or logically (i.e. that can

be motivated), but which require that meaning be established using information far beyond a

lexeme’s context-free usage will retain the traditional label of exocentric. This approach

arguably reflects the fact that such compounds do not provide crucial information or a means of

establishing the exact nature of the designatum.

The choice to simplify and reduce the role of tropes in my approach is partly due to the fact that

it’s not entirely clear just how significantly they contribute to compound interpretation, and thus

compound transparency. While it is clear that both metaphor and metonymy are multi-faceted

concepts in compound meaning (cf. Benczes 2006), attempting to include the numerous degrees

of possible interactions would quickly prove unwieldy and may not ultimately provide facets of

significant relevance to the matter of semantic transparency. Nothing, however, prevents others

from expanding on this approach. One might choose, for instance, to take into account the

hierarchy of metonymic vehicle proposed by Radden and Kövecses’s (1999), or Benczes’s

(2006) extensive typology of metaphor and metonymy when determining just how weak or

strong a given compound’s head actually is. Moreover, it may be the case that metaphor poses a

greater challenge than metonymy, or vice-versa. For the present moment, however, I believe

that distinguishing between weakly and strongly endocentric compounds is a first, yet crucial

step, in establishing what factors are involved in the semantic transparency of these types of

constructions.

Page 126: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

112

4.1.5 Summary

Based on the properties of French compounds (as well as reported facts for other languages), we

may summarize issues related to headedness according to the hierarchy below. Canonical and

non-canonical labels refer to the predominant position of the head for a given compound type

(i.e. NN), while strong and weak indicate whether a compound involves some degree of sense

extension.

Figure 4.1. Distribution of compounds according to features related to the head.

Not included in the diagram are bi-centric compounds such as lecteur-graveur and artiste-

interprète. These types are necessarily canonical (as either constituent may be interpreted as the

head) and, while nothing prohibits them from being weakly endocentric, none was actually

found in the data. The fact that they are coordinated will instead be reflected using the relational

typology developed in the next chapter.

One question that the configuration above raises is whether head position or centricity strength

is the most influential aspect of compound transparency. In other words, is a weakly endocentric

canonical compound (e.g. chou-palmiste) more or less transparent than a strongly endocentric

non-canonical compound (e.g. auto-école)? This is not an easy question to answer. While

Jarema et al. (1999) showed that French speakers processed right-headed compounds differently

than they did left-headed ones, it is unclear what sort of effect the presence of tropes has on the

Page 127: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

113

interpretation process, their research having been conducted using only the parameters

Transparent/Opaque. Adding to the challenge is that all tropes may not involve the same degree

of sense extension. My stance here is that head position is in fact a highly dominant feature of

compound transparency, more so than centricity strength.

The primary reason for this particular stance is that canonical head position is a necessary factor

in both compound formation and interpretation. Because compounds are inherently ambiguous

(i.e. they lack the necessary information for complete and explicit meaning construal), speakers

must rely on fundamental—and preferably immutable—factors in order to establish meaning for

a given combination. Headedness is one such factor. It is in fact unlikely that, upon

encountering a completely novel compound, that a French speaker, for instance, will first

attempt to interpret it as a right-headed construction without some indication that it should be

understood as such (e.g. presence of a neoclassical stem, cf. Dal and Amiot 2008). Semantic

transparency, being intimately tied to this process, will arguably favour systematicity above all

else. In instances where a compound type may involve either position (i.e. A-N or N-A),

establishing head position may be impossible without additional context (e.g. vert sapin as a

noun or as an adjective), thus reducing overall semantic transparency. Canonically headed

compounds that involve established tropes, on the other hand, merely require that the speaker

understand whether a metaphoric or metonymic reading is necessary for meaning composition.

It should also be noted that, despite their terminal nature on the chart above, exocentric

compounds are not all equal in terms of transparency. The focus of the next section will be on

constituent contribution, which will allow for exocentrics to be further contrasted within their

own class.

4.2 Semantic Compositionality

In the previous section, I explored the notion of headedness as it pertains to compounds and

argued that the presence or absence of a head, as well as the degree of its semantic contribution,

are crucial factors in a compound’s semantic transparency. The head constituent, however, is

only one of two components at play in a compound’s meaning: the non-head constituent must

also be incorporated into a discussion of semantic transparency. To this end, the following

sections explore compositionality as a factor of transparency.

Page 128: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

114

4.2.1 Definition and Approach

Chapter 2 focused on previous approaches to semantic transparency and it was shown that the

concept was often—but not always—conflated with semantic compositionality. More

specifically, while some researchers view the two concepts as distinct, most believe that they are

simply two different labels for the same concept. In Section 2.4.1 of that chapter, I laid out my

arguments in favour of the former approach, which is to say that compositionality is distinct

from, yet related to transparency. Crucially, I argued that semantic compositionality should be

understood narrowly as referring to the meaning of a compound’s parts and their relationship to

the meaning of the whole (cf. Svensson 2004, Girju et al. 2005). In this regard, compositionality

is understood as a property of compounds that “feeds” into their semantic transparency. A

compositional compound is therefore a compound whose constituents contribute meaning to that

of the whole, regardless of its perceived transparency. The chief argument in favour of

distinguishing between the two concepts is that a compound may incorporate the meaning of its

constituents without truly being transparent. Conversely, it is unlikely that a semantically

opaque compound is compositional. This unidirectional implication was represented in a

diagram in Chapter 2 and is repeated below for the convenience of the reader. The dotted lines

indicate that the relationship between the two concepts is not one of entailment (i.e. a

compositional compound may be non-transparent, but it is not necessarily so).

Figure 4.2. The relationship between compositionality and transparency.

By distinguishing between compositionality and transparency, we are in fact able to

discriminate between a number of exocentric compounds, which would otherwise be grouped

Compositional Transparent

Non-transparent Non-compositional

Page 129: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

115

together as one particular subset of opaque compounds. Thus, the compounds in (58a-b) below

may be differentiated from those in (58c-d) by virtue of their different degrees of

compositionality:

(58) a. année-lumière lit. ‘year-light’ ‘distance traveled by light in a year’

b. jambon-beurre lit. ‘ham-butter’ ‘sandwich containing (only) ham and butter’

c. bourg-épine lit. ‘village-thorn’ ‘shrubby plant with thorns’

d. chat-chateau61 lit. ‘cat-castle’ ‘instrument used to penetrate castle defences’

Although the stipulation above would allow for an exocentric compound to possess a constituent

in head position that contributes non-head meaning alongside a modifier that contributes no

meaning to the whole, no such case was found in the data. Such a case, if it did exist, would

remain partially compositional, just as in true exocentrics in which the non-head contributes

meaning, but not the constituent in head position (e.g. chat-château).

If we return to the initial description of semantic compositionality, we will notice a few

problems. First, in a binary construction, semantic compositionality is a four-way configuration,

but if neither constituent is accorded more weight than the other, this output is reduced to only

three distinct levels. Compositionality is therefore a cline that may be ordered from most

compositional to least compositional, as shown in (59) below:

(59) The meaning of a compound XY may include the meaning of:

a. both X and Y

b. (X, but not Y) or (Y, but not X)

c. neither X nor Y

This is in fact Cruse’s (1986) approach, although he does allow for partial semantic contribution

thanks to his use of the term “semantic indicator” (see Chapter 2, Section 2.3.2 for an overview

of his approach). For the configurations above, I will use the terms compositional (59a),

partially compositional (59b), and non-compositional (59c) to refer to each of these possible

61

“[M]achine au moyen de laquelle des ouvriers à couvert, ébranlaient les murailles et jetaient des ponts sur les fossés ou les remparts” (De Roujoux 1839: 48).

Page 130: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

116

outputs. One problem with this somewhat simplistic approach, however, is that partial

compositionality refers to one of two possible configurations. This ambiguity may be eliminated

by factoring in centricity, as Libben (1998) does in his typology of semantic transparency.

Where Libben’s approach results in a seven-way configuration, I will argue in favour of a

reduced distribution that still allows for a clear hierarchy of compositionality based on

centricity. In the diagram below, possible configurations go from most (left) to least

compositional (right):

Figure 4.3. Possible configurations for semantic compositionality.

Non-compositional endocentric compounds are presumably impossible as their centricity

necessarily means that one constituent is a hyponym of the whole. This restriction does not

apply, however, to exocentric compounds as they may be composed of lexemes that contribute

no meaning to that of the whole (e.g. eng. rugrat; fr. compère-loriot). The following examples are

of endocentric compounds that are either compositional (60a) or partially compositional (60b) :

(60) a. stylo-bille lit. ‘pen-ball’ ‘ballpoint pen’

b. bateau-mouche lit. ‘boat-fly’ ‘boat for tourists in Paris’

Compound  

Endocentric  

Compositional   Partially  Compositional  

Non-­‐compositional  

Exocentric  

Compositional   Partially  compositional  

Non-­‐compositional  

Page 131: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

117

As it stands, compositionality and centricity may combine to produce a five-way configuration,

but once other factors are taken into account, such as tropes and sense extension, the number of

possible combinations increases greatly.

4.2.2 Metaphor and Metonymy

As was discussed in Section 2, metaphor and metonymy may sometimes have an effect on a

compound’s meaning by casting doubt on its centricity status when these tropes target the head

constituent. I argued that one way of addressing the problem was to rely on lexicographical

work to establish whether a particular lexeme’s usage was in fact conventional or if the trope

was perhaps not quite sufficiently established to produce absolute endocentrics. This led me to

argue for a weakly endocentric label for those compounds that might otherwise fail the

centricity test. Unsurprisingly, non-head constituents are also prone to sense extension within

compounds (Benczes 2005, 2006, Arnaud 2008), which may result in varying degrees of

compositionality and hence transparency. In the following examples, the compound’s modifier

in (61a) contributes meaning via metonymy, while in (61b) the same operation takes place via

metaphor:

(61) a. carte soleil lit. ‘card sun’ ‘(health) card with a picture of the sun on it’

b. voiture balai lit. ‘car broom’ ‘vehicle that “sweeps” up last place runners’

The examples above are both endocentric, yet, the presence of tropes alongside these

hypernymic heads may potentially reduce their interpretability. As Štekauer (2005) argues,

coining a complex word “using a non-established shifted (metaphorical) meaning [. . .] reduces

the meaning-predictability of a naming unit” (Štekauer 2005: xix). On the other hand, most of

these types of compounds are not opaque either, nor are they on the same level as compounds in

which the non-head contributes no meaning at all to that of the whole (e.g. bateau-mouche,

chou-croûte62, laurier-tin). The question, of course, is whether they are fully or partially

compositional. I will return to this question in a moment.

62

Diachronically, chou-croûte is not a compound, but synchronically, it has all the features of one: “Étym. Allem. Sauerkraut, de sauer, aigre, sur (voy. sur, adj.), et, Kraut, herbe, l'assimilation avec chou ayant altéré sauer” (Littré 1873, Vol. 1).

Page 132: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

118

Also a factor in a compound’s compositionality is just how metaphor and metonymy may

interact with each other to produce meaning. Goossens (1995) calls “metaphtonymy” the

interplay between metaphor and metonymy, and argues that—following a study of various

British expressions—it is either integrated, which is to say that metonymy and metaphor are

combined, or cumulative, where one trope is derived from the other. According to Geeraerts

(2002), metonymy and metaphor may occur in multiword expressions in one of three ways: i)

consecutively, ii) in parallel, and iii) interchangeably. Likewise, Benczes’s (2006) framework

allows for metaphor and metonymy to emerge in a number of different configurations, which

results in a rather detailed and complex typology. Thus, according to Benczes, accounting for a

compound such as macarena page (“a webpage capitalising on a current fad, they are usually

full of fluff and have a short life expectancy”, 2006: 167) requires that a metaphorical

relationship between constituents first be established (i.e. a page that is like the macarena), and

which is then further expanded upon using metonymy on N1 (i.e. macarena for fad). An analysis

of the French compounds collected for my work reveals a number of similar examples, as

follows:

(62) a. singe-lion lit. ‘monkey-lion’ ‘lion tamarin’

b. oiseau-lyre lit. ‘bird-lyre’ ‘lyrebird’

b. effet papillon lit. ‘effect butterfly’ ‘butterfly effect’

In (62a), the relationship between the compound’s constituents involves a sub-part of each of

the element’s designatum: a lion tamarin is a tamarin whose mane resembles the mane of a lion.

The presence of metonymy in the head differs from those discussed in Section 4.1.4.1 as it only

arises within the context of the compound (i.e. when establishing the link between constituents),

which explains why the centricity of the whole unit is not affected (i.e. un singe-lion est un

singe). Similarly, the compound in (62b) also incorporates a trope that arises only when

establishing meaning, but in this case it involves a metaphor on the non-head: the lyrebird is a

bird whose tail resembles a lyre. Thus, metonymy is invoked via the whole-part trope for the

head, which is in turn connected to the non-head via physical resemblance. Although speakers

no doubt reduce the level of complexity for compounds involving parallel tropes (i.e. a tamarin

that looks like a lion), mixed tropes are not so easily parsed (i.e. *a bird that looks like a lyre).

Both types, however, require that the speaker establish just what part of the designatum is

involved in the trope.

Page 133: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

119

Furthermore, because a compound ultimately functions as a single lexical unit, its meaning may

involve a trope at a global level. In (62c), for instance, we have a metaphor that applies to the

whole compound: a butterfly effect is an effect like the effect caused by a butterfly63. This type

of comprehensive metaphor is in fact much more difficult to assess in terms of compositionality

than a localized metaphor and may be closer to an idiom given that its meaning can only be

understood in the non-literal sense (Gibbs et al. 1989).

The number of potential combinations of tropes in a given compound makes it extremely

difficult not only to offer an exhaustive set of features that might affect semantic transparency,

but also to determine which of these combinations has the greatest impact. In its simplest

manifestation, we have nine possible configurations given the features metaphor, metonymy, or

literal64. That said, some of these complexities have already been taken care of with the

strong/weak endocentric distinction suggested earlier, but as I have shown, tropes may arise

internally without directly impacting the status of the head constituent. Moreover, as Benczes

(2006) has shown, both metaphor and metonymy may also relate to each other in additional

ways, which may then be applied to the whole compound and not just its parts. Similar to my

approach to centricity, where I distinguished between strong and weak endocentric compounds

based on a head constituent involving tropes, I will also distinguish between strong and weak

compositionality in a similar manner. This decision is unfortunately only a partial solution to the

challenges discussed above and may prove insufficient were these compounds to be evaluated

by speakers. The data, however, seems to suggest that once we’ve made a tropic distinction at

the level of centricity, the meaning of the modifier is far more “forgiving” at the level of

interpretation. In other words, where metonymy and metaphor made it difficult to test

headedness using the IS-A paraphrase, tropes on modifiers do not typically block a simple

predicative paraphrase, as the following examples show:

63

Compare with ‘an effect caused by something like a butterfly,’ which would be a metaphor solely on the non-head constituent. (“[T]he phenomenon whereby a very insignificant change in a complex system can significantly alter an anticipated course of events” OED.) 64

As follows: literal-literal, literal-metaphor, metaphor-metaphor, metaphor-literal, literal-metonymy, metonymy-metonymy, metonymy-literal, metonymy-metaphor, metaphor-metonymy. Configurational complexity would also increase if we were to account for tropes applied to the whole.

Page 134: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

120

(63) a. Metonymy on Modifier: bleu horizon/ciel, jaune paille, pierre miel, pince crocodile,

porte papillon, rouge sang

H THAT RESEMBLES M ex. bleu qui ressemble à l’horizon

b. Metaphor on Modifier: client-cible, date butoir, poursuite-bâillon, site miroir, taux

plafond

H THAT SERVES AS M ex. client qui sert de cible

c. Metonymy on Head/Metaphor on Modifier: oiseau-lyre, poisson-sabre/épée, noctuelle

gamma, couleuvre à collier, serpent à lunette

H THAT HAS M AS A PART ex. oiseau qui a une lyre en tant que partie

Such paraphrases, while not entirely accurate, are largely sufficient to understand the meaning

of the compound. This should come as no surprise, given how fundamental metaphor and

metonymy are to language and how frequently they are used—this is in fact the premise behind

Lakoff and Johnson’s (1980) work. Moreover, as Benczes (2006) argues, if metaphor and

metonymy were truly significant hurdles to compound interpretation, one would have to ask

why they are so prevalent and why speakers continue to coin new ones that often involve major

sense extensions. In all cases, then, these compounds can be viewed as compositional, albeit to a

lesser degree than more literal instances. Thus the compounds in (63) above, which I call

weakly compositional, may be opposed to the following cases, where the modifier is understood

literally65:

(64) a. H THAT RESEMBLES M ex. hyppocampe-feuille, chou-fleur

b. H THAT SERVES AS M ex. avion-cargo, mémoire tampon

c. H THAT HAS M AS A PART ex. stylo-bille, montre-bracelet

My argument here is that compounds involving tropes are in fact compositional. It may be that

further distinctions are required to develop an even finer grained typology (i.e. which tropes are

present and whether one generates the other), but I will set aside these details so that my

65

Although it was argued that the compounds in (63) permit paraphrases using basic predication, they nevertheless involve some form of sense extension (e.g. metonymy: rouge qui ressemble à la couleur du sang). They may therefore be contrasted with the compounds in (64), which do not require such shifts (e.g. chou qui ressemble à une fleur).

Page 135: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

121

typology might be as tractable as possible. The remaining step is to combine compositionality

and centricity so as to order them in terms of their effects on semantic transparency.

4.2.2.1 Combining Compositionality and Centricity

The main issue with an approach based on different features or parameters is that it is seldom

clear how each of these properties should be weighted. At the top level, it is largely

uncontroversial to distinguish between endocentric and exocentric compounds, but once we

begin to examine how the levels within endocentrics should be organised, the picture is far less

clear. In other words, does compositionality have a greater or lesser effect on transparency than

the position and the strength of the head? Unfortunately, research in this area is too limited to

offer any clear solutions. The most obvious way to address this problem, which is to compare

compounds for each of the possible configurations and to determine which ones are more

transparent than others, is unfortunately circular in its reasoning: compound A is more

transparent than compound B, which means feature A is more transparent than feature B, which

in turn shows that compound A is more transparent that compound B. Ideally, compounds

involving various combinations of features would undergo testing with native speakers, which

would then allow us to determine which features seem to have the greatest effect on

interpretation. That said, because this work is exploratory in nature—which is to say that its goal

is to propose a typology of features that might be used for future research—comparing

compounds that fit the possible feature configurations suggested should, to some extent, still

allow for these features to be weighted. To this end, Table 4.2 on the following page contains

compounds for each of the possible feature sets66.

66

A checkmark indicates the strong or positive parameter for a given feature. In the case of centricity, the distinction is between strongly and weakly endocentric compounds; for compositionality, fully compositional may be opposed to partially/weakly compositional.

Page 136: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

122

Table 4.2. Possible combinations of compound features.

Centricity Canonical Compositional Compound Meaning

✓ ✓ ✓ stylo-bille ‘stylo avec une bille’

✓ ✕ ✓ radio-taxi ‘taxi qui utilise une radio’

✓ ✓ ✕ bateau mouche ‘bateau pour touriste’

✓ ✕ ✕ aube-vigne ‘vigne’

✕ ✓ ✓ pomme-cajou ‘fruit du cajou’

✕ ✕ ✓ vidéo-lynchage ‘enregistrement d’un acte répréhensible avec l’intention de le diffuser’

✕ ✓ ✕ --- ---

✕ ✕ ✕ --- ---

One thing to notice is that if a compound is weakly endocentric (i.e. the head involves a

metaphor or metonymy), it’s non-head element must contribute literally to the whole, as

evidenced by the absence of non-compositional combinations in the data. This may not hold for

a larger dataset, but for the moment, we may postulate that compositionality is highly dependent

on centricity.

Taking the compounds in the previous table and comparing them according to

opposing/alternate features results in pairs better suited to the evaluation of said features:

Contrasting Features

pomme-cajou radio-taxi Weak Centricity ~ Non Canonical Head Position

pomme-cajou bateau mouche Weak Centricity ~ Partially Compositional

radio-taxi bateau mouche Non Canonical Head Position ~ Partially Compositional

In the table above, the listed compounds may each be contrasted according to one negative or

weak feature. For instance, pomme-cajou is weakly endocentric, but both canonically headed

and compositional; radio-taxi, on the other hand, is non-canonically headed, but both strongly

endocentric and compositional. These two compounds, when opposed, allow us to compare the

semantic strength of the head relative to its position. If we judge radio-taxi to be less transparent

than pomme-cajou, then head position is most likely a stronger indicator of transparency than

the degree of endocentricity (in terms of the weak ~ strong distinction). The bolded compounds

in the table above are those I believe are most transparent within pairs, which suggests that the

Page 137: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

123

degree of a compound’s transparency may be assessed according to the following hierarchy:

Head Position > Centricity Strength > Compositionality. This hierarchy also reflects the fact that

partial or weak compositionality cannot occur if the head is not understood literally. If we

compare compounds possessing a single positive or strong feature, two particular details arise,

as illustrated by the following table.

Compound Canonical Strong Endocentric Compositional

--- + − −

aube-vigne − + −

vidéo-lynchage − − +

First, the proposed hierarchy grants a left-headed, weakly endocentric and weakly or partially

compositional compounds greater transparency than it does other combinations, despite the fact

that no such compound is present in the data. This isn’t entirely problematic, given that if such a

compound existed, it would still rank lower than other canonically headed compounds.

Moreover, such a compound remains endocentric, albeit weakly, which explains how it might

be, all things being equal, more transparent than compounds like aube-vigne or vidéo-lynchage.

Second, the comparison also shows how tenuous the order is for centricity and compositionality,

as one could just as easily argue that non-compositionality has a greater effect on reducing

transparency than a figurative head constituent. As was mentioned earlier, the decision made

here is meant to illustrate how such an approach might work; further research might certainly

allow for properties to be weighted differently.

Also related to this last point is how counter-intuitive it may seem to argue that compositionality

is less significant in terms of transparency effects than head position. After all, traditional

approaches to semantic transparency have often emphasized the semantic contribution of a

compound’s elements over other indicators. Yet, if we compare radio-taxi (right-headed and

compositional) and bateau mouche (left-headed and partially compositional), it is possible to

argue that canonical headedness is a stronger factor in meaning construal than compositionality

alone. This is perhaps not so surprising after all, if we understand compounds as inherently

ambiguous lexical items: most provide just enough information to understand what they might

mean, but do not convey crucial information necessary to fully understand them. A compound

with its head in canonical position allows speakers to establish just what it is the item is

referring to. Conversely, a non-canonically headed compound, when encountered for the first

Page 138: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

124

time, may lead to the incorrect identification of its hypernym (e.g. radio in the case of radio-

taxi) and thus to an erroneous interpretation. This was in fact the argument offered in Section

4.1.5, where it was said that head position was a greater factor in transparency than the

distinction between strong and weak centricity.

Given these observations, we might suggest the following (partial) hierarchy of features, in

which the corresponding compounds, from left to right, go from most transparent to least

transparent.

Figure 4.4. Distribution of features for endocentric compounds.

Given that no convincing examples were found for weakly endocentric compounds in which the

non-head element either involved a sense extension or contributed no meaning at all, we might

postulate that compositionality is constrained by the nature of the head itself. In other words, if

the head already involves a metaphor or metonymy, thus weakening its semantic transparency,

then the non-head constituent is likely to be understood literally. Interestingly, this constraint

may not be as strong for exocentric compounds as a few such cases were found in the data. The

compound trou-madame, for instance, which is a game in which players attempt to push small

balls into holes, is partially compositional, but only on the left-most constituent. It must be

noted, however, that such examples are not numerous, which suggests that compositionality

Endocentric

Canonical

Strongly Endocentric

Fully Compositional

stylo-bille

Weakly Compositional

carte soleil

Partially Compositional bateau mouche

Weakly Endocentric

Fully Compositional pomme-cajou

Weakly Compositional

???

Partially Compositional

???

Page 139: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

125

stemming from the non-head may depend on the semantic nature of the head. The issue is

fundamentally empirical in nature, which may be better addressed following the examination of

a much larger dataset.

4.2.3 Summary

A compound is said to be fully compositional if all constituents retain (and therefore contribute)

their individual meanings to the meaning of the whole; a compound is partially compositional if

the non-head does not contribute meaning to the whole. In cases where an established trope

targets the non-head element, then we may say that the compound is weakly compositional.

Only when neither constituent retains meaning is a compound considered non-compositional.

Such a case is only possible for exocentric compounds. Although the relative importance of

each feature heretofore discussed is difficult to assess, it has been proposed that they may be

ordered in the following manner: Head Position > Centricity Strength > Compositionality. The

next section will explore how compounds with the same feature sets might be further contrasted

using similar lexicalized compounds as a point of comparison.

4.3 Semantic Homogeneity

Unlike other word formation processes, there are few restrictions on just what types of items

may be joined together to form a compound. Theories of word formation often invoke rules to

explain how new units are formed (cf. Aronoff 1976), most of which are highly productive

mechanisms that, based on some input, produce an output that is appropriate given the

parameters of the specified language. In other words, rules account for a language’s potential

words. While compounds are not free to involve just any item from the lexicon, the selectional

criteria that govern compounds are significantly less restrictive than they are for derivation.

At its simplest, compounding may be said to hinge on basic rewrite rules according to what

combinations of lexical categories are possible for a given language. This was Selkirk’s (1982)

approach, stating that, for instance, an adjectival compound in English consists of either a noun,

an adjective, or a preposition, followed by an adjective (i.e. A → {N, A, P} A) (16). As was

discussed in the previous chapter, similar rules may be stipulated for French (Zwanenburg 1992,

Fradin 2009). It is this particular fact that has allowed researchers to focus their efforts on

Page 140: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

126

compounds involving only certain lexical categories (e.g. Noun-Noun, Verb-Noun, etc.). Once

we step away from the constraints imposed by lexical categories, however, there is little

preventing two words from being combined to form a compound. Of course, some words may

seem incompatible from a semantic perspective, but this sort of criterion often fails to truly

predict what might constitute a potential or impossible compound, especially those that are

coined spontaneously or that are context dependent (e.g. pumpkin bus, Downing 1977). One

might say that there are in fact few impossible compounds, which makes this particular type of

word-formation highly productive67. This is evident not only in the frequency of novel

compounds, but also in the variety of coined compounds used to name new products or

companies (Facebook, YouTube, SoundCloud, etc.).

Instead of talking about sub-categorization and base selection, many researchers instead

emphasize compounding as a process based on schemata. Such schemata are meant to give

compounds a fundamental frame from which one may analyze existing forms, as well as create

new ones. Examples are given below, from the least to the most specified:

(65) a. Ten Hacken (1999) [X Y]Z or [Y X]Z

b. Jackendoff (2010) [F (. . . , X1, . . . Y2, . . .)]

c. Booij (2010) [Xi Yj]Yk ↔ [SEMj with some relation R to SEMi]k

d. Bell and Schäfer (2013) λ B λ A λ y λ x [A(x) & R(x,y) & B(y)]

In all instances, a compound is said to consist of two unspecified words68 and a function (as in

65b) or a relation (as in 65c-d) that links the two units together. Just what these relations or

functions may be is a discussion that will take place in the next chapter. For the moment, what

we may observe given these schemata is that compounds are complex units for which the

constituting elements are largely underspecified. For instance, while a word such as widity is

impossible because the affix “-ityN” selects a [+latinate] adjective as its base (Scalise and

67

Morphological productivity is a vigorously debated concept that I have chosen not to address here, but it seems justified to say that compounding is in fact a productive means of word-formation, given how many new words that enter the lexicon are compounds. Tulloch (1991), for instance, lists 1,950 new English words, 621 of which are compounds, by far the largest category of new entries (cited in Bauer 2001b). 68

It is largely understood that compounding is a recursive process that may thus consist of an unlimited number of words (Selkirk 1982, Lieber 1992). Most schemata may easily be expanded to account for this.

Page 141: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

127

Guevara 2005), no such constraint may be stipulated for compounds. If a language allows for

NN constructions, then presumably all nouns are available to the process of compound

formation.

If compounds are unrestricted, then, what rules might govern their creation? After all, as

Štekauer (2005) puts it, “[n]ew naming units do not come into existence in a vacuum or

accidentally” (43). While speakers no doubt coin compounds by selecting lexical items relevant

(i.e. semantically related) to the things they aim to denote, is the process influenced by other

factors? In Štekauer’s (2005) onomasiological approach to word-formation, both lexical creation

and interpretation are governed by a number of different factors, including extra-linguistic

reality and speech community. We may also add the lexicon itself to these factors: speakers may

call upon their knowledge of other, similar compounds in order to produce new items in a

process called analogical word-formation (Bauer 1983). I would like to emphasize that if

analogy is at play in the creation of new forms, it is also likely involved in the interpretation of

novel combinations.

Although not strictly limited to compounding, analogical word-formation involves the creation

of new words patterned on existing words in the lexicon. According to Booij (2010), at its

extreme point, a word formed via analogy is an opaque construction: “For these [analogical]

words we can indeed point to one particular compound as the model word for the formation of

the new compound, and the meaning of this new compound is not retrievable without knowing

the (idiomatic) meaning of the model compound” (94). Such extreme cases may be based on

either already opaque compounds or ones in which one or more constituents have undergone a

significant shift in meaning post-formation. The following examples from Ryder (1994) show

different examples of compounds formed via analogy, some of which might be viewed as more

reliant on the source construction than others.

(66) a. whitemail based on blackmail

b. ice legs based on sea legs

c. Iran-gate based on Watergate

In both (66a) and (66c), the source compounds are most likely synchronically opaque—it is

difficult to imagine that the hearer would understand, for instance, the meaning of whitemail if

he or she were not already familiar with blackmail. The pair of compounds in (66b) are perhaps

Page 142: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

128

slightly less challenging, but again, it is entirely possible that understanding ice legs requires

that one know the meaning of sea legs.

The French data taken from Wiktionary contains a series of compounds that might also be said

to have been constructed on the basis of analogy69:

(67) a. seconde/minute/heure/jour/semaine/mois-lumière based on année-lumière

b. moto/vélo/bateau-école based on auto-école

Becker (1993), arguing in favour of analogical word-formation, states that “[a] compound like

firewoman ‘female fire fighter’ or frogwoman ‘female skin diver’ is not formed on the basis of

fire or frog and woman but on the basis of fireman and frogman. The constituents of firewoman

do not motivate the compound” (13). Becker refers to these particular cases as “replacive

compounds” and sets them apart from ruled-based “derivational compounds.” The key

distinction is that replacive compounds typically involve items that share a paradigmatic

relationship. For instance, airman might be the replacive result of seaman as the non-head

constituents are both items in a paradigm (i.e. land, air, sea). By this measure, the French

compounds in (67) above are all replacive in nature and thus analogical. It is not clear, however,

if this paradigmatic criterion is in fact a steadfast requirement of analogical compounds, as the

lexemes involved in many N-gate compounds do not necessarily participate in any sort of

paradigm.

Perhaps crucially, and as Bauer (1983) underlines, word-formation based on analogy is not,

under most circumstances, as productive as rule-based formation, though this does not mean that

analogical frames are incapable of generating a large number of new forms:

“That is, following Thompson (1975: 347), a distinction is drawn between productivity and analogy. This does not preclude the possibility, of course, that an analogical formation will provide the impetus for a series of formations: this is presumably what happened in the case of formations in –scape, based on landscape, then an analogical formation seascape giving eventually a productive series including [. . .] cloudscape, skyscape and waterscape.” (Bauer 1983: 96)

69

This assertion is based on the fact that only année-lumière is listed in dictionaries like LPR2010 and TLFi, indicating that it is most likely the basis for the others mentionned. Likewise, LPR2010 only lists auto-école.

Page 143: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

129

Ryder’s Iran-gate example in (66) above offers another case of analogical compound formation

that has become highly productive: N-gate is now the standard frame with which to introduce

new words denoting a scandal70. This view is somewhat tempered, however, if one understands

that this productivity is most likely the result of the lexeme gate having acquired the meaning of

‘political scandal’, thus rendering these cases as simple compounds coined according to regular

compounding rules. Despite the existence of productive templates, Bauer (1983), still hesitates

to grant analogical word-formation a very prominent place on his cline of morphological

productivity, stating that “[t]he limiting case of productivity at the lower end is presented by

analogy, where only one new form may exist” (100). This is in fact quite true. While affixes

seldom produce only one possible entry in the lexicon, an existing word might only produce one

new form via analogical means. Of the examples offered earlier, sea legs is unlikely to generate

a large number of related compounds.

Just where exactly analogy might fit into a theory of word-formation is certainly an on-going

matter of debate. Derwing and Skousen (1989), who argue in favour of analogical word-

formation from a cognitive perspective, offer ten main points upon which analogical and rule-

based theories may be opposed. They argue that while a rule-based model may allow a speaker

to store fewer lexical items, an analogy-based model results in far lower computational loads.

Bauer (2001b) also offers a good overview of the advantages and disadvantages of a theory of

word formation based on analogy. Some of the objections are that analogy fails to predict

potential words and that it is not sufficiently restrictive. While there is no doubt that these

objections are tenable in the case of word-formation involving bases and affixes, it is unclear if

they remain as strong for compounding. Bauer (2001b) also lays out a number of arguments in

support of analogy, notably how it may account for irregularities in word-formation (i.e.

multiple, different nominalisations for the same verbal base). In the case of compounding,

analogy therefore explains compounds that are otherwise entirely opaque (e.g. whitemail and

greymail). Bauer concludes that the two mechanisms no doubt co-exist, each supplementing the

other. This stance is similar to the one held by Booij (2010), for whom word formation is both a

matter of analogy and schema.

70

Wikipedia, at the time of writing, lists more than one hundred such compounds. While some of these cases may not be widely used, their number no doubt lends support to the argument that analogy may prove to be productive.

Page 144: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

130

Whether the process of analogical word-formation is more effective or better captures the

mechanisms in lexical creation than rules goes beyond the scope of this work. No matter what

the theoretical implications are, however, it is undeniable that some lexemes are created based

on existing forms in the language and that this process also applies to compounds. The question

is therefore whether analogy can be executed in reverse? In other words, if a compound can be

created analogically, might they also be interpreted this way, which is to say, might a hearer rely

on his or her knowledge and understanding of similar forms to interpret new ones? Van

Jaarsveld et al. (1994) describe this hypothetical process as follows:

“To determine whether some lexicalized compound is a suitable model, its semantic representation will have to be retrieved. The relation specified between the nouns in the lexicalized compound will be applied to the novel compound (that is the core of the analogous interpretation) and the result of this process will be evaluated with respect to meaningfulness. When the outcome of this evaluation process is unsatisfactory (according to some criterion), the process will be repeated for some other lexicalized compound. When the outcome is, however, satisfactory, the nouns of the novel compound will be related in the same way as the nouns for the lexicalized compound and interpretative processing of the novel compound will stop.” (116-117)

Similarly, Derwing and Skousen (1989), discussing Skousen’s (1989) parallel work, use the

terms supracontextual homogeneity and random selection in the context of interpretation via

analogical means, stating that “if the given context does not lead to a single, definitive solution

in the lexicon, a range of surrounding supracontexts is explored until a point of supracontextual

heterogeneity, explicitly defined, is reached; a random choice is then made from among the set

of possible analogical examples made available by the search” (64). These mechanisms describe

meaning resolution via an evaluation of similar forms present in the speaker’s lexicon.

Presumably, it is up to speakers to determine if a particular meaning is valid using other

(extralinguistic) means.

Unfortunately, experiments meant to verify to what extent analogy influences a speaker’s

processing of compounds have produced mixed results. Van Jaarsveld and Rattink (1988), for

instance, tested the effects of lexical frequency on compound processing by Dutch speakers and

concluded that the existence (and availability) of a lexicalized form could serve as the basis of

interpretation for a novel compound. Clark and Berman (1987), however, in their work on

children’s understanding and production of novel compounds in Hebrew found no such

influence in paraphrasing tasks: they obtained similar results for tests using compounds based

Page 145: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

131

on low frequency heads (i.e. heads that did not appear frequently in lexicalized compounds) and

those based on high frequency heads, concluding “that knowledge of the pertinent lexical items,

and not the constructions they appear in, is more important for compounding.” (Clark and

Berman 1987: 560). Later experiments by Van Jaarsveld et al. (1994) lends additional support

for these findings, though not without some measure of nuance. In their first experiment

involving a lexical decision task, participants responded faster overall to novel compounds

containing nouns also present in a large number of lexicalized compounds. They did not,

however, find that reaction times were affected by a novel compound’s degree of

interpretability, which was determined beforehand by other participants who rated each novel

compound on a 7-point scale, ranging from very difficult to interpret to very easy to interpret. It

is unclear whether this method of determining interpretability might have had some effect on the

absence of interaction between set-size and interpretability, but it does suggest that analogy may

have a limited effect on how a speaker judges a compound’s interpretability. In their second

experiment, Van Jaarsveld et al. looked to compare compounds in which the shared noun was

based on their semantic relatedness to lexicalized compounds. For instance, the first word of the

novel compound coughing pause is semantically related to the first word of the lexicalized

compound breathing pause, whereas no such relation exists between the latter and the novel

compound bay pause. Such pairs were constructed for both high and low productive sets of

lexicalized compounds. They found that, in a prime-target lexical decision task, participants’

reaction times were lower for semantically related pairs, but did not differ significantly between

low and high frequency lexicalized compounds. Based on the results for both experiments, Van

Jaarsveld et al. (1994) conclude that it is unlikely that individuals make use of existing

compounds when interpreting novel ones. Despite this conclusion, they concede that their

results nevertheless suggest that lexicalized compounds are being activated at some level.

In Van Jaarsveld et al.’s (1994) investigation of analogical effects on compound processing, it

was understood that sets were constructed based on a given noun’s overall frequency of

appearance in compounds. Their second experiment, which might be viewed as most relevant to

the question of analogical compound interpretation, involved single existing compounds as

targets, selected based on whether the head was of a high frequency or low frequency in their

database. What this does not address, however, is whether these sets of existing compounds

were semantically homogeneous. In other words, while speaker reaction times may not have

Page 146: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

132

been influenced by high or low frequency heads, it is entirely possible that the high frequency

heads belonged to a set of compounds that differed greatly in meaning. Ryder’s (1994) work on

how speakers interpret novel compounds addresses this particular point. She argues that certain

words not only participate in a large number of existing compounds, but the degree to which

these sets are semantically homogenous will influence how a speaker will interpret a novel form

involving those same nouns. This differs from Van Jaarsveld et al.’s (1994) assumption that

“initial activation of lexicalized compounds will be independent of characteristics of the whole

set” (116), which led them not to take into account the semantic homogeneity of a particular

group of lexicalized compounds.

The premise behind Ryder’s (1994) work is that speakers do in fact use their knowledge of

existing compounds to determine the meaning of novel ones, based on what she calls linguistic

templates. Ryder refers to a particular template as an analogy base. According to Ryder,

analogy bases may consist of templates based on groups of compounds that share a common

element. Thus, sea-N (e.g. sea lion, seaman, sea cow, seaweed, etc.) and N-house (e.g.

boathouse, warehouse, treehouse, firehouse, etc.) are both potential linguistic templates. The

words at the heart of such templates (i.e. sea and house respectively) are called core words and

are the key to how analogy is applied (see also Becker 1993 for a similar approach).

Ryder uses what Bates and MacWhinney (1987) call cue reliability to assess just how

semantically influential a template can be. Bates and MacWhinney define the notion as “a ratio

of cases in which a cue leads to the correct conclusion, over the number of cases in which it is

available” (164). Using this approach as her starting point, Ryder distinguishes between two

types of cue reliability (1994: 81-82):

(68) i. absolute cue reliability: “core words contribute the same meaning regardless of what

they are paired with”

ii. relative cue reliability: “while one cannot predict the meaning of the compound just

from the presence of the core word, the conjunction of the word with a certain semantic

class of other words produces highly reliable results”

One can imagine that cue reliability is more likely to be relative than absolute as few core words

(or templates) would participate in a set of compounds that all share a common semantic thread.

Page 147: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

133

Ryder offers box as an example of a core word with a high cue reliability as most compounds

seem to involve a container-contained relationship, but it is not difficult to think of a number of

compounds that do not (e.g. boom box, music box, signal box, etc.). Moreover, a high cue

reliability is no doubt intimately linked to a core word’s semantic representation. As Ryder

(1994) says of her example, “Box has a high cue reliability because it has a highly central

schema with a very salient slot for the item to be put in the box, and naturally this is the schema

that has been used in almost all X + box compounds.” (145).

Ryder’s approach is echoed in Štekauer’s (2005) work on meaning predictability, in which

analogy is one of twelve factors that, together, influence the meaning predictability of a novel,

context-free unit. Štekauer’s experiments revealed that participants’ interpretations of novel

compounds were occasionally influenced by the existence of similar compounds, which led

them to offer meanings based on analogical templates. Štekauer also found, however, that, while

these “templates [were] insufficient to recognise the subtle shades of individual readings” (258),

existing forms could have either a “boosting” effect on meaning predictability or, when there

was no possibility to interpret the word using an existing template, a reduction of that unit’s

meaning predictability rate. In either case, the existence of similar lexical items is said to have

an influence on the interpretation of a novel one.

Similarly, Baroni et al. (2007), also looking at the potential effects of lexicalized compounds on

the processing of new ones, propose what they call Lexicalized Interpretation Schema (LIS),

which is “an abstract constructional pattern [. . .] shared by all members of the same compound

family” (273). Their approach is largely based on results from Baroni et al. (2006) in which they

“found a strong tendency for the same heads and modifiers to be repeatedly used within the

sample of compounds from all frequency ranges they analyzed” (reported in Baroni et al. 2007:

279). Where Ryder spoke of core words, Baroni et al. talk about pivots, which may either be the

head or the modifier for a given compound. What distinguishes their model from Ryder’s,

however, is that they claim that a compound’s pivot is governed by the type of compound in

which it is found. Thus, according to Baroni et al., relational compounds (e.g. sugar box) will

involve an LIS based on the head as its pivot (i.e. X box schema), whereas attributive

compounds (e.g. feather luggage) will have an LIS with the modifier as its pivot (i.e. feather X).

What their model predicts is that novel compounds for which the pivot is retained will be easier

to understand than those in which it is changed. For instance, for an attributive compound such

Page 148: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

134

as feather luggage, we might generate either wing luggage (pivot substituted) or feather trolley

(pivot retained), the result of which is that the latter will be easier to understand than the former.

Moreover, they predicted that under certain circumstances, a semantically related lexeme might

be substituted for the pivot without negatively impacting the resulting compound’s ease of

interpretability. They tested speaker acceptability judgements for 380 analogically constructed

novel compounds and found that their predictions were largely borne out. The results are

summarized in the following table:

Relational

(e.g. city centre) Attributive

(e.g. zebra pot)

Substitution of head Only a semantically related word Any word

Substitution of modifier Only a semantically related word No substitution permitted

Broadly speaking, substituting either component of a relational compound with anything but a

semantically similar noun will result in an unacceptable combination. For attributive

compounds, however, only the head may be substituted, but the new noun need not be

semantically related to the original.

Most of the work on the role of analogy on compound interpretation has focused on entirely new

and novel constructions. The question here is whether the concept can also be applied to existing

compounds? I believe that it can. The premise that might allow for analogy to be factored into

evaluating the transparency of existing compounds is in fact quite simple. Semantic

transparency was defined in Chapter 2 as a compound’s degree of interpretability in the absence

of prior knowledge of that particular combination. In other words, semantic transparency applies

to existing compounds as if they were novel, which is to say that the speaker, being familiar

with only its constituents, must attempt to establish meaning using the same methods he or she

would use if the compound had just been coined. Knowledge of existing compounds may factor

into this process regardless of a compound’s status as either novel or established. Thus, if a

particular compound happens to share its template with a number of other compounds, the

interpretation of said compound may be influenced by how semantically uniform that template

is. Moreover, if a given template is highly homogeneous, compounds based on that template, but

that do not involve the same meaning may be harder to understand that those that do. In some

Page 149: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

135

ways, this approach is similar to theories involving family size, which is said to influence

processing of morphologically complex words (Schreuder and Baayen 1997, Bertram et al.

2000, De Jong et al. 2002).

4.3.1 Semantic Reliability Index

In order to offer a point of comparison between similar compounds, I will call the semantic

reliability index (henceforth SRI) the measure of a template’s semantic homogeneity in relation

to a particular compound. A template is to be understood as a specific lexeme alongside a

lexical category (e.g. sea N, cf. Ryder 1994). Thus, a compound’s SRI is calculated by dividing

the number of semantically similar compounds sharing the same template by the total number of

compounds for that template:

(69) # of semantically similar compounds based on template T = Compound

# of total compounds based on template T SRI

The concept of SRI, while largely based on Ryder’s work, also bears some resemblance to

Baroni et al’s (2007) LIS71. Moreover, the calculation proposed above is also similar to

Štekauer’s (2005) calculation of a unit’s predictability rate, which is calculated by dividing the

number of participants who judged a particular meaning acceptable by the total number of

participants; this number is then multiplied by the quotient obtained from dividing the sum of all

rating points assigned by participants by the total number of points possible72. Thus, when

Štekauer tested the compound baby book, he found that 38 of his 40 participants found ‘a book

for babies’ acceptable with a points tally of 306 out of 400, resulting in a predictability rate of

0.727. Similarly, the SRI calculation proposed here produces results anywhere between 0.001

and 1.000.

71

The SRI is also similar to Gagné and Shoben’s (1997) strength ratio within their CARIN theory, as well as to the relational measures proposed in Pham and Baayen (2013). 72

These points refer to a scale used by participants to evaluate a given meaning’s acceptability (i.e. 1 is least acceptable, while 10 is most acceptable). Presumably, if no such scale were used (in a yes or no solicitation task, cf. Baroni et al. 2007), Štekauer’s predictability rate would simply involve the number of acceptable responses divided by the total number of responses.

Page 150: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

136

But just what does “semantically similar” in the equation above mean? Given that compounds

are often characterized as a pair of nouns related to each other in some way, we may want to

judge similarity based on the relation present for a given compound. This approach is backed up

by experimental work by Gagné and Shoben (1997), as well as Gagné (2001), who found that

participants’ reaction times in lexical decision tasks were affected by the relative frequency of a

modifier’s relational history. In other words, participants found it easier to judge novel noun

pairs when the most likely relation was also the modifier’s most frequent relation for existing

compounds. For instance, compounds containing mountain as a modifier were easier to interpret

if they involved a locative relation (e.g. mountain cloud) than a less frequent association for that

particular lexeme (e.g. ABOUT: mountain magazine). Thus, the SRI proposed here is a simple

means to assign a number to a compound based on how likely its meaning is given other similar

forms in the lexicon. It must be stressed, however, that taken on its own, the SRI is most likely

not a very strong indicator of semantic transparency as it is highly dependent on the number of

compounds available for a particular template, which will no doubt vary greatly from one

construction to another. Furthermore, individual speakers will not all draw from the same set of

compounds for a given template, a fact that supports a wholly “speaker-dependent” SRI.

In order to further illustrate how the SRI might be applied to a set of compounds, all NN and N

à N combinations in my data were examined and grouped together according to how often either

of their constituent lexemes occurred. Unfortunately, the number of high frequency compound

patterns identified in the data collected from Wiktionary is quite low. This is no doubt due to the

small sample size of the items retained (729 NN and 319 N à N compounds). The following

table contains the number of patterns and tokens in which one constituent recurs at least 4 times:

Table 4.3. Number of templates and tokens found in the data.

Compound Type N1 ≥ 4 occurrences N2 ≥ 4 occurrences

N1(-)N2 32 patterns 15 patterns

729 compounds 165 items 83 items

N1 à N2 15 patterns 4 patterns

319 compounds 80 items 23 items

Despite the small number of templates identified, a few facts do surface regarding French

compounds. First, the number of templates based on N1 is at least twice that of those based on

Page 151: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

137

N2. We may argue that this is due to French’s preference for left-headed compounds, which

shows that templates typically favour the head constituent. Second, in terms of proportion, the

difference in numbers between the two types of compounds is not actually that great: 165 items

for N1 X templates accounts for approximately 22% of NN compounds, while the 80 items

identified for N1 à X templates accounts for approximately 25% of N à N compounds. It would

seem then, that the distribution of compounds within recurring templates is similar for both

types.

Because the actual number of tokens for a given template is in fact quite low (i.e. the highest

number of compounds for either NN or N à N templates is 11), an analysis using a larger set

was conducted using traditional dictionary entries, namely LPR 2010. Two different templates

were used.

The first example template groups together a set of NN compounds formed around the noun

papier in head position (i.e. papier-N). This also happens to be the most frequent pattern in my

own data with a total of 11 compounds. Arnaud (2003), on the other hand, lists 25 such

compounds. Looking through the entry for papier in LPR2010, we find a total of 24 NN

compounds involving this particular lexical unit (though they don’t all coincide with Arnaud’s

list). These compounds are listed in Table 4.4 on the next page and are grouped together

according to their shared meaning73.

Using these compounds, the average SRI74 of the template papier-N is 0.083, which suggests

that it possesses very low semantic uniformity. The number in fact reflects the wide range of

possible meanings a papier-N compound might have. On the one hand, this number may be

used as a comparative measurement of semantic homogeneity between different templates, and

on the other, it may also serve as an anchor point for defining semantic homogeneity within the

template itself. In other words, a compound that has an above average SRI no doubt represents

73

The paraphrases retained here are based on the definitions provided by LPR2010 and are necessarily specific (e.g. essuyer, emballer, etc.) so as to attribute greater importance to the semantic homogeneity of the retained templates. Compounds are considered semantically similar if they allow for the same basic paraphrase to be used. In the next chapter, many of these compounds are subsumed under identical relations, which would then (slightly) increase the calculated SRI. 74

The average SRI of a template is the sum of each type’s SRI, divided by the number of types for that template.

Page 152: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

138

the most likely meaning if interpretation does in fact activate other lexicalized forms. In the case

of papier-N, four meanings in particular stand out (given at the top of Table 4.4)

Table 4.4. List of papier-N compounds taken from LPR2010 under the entry for papier.

papier-N Approximate Meaning Compound’s SRI crépon, cristal, vélin, pelure Papier qui rappelle N 0.167 chine, hollande, japon Papier fabriqué à/en N (de style N) 0.125 carbone, émeri, toile Papier ayant N en tant que partie 0.125 bristol, kraft, buvard Papier de type N 0.125 main, cul Papier pour essuyer N 0.083 bible, journal Papier qui fait partie de N 0.083 filtre, monnaie Papier servant de N 0.083 toilette Papier utilisé de façon quelconque pour N 0.042 aluminium Papier composé de N 0.042 calque Papier pour produire N 0.042 cadeau Papier pour emballer N 0.042 ministre ? 0.042

Average SRI of Template 0.083

Comparing the above template to pompe à N, however, shows what a relatively homogeneous

set of compounds might look like. The LPR2010 lists 17 such constructions under the lemma

pompe, all of which are listed in the following table, and again grouped together based on their

shorthand periphrasis.

Table 4.5. List of pompe à N compounds taken from LPR2010 under the entry for pompe.

pompe à N Approximate Meaning Compound’s SRI

eau, huile, gazole, insuline, morphine, essence pompe destinée à déplacer (pomper) N 0.353

piston, bras, levier, roue, moteur pompe ayant N comme élément constitutif 0.294

injection, chaleur pompe fonctionnant à l’aide de N 0.118

vide, fric pompe utilisé pour produire N 0.118

vélo/incendie pompe utilisé de façon quelconque pour N 0.059

Average SRI of Template 0.246

Page 153: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

139

Taken as is, compounds constructed on the pattern pompe à N may possess one of five possible

meanings, with an average SRI of 0.246, much higher than that of papier-N. In this regard, we

may state that pompe à N is more semantically homogeneous than papier-N. Moreover, both

‘pump that pumps N’ and ‘pump with N as a part’ have a higher than average SRI, suggesting

that these meanings are dominant for this particular template. Activation of lexicalized

compounds during processing might favour these particular meanings75. In the case of the most

frequent meaning, I would suggest that this is due to the lexeme pompe’s inherently salient

function, which is a result of its artefactual nature. This will be discussed in greater detail in

chapters 5 and 7.

If we return to the data collected from Wiktionary, results for this particular analysis show a

great deal of variance. Table 4.6 on the following page contains the 13 patterns in which the

left-most constituent occurs at least five times76, ordered according to their average SRI (the

bolded row indicates a right-headed template).

At first glance, NN compounds containing the same leftmost component (i.e. the semantic head)

seem to consist of a number of highly homogeneous templates. While this may be an indication

of a great deal of homogeneity across NN compounds, it is more likely a consequence of the low

number of tokens present in the data, which seems all the more plausible given the low SRIs

calculated above for papier-N and pompe à N, each of which contained a much greater number

of tokens. It is also worth noting that one of the templates in Table 4.6 is in fact right-headed:

video-N is based on the modifier as the core word, with four out of five compounds meaning ‘N

that uses video’ (e.g. vidéo-protection).

75

This statement must, however, be hedged, as other factors are no doubt involved. As discussed earlier, meaning selection for a novel compound based on analogy would still be required to pass a felicity test (i.e. it is unlikely, for instance, that a speaker unfamiliar with pompe à incendie (‘fire pump’) would interpret it as ‘pump to pump fires’ given the semantic incompatibility such an interpretation would produce). 76

Earlier, in Table 4.3, all patterns in which either constituent occurred at least 4 times were retained. In the interest of space, only those compounds in which the core word appears at least 5 times are presented above.

Page 154: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

140

Table 4.6. NN compounds with average template SRI based on the left-most constituent.

Template X-N

# of types Average SRI

Most Frequent Meaning Example

poids 5 0.680 poids de N poids coq vidéo* 5 0.680 N qui utilise vidéo vidéo-protection singe 5 0.520 singe qui ressemble à un N singe-chouette wagon 7 0.510 wagon qui a un N en tant que partie wagon-citerne mot 5 0.440 mot qui fonctionne comme un N mot-outil carte 8 0.375 carte qui sert de N carte-cadeau voiture 7 0.347 voiture dans lequel il y a un N voiture-bar radio 7 0.306 radio qui est aussi un N radio-gramophone café 7 0.266 café qui est aussi un N café-bar

poisson 7 0.225 poisson qui ressemble à un N poisson qui a un N en tant que partie

poisson-chat poisson-épée

chou 6 0.222 chou qui resemble à un N chou-fleur bateau 8 0.219 bateau qui sert de N bateau-bus chien 5 0.200 --- --- papier 11 0.124 papier destiné à N papier toilette

As for templates constructed using the right-most constituent (i.e. the modifier), the average SRI

ranges from very high to absolute. Again, the following table contains all templates for which

the second constituent occurs at least five times (the bolded row indicates that the template is

right-headed):

Table 4.7. NN compounds with average template SRI based on the right-most constituent

Template N-X

# of types Average SRI

Most Frequent Meaning Example

alpha 6 1.000 N de type/catégorie alpha particule alpha

lumière 6 1.000 distance parcouru par la lumière en un N

année-lumière

garou 11 0.835 N qui est un garou loup-garou

mère 12 0.680 N qui sert de mère bateau-mère

gamma 5 0.680 N de type/catégorie gamma particule gamma

tampon 5 0.680 N qui sert de tampon mémoire tampon

école* 6 0.611 école où on apprend N auto-école

Page 155: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

141

There seem to be two reasons for the high semantic homogeneity observed for modifier based

templates. First, as Baroni et al. (2007) suggest, templates may be influenced by the type of

compound favoured by certain lexemes, which is to say that attributive compounds will most

likely favour the modifier as its pivot. Most of the recurring modifiers in the table above involve

compounds in an attributive (or attributive-like) association (e.g. N-mère, N-alpha, N-tampon,

etc.). If the modifier is meant to ascribe a particular characteristic to the head noun and this

characteristic is in fact a stable component of the modifier, then compounds involving this

modifier are likely to possess the same attributive meaning (e.g. N-mère = ‘N that is like a

mother’). This differs from relational compounds in that the relation that arises may not be a

fixed feature of either constituent (see, for example, the compounds based on papier-N).

Second, many of the compounds occurring within the templates above are in fact related to each

other via analogy, suggesting that analogical word-formation need not favour the head.

Compounds such as N-lumière (discussed earlier) and N-garou are both templates for which a

source compound can be identified (année-lumière and loup-garou respectively). These

instances call back to Becker’s (1993) comments regarding replacive (i.e analogical)

compounds, in which he argues that the elements susceptible to substitution form a paradigm

(e.g. N-lumière, where N is a standard measurement of time). N-école, also mentioned earlier as

a case of analogically formed compound, differs in that it is mostly right-headed (4 of 6 tokens),

which means that the core word is in fact the head constituent. Meaning across these items of

the template is consistent: école où on apprend à faire N (‘school where one learns to do N’).

Although this might suggest that this pattern has an absolute SRI, the truth is that there is

interference from left-headed compounds that fit the same pattern and which obviously do not

share the same meaning:

(70) a. bateau-école, auto-école, moto-école, vélo-école (‘école ou on apprend à faire N1’)

b. navire-école, croiseur-école (‘N1 qui est une école’)

It is my contention that SRIs are to be calculated using all compounds that fit a given template,

regardless of headedness, so as to account for any interference that might occur at the level of

interpretation. If the speaker, when presented with a new compound, attempts to interpret it

using existing compounds, he or she must do so by evaluating all items for a given pattern. The

Page 156: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

142

process is most likely executed using the dominant head position first, but may then require that

the other head be considered if either position is possible for a particular template (i.e. N-école).

If we consider N à N compounds, we find only seven templates based on a core word appearing

at least five times. The following table contains all such instances:

Table 4.8. N à N compounds with average template SRI based on the left-most constituent

Template X à N

# of types Average SRI

Most Frequent Meaning Example

boîte 7 0.755 boîte pour contenir N boîte à outils clé 11 0.454 clé qui a N en tant que partie clé à chaîne pierre 6 0.389 pierre destinée à N pierre à briquet pompe 6 0.389 pompe qui pompe N pompe à essence moulin 9 0.308 moulin pour moudre N moulin à poivre

pâte 6 0.278 pâte utilisée pour faire N pâte qui produit N

pâte à papier pâte à pet

tête 5 0.233 tête qui provoque N tête à claques

Although the number of recurring N à N templates is far lower than for those examined earlier,

the few that are present show that this particular type forms, overall, a more semantically

homogeneous group than their NN counterparts. This is not all that surprising given that the

presence of the preposition significantly restricts what relation might link together its nominal

constituents, thus increasing semantic homogeneity within templates (see Chapter 6, Section

6.2.2 for a discussion of N à N compounds following an analysis of their relational semantics).

Examining N à N templates based on the right-most constituent reveals only four cases where

the core word occurs at least four times in the data. These are all listed in the following table:

Table 4.9. N à N compounds with average template SRI based on the right-most constituent.

Template N à X

# of types Average SRI

Most Frequent Meaning Example

vapeur 4 1.000 N qui fonctionne à vapeur bateau à vapeur

vide 6 0.722 N qui fonctionne au vide tube à vide

main 5 0.440 N employé par la main frein à main

feu 8 0.156 N qui utilise le feu N dans lequel il y a du feu

arme à feu chambre à feu

Page 157: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

143

Again, templates constructed using the modifier show a high degree of semantic uniformity.

Although the SRI numbers above are no doubt influenced by the small number of types

examined, some of the patterns observed are revealing. Compounds involving N à vide, for

instance, are analogical in nature, 5 of the 6 occurrences being based on tube à vide (→ diode à

vide, triode à vide, etc.) and are therefore all semantically related. In the case of compounds

constructed on the N à vapeur template, all instances denote machines powered by steam, which

is a highly productive relation for N à N compounds (see Chapter 6 for more on this). These

observations, although weakened by the limited number of templates in the data, are significant

in that they correspond to those made earlier for NN compounds sharing the same modifier

based templates.

Overall, many compounds containing the same lexeme do show some degree of semantic

homogeneity, suggesting that compound meaning is in fact constrained based on what

constituents are involved. These results, however, are of limited appeal given how few patterns

are in the data. That said, my initial examination of papier-N and pompe à N using a larger set

of compounds shows that the number of compounds for a given template is most likely much

greater than my own data suggests. Ideally, a template’s SRI would be calculated using as many

existing compounds as possible, which would produce a far more reliable set of indices.

4.3.1.1 How Does the SRI Fit in?

The purpose of calculating the SRI is in fact twofold. First, it provides an additional means with

which to compare and evaluate otherwise identical compounds. Most compounds analysed in

the data retained from Wiktionary are compositional compounds with canonically strong

semantic heads, which would, according to the features outlined in the first two sections of this

chapter, render them equally transparent. While this might in fact prove both accurate and

sufficient, determining the semantic uniformity of a set of compounds allows us to further rank

them within their feature sets. Verifying the viability of this proposal, however, requires that

compounds with differing SRIs be tested with speakers. For the time being, we may use SRI as

a scale for each terminal point on the hierarchy developed in the first two sections of this

chapter, as illustrated in the following partial figure:

Page 158: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

144

Figure 4.5. Relationship between compound features and the semantic reliability index.

Once we’ve established how a compound is classified using head and compositionality features,

we may then verify how its meaning relates to those based on the same template. A compound

that shares its meaning with a large number of similar compounds may have a greater chance of

being interpreted correctly than one involving marginal or idiosyncratic relational information.

This may not be all that surprising or counterintuitive if we imagine that a highly recurring

meaning for a given template is most likely motivated by the presence of highly salient features

or properties of the core constituent’s semantic representation (e.g. N-box where box is meant to

contain things). This use of the SRI will be explored in greater detail in Chapter 7 (Section

7.2.2.5).

A second use for the SRI is that it also allows for the evaluation of specific relational

information for a given compound. In the examples in Table 4.5, pompe à N seems to favour

either a purposive (in this case, for pumping N) or a part-whole relation. Conversely, this

particular template does not seem to include, among others, a locative sense, which would mean

that a compound based on this template involving location might have a negative impact on its

semantic transparency. In the next chapter, I will propose a set of basic relations that may be

used for this very purpose.

Strongly  Endocentric  

Compositional   Weakly  Compositional  

Partially  Compositional  

1.000

SRI

0.001

1.000

SRI

0.001

1.000

SRI

0.001

Page 159: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

145

There are, however, a number of limitations to this approach. First, it is not entirely clear just

how great a role lexicalized compounds play in the interpretation of new compounds involving

the same constituents. While the research seems to suggest that that these stored compounds are

being activated (Van Jaarsveld et al. 1994), it isn’t clear that the semantic homogeneity of these

shared sets will have a positive or negative effect on a given compound’s semantic transparency.

Furthermore, these sets may be far larger than suggested here and may in fact involve related

words, such as synonyms, hypernyms and hyponyms. In other words, a compound with the

lexeme boat as its head may activate compounds involving other related words (ship, cruiser,

vessel, craft, etc.). The template used to calculate a compound’s SRI would then be something

along the lines of WATER BASED VEHICLE + N. Ryder’s (1994) research seems to suggest that

speakers do in fact generalize these patterns (e.g. ANIMAL + LOCATION). If this approach is

correct, however, it may be that greater opacity will occur for compounds whose meaning strays

from a template’s established meaning, as well as for those that otherwise seem to share the

same template, but that differ structurally, and therefore semantically (e.g. bateau-école ~

navire-école, where head position varies within the N-école template; see Table 4.7).

I would reiterate, however, that the semantic reliability index is meant to add just one more

indicator to the typology of semantic transparency, one that might add greater granularity to the

classification. Additional research might in fact show that compounds that differ only by their

SRI ratings are viewed by most speakers as equally transparent or opaque. This is a topic worthy

of future exploration.

4.4 Summary

In this chapter, I proposed three major features with which to evaluate a compound’s degree of

semantic transparency: centricity, compositionality, and semantic homogeneity. For each of

these factors, I argued that a number of key characteristics, such as head position and the

presence of tropes, should be taken into account when comparing compounds to one another.

The resulting hierarchy, while theoretical in nature, reflects a number of facts observed in the

data collected. One factor, however, that has not been discussed here and that might also be

integrated into this framework is frequency, whether it be the lexical frequency of the compound

itself or the frequency of its constituents. One interesting aspect that would merit further

exploration (and formalization) is the effect of relative frequency between the whole and its

Page 160: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

146

parts (cf. Hay 2003 for complex words). It is possible that a compound whose lexical frequency

is greater than its constituting elements (i.e. lexicalization) might pose greater interpretational

challenges than one whose elements are more frequent. I have chosen to set aside this particular

factor in order to concentrate on purely semantic features.

At the beginning of this chapter, I cited Baroni et al.’s (2007) hypothesis regarding compound

interpretation, which states that the process involves two steps. The first step requires that the

speaker identify the head constituent. This operation was largely the focus of this chapter. The

second step requires that the speaker establish the nature of the relation that binds a compound’s

elements together. This particular characteristic of compounds has received a great deal of

attention over the last 40 years. In the following chapter, I will attempt to synthesize the

research done in this particular area and propose a set of relations that might be used to further

develop a theory of semantic transparency.

Page 161: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

147

Chapter 5

Compound Relations

If the earlier discussion on compounding has shown anything, it’s that many, if not most

primary compounds are semantically abstruse. As I discussed in Chapter 2, compounds are often

said to defy semantic compositionality because they lack any clear indication of just how their

constituents are meant to relate to each other. Much of this ambiguity of meaning arises from an

absence of predication between a compound’s elements, a semantic gap that may be bridged by

a number of different senses, even across similar combinations. For instance, sun and burn in

sunburn are linked by an unexpressed causal relation (i.e. ‘burn caused by the sun’), while in

heartburn, the relation might instead be locative or argumental in nature (i.e. ‘burn located in

the heart’; ‘burn of the heart’). Unfortunately, this disparity of meaning is not a simple quirk of

a select few isolated constructions: Lees (1968) lists eleven compounds with dog as its head

element and every single one differs in meaning (e.g. puppy dog, watch dog, police dog, etc.);

Jackendoff (2010) illustrates the same problem using seven different compounds with cake and

nine with car. This wide range of possible meanings is due to some unexpressed association

between otherwise independent nominal constituents, a relationship that Allen (1978) calls the

Variable R. The value of this variable is governed by what she refers to as the Variable R

Condition, which constrains the range of possible values, while simultaneously blocking

unlikely or even impossible values by establishing compatibility between the semantic features

of the constituents. The fundamentals of this approach have been widely supported elsewhere in

some form or another (Cohen and Murphy 1984, Murphy 1988, Lieber 2004, Benczes 2006,

Baroni et al. 2007).

While such a proposal seems entirely reasonable, it is not, however, without its problems. Most

vexing is the fact that a number of compounds do not easily allow for the implicit meaning to

surface without some degree of manipulation or coercion. In a compound such as sunglasses, for

instance, what is it exactly about sun or glasses that allows for the Variable R Condition to

produce the correct relation (i.e. ‘glasses that protect against the sun’)? Allen admits that these

Page 162: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

148

cases abound and that they usually involve some degree of lexicalization, but this doesn’t

explain how the relation might come to be. Moreover, what exactly prevents equally plausible

associations from being generated for a given pair of words? As Allen puts it, a watermill could

just as well be a mill where people drink water and not one that uses it to generate power (cf.

Anscombre 1990 for the French moulin à vent). Her solution involves organising a lexeme’s

semantic features into a hierarchy, which allows dominant properties to determine meaning

when combined with other lexical units. Thus, according to Allen, mill has two dominant

features: “powered by” and “produces or makes something,” which accounts for a number of

compounds headed by that lexeme (94-95):

(71) a. water/wind/hand/steam-mill “powered by” interpretation

b. steel/paper/flour/cotton-mill “production” interpretation

This approach is not only reasonable, but it is also most likely correct given some of the results

from experimental research on compound interpretation (Ryder 1994). It nevertheless gives rise

to two questions. First, if feature dominance is sufficient to prevent incorrect meaning

generation, how does it allow one to distinguish between equally valid interpretations? In other

words, given that at least two possible relations exist for compounds involving mill, why is the

“production” reading not available for windmill? As mentioned earlier, Allen and others are

aware that features must be compatible in order for a particular interpretation to be deemed

felicitous, but it isn’t always clear where such lines must be drawn (such as for windmill). A

second question, and one which will be the focus of this chapter, involves the very nature of

these “dominant features.” More precisely, are they solely internal properties of the lexemes in

question or might there be a number of recurring relations able to account for various

combinations? Consider once again the compounds in (71b) above (i.e. steelmill, paper mill,

etc.). Is “production” truly a dominant feature of either steel or mill? Perhaps, given that a mill is

typically understood as a place where things are manufactured, but what of bee in honey bee, or

house in lighthouse, both of which also involve similar implicit predicates? Could “production”

instead be some sort of recurrent relation for compounds, one of many possible fundamental

values for the so-called Variable R? This is in fact what a number of researchers believe and

have attempted to formalize using a set of basic or fundamental relational concepts that could

account for most (if not all) compounds. It should come as no surprise, however, that such an

approach is not without some controversy.

Page 163: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

149

In its simplest form, a theory of compound relations (as they will henceforth be called) involves

two fundamental positions: either the number of relations that may emerge between a

compound’s elements is limited to just a few or these relations are, to some extent, unlimited.

Although Downing (1977) proposes her own set of relationships, she argues that no definitive

list can be compiled given that contextual circumstances will allow for a nearly infinite number

of possible interpretations, thus rendering compounding a matter of pragmatics. In her now

classic example, an apple juice seat would be difficult to interpret out of context, but if we were

to imagine a table with place settings, one of which has a glass of apple juice, it’s meaning

suddenly becomes rather trivial77. Others have also argued that no basic list could possibly

account for all compounds (Selkirk 1982, Lieber 1992, Wisniewski 1997) and those who do

promote such an approach often admit that their list of relations is not meant to be exhaustive,

but instead representative of most compounds under study (Jespersen 1956, Adams 1973,

Jackendoff 2010). Some have even gone so far as to propose a set of relations that account for

all possible nominal combinations, usually ignoring those compounds that defy compositionality

(Hatcher 1960, Warren 1978, Arnaud 2003). As we will see, while many researchers seem to be

at odds with one another when it comes to the analysis of compounds, often criticizing each

other’s works at length, there is nevertheless substantial overlap between their various

frameworks.

The focus of this chapter will therefore be on compound relations as a limited set of basic

predicates. The central motivation for this approach is that it allows for an account of

compounds like windmill and heartburn whose meanings are otherwise difficult to ascertain

using only the intrinsic semantic properties of their constituents. Although semantic properties

no doubt remain an important factor in the process of sense disambiguation, the fact that a

number of relatively basic relational concepts recur with some degree of frequency suggests that

speakers are more than likely using this information when interpreting compounds. In fact, there

is ample evidence to support that speakers make use of relational information during compound

processing (among others, Wisniewski and Love 1998, Gagné 2002, Estes and Jones 2006,

77

This observation, however, is also valid for nearly all aspects of language, as research in semantics has shown that context may allow for even simplex forms to acquire new and novel meanings through a variety of means (Nunberg 1979, 1995, Copestake and Briscoe 1995).

Page 164: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

150

Gagné and Spalding 2009). It therefore seems justified to integrate this component into a theory

of semantic transparency: when confronted with unfamiliar compounds, the speaker may

attempt to establish its meaning by “testing” known basic relations (cf. Ryder 1994). Like most

researchers, however, I do not believe that all compounds can be accounted for using only a

closed set of relational associations. Rather, some compounds may involve highly specific and

idiosyncratic relations, while a significant number of them make use of only a few recurring

associations. This position, if true, is of consequence for the current work because it further

strengthens my earlier position regarding “irregular” compounds (see Chapter 4, Section 4.3)

and the additional costs they might incur during interpretation. Furthermore, results from

experiments conducted by Gagné and Shoben (1997) suggest that the frequency of a relation for

a particular constituent (in their case, the modifier) has some effect on the ease with which a

speaker will be able to interpret a given compound. Factoring in relational frequency data is

therefore crucial, given that not all relations are equally pertinent for compounds. For instance,

in Girju et al. (2005), Part-Whole accounts for nearly 17% of their 4,500 English compounds,

while Location only accounts for half that number. Furthermore, research on novel compound

interpretation has shown that, plausibility constraints notwithstanding, speakers tend to favour

similar and recurrent relational paraphrases in meaning composition (Downing 1977, Ryder

1994, Wisniewski 1996). Such data for existing French compounds would not only be useful for

testing frequency effects of compound interpretation, but also offer an additional metric by

which to measure semantic transparency.

This chapter consists of two major sections. First, I will discuss some of the work previously

conducted on compounds, focusing on the approaches and relations others have adopted to

provide a semantic account of NN compounds. Based on this research, I have retained the most

salient and frequently mentioned compound relations in the literature for my own research on

semantic transparency. These relations will be the focal point of the second half of this chapter.

The results of their application for both French NN and N à N compounds will be discussed in

greater detail in Chapter 6.

5.1 Studies on the Semantics of Compounds

Although there has certainly been no shortage of research done on the relational properties of

compounds, only a few authors go into great detail on just what their relations are meant to

Page 165: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

151

represent or how they apply to compounds. In fact, it is largely the earlier work that seems most

concerned with a reliance on data to support the research. More recent studies have been far

more applied in nature and have concentrated on cognitive and computational approaches. I will

first look at some of the earlier research, before moving on to more recent proposals.

It is worth noting that the objective of the following sections is to show how the topic of

compound relations has been broached and what formalisms have emerged from this work. It is

not my goal to openly critique the approaches detailed, but to compare them so as to establish a

set of relations with which to explore the French data on hand. That said, some degree of

criticism is unavoidable as it remains important to identify the strengths and weaknesses of the

various approaches.

One further note: as most of the work done on this subject has been on English, the examination

that follows will rely heavily on examples from that language. The French data compiled from

Wiktionary will be used during the presentation of the relations retained in Section 5.2.2.

5.1.1 Early studies

Although he may not have been the first to do so, Jespersen’s (1942) early and rather cursory

work on compounds seldom goes unmentioned in the work that has followed. He identified six

types of substantive compounds, the first of which he called final determinative (i.e. right-

headed endocentric compounds). Although he states that “the number of possible logical

relations between the two elements is endless” (143), he nevertheless describes a number of

relational classes involving concepts such as Time (nightmare), Location (headache), Means

(handwriting), and Purpose (beehive). Hatcher (1960) believed Jespersen’s work to be flawed

and laden with innumerable inconsistencies and subsequently offered a harsh and biting critique

of his formalism. Where Jespersen saw idiosyncrasies and caprices of language, Hatcher saw an

opportunity for a higher level of abstraction. She reworked his classification and reduced it to

just four basic relations:

(72) a. Ⓐ = A is contained in B (e.g. seed orange) b. Ⓑ = B is contained in A (e.g. orange seed)

c. A → B = A is the source of B (e.g. cane sugar) d. A ← B = B is the source of A (e.g. sugar cane)

Page 166: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

152

Hatcher maintains that her classification system is able to account for even the most

uncooperative of compounds, including those Jespersen claimed defied classification. She does,

however, admit that this is largely due to the very loose nature of her basic relations and that

further subdivision might prove useful. In fact, Hatcher, realizing that her four main classes

failed to capture a number of relevant distinctions between compounds, supplemented her

categories with seven semantic classes (e.g. A is an animal and B is a person: cowboy), which

resulted in a system with 49 possible combinations. Given that this additional layer of

abstraction is not relational in nature (i.e. a constituent remains an animal, no matter what B is),

it does little to distinguish between very different compounds that otherwise share the same

space in her system: for example, both wheelchair and toolbox are object-object compounds of

class Ⓐ (as in 72a), yet are sufficiently dissimilar so as to merit different treatments. Despite the

appeal of such a succinct approach, four relations seems too small a number to be truly useful in

teasing out the many nuances that compounds exhibit.

That very year, Marchand (1960)78 offered his own take on compounds in his seminal work on

English word-formation. His approach, however, was more descriptive than it was explanatory.

According to Marchand, predication for non-verbal nexus compounds (i.e. root compounds) is

necessarily restricted. He distinguishes between copula compounds, where BE is the underlying

verbal unit (e.g. girlfriend), and rectional compounds, which require a full verb for expansion

(e.g. steamboat → ‘boat that uses steam’). For copula compounds, he identified four types:

subsumptive (oak tree), attributive (girlfriend), dvandva (fighter-bomber), and adjectival

(blackbird). Of the so-called rectional compounds, Marchand distinguishes between two types:

the type steamboat and the type policeman. For each of these types, he offers a number of

possible paraphrases (e.g. ‘B consisting, made up of A’), but these are given on a case by case

basis and play little role in the classification of these compounds. What is more important, in

Marchand’s opinion, is just how the components relate to each other syntactically. He therefore

discusses subject types on the one hand (i.e. the head is in subject position; silk worm = ‘worm

produces silk’), and object types on the other (i.e. the head is in object position; steamboat =

78

This discussion is based on the second edition of Marchand’s work published in 1969.

Page 167: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

153

‘steam powers the boat’). Although relational concepts are present in his work, they are not the

primary focus of his study.

Published the same year as Marchand’s opus on English word formation, Lees (1960) offered

his own analysis of compounding, one that was steeped in the transformational grammars of the

time. His work shares a number of features with Marchand’s in that Lees also draws parallels

between compounds and sentences. He therefore sought to show that compounds were in fact a

surface form of a “kernel” sentence whose predicate was deleted at some stage of the derivation.

The example for oil well in (73), taken from the fifth edition of Lees (1968), illustrates the

process (144):

(73) i. The well yields oil. à (GT19: relative clause)

ii. ... well which yields oil... à (T57: nominal modifier)

iii. ... well yielding oil... à (preposed modifier)

iv. ... oil-yielding well... à (ellipsis)

v. ... oil well...

The important point to retain is that compounds are purported to begin their lives as complete

sentences (i.e. The well yields oil.), and, with the help of a number of transformational rules,

lose most of their phrase structure along the way. Lees’s transformations are based on the

underlying syntactic relations held between the compound’s elements, as well as a number of

types and subtypes. Lees identified eight major classes of nominal compounds, each of which

contains multiple subclasses:

Table 5.1. Lees’s (1960) grammatical relations of nominal compounds.

Grammatical Relations Examples Subject-Predicate girlfriend, fighter plane, madman, redskin Subject-Middle Object doctor’s office, arrowhead, rattlesnake Subject-Verb talking machine, payload, population growth, etc. Subject-Object steamboat, car thief, water spot, etc. Verb-Object setscrew, pickpocket, eating apple, etc. Subject-Prepositional Object gunpowder, garden party, eggplant, etc. Verb-Prepositional Object grindstone, washing machine, boiling point, etc. Object-Prepositional Object bull ring, station wagon, wood alcohol, etc.

Page 168: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

154

While these classes do not seem to single out any of the semantic characteristics of compounds,

Lees does offer a limited set of possible paraphrases for many of his types. For instance, the

Subject-Object class, in which the head is the subject of some unexpressed verb and the

modifier its object (e.g. car thief = a thief steals cars), includes compounds that may be

paraphrased using one of two prepositions:

(74) a. from battle fatigue, beet sugar, fingerprint

b. for candy factory, grocery store, textile mill

These prepositions, while perhaps broad in meaning, offer some glimmer of a semantic

approach to compounds. The from preposition might be said to mean Source or Cause, while for

is most likely purposive in nature. Other prepositions used by Lees are of, by, with, and like.

Also included in some of his subtypes are basic verbs such as have and be.

As exhaustive and meticulous as Lees was in his work, his approach was subject to a great deal

of criticism. The main point of contention stemmed from the ad-hoc nature of his kernel

sentences. As Bauer (1978) points out, “a compound appears to be a surface neutralization of a

number of different logical/semantic/underlying representations” (81). Compounds are thus

inherently ambiguous, which means there cannot be just one kernel sentence for a given

compound. Lees was of course aware of this, stating that “most compounds can be derived, each

one, in a number of different ways, and thus each may have many different ways of being

understood” (1968: 122). This is in fact a widely held opinion regarding compounds, but it does

little to resolve the difficulties inherent to his transformational approach, which may be summed

up with the following question: if the predicate of a compound is deleted during a

transformation, where does this deleted information come from in the first place? If the speaker

only has access to the surface form, yet must derive it from some underlying sentence, how does

he or she choose “powered by” for windmill, but “produces” for paper mill? Similar criticism

was raised elsewhere (among others, Marchand 1965; Scalise 1984). Most of the work on

compounds that followed distanced itself from a purely syntactic account of compounding and

instead sought to introduce a more robust semantic component into their approaches.

Although Adams (1973) retains many of the syntactic relations found in Lees (1960), so as to

account for verbal nexus compounds, she classifies root compounds using a small number of

highly recurrent semantic associations. Adams (1973) introduced 11 such groups, each with its

Page 169: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

155

own set of sub-types. Her classification system ultimately contains over 70 distinct compound

types. The following table retains only those types that are paraphrased using unexpressed

content and thus omits five classes (Subject-Verb, Verb-Object, Adjective-Noun, Names, and

Other). Nor do they exhaust all of the possibilities set forth by Adams:

Table 5.2. Adams’s (1973) compound classes.

Major Classes Number of Sub-Classes

Possible Paraphrases

Appositional 4 functions as, is an instance of, is more specific than

Associative 7 is part of, belongs to, is produced from

Instrumental 13 prevents, preserves, causes, obtained through

Locative 8 place where, place to or from, time when

Resemblance 8 is in the form of, has features of, reminds one of

Composition/Form/ Contents 6 consists of, made from, in the form of, contains

Even with this reduced set of retained paraphrases, one will notice that meaning can differ

significantly within a given class. Appositional, for instance, includes compounds like fuel oil

(i.e. oil that serves as fuel), as well as compounds like codfish (i.e. fish of which cod is a

particular instance). Because the Appositional class is meant to capture copulative compounds,

the inclusion of forms such as fuel oil is not unsound (i.e. fuel oil is oil that is fuel), but one

might argue that this particular grouping is perhaps too broad. After all, fuel oil is not fuel like a

codfish is a cod. In other cases, however, the decision to include wildly different relations seems

entirely justified. For instance, compounds that make use of an instrumental relation could in

fact refer to a number of different activities: we use things to build, remove, clean, fasten, etc.

By the same token, however, it seems strange, to then distinguish between the contents and the

locative classes, given that when one says that “X contains Y,” one is actually saying “X is

located in Y.” Adams does in fact provide an explanation for this particular distinction, stating

that the Composition/Form/Contents class is meant to represent “compounds in which one

element specifies the other in terms of some concrete feature” (81), while the locative class

deals strictly with nouns denoting a time or place. But other unanswered questions do arise from

her classification system. For example, is grouping together Composition, Form, and Contents

indeed warranted? What distinguishes a bell jar (RESEMBLANCE) from a bow tie

Page 170: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

156

(COMPOSITION/FORM/CONTENTS), for instance? More importantly, why is “B in the form of A”

included in two different classes? Unfortunately, Adams does not provide an explanation for

these conflicting analyses. Nevertheless, her research remains among some of the most

comprehensive work done on compounds and provides a number of possible relational concepts

that might prove fundamental.

Like Jespersen (1942), Downing (1977) believes that the number of relations is not finite: “The

existence of numerous novel compounds [. . .] guarantees the futility of any attempt to

enumerate an absolute and finite class of compounding relationships” (828). She nevertheless

concedes that the majority of the novel compounds included in her study seem to involve a

limited set of basic semantic categories, which, combined with the fact that most previous

studies tended to invoke similar relations, prompted her to offer what she says should be the

minimum number of relationships needed to account for most compounds (828):

Table 5.3. Downing’s (1977) minimal compound relationships.

Relationship Example

Whole-Part duck foot

Half-Half giraffe-cow

Part-Whole pendulum clock

Composition stone furniture

Comparison pumpkin bus

Time summer dust

Place Eastern Oregon meal

Source vulture shit

Product honey glands

User flea wheelbarrow

Purpose hedge hatchet

Occupation coffee man

Downing’s list offers a good look at what the reduction of nominal compounds to a set of

primary relations would look like. One will no doubt notice some similarities with Adams’s

(1973) own system, though not quite with the same level of granularity. Downing does not,

Page 171: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

157

however, elaborate on just how these relations might apply beyond her own work on novel

compounds.

Warren’s (1978) classification of English compounds bears a striking resemblance to Adams’s

work, but with a far more elaborate structure. Although her approach consists of only five major

classes (i.e. Constitute and Resemblance, Belonging to, Location, Purpose and Activity-Actor,

and Proper-Name Combinations), each of these classes contains a number of sub-classes, which

are in turn composed of their own set of sub-classes. To further complicate matters, some of

these sub-classes are subject to an additional level of sub-division, which results in a final count

of 60 different compound types. Needless to say, her system is deeply hierarchic and proves to

be just as, if not more fine-grained than Adams’s. That said, Warren does offer a summary of

her relational classes79, which closely resembles the one proposed by Downing (1977):

Table 5.4. Warren’s (1978) semantic classes.

Semantic Classes Example

Source-Result student group

Copula girl friend

Resemblance clubfoot

Whole-Part spoon handle

Part-Whole armchair

Size-Whole 3-day affair

Goal-OBJ moon rocket

Place-OBJ sea port

Time-OBJ Sunday paper

Origin-OBJ hay fever

Purpose ball bat

Activity-Actor cowboy

Warren’s work is based on a corpus of just over 4,500 root Noun-Noun compounds (no verbal

nexus constructions). According to her analysis, nearly 4,000 compounds are accounted for

79

While Warren’s relations are primarily semantic in nature, some of them nevertheless possess syntactic features (i.e. N2’s status as object).

Page 172: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

158

using the classes in Table 5.4, with a further 519 falling under the proper names category. All in

all, Warren claims that only 33 compounds do not fit under her classification scheme, either

because they possess idiosyncratic meanings (e.g. stage coach = ‘coach that goes in stages’) or

because they defy analysis (e.g. bobby pin). At first glance, that such a low number of “misfits”

(as Warren puts it) should appear in her data seems odd, considering how many such

compounds have been said to exist (Jespersen 1942, Marchand 1960). It may very well be that

Warren was simply very liberal in her interpretation of certain compounds. If we look at those

listed under Adams’s (1973) OTHER class, we discover that Warren was indeed able to classify

some of them: cradle song, for instance, can be found under Place-OBJ. In fact, a closer look at

her data reveals that Warren classified a number of compounds usually treated as unanalyzable

elsewhere, such as honeymoon (OBJ-Time) and butterfly (Activity-Actor). She does, however,

usually explain why certain obscure or non-compositional compounds are included under a

particular class, which may or may not satisfy the reader. If Warren’s classification scheme is

indeed representative of most of the compounds included in her data, then what we have is

quantitative confirmation of what Downing (1977) had suggested were essential relations at play

in compounding. One may wonder, however, whether Warren’s abridged set of classes does in

fact suffice or if the numerous layers of sub-types are key to her—or any, for that matter—

treatment of compounds.

Levi (1974) had already sought to reduce the complexities proffered by some of these

approaches and argued that much of the granularity observed in previous works was in fact

unnecessary. In her (1978) work on complex nominals, she sought to reintroduce the

transformational approach advocated for by Lees (1960) and hoped to address some of the

criticism that had been leveled at his work, namely that there was little way to know what

predicates had been omitted during the derivation. Her solution was to introduce what she called

Recoverably Deletable Predicates (RDP), a small set of basic relations that, because there are so

few of them, are recoverable by the speaker at the surface level. She proposes 9 such RDPs

(three of which are reversible, allowing for a total of 12 types) and argues that they are largely

Page 173: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

159

sufficient to account for the majority of what she calls non-predicating compounds80. Her RDPs

consist of both basic verbs and prepositions:

Table 5.5. Levi’s (1978) Recoverably Deletable Predicates.

RDP Examples

Cause1 tear gas, disease germ, mortal blow

Cause2 drug deaths, birth pains, viral infection

Have1 picture book, apple cake, gunboat

Have2 government land, lemon peel, student power

Make1 honeybee, silkworm, musical clock

Make2 daisy chains, snowball, consonantal patterns

Use voice vote, steam iron, manual labor

Be soldier ant, target structure, professorial friends

In field mouse, morning prayers, marine life

For horse doctor, arms budget, avian sanctuary

From olive oil, test-tube baby, apple seed

About tax law, price war, abortion vote

Levi’s relations are as close to primitives as such an approach may allow. Her highly reductive

approach has the advantage of being sufficiently underspecified so as to capture a great many

types of compounds. Thus, Have may be used for constructions involving either possessive or

partitive relations (similar to Warren 1978). Of course, this also means that some compounds

that differ greatly in meaning are grouped together under the same predicate81. For instance,

Make2, under certain circumstances, conflates production (e.g. beeswax = made by) and

composition (e.g. snowball = made of). Perhaps more problematic is just how general her Be

80

Levi (1978) distinguishes between compounds such as atom bomb and atomic bomb via copular periphrasis (bomb that is atomic ~ *bomb that is atom). 81

Levi is of course aware of this fact and therefore discusses at length the matter of overlapping RDPs. The RDPs Cause and Make are especially difficult to differentiate. The reader is encouraged to consult Chapter 4 of Levi (1978) for additional insight into her classification.

Page 174: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

160

RDP is: not only does it group together compounds based on genus-species, coordination, and

resemblance, it also overlaps with her Make2 RDP (i.e snowball = ‘a ball that is snow’ or ‘a ball

made of snow’). These may or may not be real problems, depending on one’s opinion on the

matter, but they are nevertheless easily addressed with the addition of perhaps a few more

dividing lines. It should come as no surprise, then, that Levi’s work has been at the heart of a

great deal of research on the semantics of compounding and has since served as the basis for a

number of similar formalisms.

5.1.2 Recent Developments in Compound Relations

In recent years, largely due to renewed interest in compounds in both natural language

processing (NLP) and psycholinguistics, the notion of basic relations has resurfaced and has

been the focus of a great deal of research. Because recent work on the topic is predominantly

based on earlier research on compound relations, many authors simply list the relational

concepts without going into much detail regarding their characteristics or the data they are

meant to represent. Readers are instead invited to consult previous formalisms. Thus, while this

section offers an overview of these more recent approaches, it occasionally does so with only a

minimum of clarification as it is not always possible to elaborate on some of the finer points of

their models.

Leonard’s (1984) early work on automatic compound interpretation includes eight major types

of compounds, many of which are further subdivided according to the more restricted meaning

of certain combinations. Although not directly related to Warren’s (1978) classification system,

Leonard’s typology is in fact quite similar to it: her classes include, among others, Locative,

Annex, Equative, and Material. According to Leonard, her software, with the help of a robust

dictionary application, is able to generate the correct interpretation for her data (consisting of

roughly 2,000 compounds from works of fiction between 1719 and 1968) roughly 76% of time.

What is perhaps most interesting about Leonard’s work is that her approach allowed for her

relational associations to be expanded upon. In other words, once a compound’s semantic type

has been identified (i.e. sponge-bag: Locative), the program is able to transform it into a natural

sentence (i.e. “A bag for or containing a sponge or sponges”). This is, in effect, the opposite of

what Lees (1960) was arguing for.

Page 175: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

161

In contrast, Lauer’s (1995) own work on compound interpretation was directly related to

Warren’s (1978) classification system, but he relied on an understated aspect of her work.

Warren had in fact offered a number of prepositions as possible labels for her classes, which

Lauer used as the basis for his approach:

Table 5.6. Lauer’s (1995) preposition based treatment of compounds.

Preposition Example

Of state laws means laws of the state

For a baby chair means a chair for babies

In morning prayers means prayers in the morning

At airport food means food at the airport

On Sunday television means television on Sunday

From reactor waste means waste from a reactor

With gun men means men with guns

About war story means story about war

These connectors are supplemented with a few additional relations such as BE-copula, as well

as various labels for verbal-nexus compounds. Like Leonard, the purpose of Lauer’s work is to

provide a system for the automatic interpretation of compounds. His system automatically

determines meaning based on a probabilistic model that takes into account the affinity of

conceptual groupings (i.e. how likely two words are to be paired together), which is highly

facilitated by the limited number of possible outcomes in his system (i.e. eight basic

prepositions).

One will have no doubt considered that prepositions may not be the best candidates given their

highly polysemous (or ambiguous) nature. This is evidenced by Lauer’s use of the preposition of

with compounds that differ significantly from one another:

(75) a. jute products = ‘products made of jute’

b. health problems = ‘problems related to one’s health’

c. family business = ‘business run/owned by a family’

d. cupboard doors = ‘door that is part of a cupboard’

Page 176: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

162

Using of to paraphrase such compounds (e.g. products of jute) is of course semantically

acceptable. Just like many PP adjuncts involving prepositions, however, this approach hardly

offers the most accurate means of distinguishing between groups of nominal pairs—after all,

jute is not to product as health is to problem. Nor does this method truly eliminate compound

ambiguity as the preposition is unable to prevent other, equally plausible meanings from

emerging at the surface level (cf. Levi 1978).

Vanderwende (1994), also working in NLP, approached the problem from a different

perspective and reformulated the more conventional relations used elsewhere as wh-questions,

which she argues allows for greater ease in judging noun sequence classification. Her thirteen

classes are reproduced in the following table:

Table 5.7. Vanderwende’s (1994) classification schema of noun sequences.

Relation Conventional name Example

Who/what? Subject press report

Whom/what? Object accident report

Where? Locative field mouse

When? Time night attack

Whose? Possessive family estate

What is it a part of? Whole-Part duck foot

What are its parts? Part-Whole daisy chain

What kind of? Equative flounder fish

How? Instrument paraffin cooker

What for? Purpose bird sanctuary

Made of what? Material alligator shoe

What does it cause? Causes disease germ

What causes it? Caused-by drug death

While the results of her tests are promising (her algorithm has an accuracy rate of roughly 78%),

she admits that other categories, such as topic (i.e. What about?), might be necessary given

some of the more general interpretations she encountered (e.g. history conference =

Whom/what?). Nothing prohibits expansion of this particular approach either: one can rather

easily add to the basic list of wh-questions (e.g. Who/what uses it? Made from what? etc.). The

Page 177: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

163

fact that each wh-question can be associated to a conventional label, however, shows that the

fundamentals of her approach are very similar to those of previous models. The interrogatives

she proposes are in a fact a method for determining whether compounds are classified

correctly—they may therefore also prove useful when assigning relations to existing

compounds.

More recent work on compounds in NLP share much in common with this earlier work, though

the number of relations does differ greatly. Rosario and Hearst (2001) propose a total of 38

relations, 18 of which are considered dominant because of their frequency in the compounds

contained in their corpus. While the relations they use are loosely based on Warren’s (1978)

work, they are applied to compounds found in the highly specialized language of medical texts,

which they claim requires an expanded number of labels. Some examples of Rosario and

Hearst’s relations are Activity/Physical process (bile delivery, virus reproduction), Cause (1-2)

(AIDS death, automobile accident), Measure of (relief rate, asthma mortality), and Purpose

(headache drugs, HIV medication). Similarly, Girju et al. 2005, following their earlier work in

NLP involving compounds (Girju et al. 2003, Moldovan et al. 2004) propose 35 relations, a

number they argue is both necessary to account for most combinations and sufficiently limited

so as remain manageable. Most of these classes are also found in Adams (1973) and Warren

(1978). Girju et al. (2007) later reduced this number of relations to just seven general

associations. Even more recently, Séaghdha (2008) returned to the pared down approach

adopted by Lauer (1995), but instead based his compound relations on those proposed by Levi

(1978). Five of these relations were taken as is or amalgamated into new, more general labels:

Be, Have, In, Actor, Instrument, and About. Séaghdha’s algorithm, through a variety of means,

was able to correctly assign meaning for his dataset of 1,400 compounds approximately 70% of

time.

From a more cognitive perspective, Shoben’s (1991) discussion of conceptual combinations has

influenced a great deal of subsequent research on compound interpretation (Gagné and Shoben

1997, Gagné 2001, Spalding and Gagné 2007). Shoben’s 14 relations are, once again, very

similar to those used elsewhere: they include such primitive relations as Cause, Has, Make,

Uses, Located, etc. Although Shoben uses these relations for combinations that involve what he

calls non-predicating adjectives, it is clear from the examples he provides that they are in fact

what have been traditionally treated as NN compounds (e.g. tax law, oil money, finger toy, etc.).

Page 178: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

164

The research that stemmed from Shoben’s discussion of relational concepts has shown that

speakers are in fact keenly aware of the types of associations at play in compounding. For

instance, Gagné and Shoben (2002) found that speakers understood compounds more easily

when they were preceded by pairs that involved similar relations. Such research is arguably

reliant on the understanding that compound’s do in fact involve particular relational concepts

and that they can be reduced to some fundamental set of recurring associations.

Jackendoff’s (2009) work on compounds, expanded and further refined in Jackendoff (2010),

stands as the most recent attempt at reducing compound semantics to a set of fundamental

relations. His approach involves 14 basic functions, most of which can be found elsewhere in

the literature:

Table 5.8. Jackendoff’s (2010) 14 Basic Functions.

Basic Function N2 = Subject N1 = Subject

Classify beta cell ---

Argument wardrobe color chewing gum

Be boy king ---

Same/Similar piggy bank ---

Kind bear cub puppy dog

Be-Loc sunspot water bed

Comp(osition) rubber band sheet metal

Made apple juice sugar beet

Part backbone wheelchair

Cause sunburn ---

Make moonbeam honeybee

Be-Function handlebar ---

Have career girl gangster money

Protect lifeboat mothball

Jackendoff does introduce a few inovations, however. First, he formally acknowledges what

others have usually only implied regarding the possible bidirectionality of compound relations.

In other words, instead of specifying both a Part-Whole and a Whole-Part relation in order to

account for compounds in which “N1 is part of N2” and “N2 is a part of N1” respectively (cf.

Page 179: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

165

Downing 1977 and Warren 1978), Jackendoff instead introduces what he calls Reversibility.

Thus, as shown in Table 5.8, a number of his functions may be inverted, allowing for a more

complete account of the data without resorting to an increased number of relations. Second,

Jackendoff formalizes the application of his functions using his Lexical Conceptual Structures

(LCS: Jackendoff 1992), which, on the one hand, further reduces ambiguity by explicitely

stating how argument slots are filled and, on the other, allows him to combine relations in cases

where a compound might require a more complex representation. Finally, Jackendoff also

includes a component he calls proper function, which he uses to account for those compounds

whose meaning would otherwise fall outside his set of relations. Proper Function, as it is

presented, is conceptually similar to Pustejovsky’s (1995) telic quale and is loosely related to

what other authors have often called Purpose. Both Reversitiblity and Proper Function will be

discussed in greater detail in Sections 5.2 and 5.2.2.14 respectively.

Evidently, the major focus on compound relations has largely been on English. Arnaud (2003)

is, to my knowledge, one of the only authors to have applied a similar approach to French

compounds82. His work on French NN compounds led him to identify 54 “low level” relations

based on an inventory of 810 compounds, only 96 of which are also included in my data83.

These relations are said to exhibit a low level of abstraction and are thus meant to be as granular

as possible, similarly to Adams (1973) or Warren (1978). In fact, 19 of his relations have fewer

than 5 compounds each (two relations only have 1 compound each). Of course, this is not to say

that these relations are without merit, but they unfortunately offer little in the way of

generalisations regarding the semantics of French compounds. Recognizing this, Arnaud

grouped together his low-level relations into eight higher order ones that offered a much greater

degree of abstraction, which he dubbed “high-level relations:”

82

Barbaud (1971), working on French compounds, offers 4 “fundamental” relations (i.e. Attribute, Metaphorical, Complementarity, Coordination), but his work remains largely syntactic in nature. 83

Arnaud’s data was constituted using a variety of sources, including dictionaries such as Le Petit Robert and Le Larousse, as well as compounds encountered haphazardly or through Google searches (see Arnaud 2003: 95). He was therefore far more selective in his manner of gathering data than I was—for instance, he completely excludes coordinated (or dvandva) compounds from his corpus.

Page 180: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

166

Table 5.9. Arnaud’s (2003) high-level relations for French NN compounds.

Relation Gloss ((N1) N2) N1 is included in N2 (N1 (N2)) N2 is included in N1 N1 → N2 N1 toward N2 N1 ← N2 N2 toward N1 ÊTRE Predication of a quality (metaphorical) ANALOG Resemblance N1 SYMB N2 N1 symbolizes N2 N2 SYMB N1 N2 symbolizes N1

Arnaud’s approach is partly based on Hatcher’s (1960) highly abstract treatment of compounds.

To her four basic relations, Arnaud adds four more meant to account for compounds that exhibit

what he considers to be less literal associations (e.g. franc-or, kirch fantaisie, style nouille), but

which only make up a little more than 10% of his data. The majority of his compounds are in

fact covered by the four primary relations and encompass most of the basic ones mentioned

elsewhere. For instance, the inclusion relations (i.e. ((N1) N2) and (N1 (N2))) consist of

compounds based on Part-Whole, Location, and Composition associations, while the directional

relations (i.e. N1 → N2 and N1 ← N2) involve those based on Destination, Source, Purpose,

and Production.

While Arnaud’s work remains in-line with previous work on compounds, his set of relations are

arguably too extreme in their approaches. On the one hand, his low-level relations are far too

granular to be useful at a global or universal level. On the other hand, his high-level relations,

although they offer interesting generalizations regarding the interaction between the elements of

compounds, remain insufficiently explicit to provide any sort of method for disambiguation. In

other words, many compounds with different underlying relationships are grouped together in

less than meaningful ways; thus, both reliure cuir and poisson-scie are compounds for which

“N2 is included in N1,” but this says little about what this inclusion entails. Nevertheless,

Arnaud’s analysis reveals many useful facts about French NN compounds, one of which

pertains to the prominence of certain relations based on his data. Although the number of low-

level relations he uses is quite large, thus making it difficult to determine to what degree some

of them overlap, a few associations stand out: Location, Part-Whole, and Purpose are by and

large the most prominent relations he identified, together accounting for at least 35% of his

Page 181: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

167

data84. It is clear, then, that a large number of compounds involve a small number of associative

concepts.

5.1.3 Summary

In the previous sections, I have sought to show two things. First, that the work on compound

relations is well-established and offers a rich and diverse breadth of research with which to

work, and second, that while there is much disagreement between works, there is also

considerable agreement. In fact, from a purely qualitative perspective, there is perhaps more

agreement that there is disagreement. The following table summarizes the number of relations

found in each of the works cited above:

Table 5.10. Summary of the number of semantic relations present in the literature.

Author(s) Number of Major Semantic Relations85

Number of All Relations Including Sub-Types

Jespersen (1942) 11 - Hatcher (1960) 4 4986 Adams (1973) 11 70 Levi (1978) 9 22 Downing (1977) 12 - Warren (1978) 14 59 Shoben (1991) 14 - Vanderwende (1994) 13 - Lauer (1995) 11 - Rosario et Hearst (2001) 38 - Arnaud (2003) 8 54 Moldovan et al. (2004) 35 - Girju et al. (2005) 23 - Girju et al. (2007) 7 - Séaghdha (2008) 6 - Jackendoff (2010) 14 22

As Table 5.10 shows, the number of basic relations ranges anywhere from 4 to 38, with an

average of 14. When all sub-types are included, the number of distinct compound types balloons

84

This number is highly conservative, however, as Arnaud often labels compounds using multiple relations, arguing that they are in fact multi-faceted. He identifies these compounds as “complex” and says that they owe much of their complexity to the fact that some compounds can be interpreted in a number of different, yet equally valid ways. I will explore this concept in greater detail in Section 5.2.1. 85

Because Lees’s (1960) analysis of compounds is primarily syntactic, his work has been omitted from this table. 86

See Section 5.1 for a summary of Hatcher’s expanded classification system.

Page 182: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

168

to as many as 70. Despite the considerable range of relations proposed, there is nevertheless

significant overlap between them. In order to determine which relations would be retained for

the present work, every set of relations was compared against each other and parallels were

drawn wherever possible. For instance, whereas Downing (1977) would use Product for

compounds such as honey bee, Levi (1978) would use Make: I therefore consider these two

labels as both referring to the same relational concept (i.e. production). Appendix B contains a

table showing just what this comparison looks like. In the following section, I will discuss the

results of this research.

5.2 Retained Semantic Relations

Given the emphasis on the semantics of compounds, the relational concepts retained are largely

characterized by their semantic content and not their syntactic function. No matter how much

care or effort is put into establishing a set of basic relations, it is unlikely that it will suit

everyone’s needs or that it will apply to every type of compound examined. Considering the

sheer number of different approaches adopted over the years, Bauer’s (1983) words regarding

compound classification remain true today: “[a]ny method of subclassification is bound to be

controversial, and none can hope to win unqualified support” (202). One must therefore focus

on remaining as coherent as possible with not only one’s choice of relations, but also their

application to the data.

One of the reasons systems such as those proposed by Adams (1973), Warren (1978), or Arnaud

(2003) could proliferate to as many as 70 distinct types is that they were meant to account for

every compound included in their respective corpora. Of those researchers who offered far

fewer relations, one of two methodologies was adopted: either they applied their relations in the

most general way possible or they allowed for some compounds to resist their analysis. It is this

latter approach that will be adopted here. To put it plainly, certain compounds simply cannot be

reduced to a basic relation, while still remaining faithful to their meaning. I will call these cases

idiosyncratic compounds (e.g. année-lumière; cf. Warren 1978). These compounds differ from

those that might, for historical reasons, possess no discernible relation (e.g. compère-loriot) and

which may instead be called lexicalized compounds. This method is, I believe, very much in line

with the principal goal of this work, which aims to determine to what degree a given compound

can be considered transparent, given both its meaning and its form. The semantic relations

Page 183: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

169

retained here are thus not meant to categorize all compounds in my data. In fact, considering

that my data were constituted using entries in Wiktionary without prejudice, so to speak, it is

inevitable that a number of the compounds retained would defy classification. This approach

thus tacitly acknowledges that, from a semantic perspective, some compounds involve more

complex relational concepts than others.

This decision speaks to the necessity of establishing—and adhering to—a set of principles that

might guide the selection of the fundamental logico-semantic relations at play within

compounds. I therefore offer the following principles with regards to said relations87:

1) They should be meaningful. Relations should be sufficiently meaningful so as to

disambiguate the compounds to which they are applied.

2) They should be limited. Relations should not be based on ad-hoc associations or

proliferate in a manner that might render them redundant.

3) They should be representative. The relations retained should account for as many

compounds as possible without violating (2).

4) They should be distinct. The differences between relations should be sufficiently clear so

as to allow for their coherent application across the data.

Principle (1) rules out the use of prepositions as labels for relations as they are not able to

sufficiently disambiguate compounds (i.e. at might represent both time and location). It also

means that whatever labels are used, they should easily allow for expansion via some sort of

paraphrase. Principles (2) and (3), taken together, urges us to work toward a number of relations

that is both sufficient and necessary. Principle (4) emphasizes the need for a coherent and

cohesive formalism. As we will see, however, the fourth principle is also the most difficult to

respect. Two relations will often overlap in ways that make their application difficult, thus

requiring that tests or clear descriptions be offered in order to further distinguish between them.

87

These principles are loosely based on Séaghdha’s (2008) set of 5 criteria for establishing a classification scheme, which are Coverage, Coherence, Generalisation, Annotation Guidelines, and Utility.

Page 184: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

170

Given these basic principles, along with the careful comparison of the works discussed in the

previous sections, 15 relations were retained based on both their recurrence and salience in the

literature. By salience, I mean that although a given relation may not be listed across most

works, it may offer a sufficiently significant distinction so as to warrant inclusion. For instance,

“function as” is only truly discussed in Adams (1973) and Jackendoff (2010), but it offers a

useful layer of granularity to the list. This particular relation is often grouped with copulative

compounds (i.e. A is a B), but since hypernymic and coordinating relations often receive,

despite their own copulative status, distinct treatments, it seems reasonable to view function as

its own category. This allows us to distinguish, for example, between compounds such as buffer

state (i.e. ‘state that functions as a buffer’) and girlfriend (i.e. ‘friend that is a girl’).

The relations retained for this project are listed in Table 5.11 and will be explained in greater

detail in the following sections. The labels reflect many of those found in the literature, but the

use of substantive forms is not rooted in any particular approach; nothing hinges on this

particular choice of nomenclature.

Table 5.11. Logico-semantic relations retained in this work.

COORDINATION COMPOSITION TIME

HYPERNYMY SOURCE TOPIC

SIMILARITY PART FUNCTION

PRODUCTION LOCATION PURPOSE

CAUSE POSSESSION USE

Many of the relations listed above are in fact reversible, a term borrowed from Jackendoff

(2010), but also present in Warren (1978) and referred to as “direction” in Séaghdha (2008). In

essence, a relation is reversible if either the head (as in 76a) or non-head element (as in 76b)

may function as its subject:

(76) a. tear gas ‘gas that causes tears’ gas CAUSE tears

b. motion sickness ‘sickness that motion causes’ motion CAUSE sickness

In Levi (1978), only the Make and Cause RDPs are reversible, but Warren (1978) shows that all

of her classes, save Purpose and Copula, are reversible. Likewise, of the fourteen basic functions

in Jackendoff (2010), nine are said to be reversible. Interestingly, Jackendoff does not outright

Page 185: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

171

state that Cause is reversible, although this seems likely given Levi’s (1978) analysis of this

RDP (cf. the compounds in 76). Reversibility is in fact implied in a number of other works,

usually by stipulating the same relation twice. Downing (1977), for instance, differentiates

between Whole-Part and Part-Whole, a distinction that clearly relates to the order of the

constituents. Similar treatments can also be found elsewhere (Adams 1973, Shoben 1991,

Vanderwende 1994, Arnaud 2003, etc.).

Reversibility is relevant for French as well (cf. Arnaud 2003), as evidenced by the following

compounds:

(77) a. piétin-échaudage = ‘piétin qui cause l’échaudage’ [N1 causes N2]

b. arrêt-maladie = ‘arrêt causé par une maladie’ [N2 causes N1]

c. marche-palier = ‘marche qui fait partie d’un palier’ [N1 is part of N2]

d. stylo-bille = ‘stylo dont une bille fait partie’ [N2 is part of N1]

When applicable, reversibility, along with the appropriate paraphrases, will be included in the

presentation of the relations.

How might the relations in Table 5.11 above apply to French compounds? To answer this

question, they were used to label 729 NN, as well as a smaller set of 319 N à N compounds, all

taken from Wiktionary (see Chapter 3 for details on the data). Before going into greater detail

regarding the meaning of each of the above relations, however, a few words must be said

regarding some of the challenges related to the selection of a particular relation for a given

compound.

5.2.1 Interpreting Compounds

In an attempt to make labeling compounds as straightforward as possible, explanations are

provided for many of the choices made. Unsurprisingly, however, the task of selecting a

particular relation over another for a given compound poses a number of challenges. Chief

among them is the unavoidable semantic overlap present for many of the retained relations.

Many authors have discussed this problem at length, but seldom can a definitive solution be

offered. For instance, Jackendoff (2010) says the following of two of his basic functions: “It is

sometimes hard to distinguish MAKE from CAUSE. Perhaps MAKE (X,Y) decomposes as CAUSE

Page 186: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

172

(X, (COME INTO EXISTENCE (Y)).” The blurred line between these compounds perhaps explains

why he lists knife wound under both of these functions, but the matter is not addressed any

further. As for MADE FROM and COMPOSITION, Jackendoff insists that they differ in that for

composition “the object or substance is no longer in evidence,” but that “[t]he distinction is

however slippery” (440). These comments echo those of many of the authors discussed in the

previous sections.

Another, perhaps more vexing issue, however, stems from the fact that many compounds allow

for a number of different interpretations. Take, for instance the following set of English

compounds:

(78) a. dog house ‘house in which there is a dog’ (LOCATION)

‘house for a dog’ (PURPOSE)

b. peanut butter ‘butter made from peanuts’ (SOURCE)

‘butter which consists of peanuts’ (COMPOSITION)

c. bear country ‘country that has bears’ (POSSESSION)

‘country in which bears are located’ (LOCATION)

The multiplicity of meaning illustrated in these examples is distinct from the one discussed at

the beginning of this chapter, where a compound was said to be ambiguous because it could

potentially have different meanings, but that only one was actually acceptable or attested. In the

examples above, however, either interpretation is in fact correct. If one feels that one paraphrase

is clearly better than the other, it should be noted that all of the interpretations in (78) can be

found in the literature on compounds.

This issue is most likely a cross-linguistic one. A quick look at a few French compounds reveals

that they too are susceptible to multiple analyses, a fact that Arnaud (2003) had already

acknowledged and for whom the solution was to simply assign multiple relations to a single

compound:

(79) a. carte-index ‘carte servant d’index’ (FUNCTION)

‘carte sur lequel il y a un index’ (LOCATION)

Page 187: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

173

b. passage-piétons ‘passage destiné aux piétons’ (PURPOSE)

‘passage utilisé par les piétons’ (USE)

c. lit-cage ‘lit qui ressemble à une cage’ (SIMILARITY)

‘lit ayant une cage en tant que partie’ (PART)

Again, arguments exist in support of any of the above interpretations. Levi (1978) refers to this

particular issue as an “indeterminacy of analysis,” stating that “there seems to be no principled

reason to prefer one [RDP] over the other” (263). She ultimately dismisses the problem, arguing

that the existence of “double analyses” is simply a consequence of a language system that shows

a great deal of inter-dependency. To say, for instance, that a man has a beard entails that there is

a beard located on his face or that a beard is part of him. Jackendoff (2010) largely agrees with

Levi and argues that this multiplicity of meanings should not, under normal circumstances, be

viewed as a case of ambiguity, but that it should instead be understood as characteristic of

certain compounds, which he labels as promiscuous. According to Jackendoff, any one of the

possible interpretations is equally valid. Marchand (1960) was of the same opinion, claiming

that

“[w]hether a night shirt is a ‘shirt for the night’ or a ‘shirt worn at night’ is quite unimportant. In forming [compounds] we are not guided by logic but by associations. We see or want to establish a connection between two ideas, choosing the shortest possible way. What the relation exactly is, very often appears from the context only.” (Marchand 1960: 22)

Without context, then, any analysis may be correct so long as it remains faithful to what we

know about the meaning of a particular compound.

This “indeterminacy of analysis” is in fact so pervasive that it is not uncommon for compounds

to be treated differently from one work to another. So as to quantify just how disparate some

analyses can be, I gathered as many of the example compounds as I could from five different

works on relations: 648 from Adams (1973), 485 from Warren 1978, 387 from Levi (1978), 378

from Lauer (1995), and 389 from Jackendoff (2010). In total, nearly 2,300 English compounds

were collected. Of these compounds, only 114 were present in more than one work. It is

surprising to see so little overlap in the compounds examined (given that the object of study in

Page 188: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

174

all works was NN compounds), but it’s unfortunately unclear why this is the case88. What is

clear, however, is just how different the treatment of identical compounds is from one work to

another: of the 115 duplicate entries, just under half (56) were treated similarly by two or more

authors. The following table shows the 59 compounds that were interpreted differently in each

work89:

Table 5.12. A comparison of compound interpretations across five different works.

Compound Adams 1973

Levi 1978

Warren 1978

Jackendoff 2010

Lauer 1995

anthill Part-whole Make1 apple core Have2 Part1 bear country Locative Have1 Loc2 bird sanctuary For Protect2 bloodstain Instrumental Comp1 bookmark Locative Argument booster shot Appositional (is a) Serve bull ring Locative For career girl Part-whole Have1 crystal structure Whole-part Argument doghouse For Loc2 (PF) eyeball Associative (part) Source-Result farm boy From Loc1(Char) fighter plane Appositional (is a) Serve fingerprint Instrumental Make1 fireplace Composition/Form/

Contents Purpose fisherman Copula Be food surplus Source-Result Argument football Instrumental Source-Result/Purpose frogman Resemblance Be gangster money Origin-object Have2 garlic bread Part-whole Loc2 garter snake Resemblance (char) Be Similar gas mask Instrumental Protect2 grain alcohol Instrumental From Made1 guidebook Instrumental Serve handlebar Appositional

(function) Copula Serve

88

A number of explanations are in fact available. One possibility is that the source data are intimately related to the period during which they were collected: nearly forty years separate Adams’s work from Jackendoff’s, for example. Another possibility, however, is that the methods used in the selection of compounds in each study (i.e. source, criteria, classification, etc.) were sufficiently dissimilar to result in vastly different data sets. 89

Some of the relations have been truncated to their most recognizable designations. When a relation includes a number (e.g. Make1, Have2), it indicates whether the head is the object or subject of the relation.

Page 189: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

175

Compound Adams 1973

Levi 1978

Warren 1978

Jackendoff 2010

Lauer 1995

handlebar moustache Resemblance Be

headache Subject-Verb Loc1 headache pill For Protect hearing aid Instrumental Argument hermit crab Resemblance Be honeybee Subject-Verb Make1 Make2 houseboat Appositional

(function) Copula immigrant minority Make2 Source-Result lifeboat Instrumental Protect1 lightning rod For Protect2

loaf sugar Composition/Form/Contents Comp2

mothball Instrumental For Protect2 mouse trap Instrumental Argument musk deer Make2 With particle shape Whole-part Argument peanut butter Composition/Form/

Contents From

picture book Composition/Form/Contents Have1

prose poem Source-Result Be puppet government Resemblance Be (Copula)

sandpaper Composition/Form/Contents Loc2

sandstone Part-whole Similar silkworm Subject-Verb Make1 Make2 sunburn Subject-Verb Cause tablecloth Purpose Loc1(Pf) tea room Locative For tear gas Instrumental Cause1 textbook Be Part-Whole toothache Subject-Verb Loc1 union member Whole-part Argument wall board Purpose Comp2 wardrobe color Whole-part Argument windshield Instrumental Protect2

In some cases, the differences in treatment are simply due to gaps (or to highly generalized

relations) in a particular formalism. For instance, frogman is based on the resemblance relation

under Adams’s analysis, while it is grouped under Be in Levi’s work, the reason being that Levi

has no resemblance predicate—all such compounds are treated as cases of metaphorical

copulatives. In many cases, however, the differences are due purely to the author’s personal

interpretation: bear country is locative for both Adams and Jackendoff, but purposive for Levi;

Page 190: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

176

anthill is based on a part-whole relationship for Warren, but on a production one for Jackendoff.

Nor is it unheard of for an author to include the same compound under two different relations

within their own work. Jackendoff (2010), for instance, lists ferry boat under both the Kind and

Serve functions (‘a boat of kind ferry’ ~ ‘a boat that serves as a ferry’); stew beef is first said to

be based on the Made function (‘beef from which stew is made’), but is later listed under Part

(‘beef that is part of a stew’)90. Similarly, Warren (1978) is unsure whether a spacecraft

expresses a locative or purposive relation and so lists it under both classes. These dual category

compounds show just how difficult it can be to determine what basic relation is at play in certain

constructions. In some instances, nearly identical compounds are given different treatments. It is

unclear, for example, why Jackendoff views sunburn as a case of Cause (‘burn caused by the

sun’), but suntan as an occurrence of Make (‘tan made by the sun’). These types of analyses,

while perhaps infrequent, do occur.

It is clear then, that the only solution is to strive for consistency above all else. Unfortunately,

this proves surprisingly difficult. For most of the works cited above, many of the compounds

under investigation are not in fact institutionalized, which means that resolving any issues

related to interpretation is not simply a matter of consulting reference works. Given that this

project relies on entries from Wiktionary, one should, in theory, be able to base all

interpretations on the definitions it provides, but this method doesn’t always provide clear and

consistent results:

(80) a. boîte à outils “Coffret où ranger les outils [...]”

b. boîte à camembert “Boîte en bois léger ou en carton pour le camembert.”

Should boîte à outils be treated as locative and boîte à camembert as purposive or should they

be treated identically? This issue is only further exacerbated by the numerous condensed or

truncated definitions that fail to provide sufficient information to determine how a compound’s

elements might be related, often requiring that additional research be conducted. This is a

90

Such heterogeneous treatments for a single compound may be due to a Type vs. Token reading on the author’s part. Therefore, treating stew beef as an instance of the Made relation might be due to a type (or intensional) interpretation, while Part would be its token (or extensional) interpretation. It must be noted, however, that all authors seem to be interested solely in compound types and not tokens, which is also the perspective adopted here.

Page 191: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

177

shortcoming not only of Wiktionary, but also of more established dictionaries. Take, for

instance, the following two definitions for chêne kermès:

(81) a. Wiktionary: “Arbuste méditerranéen piquant de la famille des Fagacées.”

b. LPR2010: “Espèces méditerranéennes à feuilles persistantes.”

Neither of the definitions in (81) mentions kermès, let alone how it relates to chêne. The only

solution is to adopt a methodology that might allow one to minimize, though probably not

completely eliminate some of the problems posed by these so-called promiscuous compounds. It

is therefore with this issue in mind that I attempt to offer as many distinctive features as possible

for the relations used in this work. This is an important component to the analysis considering

that some relations might be said to share a number of characteristics (Location and Part; Cause

and Production; etc.). Whenever possible, tests and explanations are provided that might help to

further ensure that the relations are applied consistently, or at the very least, to provide

justification for some of the decisions I have made.

5.2.2 Presentation Format

Traditionally, the elements of a compound are referred to as N1, N2, N3...Ni (or X, Y, W, Z), but

this doesn’t allow for much flexibility when stipulating how a relation applies to a compound. In

other words, by stating that a relation such as causation is applied as “N2 causes N1” (as, for

instance, Jackendoff 2010 does), one fails to account for headedness, or rather, one assumes that

the head is always the rightmost constituent. This is relatively trivial for English compounds as

they are nearly always right-headed, but not all languages are so rigid when it comes to the

position of the head. Although French is mostly left-headed (requiring that the above causal

relation be instead stated as “N1 causes N2”), right-headed compounds in French are not

unheard of. For this reason, I will use the labels H(ead) and M(odifier) when stating relational

associations and their application91. Of course, this means that position must then be stated

independently (i.e. H=N1, M=N2), but this should arguably have little effect on the presentation

91

Similar to Pham and Baayen (2013), the abbreviations used are as follows: H = Head constituent (fr. T = Tête); M = Modifier constituent; C = Compound. Although the term modifier is usually reserved for subordinative compounds (Scalise and Bisetto 2009), it will also be used for the non-head element of compounds that would otherwise fall under the coordinate type (e.g. boy-king).

Page 192: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

178

of the relations. This nomenclature also has the added advantage of being language independent,

provided that a head element can be identified.

It must also be noted that many of the authors whose work has inspired the following relations

focused on compounds denoting concrete objects and not abstract concepts. The relations, as

they are put to use here, have a relatively broad application, that is to say that a relation such as

PART does not necessarily have to involve physical objects. This particular facet of my approach

will be further elaborated upon as each relation is discussed in greater detail.

The retained relations are presented using the following table:

RELATION

Relation Type Structure Template Examples Linking Material

Basic H REL M H REL M NN

verbs, prepositions, etc. Reversed

H that M REL H que M REL

NN

For each relation, a basic structure template (or paraphrase) is provided for both English and

French compounds. Examples are also given for both languages, along with a list of possible

linking material that can be used to paraphrase a given compound syntactically (e.g. toolbox =

box for tools). This linking material is meant to draw parallels between the retained relation and

those proposed elsewhere in the literature and may include such items as verbs (e.g. have, cause,

make, etc.), prepositions (e.g. for, from, of, etc.), and even nouns (e.g. kind, type). Whenever

appropriate, the reversed form of the relation is also included.

The following sections will largely focus on NN compounds as they offer few, if any, methods

of disambiguation and are thus most likely to make use of all the retained relational concepts.

Furthermore, the research upon which the relations are based has focused almost exclusively on

NN compounds. A discussion of N à N compounds, however, will be provided whenever it is

deemed pertinent (i.e. when a particular relation is only marginally applicable for NN

compounds) and will instead be a major component of the following chapter, which will seek to

determine to what degree these relations coincide with what has already been said regarding the

preposition à.

Page 193: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

179

5.2.2.1 Hypernymy

HYPERNYMY

Relation Type Structure Template Examples Linking Material

Basic an H of kind M un T de sorte M

oak tree banane plantain kind of, type of

sorte de, type de Reversed an H that M is a kind of ?un T dont M est une sorte

bear cub ?argent métal

Some compounds consist of two semantically related words, one of which is in fact a more

specific term for the other. A number of authors have identified these types of compounds and

have classified them in a variety of ways. Adams (1973) lists compounds such as football game,

repair job, and teaching profession under her appositional class and paraphrases them as “B of

which A is a particular instance” (69). In Warren’s framework, these types fall under her

Copula: Subsumptive class (cf. Marchand 1960). Vanderwende, whose approach is based on

interrogative structures, categorizes hypernymic compounds using “What kind of?” Others,

however, simply group them together with other copula (i.e. A is a B) compounds (Levi 1978,

Séaghdha 2008). A test for hypernymy, taken from Marchand (1960), is based on bi-

directionality: for oak tree, for example, both an oak is a tree and the tree is an oak are true

(42). In French, most cases seem to involve either animals (82a), plants (82b), or substances and

minerals (82c):

(82) a. chouette-effraie, larve échinocoque, chat serval

b. banane plantain, houx frelon, menthe pouliot

c. quartz morion, salicaire pourpier, zéolithe cyanite

Based on the data examined, the HYPERNYMY relation does not seem to be reversible in French,

although, according to Jackendoff (2010), it is in English (e.g. puppy dog ~ bear cub).

Unfortunately, he only provides two such examples (seal pup and bear cub), both of which are

animal-young combinations, stating that they may also be open to other analyses (438). Warren

(1978), on the other hand, states that the species-genus (i.e. subsumptive) relation is not

reversible (105). This assertion is seemingly supported by the French data as all such

compounds have the head element as hypernym and the modifier as hyponym. There is,

however, at least one case that could be treated as a reversed instance of hypernymy, that is to

Page 194: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

180

say one in which the head constituent plays the part of hyponym: argent métal is defined as

‘argent qui est un métal’ so as to distinguish argent meaning ‘metal’ from its homonym

meaning ‘money’. If this is in fact a case of reversed hypernymic compound (and not simple

coordination), then there may be other cases involving homonymic or highly polysemous

constituents. I will, however, leave open the question of reversibility for this particular

compound type, which seems prudent given the divided consensus on its applicability.

5.2.2.2 Coordination

COORDINATION

Relation Type Structure Template Examples Linking Material

Basic a C is an H and an M un C est un T et un M

boy king auteur-compositeur is also, is both / and

est aussi, est à la fois / et Reversed --- ---

Coordination is typically used for compounds whose elements may be coordinated using and.

More specifically, coordinating compounds refer to combinations that, from a semantic

perspective, seem to involve both elements equally. Jespersen (1956) identified two such types.

The first, which he called copulative, is defined as “AB means A plus B” (e.g. Alsace-Lorraine).

This type, according to Bauer (1978), is the traditionally accepted meaning of dvandva

compounds. In recent years, however, this Sanskrit term has had a far less restrictive usage (see

Scalise and Bisetto 2009) and often includes the second type identified by Jespersen, which he

called appositional and which refers to pairings that mean “at the same time A and B, the two

combined in one individual (e.g. maid-servant)” (144). The fact is that the terminology

employed for many of these types of compounds is far from conventionalized92. Coordination,

as it is used here, groups together a number of different, yet related types, all of which may

traditionally fall under one of three frequently used labels: copulative, appositional, and

coordinate.

92

For instance, Marchand (1960) used the term copulative for compounds such as fighter-bomber, while Jespersen (1956) used the term appositional; for Scalise and Bisetto (2009), however, these types are coordinate compounds.

Page 195: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

181

These types of compounds were examined in Chapter 4 while discussing headedness, where it

was said that they may, under certain conditions, be considered bi-centric as both constituents

can function as hypernyms of the compound. From a purely semantic perspective, however,

coordinated compounds seem to involve several types of coordination. According to Wälchli

(2005), co-compounds may be grouped together based on the type of “natural coordination”

involved, which he describes as a “coordination of items which are expected to co-occur, which

are closely related in meaning, and which form conceptual units” (5). Many languages, for

instance, possess compounds consisting of the words for mother and father and which mean

‘parents’. Family ties may thus be said to involve natural coordination. Wälchli’s corpus

primarily consists of Eurasian languages, but he also makes extensive use of East and South

East Asian languages, which are known to involve a great deal of coordination in compounding

(Arcodia et al. 2010). Based on his research, Wälchli identifies ten semantic classes of co-

compounds. Unfortunately, few of these classes are applicable to the compounds typically

labeled as coordinated for germanic and romance languages. This is not necessarily surprising as

few of Walchli’s co-compounds denote just one entity, which is in fact related to the idea of

natural coordination. Thus, both Mordvin t’et’a.t-ava.t ‘father.pl-mother.pl’ = ‘parents’ and

Georgian mšvild-isari ‘bow-arrow’ = ‘bow and arrows’ are said to be additive co-compounds

because they “denote pairs, each consisting of the parts A and B” (2005: 137-138).

Bauer’s (2008) classification of coordinated compounds, on the other hand, contains five major

types, one of which is the classic dvandva type, which he further subdivides into five sub-types.

Bauer reserves the dvandva label for those compounds that adhere to the Sanskrit description

(cf. Burrow 1955) and thus reclassifies a number of compounds that had, over the years, been

labeled as dvandvas by virtue of the coordination of their elements, but which Bauer argues is

only partly related to what the Sanskrit grammarians understood to be dvandvas. True dvandva

compounds, according to Bauer, require that both constituents be of equal status. His five major

types of coordinated compounds are summarized in the following table:

Page 196: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

182

Table 5.13. Bauer’s five main types of coordinated compounds.

Type of coordination English Example French Example

Translative London-Edingburgh (Express) (vol) Paris-Londres

Co-Participant Mother-Child (Relationship) (relation) médecin-patient

Dvandva Austro-Hungary (among others) Alsace-Lorraine

Appositional singer-songwriter auteur-compositeur

Hyponym-Superordinate oak-tree banane-plantain

For the purposes of this work, the coordination relation is used to refer to any compound in

which its constituents may be coordinated using, at a minimum, the conjunction and. For a more

exhaustive look at how this relation may be instantiated, including within exocentrics, the reader

is encouraged to consult Wälchli’s (2005) inventory of ten semantic classes of co-compounds or

Bauer’s own extended typology (2008).

Based on the examination of the French data, the first type of coordination involves compounds

whose elements denote two aspects or features of equal status. They usually denote a person

(83a), an artefact (83b), and occasionally an establishment (83c):

(83) a. analyste-programmeur, auteur-compositeur, cardinal-diacre

b. canapé-lit, chargeuse-pelleteuse, moissonneuse-batteuse

c. bistro-brasserie, café-bar, restaurant-bistro

These particular compounds are most plainly paraphrased as “an H that is also an M” (e.g. un

analyste qui est aussi un programmeur), or by the more equative paraphrase: “a C that is both an

H and a M” (e.g. un analyste programmeur est à la fois un analyste et un programmeur). In this

regard, coordinating compounds are in fact particular types of copula constructions as they

allow for an IS A based paraphrase, but remain distinct from those listed under HYPERNYMY

because neither element is subsumed under the other (i.e. *un programmeur est un analyste). It

should also be noted that the use of a copulative verb for these compounds also distinguishes

them from those categorized under the SIMILARITY and FUNCTION relations. A number of other

compounds involving other noun types exist and are relatively frequent (e.g. hotellerie-

restauration, aller-retour, roulage-décollage, quinte-flush). Because these types of compounds

arguably possess two heads, reversibility is largely a matter of interpretation (see Chapter 4,

Section 4.1 for a discussion of centricity). Although reversibility is technically possible, it is of

Page 197: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

183

little consequence: to say that a singer-songwriter is a singer who is also a songwriter, or vice-

versa, does not give rise to a significant shift in meaning93.

In contrast with the constructions described above, a number of coordinating compounds instead

denote hybrid entities. They do not typically permit the first type of paraphrase mentioned

earlier (“an H that is also an M”):

(84) point-virgule, âne-zèbre, femme-renarde, jupe-culotte, punk-rock, roman-feuilleton

One reason for this limited periphrasis is that the designatum of each of the compounds in (84)

is in fact neither entity, but instead a mixture of them. In other words, a point-virgule is neither a

period nor a comma, but is instead a form of punctuation with the features of both. If one tackles

this from the perspective of function, the distinction is much clearer: a point-virgule does not

accomplish or serve the same purpose as either a period or a comma does. Compare this with

those compounds in (83) above, which do in fact function as either/or (i.e. an analyste-

programmeur does what an analyst does, as well as what a programmer does; a canapé-lit is

used as either a sofa or a bed; etc.).

An interesting set also treated as cases of coordination and related to the hybrid compounds in

(84) are based on an N-garou pattern:

(85) animal-garou, loup-garou, ours-garou, chien-garou, etc.

Very little can be said about these cases as they are all patterned on loup-garou, garou alone

having originally meant part-man, part-wolf, but which, according to the TLFi, later underwent

expansion to explicitly include the word for wolf. They are included under coordination based

on the assumption that garou now simply means ‘human-like monster’, which is then

coordinated with the lexeme denoting an animal.

93

One might argue that reversing the order of such compounds would result in an adjustment of prominence, that is to say a singer-songwriter is a singer first and songwriter second, but this is a different issue related to the actual order of the elements and not the interpretative one (i.e. a songwriter that is a singer ~ a singer that is a songwriter).

Page 198: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

184

The copula paraphrases mentioned earlier are not definitive criteria for coordinating compounds,

however, as some compounds seem to focus on some middle space between the designatum of

its elements. These are related, to some degree, to the hybrid compounds in (84) above:

(86) nord-ouest, baryton-basse

These particular types are not in fact very frequent and mostly consist of cardinal points.

Although north-west might be said to indicate a point that is both north and west, it is in fact

more accurate to say that it is in fact referring to a point somewhere in between the two. The

same can be said for baryton-basse. They nevertheless involve coordination (i.e. north and west)

and so are treated as such. An additional set of compounds that might also be said to indicate

some sense of “in between” is related to designations of rank or titles:

(87) lieutenant-colonel, lieutenant général, sergent major

Again, one might argue that these are cases of “both A and B,” but the truth is that they are

neither and that the whole refers to someone situated between the two ranks denoted by the

elements.

It is also worth noting that exocentric compounds may also rely on the COORDINATION relation,

as shown in the following examples:

(88) a. jambon-beurre ‘sandwich composé de jambon et de beurre’

b. huppe-col ‘oiseau ayant à la fois une huppe et un col’

c. épinard-fraise ‘plante ayant des feuilles ressemblant aux épinards et des baies

ressemblant aux fraises’

The compound in (88c) differs from the others in that the coordination of the elements relies on

an additional relational factor, that is to say one involving resemblance. Another similar instance

of this type might be fibre-cellule, which according to a number of sources is neither a fibre, nor

a cell94, although there may be reason to treat it as “fibre that is similar to a cell.”

94

“L’usage a fait adopter ce substantif, introduit par les anatomistes allemands, malgré l’opposition qui existe entre la valeur des mots fibre et cellule ; mais les éléments anatomiques qu’il sert à désigner ont à la fois la forme généralement étroite, allongée, aplatie, de beaucoup de fibres, et quelque chose de la structure des cellules, en ce

Page 199: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

185

Some might argue that the compounds discussed above would benefit from being treated as sub-

types of coordination (cf. Adams 1973). Although such an approach does have its advantages

(mainly that it further disambiguates between these particular types), it also needlessly

introduces relations to account for compounds that only differ slightly in meaning. They are all,

fundamentally, cases of coordination, which is evidenced by the possibility of paraphrasing

them in more or less the same manner. In fact, the differences of meaning observed (i.e. hybrid

~ intersection) often stem from extralinguistic constraints related to how two elements may be

coordinated. For instance, while nord-ouest may very well be paraphrased as ‘nord et ouest,’ it

is impossible for something to occupy these two points at the same time. In this case,

COORDINATION will thus favour an intersective reading.

5.2.2.3 Similarity

SIMILARITY

Relation Type Structure Template Examples Linking Material

Basic an H that is similar to M un T qui est semblable à M

ant lion fourmi-lion similar to, like

semblable à, comme Reversed --- ---

The SIMILARITY relation is based on a general degree or aspect of “likeness,” although just what

exactly this “likeness” might involve is not always entirely clear. Such a criterion is, of course,

highly indeterminate and thus allows for a wide range of possible interpretations. This is why

I’m calling SIMILARITY the loosest possible compound relation. As Warren (1978) says, two

objects might be similar in any number of ways. In fact, based on her data, a more granular

approach to similarity would need to account for at least 18 different types. The French data,

although not quite as multi-faceted as Warren’s, seem to support this finding as well:

qu’elles renferment un noyau central ou quelquefois deux, avec ou sans granulations moléculaires autour de lui” (Nysten et al. 1858. Dictionnaire de médecine, de chirurgie, de pharmacie, des sciencies accessoires et de l'art vétérinaire).

Page 200: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

186

(89) a. oiseau-mouche H looks like M

b. pomme cannelle H smells like M

c. oiseau-cloche H sounds like M

d. fermeture éclair H is fast like M

e. taupe grillon H behaves like M

f. mot obus H functions like M

g. mot-valise H is formed like M

h. roman fleuve H flows like M

The different kinds of similarities are numerous enough—and possibly idiosyncratic enough—to

consider grouping them together under a single relation. Of course, such a relationship requires

that the speaker attempt to determine just what the similarity is based on, which is perhaps

further evidence that a compound’s meaning is established by evaluating the compatibility

between the items’ features using some sort of schema or slot-filling mechanism (cf.

Wisniewski 1998, Baroni et al. 2007, Lieber 2009). Although the adoption of SIMILARITY may

seem to violate the principle requiring that all relations be meaningful, the alternative—which is

to say, further distinguishing between various types of similarities—would violate both the

principles of limitedness and representativeness.

Note that the SIMILARITY relation is not reversible, as doing so produces an implausible

equivalence:

(90) catfish ‘a fish that looks like a cat’ ‘*a fish that a cat looks like’

It is also worth noting that there have been efforts to distinguish between a broad similarity

relation and one involving shared physical attributes. Under more granular approaches, physical

similarity corresponds to “B which is in the form of, has the physical features of, A” (VIIA1,

Adams 1973) or “N2 designates via analogy a perceivable characteristic of N195” (Arnaud

2003) and might be represented by a number of different expressions such as “B is

like/resembles A” (Lees 1968, Warren 1978) or even “N2 is similar to N1” where the similarity

is understood to mean physical resemblance (Séaghdha 2008, Jackendoff 2010).

95

“N2 désigne par analogie une caractéristique perceptive de N1” (Arnaud 2003: 75).

Page 201: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

187

Despite the fact that some authors have chosen to treat physical resemblance as distinct from

other types of similarities, many have in fact grouped together these particular relational

concepts under a single heading. For Warren (1978), resemblance entails that “B is similar to A

in some respect or respects” (108) (see also, Downing 1977, Levi 1978, Séaghdha 2008). This

broader interpretation of likeness is entirely justified given that physical resemblance is simply a

narrower instance of similarity: if something looks like something else, than it is also similar to

that thing. SIMILARITY, as it is used here, therefore covers all manner of shared features,

regardless of their nature.

The SIMILARITY relation, however, does possess a number of application issues. First, when

referring to physical similarities, it occasionally involves a meronymic association where it is in

fact the individual parts and not the whole that is targeted by the relation:

(91) a. clé crocodile, raie léopard, poisson-chat

b. requin marteau, plante crayon, tortue-boîte

In (91a), parts of both elements are involved in the similarity (i.e. clé crocodile = the teeth of the

wrench look like the teeth of a crocodile), while in (91b), only part of the head element is

targeted (i.e. requin marteau = the head of the shark looks like a hammer). Because the

relationship between the elements in the compounds in (91a) cannot be said to be based on a

part-whole relation (e.g. *clé qui fait partie d’un crocodile / *clé dont un crocodile fait partie),

they are treated as cases of SIMILARITY; conversely, because the elements in (91b) involve a

part-whole relation (e.g. requin qui a un marteau en tant que partie), albeit one that relies on a

metaphor, they are included under the PART relation (see Section 5.2.2.6). This distinction is

similar to the one made by Arnaud (2003), though his corpus actually contains few examples of

compounds like those in (91).

Another related type of SIMILARITY involves colour and is present in two different, yet related

kinds of compounds:

(92) a. bleu ciel, jaune paille, rouge sang

b. beurre noisette, liane-corail, pierre miel

Page 202: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

188

The compounds in (92a) can be paraphrased as “H that is the colour of M,” while those in (92b)

are best paraphrased as “the colour of H is the colour of M.” Both types are treated as cases of

SIMILARITY as they may both be paraphrased using this relation (e.g. bleu semblable au ciel;

beurre semblable aux noisettes).

5.2.2.4 Function

FUNCTION

Relation Type Structure Template Examples Linking Material

Basic an H that serves as M un T qui sert de M

buffer state papier filtre functions/serves as

sert de / fonctionne en tant que Reversed --- ---

This particular relation is not in fact frequent in the literature: it is most prominent in Adams

(1973) and more recently in Jackendoff (2010). It is retained here because it highlights key

differences between compounds that might otherwise be treated as either cases of

COORDINATION or SIMILARITY. This relation groups together compounds in which one element is

fulfilling the function of the other. For instance, although many have treated compounds such as

houseboat as a simple copula construction (i.e. a boat that is a house, cf. Warren 1978), this

approach in fact glosses over a critical aspect of this compound, namely that it is not, strictly

speaking, a house, but rather a boat used as one. A few examples from French are as follows:

(93) cellule assistante96, circuit tampon, gaz propulseur, logiciel antivirus, papier filtre

Distinguishing between coordinating compounds and those involving functionality is not always

simple. The chief distinction between them is that the former will allow either of its elements to

function as the head (as in 94a), while the latter typically produces marginally acceptable

paraphrases (as in 94b):

96

This construction may also be treated as an NA compound, where assistante is understood as an adjective. It is treated here as a noun because LPR2010 contains no adjectival entry for this word (see Chapter 3, Section 3.4 for information on the methodological choices made when identifying parts of speech of compounds).

Page 203: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

189

(94) a. un auteur-compositeur est un auteur / un auteur-compositeur est un compositeur

b. une cellule assistante est une cellule / ?une cellule assistante est une assistante

Borrowing from Jackendoff (2010), we can further identify FUNCTION based compounds by

using the paraphrases “the function of H is as an M” or “the function of H is to do what M does”

(442). Paraphrasing coordinate compounds in this manner produces odd results, while doing the

same with those in (93) typically yields acceptable sentences:

(95) a. ?la fonction/le rôle de cet auteur est de servir de compositeur

b. la fonction/le rôle de cette cellule est de servir d’assistante

Neither test provides absolute evidence for classification one way or the other, but they do show

that these particular cases are not exactly alike. To treat them in the same manner fails to

emphasize that a number of compounds are coined by identifying the function of a particular

entity and applying it to another that wouldn’t typically fill this role. Moreover, another key

distinction is that the elements of a coordinated compound are typically of the same conceptual

class (i.e. person-person, artefact-artefact, place-place), while compounds involving function

have no such requirement. The examples given in (93) show just how different the elements can

be when one functions as the other.

It should be noted that this relation is not reversible, most likely because doing so would

introduce an illogical relationship between the elements that would make it difficult to identify

the head:

(96) a. houseboat ‘a boat that functions as a house’ = it’s a boat

b. houseboat ‘a boat that a house functions as’ = is it a house or a boat?

We may also want to differentiate FUNCTION from SIMILARITY, given that the latter also seems to

include compounds that involve functionality to some degree (e.g. mot-obus or ville-dortoir).

The distinction here lies with how they are paraphrased:

(97) a. mot-obus ‘mot qui fonctionne comme obus’

b. cellule assistante ‘cellule qui fonctionne en tant qu’assistante’

Page 204: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

190

While it is true that the compound in (97a) involves the modifier’s function, it does so in a

metaphorical way. In short, FUNCTION is for compounds in which one element functions AS the

other, while similarity may include those compounds in which one element functions LIKE the

other.

5.2.2.5 Possession

POSSESSION

Relation Type Structure Template Examples Linking Material

Basic an H that possesses M un T qui possède M

career girl punk à chien possess (have / of)

possède (a / de) Reversed an H that M possesses un T que M possède

family estate droit d’auteur

The POSSESSION relation, often paraphrased using the verb have, is related to a number of other

associations. For instance, it is subsumed under the Part-Whole class in Warren’s (1978)

framework and under OF in Lauer (1995). Similarly, Levi includes both partitive and possessive

meanings in her Have predicate. This approach is largely based on the fact that Have is a highly

polysemous verb and that possession involves a number of different relations: Baron and

Herslund (2001), for instance, argue that Have expresses what is fundamentally a locative

relation that may also include possessive and partitive associations. Fonagy (1975) identified ten

types of possessive relations, many of which are further subdivided into more specific types,

which include, among others, ownership, kinship, part-whole, and group membership. It is

therefore not surprising that by relying heavily on Have as a basic relation, one inevitably

groups together vastly different compounds:

(98) a. doorknob, fingertip, shoelace

b. student power, family car, gangster money

While both sets of compounds can indeed be paraphrased using the predicate Have, only the

compounds in (98a) allow for the explicit use of PART in its paraphrase (e.g. a knob that is part

of a door ~ *power that is part of a student). Distinguishing between these types of compounds

seems therefore warranted and many have done just that (Adams 1973, Vanderwende 1994,

Moldovan et al. 2004, Jackendoff 2010), usually defining possession as a case of ownership,

regardless of tangibility (cf. the examples in (98b above). For French, Arnaud (2003) seems to

Page 205: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

191

also include a possessive relation in his inventory of NN compounds, which he paraphrases

simply as “N2 has N1.” He records 15 such cases (with an additional 9 in complex formations),

but consulting his list raises a number of questions.

Table 5.14. Compounds listed as dd or “N2 has N1” in Arnaud (2003).

année lumière incident calénaire situation météo

bateau pirate PC course taille mémoire

cas régime radio pirate temps machine

cas sujet régime moteur vaisseau pirate

émetteur pirate relais traction vérité terrain

Arnaud’s description of the “N2 has N1” class doesn’t make the matter any clearer. His

argument in favour of using this particular class for the compounds cas sujet and cas régime, for

instance, is largely unconvincing (my translation):

“The subject case and the objective case are not cases ‘intended’ for the subject or the objects, nor do they ‘contain’ these functions. Can we say then that the grammatical function is at the source of the case? This is a bit more acceptable, but it nevertheless seems more accurate to say that the case ‘corresponds’ to, ‘is that of’ the function. In more abstract terms, the function ‘has’ its case.” (71)97

It’s not clear just how Arnaud arrives at “the function ‘has’ its case” from his observation that

“the case ‘corresponds’ to [. . .] the function.” In fact, his statement regarding correspondence

might be better served by the use of other relations, such as his “N2 is what N1 is about” class

(e.g bilan matières, plan produit, réflexe achat).

As for the other compounds in Table 5.14, some might be more sensibly represented by

instrumental or “use” type relations (e.g. bateau pirate), while others seem to support a

production interpretation (e.g. régime moteur). Others still defy any straightforward analysis.

For instance, the exocentric compound année lumière (light year) refers to the distance traveled

97

“Le cas sujet et le cas régime ne sont pas des cas ‘destiné’ au sujet et aux régimes, ils ne ‘contiennent’ pas non plus ces fonctions. Peut-on dire pour autant que la fonction grammaticale est à la source du cas ? C’est un peu plus acceptable, mais il semble quand même plus juste de dire que le cas ‘correspond’ à, ‘est celui de’ la fonction. de façon plus abstraite, la fonction ‘a’ son cas” (Arnaud 2003: 71).

Page 206: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

192

by light in a year, a meaning that is not easily paraphrased using possession, even in its most

abstract form. It is therefore surprising to see it included in Arnaud’s dd class.

Although the “possession as ownership” relation seems of limited use for NN French

compounds, other types do clearly rely on this particular association, namely N de N

constructions in which the preposition assigns the genitive case (cf. Bartning 2001, Knittel

2009):

(99) a. bien de famille, droit d’auteur, mémoire d’éléphant

The examples above might be sufficient evidence for the retention of POSSESSION as a

compound relation, but these types are not under investigation here. Unfortunately, my data

contain few, if any cases of NN and N à N compounds that would unequivocally involve

“ownership.” The only possible candidates are as follows, with those in (100a) being the least

likely:

(100) a. poids coq/mouche/paille/plume

b. bourse-à-berger/pasteur

c. fils à papa, punk à chien

Unfortunately, it’s not immediately clear that any of the compounds in (100) are in fact truly

cases of a possessive relation. According to Riegel (2001), only “ownership/belonging”

possessive constructions allow for a paraphrase using the verb posséder. Compare the following

sentences from Riegel (2001:189):

(101) a. Jean possède trois voitures.

b. *Jean possède deux frères.

c. *Une voiture, ça possède/a quatre roues.

d. *Jean possède un nez bulbeux.

e. *Une équipe de football possède onze joueurs.

This test seems to confirm that the N de N compounds listed in (99) are indeed cases of

ownership or belonging (e.g. il possède des biens/des droits/une mémoire), but shows that the

set of NN compounds in (100a) are most likely not (e.g. *ce coq possède un poids). We might

argue, then, that they are in fact cases of heads profiling for their internal argument (i.e. the

Page 207: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

193

weight of X). As for the compounds bourse-à-berger and bourse-à-pasteur in (100b), they are

exocentric, but nevertheless seem to involve possession: they refer to plants whose flowers look

like the purse of a monk or a shepherd. They can thus be paraphrased using the verb posséder

(i.e. bourse que possède un berger). The compound fils à papa in (100c), because it involves

kinship, does not typically allow for such a paraphrase. Should it be treated as a case of

POSSESSION, then? If we choose to include kinship under possession (of which it is arguably a

type), then the answer is obviously yes. But how should it be paraphrased? Is it ‘fils qui a un

papa’ or ‘fils qu’un papa a’? If we base our interpretation strictly on the meaning of the

compound, then the first paraphrase seems most appropriate (it is, after all, a boy who has an

influential father); yet, the preposition à, when used possessively, functions like de, in that it is

the head that is possessed and the complement that is the possessor (cf. bourse-à-

berger/pasteur). When we compare these compounds to punk à chien, however, the matter is

perhaps further clarified as the head cannot be understood as the possessed in this case (*punk

qu’un chien possède). I will therefore state that fils à papa (provided that kinship be accepted as

possessive in nature) and punk à chien are endocentric POSSESSION compounds (H that has M),

while bourse-à-berger and bourse-à-pasteur are exocentric REVERSED POSSESSION compounds

(H that M has). These are, it would seem, the only such cases present in the collected data.

Given that the possessive relation seems to be of limited relevance for NN French compounds,

one might wonder just how prevalent it is elsewhere, despite its frequent inclusion in the

literature. Adams (1973), for instance, only lists four examples and Jackendoff (2010) only lists

six. In Warren (1978), although possession accounts for a non-negligible 15% of her BELONGING

TO class, it only accounts for approximately 4% of all her compounds. Warren defines three

types of Possessor-Belonging relations (102a-c) under her Whole-Part semantic class and one

under Part-Whole (102d):

(102) a. Possessor-Legal Belonging: family estate, agency car, hospital bus, clubhouse

b. Possessor-Habitat: police station, foxhole, courthouse

c. Authority-Subordinate Entity: county school, state hospital, police laboratory

d. Belonging-Possessor: gunman, boatmen, horsemen

In most cases, alternative analyses are available, a fact that is further underlined by their

treatment elsewhere. The compounds in (102b), for instance, are routinely analyzed as locative

Page 208: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

194

in nature (cf. Adams 1973), while many of those in (102d) might instead be treated as

instrumental. The nature of the compounds in (102c) is harder to establish, which may or may

not support a possessive analysis, but here too, there is the potential for alternative analyses (e.g.

a police laboratory is a laboratory used by the police); the compounds in (102a) can be said to

represent the more “ownership” type combinations and are in line with such a treatment

elsewhere (cf. Vanderwende 1994, Moldovan et al. 2004, Jackendoff 2010). When it is

understood strictly under the angle of ownership or belonging—as is the case in (102a and

102d)—possessions remains marginal and accounts for a mere 2% of the compounds examined

by Warren.

Despite its limited scope in French for both NN and N à N compounds, the POSSESSION relation

has been retained for this work, mainly because, on the one hand, it is a component of nearly

every other formalism I have examined, and on the other, it will no doubt be necessary for any

work that might look at N de N compounds in the future.

5.2.2.6 Part

PART

Relation Type Structure Template Examples Linking Material

Basic an H that is part of M un T qui fait partie de M

table leg tiroir-caisse part of (have / of)

faire partie de (a / de) Reversed an H that M is part of un T dont M fait partie

wheelchair stylo-bille

The PART relation is one of the most commonly identified compound relations: of the sixteen

works listed earlier in Table 5.10, twelve include some manner of partitive association between

elements. Despite the similarities shared by PART and POSSESSION, namely in the use of either

HAVE or OF as a paraphrastic predicate, the former distinguishes itself from the latter in that it is

included in the whole. PART, as it is used here, is meant to identify those compounds for which

one of its constituents denotes a constitutive element of the whole object or concept denoted by

the other constituent. It is best paraphrased as ‘H that is a part of M’ and reflects what many

have labeled as a Whole-Part relation (Downing 1977, Warren 1978, Levi 1978, Shoben 1991,

Vanderwende 1994, etc.):

Page 209: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

195

(103) a. tiroir-caisse ‘tiroir qui fait partie d’une caisse’

b. grille écran ‘grille qui fait partie d’un écran’

c. moteur fusée ‘moteur qui fait partie d’une fusée’

This relation is reversible, which results in the non-head as the component element; it is often

labeled as a Part-Whole relation elsewhere (id.):

(104) a. roman-photo ‘roman dont des photos font partie’

b. stylo-bille ‘stylo dont une bille fait partie’

c. montre-bracelet98 ‘montre dont un bracelet fait partie’

With regards to NN French compounds, the PART relation is far more frequent in my data in its

reversed form (38 compounds) than in its basic form (7 compounds). While these findings run

counter to those in Warren (1978) for English, they seem to reflect those found by Arnaud

(2003). Although it is difficult to say exactly how many such compounds he identifies given the

number of low-level classes used to encode his data, one class in particular does stand out as an

exact match to the one described above (af: “N2, concret-discret, est une des parties de N1”).

This category of compounds includes 47 simple cases, including all of those in (104) above.

Arnaud does not, however, seem to include the alternate formulation in which N1 is part of N2.

Of the 7 compounds identified in my data as “H is a part of M”, only one is also included in

Arnaud’s corpus, namely balai brosse, which he treats alongside those in (104) and thus

interprets it as ‘balai dont fait partie une brosse’. He no doubt views this particular compound

as left-headed, which, intuitively seems correct given that this is French’s preferred position for

the head. I, however, chose to base my treatment of this compound on the definitions provided

not only by Wiktionary99, but also LPR2010100, which both seem to treat this compound as

right-headed, the result of which is an interpretation along the lines of ‘brosse qui fait partie

98

The compound montre-bracelet might also be treated as a case of COORDINATION, an interpretation supported by the existence of the inverted synonym bracelet-montre. COORDINATION, however, typically allows for either component to fill the role of head, a fact that doesn’t apply to either one of these compounds: *une montre-bracelet / *un bracelet-montre est un bracelet. 99

Balai-brosse: Brosse très dure fixée sur un manche à balai (fr.wiktionary.org/wiki/balai-brosse). 100

Balai-brosse: Brosse de chiendent montée sur un manche à balai, pour frotter le sol (LPR 2010).

Page 210: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

196

d’un balai’ (cf. bracelet-montre, which is also right-headed)101. In this matter, then, our

contrasting analyses are the result of a different identification of the head constituent.

Fundamentally, however, this particular compound involves the PART relation either way.

It should also be noted that PART, as it is used here, also covers those compounds that fall into

the “group membership” category, such as orchestre musette or émission-débat, though these

types are not frequent in my data.

Another issue to consider is that some compounds might be analysed as either PART or

LOCATION. This dual analysis is related to the fact that LOCATION may subsume PART: if

something is a part of something else, then it is located at/on/in that thing (cf. Baron and

Herslund 2001). One possible solution is to reserve location for only those compounds that

actually involve a locative noun, as does Adams (1973). The problem, of course, is that one

must treat combinations such as toolbox or treehouse using some other relation, as they do not,

in the strictest sense, involve places. The key distinction that will be used here is one that views

the PART relation as a reference to an integral component of the whole, without which it would

either be incomplete, defective, or non-functional. Thus, a negation test may be used to

determine whether the modifier denotes an essential part of the compound. The formulation in

(105) below shows how such a test might apply to compounds in which the head denotes the

whole (cf. 104 above):

(105) a. a C without an M is still a C

b. un C sans M est toujours un C

A positive response to the above sentence would indicate that the modifying noun is not an

essential component of the object denoted by the compound, but instead a distinguishing

feature. Thus, a toolbox without tools is still a toolbox, which indicates that tools is connected to

box via some other relationship (i.e. container-contained). This result is the same for the French

boîte à outils (i.e. une boîte à outils sans outils est toujours une boîte à outils). When applied to

compounds that denote a part-whole association, the test produces defective or incomplete

101

See Section 4.1.2 for a discussion of the issues related to head position in French.

Page 211: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

197

readings. Given the following compounds, the test highlights the differences between the

compounds in (106a-b) and those in (106c-d):

(106) a. ?un stylo-bille sans bille est toujours un stylo-bille

b. ?une auto-mitrailleuse sans mitrailleuse est toujours une auto-mitrailleuse

c. une poche-revolver sans revolver est toujours une poche revolver

d. une info-bulle sans information est toujours une info-bulle

The test thus provides a method of treating compounds that might otherwise prove difficult to

categorize, such as those in (107) below:

(107) a. café-crème ?un café-crème sans crème est toujours un café-crème

b. bloc-cylindres ?un bloc-cylindres sans cylindres est toujours un bloc-cylindres

Because a café-crème can be paraphrased as “un café dans lequel il y a de la crème”, it might

allow for a locative treatment (cf. Arnaud 2003), but given the oddness of the test sentence in

(107), I prefer to treat it as an instance of the PART relation. Some compounds that seem to

involve a partitive relationship, however, still fail the test because the component is not in fact

integral or can be typically removed, such as those in (108):

(108) a. laurier-cerise un laurier-cerise sans cerise est toujours un laurier-cerise

b. chêne-gomme ?un chêne-gomme sans gomme est toujours un chêne-gomme

Compounds similar to the one in (108a) are thus understood as instances of PRODUCTION as the

non-head is in fact something the whole produces and is not necessarily always present (i.e. not

in bloom). The compound in (108b) is slightly different, as evidenced by the inconclusive result

from the test. A chêne-gomme (gum oak) is a type of oak tree which contains and exudes large

quantities of sap. Although production might also be appropriate for this compound, the fact that

it would be difficult, if not impossible to remove the sap entirely suggests that it should be

treated as partitive: chêne-gomme is therefore labeled as PART REVERSED.

Unfortunately, the test proposed in (105) is incompatible with compounds in which the non-

head element denotes the whole (cf. 109). In these cases, simply removing the part element from

the head is sufficient to produce a defective reading:

Page 212: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

198

(110) a. ?cet écran sans grille fonctionne parfaitement

b. ?cette caisse sans tiroir fonctionne parfaitement

c. ?cette fusée sans moteur fonctionne parfaitement

There is also a small set of compounds included under PART—and which were briefly discussed

in the SIMILARITY section—that permit an alternative analysis. They mostly involve animals:

(111) a. écrevisse signal, noctuelle gamma, oiseau-lyre, poisson-épée

These compounds are all based on some physical property that might be said to look like some

other object. Thus, a poisson-épée is a fish with a nose that looks like a sword, an oiseau-lyre is

a bird with a tail that looks like a lyre, etc. Including them under the PART relation necessarily

involves invoking a metaphor alongside it (i.e. an H that has a part that looks like M). Including

these particular compounds under SIMILARITY does not provide a simpler solution as PART

would still need to be invoked (i.e. an oiseau-lyre is not a bird that looks like a lyre). The use of

complex relations is in fact a component of Jackendoff’s (2010) formalism, but it remains

largely absent from most other works. I have chosen to use simple relations and rely on the

weakly compositional label discussed in Chapter 4 to take into account the presence of the

metaphor. This is in fact similar to the approach Arnaud (2003) adopts for these particular cases:

he includes in his list of low-level relations one that is paraphrased as “N2, concrete-discrete, is

one of the parts of N1 (meronymic-analogical relation)102.”

5.2.2.7 Location

LOCATION

Relation Type Structure Template Examples Linking Material

Basic an H located at/near/in M un T situé à/près de/dans M

windows seat centre-ville at, near, in, etc.

à, près de, dans, etc. Reversed H that M is located at/near/in un T auquel M est situé

bedroom café concert

102

“N2, concret-discret, est une des parties de N1 (relation méronymique-analogique) – poisson-scie” (Arnaud 2003: 73)

Page 213: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

199

The locative relation is another frequently cited relation for compounds: it can be found in

nearly all of the works mentioned in Section 5.1. Many of the compounds involving this relation

contain a constituent denoting a place or container (bedroom, sandbag, groundwater), but this

need not be the case (earwax, sunspot, leg cramp). The suggested paraphrases for these types of

compounds are numerous, but are all typically locative in nature and usually make use of one of

several prepositions: at, in, on, under, above, near, etc. Examples of basic French NN

compounds that seem to involve a locative function are in (a) and reversed in (b):

(112) a. bout dehors, colis-route, côté jardin, page web, station-aval, village-rue

b. bloc-eau, café concert, chêne kermès, point presse, prés-bois

A number of compounds involve a “container-contained” relationship, which consequently

supports a locative treatment. These types do, however, raise a number of questions regarding

how they should be categorized. It can be argued that because a container is meant to contain

something, compounds involving this relationship are fundamentally purposive in nature. A

number of NN compounds display this double reading:

(113) bloc-note, info-bulle, livret-police, malle-poste, poche-revolver

In addition to a locative paraphrase, any of the compounds in (113) can also be paraphrased as

“an H for M” (i.e. un bloc pour notes). The problem isn’t immediately apparent if we only

consider NN compounds; rather, it becomes much more obvious when we compare them to

analogous sets of N à N and N de N compounds. As Bassac and Bouillon (2013) point out, there

are clear differences between the compounds in (114a) and the constructions in (114b):

(114) a. boîte à outils, verre à vin, chambre à air

b. boîte d’outils, verre de vin, chambre d’air

If we treat both sets of constructions above as locative in nature, we ignore the fact that those in

(114a) express both purpose and location (i.e. N pour N), while those in (114b) only convey

Page 214: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

200

location103. How should we treat the compounds in (113) and (114a) then? Any solution to this

problem should be able to distinguish between these types and the ones in (114b), while also

retaining what sets them apart from other purposive compounds that don’t involve location.

These particular cases will be explored in greater detail in Section 5.2.2.14, but I will state here

that whenever purpose is involved in the compound’s meaning, the item will be labeled as such,

along with whatever additional information it may express using what Jackendoff (2010) calls

proper function.

5.2.2.8 Composition

LOCATION

Relation Type Structure Template Examples Linking Material

Basic an H made of M un T composé de M

sugar cube disque vinyle composed/made of

composé/fait de Reversed an H that M is made of ?un T dont M est composé

sheet metal ?

The COMPOSITION relation is used for compounds in which one constituent is the material or

substance that composes the other constituent. COMPOSITION thus differs from part in that it

entails, on the one hand, irretrievability, which is to say that the composing substance cannot

(simply) be removed from the whole, and on the other, that the substance be its sole component.

It is a frequently cited relation in the literature and is usually stated as “composition” (Adams

1973, Downing 1977, Jackendoff 2010) or “made of” (Levi 1978, Shoben 1991, Vanderwende

1994). There are only a few such cases in my data; they are supplemented below with examples

from Arnaud (2003):

(115) a. disque vinyle, gaz hydrogène, pal-fer, terre diatomée

b. bac acier, bas nylon, papier aluminium (Arnaud 2003)

Although COMPOSITION primarily refers to the sole substance that makes up the item, this is in

fact only partially true. It may be more accurate to state that this substance is the

103

The difference no doubt stems from the prepositions involved. According to Cadiot (1997), these analogous constructions lend support to his argument that à is intensional in nature, while de is extensional. We may therefore draw parallels between this distinction and the Type ~ Token one mentioned in footnote 90.

Page 215: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

201

overwhelmingly predominant material. Parallels may be drawn with freely constructed nominal

phrases containing a pre-adjectival modifier such as wooden table: although it most likely also

contains nails, screws, and brackets made of other materials, it is largely understood as ‘a table

made of wood’. The line may not always be easily drawn between part and substance, but the

“H composed/made of M” paraphrase is, in most cases, sufficient to disambiguate borderline

cases. For instance, sabre laser (light saber) is listed under PART as it cannot be paraphrased as

‘?sabre composé d’un laser,’ which confirms that the non-head is only a major defining

characteristic of the object in question, but that it nevertheless contains parts that also play a

crucial role in its composition (i.e. a handle with internal components, buttons, etc.). Moreover,

for substance or material composition, French will typically allow compounds to be

reformulated using the preposition en (e.g. disque en vinyle), which is analogous to the English

NPs mentioned above (i.e. wooden table).

Composition is also used in a more abstract manner if, again, one of the constituents refers to

the sole (or predominant) element of the whole. Although these compounds can be paraphrased

using “composé de,” they do not typically permit the use of the preposition en or the verbal

predicate fait de:

(116) a. code-barres ‘code composé de barres/*en barres/*fait de barres’

b. plan séquence ‘plan composé de séquence/?en séquence/*fait de séquence’

c. fan-club ‘club composé de fans de qqch/*en fans/*fait de fans’

Some might prefer to treat fan-club as a part related compound, but it is in fact a club consisting

entirely of fans (it is the sole constituting element of the whole; cf. spectacle solo also listed

under composition).

Based on the compounds in my data, the composition relation may not be reversible in French,

which is consistent with Arnaud’s (2003) findings. It is not clear, however, why French might

not permit “H that M is made of” compounds; in contrast, reversibility of the composition

relation seems relatively productive in English (see Jackendoff 2010).

Page 216: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

202

5.2.2.9 Source

SOURCE

Relation Type Structure Template Examples Linking Material

Basic an H (made) from M un T (fait) à partir de M

cane sugar sauce soja (made) from

(fait) à partir de Reversed an H that M is (made) from un T (à partir duquel) M est fait

sugar cane chêne-liège

The SOURCE relation is related, in some ways, to both the COMPOSITION and PRODUCTION

relations. In fact, identifying compounds that make use of SOURCE poses some challenges. The

principle use of this relation is with compounds in which one element is the object or substance

from which the other is derived. It differs from COMPOSITION in that the source object is no

longer an identifiable component of the whole. Parallels may be drawn between the NN

compounds in (117a) and the N de N ones in (117b):

(117) a. baume copalme, carton pâte, papier maïs, sauce soja, sauce tomate

b. jus d’orange, sirop d’érable, huile d’olive

The most basic paraphrase for this particular type of compound is “H from M,” (cf. Levi 1978,

Lauer 1995, Moldovan et al. 2004), but it may also be paraphrased using “derived from” (cf.

Adams 1973, Shoben 1991) or “made from” (cf. Jackendoff 2010).

This relation is reversible, although there are few such NN cases in my data, chêne-liège being

the clearest instance of a reversed source (i.e. chêne à partir duquel le liège est obtenu). Arnaud

lists a few other such compounds such as laurier-cerise and pin pignon. These examples in fact

show how PRODUCTION, CAUSE, and SOURCE might overlap if the latter is understood in a much

broader sense:

Page 217: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

203

Table 5.15. The SOURCE, PRODUCTION and CAUSE relations compared.

Relation Basic Reversed

laurier-cerise

SOURCE *laurier provenant des cerises laurier d’où proviennent les cerises

PRODUCTION laurier qui produit des cerises *laurier que les cerises produisent

arrêt-maladie

SOURCE arrêt provenant d’une maladie *arrêt d’où provient une maladie

CAUSE *arrêt qui cause une maladie arrêt causé par une maladie

The above table only represents a sampling of the potential overlap, as these relations may

conflict in a number of different ways. Put simply, the basic form of SOURCE (H from Y) may

produce a paraphrase that is functionally equivalent to the reversed form of either PRODUCTION

or CAUSE, while its reversed form (H that M is from) may conflict with the basic forms of these

two relations. This is, unfortunately, largely unavoidable as something that produces or causes

something else will inevitably serve as the source of the resulting item (i.e. le moulin produit du

papier; le papier provient du moulin). I would argue, however, that not all compounds are

susceptible to this particular parallelism. For example, chêne-liège mentioned earlier is not an

oak that produces cork, but is in fact an oak whose bark is used in the production of cork (cf.

those in 117). This not only explains why this compound is viewed as an instance of SOURCE,

but also why I treat laurier-cerise as a case of PRODUCTION as the head of this compound does in

fact produce the object denoted by the non-head element.

For these reasons, the SOURCE relation is used solely for compounds that involve the “material

or substance origin” of an object, thus allowing both PRODUCTION and CAUSE to fill any other

type of “origin” gap. Although this distinction may not always provide the most intuitive

analysis, it seems to largely suffice as few compounds that do not meet the above criterion can

be paraphrased using these alternative relations. As we will see later, other pairs of relations also

engender similar problems, namely USE and PURPOSE, but the solution they require is not so

sharply delimited. Whenever possible, relations are applied in a manner that is both consistent

and coherent, so as to minimize analyses that might otherwise introduce contradictory or

paradoxical annotations. I believe that the criterion provided above for SOURCE achieves this

goal.

Page 218: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

204

5.2.2.10 Cause and Production

CAUSE

Relation Type Structure Template Examples Linking Material

Basic an H that causes M un T qui cause M

sunburn piétin échaudage causes

cause Reversed an H that M causes un T que M cause

motion sickness arrêt-maladie

PRODUCTION

Relation Type Structure Template Examples Linking Material

Basic an H that makes M un T qui fait M

honey bee appareil photo makes, produces

fait, produit Reversed an H that M makes un T que M fait

beeswax jazz manouche

The CAUSE and PRODUCTION relations share a number of similarities and have in fact been

reduced to a more general primitive in a number of works (among others, Downing 1977,

Warren 1978, Vanderwende 1994). This reduction usually results in compounds such as honey

bee and tear gas being grouped together (see, for instance, Warren 1978: 188-189). Most

authors, however, have argued in favour of distinguishing between these two compounds based

on precisely these types. For Levi (1978), MAKE (her corresponding production RDP) refers to

associations based on “physically producing, causing to come into existence,” (90), whereas

cause has no such physical requirement. Honey bee is therefore a bee that makes honey, while

tear gas is gas that causes tears. Furthermore, Levi suggests that CAUSE involves both direct and

indirect causation. While she does not stipulate this criterion for MAKE, it seems plausible that

production also assumes either type of involvement, as evidenced by her inclusion of sap tree

under this RDP.

This approach to production thus differs slightly from the one argued for by Moldovan et al.

(2004). They define their CAUSE and MAKE relations as follows (62):

(118) a. CAUSE: “an event/state makes another event/state to [sic] take place”

b. MAKE/PRODUCE: “an animated entity creates or manufactures another entity”

Page 219: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

205

According to these definitions, the difference between these relations is twofold. First,

production requires that the agent be animated and second, that the result be something other

than an event or state. The first criterion would therefore exclude sap tree from the

MAKE/PRODUCE class; the second would do the same for music box. Yet, neither of these

compounds seem causal in nature (i.e. *tree that causes sap; * box that causes music).

The fact is that it can be difficult to set apart CAUSE from PRODUCTION. As I mentioned earlier,

Jackendoff (2010) says that they are “closely related function[s]” and that “[i]t is sometimes

hard to distinguish make from cause” (441). This may explain why he includes sunburn under

CAUSE, but suntan under MAKE, even though there is little reason to treat them differently. Also

worth noting is that he, perhaps mistakenly, lists knife wound under both functions, which shows

just how tenuous the distinction between them is.

Perhaps the easiest way to differentiate between these to relations is simply to use paraphrases

that include the verbs cause or make. In fact, in most cases, these paraphrases are an effective

way of ruling out one or the other relation (cf. sap tree and music box mentioned above). Using

this test as the basis for identifying these particular relations, it would seem that French has only

a few NN compounds that rely on either relation. Examples of CAUSE are given in (119a), while

examples of PRODUCTION in (119b):

(119) a. piétin-échaudage, piétin-verse

b. appareil photo, bombe aérosol

Both relations are, however, reversible (CAUSE in 120a; PRODUCTION in 120b), although again,

their numbers are limited:

(120) a. arrêt maladie, effet papillon, erreur système, photolyse éclair

b. café-filtre, drainage taupe, image-gradient, jazz manouche, portrait-robot

As discussed in Section 5.2.2.9, PRODUCTION and CAUSE may show some degree of overlap with

SOURCE, an unfortunately unavoidable consequence of their shared semantic space. Measures to

avoid this particular conflict were introduced in that section.

Page 220: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

206

5.2.2.11 Topic

TOPIC

Relation Type Structure Template Examples Linking Material

Basic an H about M un T à propos de M

history conference réunion bilan about

à propos de Reversed --- ---

TOPIC is present in a number of works, but seldom accounts for a significant number of

compounds and usually involves very specific types of nouns. For instance, of the 32

compounds that Levi (1978) lists under her ABOUT RDP, most involve “a rather restricted set of

either abstract nouns or activity nominalizations” in head position (103). These nouns are

usually “about” something (e.g. vote, policy, law, conference, novel, speech, etc.). Similarly,

Lauer only lists 22 such compounds in his appendix and they too have head nouns like those

mentioned by Levi.

Put simply, TOPIC is understood as “H is the subject matter of M.” This relation is not well

represented in my data. Only four compounds seem to unequivocally involve this particular

relation:

(121) ciné-club, réunion bilan, science fiction, secret défense

The limited nature of TOPIC for my data contrasts somewhat with its use in Arnaud (2003),

where he identifies 10 simple cases of “N2 is what N1 is about104.” Some examples of these

compounds are bilan matières, plan calcul and réflexe achat. When listed alongside other

relations, however, the number of instances involving topic increases to 46 and includes many

compounds that I would have included under this category had they been present in my data

(e.g. bulletin météo, catalogue auteurs, film catastrophe, etc.). These numbers lend further

support to TOPIC as a fundamental relation.

104

This low-level relation is labeled as bx and is defined as “N2 est ce à propos de quoi est N1” (Arnaud 2003:74).

Page 221: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

207

Some compounds have been included under this relation that might also call for alternative

treatments:

(122) cas régime/sujet, participe passé/présent

The first pair in (122) has already been discussed in Section 5.2.2.5 as they are treated by

Arnaud as “N2 has N1.” The fact is that both sets of compounds in (122) seem to involve

correspondence or representation (i.e. cas qui représente/correspond au régime). They may also

be paraphrased, however, using “à propos de,” which is why they have been identified as

instances of TOPIC.

The TOPIC relation does not seem to be reversible. It is difficult to imagine a context where a

compound might be coined where the head is the thing that the modifier is about (i.e. an H that

M is about), as the result would be somewhat circular:

(123) a. history book ‘a book about history’

a. history book ‘a book that history is about’ (= a book about itself)

5.2.2.12 Time

TIME

Relation Type Structure Template Examples Linking Material

Basic an H that occurs at/during M un T qui a lieu pendant M

summer job pause-carrière

during, at, in, before, etc.

pendant, à, en, avant, etc. Reversed an H at/during which M occurs

?un T pendant lequel M a lieu golf season

(journée-débat)

The temporal relation TIME is not very frequent in my data. In fact, only three compounds

clearly rely on some temporal property, none of which are reversed, and they can all be traced

back to the meaning of one of their constituents:

(124) a. épreuve-minute ‘épreuve produit très rapidement / en une minute’

b. pause-carrière ‘pause prise pendant la carrière’

c. réveil-matin ‘réveil pendant le matin’

Page 222: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

208

Arnaud only lists one explicitly temporal compound (match retard) and includes pause-carrière

in his locative class (i.e. ‘pause dans une carrière’), which is not only a plausible analysis, but

one that seems widely held: a number of researchers treat locative and temporal relations as

highly related functions and group them together, usually with TIME subsumed under LOCATION

(among others, Adams 1973, Levi 1978, Jackendoff 2010). Arnaud’s analysis of épreuve-minute

involves duration, which remains a temporally related property and which also includes

compounds such as fermeture éclair and pulsar milliseconde. Given the limited number of

compounds involving TIME as a relational component, one might wonder whether it merits its

own category. In fact, considering that the temporal information otherwise required to

disambiguate most, if not all of these compounds is usually available in one of its elements (i.e.

minute, pause, matin), it might seem unnecessary to stipulate this particular relation for

compounds. The decision to retain this relation, however, is based primarily on its frequency in

the literature: no fewer than ten authors have included it in some form or other in their

formalisms. Although it remains of limited use for both French NN and N à N compounds (two

occurrences for the latter), it may very well prove necessary for other compound types.

It should also be noted that the data does not contain any cases of NN compounds involving a

reversed application of TIME, but this is no doubt simply due to the limited size of the dataset, as

such compounds do seem to exist (e.g. journée-débat = journée pendant laquelle a lieu des

débats). There are also a number of similar N de N cases (e.g. jour de paye, heure de pointe,

saison de drainage, etc.), which lend further support to a reversible treatment of the TIME

relation.

5.2.2.13 Use

USE

Relation Type Structure Template Examples Linking Material

Basic an H that uses M un T qui emploie M

steamboat bouton pression use / with, by

emploie / avec, par Reversed an H that M uses un T que M emploie

hand brake langage machine

The USE (or instrumental) relation is meant to group together compounds in which one element

is used by the other. Consequently, this relation represents those compounds in which one

Page 223: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

209

constituent is said to be “powered by” the other (e.g. windmill, steamboat, gas stove, etc.). It is

found in the literature as either instrument/al (Adams 1973, Warren 1978, Moldovan et al. 2004,

Séaghdha 2008) or use/r (Downing 1977, Levi 1978, Shoben 1991). Vanderwende (1994)

frames this particular relation using How?, while Lauer (1995) makes use of the traditional

instrumental preposition with, although other prepositions may also be used (e.g. by). French has

a number of compounds that involve USE, as in (125a), some of which are reversed, as in (125b):

(125) a. bouton pression, café-comptoir, danse-poteau, pneu contact, vidéo-surveillance

b. alphabet hindi, attaché-case, croiseur-école, code machine, culotte garçonne

The examples in (125) show that the thing being used need not be physical either. This is

particularly important for N à N compounds, where the modifier is in fact a process, state, or

event that is at the core of the compound’s functionality:

(126) amortisseur à fluide, arme à implosion, bombe à fission, instrument à vent, etc.

One of the challenges posed by USE is that, like PURPOSE covered in the next section, it is in fact

an indeterminate relation, as it does not specify how a particular thing is used. The only evident

aspect of this particular relation is that it refers to a necessary feature of the compound’s

designatum, without which it will cease to function or even be. In this manner, USE also shares

some similarities with PART, so much so that in some instances, PART could very well be a

plausible analysis for some of the compounds listed above (e.g. amortisseur à fluide). The

distinction, of course, lies with USE’s narrower scope: whereas amortisseur à fluide may easily

be paraphrased as ‘amortisseur qui emploie un/du fluide,’ a compound such as navire-citerne—

treated as an instance of PART here—produces a less than ideal paraphrase: ‘?navire qui emploie

une citerne.’

Although some compounds show support for the reversibility of USE, its implementation gives

rise to a number of problems, namely that it may, under certain circumstances, overlap with

PURPOSE. In other words, some compounds permit two related, yet distinct paraphrases: “an H

for M” or “an H that M uses.” This issue will be addressed in the following section.

Page 224: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

210

5.2.2.14 Purpose and Proper Function

PURPOSE

Relation Type Structure Template Examples Linking Material

Basic an H intended for M un T destiné à M

animal doctor passage piétons for

pour Reversed --- ---

A number of authors have suggested that one of the relations required to describe a subset of

compounds is purposive in nature, that is to say, one for which an element is intended or

designed to fulfill some purpose related to the other. The French compounds that best illustrate

this relationship are N à V, in which V is infinitival and represents the act that N is intended to

perform. Although these types are not under investigation here, they are nevertheless presented

below as evidence that the PURPOSE relation is in fact necessary:

(127) a. fer à repasser, pâte à modeler, machine à écrire, poêle à frire

The periphrasis for these particular cases is “an H whose purpose is to M” (i.e. un fer à repasser

est un fer qui a pour fonction de repasser). The PURPOSE relation is listed in a number of

different inventories (among others, Warren 1978, Levi 1978, Vanderwende 1994, Rosario and

Hearst 2001) and is often associated with the preposition for. This relation is missing from a

number of works, however, most notably from Adams (1973) and Jackendoff (2010). Its

occasional absence likely stems from its wide scope of application, as PURPOSE can be said to

apply without specifying its exact nature (similar to USE above). As Levi (1978) points out, both

headache pill and fertility pill are purposive combinations (i.e. both allow for expansion via for),

but the nature of this purpose differs greatly between them (suppress and enhance respectively).

She argues that this is partly due to the fact that pill itself serves an unspecified purpose, which

only surfaces in context. Similarly, related compounds involving purpose may show different

degrees of explicitness. For instance, pompe à eau and pompe à vélo both seem to rely on a

purposive relation (i.e. they can both be paraphrased using the preposition pour), but the

function of pump in pompe à vélo is far more removed from its modifier than it is in pompe à

eau. I will return to both of these examples in a moment.

Page 225: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

211

Based on the initial description provided above, a number of NN compounds can be said to rely

on the PURPOSE relation:

(128) abri-vent, appui-tête, arrêt-buffet, chèque-repas, chou-vache, clé lavabo, passage piétons

Arnaud (2003) lists 160 such compounds and an additional 87 in complex configurations, which

lends further support to the retention of this particular relation for NN compounds.

PURPOSE does not seem to be reversible, however. In other words, there are no compounds that

are best paraphrased as “an H that an M is for.” PURPOSE seems to always originate from the

head. The only way for purpose to emerge from the modifier is if it explicitly communicates its

function in connection with the head (as in 129c), which would not only be redundant, but also

better paraphrased using the USE relation (like the more general form in 129b):

(129) a. lamp oil oil for a lamp

b. oil lamp ?lamp that oil is for → lamp that uses oil

c. [[lamp oil] lamp] lamp that lamp oil is for → lamp that uses lamp oil

These particular configurations also bring us to discuss another issue, mainly how PURPOSE and

USE may, under certain conditions, overlap. The following table illustrates the issue.

Table 5.16. A comparison of PURPOSE and USE.

Relation lamp oil oil lamp PURPOSE oil for a lamp *lamp for oil PURPOSE Reversed *oil that a lamp is for ?lamp that oil is for USE *oil that uses a lamp lamp that uses oil USE Reversed oil that a lamp uses *lamp that oil uses

For a compound such as lamp oil, PURPOSE and reversed USE both produce acceptable

paraphrases with no clear indication which of the two is preferable. There are two reasons for

this overlap. First, USE (along with PURPOSE) is an underspecified predicate. Its usage is virtually

unrestricted. One can use a bicycle, a mug, a table, yet none in the same manner. In fact, the

object of the verb need not even be an artefact (i.e. John used the rock to drive in the pegs.). The

result is that when one says that “X is for Y,” it may very well be that what one is really saying

is that “X is meant to be used by Y.” Second, because PURPOSE allows for the modifier to be

Page 226: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

212

either the subject or object of the unspecified predicate, it will necessarily run parallel to the

reversed form of USE when it involves the modifier as its subject:

(130) a. lamp oil “the lamp uses oil” = oil for a lamp / oil that a lamp uses

b. bread knife “X uses the knife (to cut) bread” = knife for bread / *knife that bread uses

Although USE only applies to the compound in (130a), PURPOSE is well-founded in both cases.

How do we then address the overlap between PURPOSE and USE? There are a number of options

available. First, we might simply treat one of the relations as redundant and exclude it from the

analysis, thereby allowing the remaining relation to replace the other via its reversed form.

Because PURPOSE can only account for one particular set of compounds, while USE can account

for two, it is clear that the most likely candidate for exclusion is PURPOSE. The problem with this

approach, however, is that a number of compounds can only be accounted for using PURPOSE

(e.g. pause-café cannot be paraphrased as either “pause qui emploie un café” or “pause qu’un

café emploie”). By the same token, the data also lend support for the retention of USE as a

compound relation (see examples in Section 5.2.2.13). A second option is to artificially reject

the offending configuration—in this case the reversed form of USE—but this too is unlikely to

work. One need only look at some of the examples listed earlier in the section on USE to see that

PURPOSE cannot always fill the gap: for instance, croiseur-école is not in fact a cruiser whose

purpose is to be used by a school, but simply a cruiser used by a school for training purposes.

One might of course argue that such a distinction is irrelevant, but the fact remains that these

treatments do produce different interpretations. A third option might be to always default to one

particular relation when both are applicable. The question, then, is which analysis should take

precedence over the other? We might choose PURPOSE as it doesn’t involve a reversed

construction, but this will likely result in a number of borderline cases being treated under the

purposive heading (cf. croiseur-école). The fact is that there is no simple and foolproof method

with which to select PURPOSE over USE or vice-versa in cases where both seem to apply. The

only solution is to choose the relation that is most representative of its meaning. While this

approach may produce different results for different people, it is nevertheless the only one that

does not rely on artificial means to assign these particular relations.

Page 227: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

213

Returning to the examples pompe à vélo and pompe à eau mentioned at the beginning of this

section, it was said that both compounds seem to involve the PURPOSE relation, but that the

former is far less clear about how the elements relate to each other:

(131) a. pompe à eau pompe pour eau ~ ‘pompe qui pompe de l’eau’

b. pompe à vélo pompe pour vélo ~ ‘*pompe qui pompe un vélo’

The reason these cases are relevant is that for all other relations (save USE), there is no such

ambiguity. For instance, PRODUCTION is paraphrased as “H makes/produces M” or vice-versa; if

the paraphrase is infelicitous, then the relation cannot apply. PURPOSE, however, has no such

restriction. Because no verbal predication is stipulated, the fullest paraphrase possible is “H

whose purpose is to X to M,” where X can be any number of verbs (cf. pump for the examples

in 131). Thus, although the compounds in (131) look alike and both involve PURPOSE, they

differ in a significant way. How do we account for this difference then?

One way is to allow for predication to emerge through one of the compound’s elements. In this

regard, I will use what Jackendoff (2010) calls Proper Function (henceforth PF). This function is

very similar to Pustejovsky’s (1995) Telic Role, which has since been used successfully to

explain how certain compounds may be interpretable out of context (Johnston and Busa 1996,

Bassac 2006, Bassac and Bouillon 2013). Put simply, some lexemes denote things that are

designed to perform a particular function. For instance, a knife’s purpose is to cut something,

clothing is to be worn, food is to be eaten, etc. When combined with another lexeme in a

compound, the proper function may be profiled, yielding additional material with which to

connect its elements. The following are a few examples of such compounds:

(132) a. auto-école = ‘école où l’on enseigne/apprend à conduire une auto’

b. appui-tête = ‘appui qui supporte la tête’

c. chou-vache = ‘chou que les vaches consomment’

In the first case, one will notice that two PFs are listed (cf. store = buy/sell). Jackendoff (2010)

dismisses this as a real problem, arguing that in either case, the same event is at play. I agree:

one’s interpretation will skew according to whichever perspective one chooses to adopt, which

will have little effect on whether the compound is paraphrased correctly.

Page 228: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

214

Technically, introducing a PF mechanism only partially solves the problem I identified earlier

for the two pompe à N compounds. Although we can now explain how their constituents are

connected (via pompe’s PF), we must now state that in the case of pompe à eau, the modifier

fills the internal argument of the PF (i.e. X pumps Y), while for pompe à vélo, the modifier does

not. The result is that both compounds are identified as instances of PURPOSE, but only pompe à

eau meets all of the requirements of its PF. A corollary to this observation is that pompe à eau

may in fact be more transparent than pompe à vélo. This will be touched on again in Chapter 7

when I discuss ordering relations according to transparency effects.

Unfortunately, there is often some degree of overlap between a lexeme’s PF and a few of the

basic relations listed previously, namely LOCATION and PRODUCTION. For instance, a noun

whose designatum is a container of some sort will often target this function within the

compound (e.g. boîte à N). Other such examples are as follows:

(133) a. poche-revolver PF of poche = contain (LOCATION)

b. pâte à papier PF of pâte = used to make (PRODUCTION)

The question is whether we should then select the basic relation or the PF for these particular

compounds? The solution I have chosen to adopt is to identify purposive compounds as such,

while also providing information regarding a PF that may overlap with other basic relations. The

reasoning behind this decision is based on the research mentioned at the beginning of this

chapter, which supports an approach to compound interpretation that involves establishing

compatible features or properties between a compound’s elements. It can be argued that proper

function is a core feature of certain lexemes, especially those denoting artefacts, and that

speakers are no doubt aware of and able to utilize this information when establishing meaning

for a given combination. Thus, when the meaning of a compound is based on a relational

concept such as PRODUCTION, those that involve a lexeme with this relation as its PF should be

far easier to interpret (and thus more transparent) than those that do not. Although this may

seem to challenge the premise that the retained relations are fundamental associations for

compounding, I believe that this is, at best, a minor conflict, as nothing fundamentally prohibits

someone from treating such compounds as they would other cases involving basic relations. The

advantage, however, of including toolbox under PURPOSE with its PF as “contain” (as opposed to

Page 229: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

215

just under LOCATION) is that it recognizes that it arguably has more in common with steak knife

(another purposive compound), than it does tree house (a locative compound).

5.3 Summary

Based on a detailed survey of the literature on compound relations, I have retained 15 basic

relations and have discussed their use with some of the NN French compounds found in my

data. The following table summarizes the results of this research:

Table 5.17. Summary of the relations retained in the present work.

Relation Basic (H REL M) Reversed (H that M REL)

HYPERNYMY chouette-effraie ?argent métal

COORDINATION auteur-compositeur ---

SIMILARITY fourmi-lion ---

FUNCTION papier filtre ---

POSSESSION* punk à chien droit d’auteur

PART tiroir-caisse stylo-bille

LOCATION centre-ville café concert

COMPOSITION disque-vinyle ?

SOURCE sauce soja chêne-liège

CAUSE piétin-échaudage arrêt-maladie

PRODUCTION appareil-photo jazz manouche

TOPIC réunion bilan ---

TIME afternoon (journée-débat)

USE bouton pression langage machine

PURPOSE passage piétons ---

Of the relations retained, eight are clearly reversible in French. HYPERNYMY may be reversible,

but the sole example present in the data does not offer incontrovertible evidence that this

particular relation may be reversed. Moreover, the French compounds examined do not seem to

support a reversed application of COMPOSITION, although there is no clear indication why this

should be the case (this relation seems to be reversible in English, for instance; cf. Jackendoff

2010). Others, such as SIMILARITY, TOPIC, and PURPOSE, defy reversibility as the result of such a

transformation would be strange or illogical.

Page 230: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

216

Furthermore, some of these relations have shown only limited use for the compounds examined,

namely POSSESSION, TOPIC, and TIME. In fact, no NN cases of POSSESSION were found in the

data, although there are a few examples of N à N and N de N that involve this particular

relation.

Perhaps unsurprisingly, a number of compounds eluded the approach described above. In the

next chapter, I will not only examine the distribution of these relations in greater detail, but also

look at some of the more problematic compounds in my data. In addition, I will explore how

these relations apply to binary constructions that involve more lexical material, which is to say

N à N compounds.

Page 231: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

217

Chapter 6

Compound Relations: Application Results

In the previous chapter, I conducted a detailed examination of several works on compounding

that had integrated, to varying degrees, a semantic component into their frameworks. All of

these works proposed various sets of basic relations meant to account for the unexpressed

relational associations that link together the elements of compounds. Based on this research, I

identified 15 highly recurring relations in the literature, which I then used to analyze more than

700 NN and 300 N à N French compounds, all of which were collected online from Wiktionary.

The goal of this chapter is to offer a closer look at the results of this analysis. The following

sections will focus on how the relations are distributed across the data, as well as on some of the

compounds that could not be accounted for using these relations. The first section will address

NN compounds, while the subsequent sections will concentrate on N à N compounds, a

category of French compounds that seems to involve far fewer relations, which may be

understood as a consequence of the additional material (i.e. à) present within the compound’s

structure.

6.1 NN Compounds

Given the absence of predication between elements, NN combinations may represent the most

semantically underspecified type of compound. The relations presented in the previous chapter

should be regarded as the formalization of this missing relational information. This section will

focus on their application to French NN compounds.

6.1.1 Relations

A total of 729 French NN compounds were retained from Wiktionary and each one of these was

evaluated according to the 15 relations discussed in the previous chapter. The following table

summarizes the results of this work.

Page 232: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

218

Table 6.1. Results of relations analysis for NN compounds.

Relation Basic Reversed Total

COORDINATION 113 --- 113

SIMILARITY 112 --- 112

FUNCTION 80 --- 80

LOCATION 32 24 56

PART 8 39 47

PURPOSE 44 --- 44

HYPERNYMY 40 1 41

USE 22 10 32

COMPOSITION 20 0 20

SOURCE 7 5 12

PRODUCTION 5 8 13

TOPIC 8 --- 8

CAUSE 2 5 7

TIME 3 0 3

POSSESSION 0 0 0

Other --- --- 141

Total 496 92 729

The table reveals a number of facts about French NN compounds. First, these particular

combinations are largely dominated by relations that express an equative relationship, which is

to say COORDINATION, FUNCTION, SIMILARITY and, to a lesser extent, HYPERNYMY. These four

relations alone account for nearly 50% of all the NN compounds in my data. It therefore seems

justified to state that, in the absence of additional elements (i.e. prepositions), appositional

nouns tend to favour some sort of coordinative or copulative association. Although these types

of compounds were not under investigation in Arnaud (2003), their prevalence in French was in

fact noted. Arnaud distinguished between these types of compounds and a category he called

composé timbre poste (CTP). The distinction was based on the results of a series of judgement

tests in which Arnaud asked ten participants to indicate whether a particular sentence—

constructed using material from various compounds—was acceptable (+), unacceptable (-), or

marginal (?). The results for three compounds, all of which are included in my data and

correspond to one of three “equative” relations above, are given in the following table:

Page 233: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

219

Table 6.2. Results of judgement tests in Arnaud (2003).

Test auteur compositeur

(COORDINATION) scie égoïne

(HYPERNYMY) oiseau mouche (SIMILARITY)

Un C est un N1 ? + +

Un C est un N2 + ? −

C’est quoi comme N1 ? ? + +

C’est quoi comme N2 ? ? − −

Arnaud’s CTP category of compounds are largely identified by those combinations for which,

under normal circumstances, most speakers would produce a “ + − + −” result pattern for his

four tests. In other words, he is strictly interested in clear cases of endocentric left-headed

compounds. According to Arnaud, this criterion automatically excludes both hypernymic and

coordinative compounds for a number of reasons. On the one hand, although hypernymic

compounds, which he calls composés génériques-spécifiques, seem to favour a left-headed

reading for most speakers, they do not typically disallow a right-headed interpretation. Thus,

both scie égoïne and chêne rouvre produce marginally acceptable results for Arnaud’s second

test (e.g. ?une scie égoïne est une égoïne). As for coordinated compounds, such as bain-douche,

café-restaurant, chasseur bombardier, and point virgule, results varied not only across speakers,

but also across items. Arnaud labels these particular types as “composés combinatoires” and

argues that they are either “theoretically” exocentric (e.g. auteur-compositeur) or weakly left-

headed (e.g. bateau lavoir) (Arnaud 2003: 10).

According to the results of Arnaud’s tests, SIMILARITY based compounds should be treated as

CTPs as they usually produce a “ + − + −” pattern for most speakers (see oiseau-mouche in

Table 6.2), but Arnaud dismisses them on the basis that such compounds rely on an analogical

or equative relation that is too variable to analyze accurately. I agree with this assessment: as I

showed in Chapter 5, SIMILARITY is the loosest of the retained relations as it may involve any

number of different features in order to establish meaning (e.g. looks like, smells like, tastes

like, etc.). I would argue, however, that this particular type of association is sufficiently

important (and frequent) to warrant formalization, even if this formalization is only partial. It is

entirely possible that speakers, aware that SIMILARITY is a core relation linking elements to one

another, are able to establish the exact nature of the compound by using other means (i.e.

extralinguistic knowledge). Fundamentally, SIMILARITY may in fact be a strong indicator of

Page 234: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

220

compound meaning despite its variability, which is partially supported by its frequency in the

data (i.e. approximately 15% of the NN compounds examined rely on some sort of analogical

association). Moreover, it is a highly frequent relation in the literature, which suggests that most

researchers had to take into account its prevalence in their own data. The same might be said of

FUNCTION (e.g. circuit tampon), although here the analogy is much clearer: one thing does what

the other does—it serves the same purpose. While it’s true that Arnaud doesn’t specifically

discuss these types, there is little reason to believe that he would view them outside his

composés équatifs/analogiques (which include those involving similarity), which explains why

we find no such relation in his list of classes105.

A second point to notice regarding the results in Table 6.1 above is that reversibility is in fact a

limited operation. Not only are equative relations impossible to reverse, some reversed relations

are not present in the data despite the fact that there is little a priori reason for there not to be

(i.e. COMPOSITION and TIME). Furthermore, only PART shows a strong preference for its reversed

configuration (i.e. H that M is part of). French compounds involving a partitive106 relation

therefore follow the whole-part template, which corresponds to Warren’s (1978) findings for

English (see Table 5.4). This fact seems counter-intuitive, however, given that these two

languages are known to involve different head positions. Compare the following sets of

compounds:

(134) a. Part-Whole: eng. wheelchair fr. moteur-fusée

b. Whole-Part: eng. spoon handle fr. stylo-bille

This unexpected correspondence may be due to the different sizes of data examined (Warren’s

data contains 4,500 compounds, whereas mine contains just over 700), but it certainly raises a

number of questions regarding the head’s role in partitive compounds. It seems that for such

compounds in English, the head is more likely to denote the part component, while in French

compounds, the head will denote the whole component. One possible explanation is that French

105

Arnaud does include an être class in his typology, but it always involves some additional information, such as style of, status of, state of, etc. and never similarity or function. 106

It should be noted here that the term partitive is used to refer generally to part-whole associations and not to the traditional French grammar usage involving the preposition de.

Page 235: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

221

may rely on other constructions to express part-whole entities, namely N de N sequences (e.g.

bras de levier, tête d’épingle, corne de chevreuil, etc.). If these particular combinations do in

fact outnumber part-whole NN compounds in French, then we might begin to explain the

observations mentioned above. It certainly remains something worth further exploration.

Nevertheless, these results for French NN compounds reflect those from Arnaud (2003), whose

data contained only whole-part combinations. Most of the part-whole compounds identified in

my data allow for other analyses (e.g. moteur-fusée = PURPOSE: moteur pour une fusée), which

would, again, support an analysis of partitive French compounds in which whole-part is

expressed via NN constructions, while part-whole is expressed via N de N constructions.

The other two relations that might also favour a reversed template in French, although perhaps

only slightly, are PRODUCTION and CAUSE. These two relations seem to lean toward a

construction that puts the result in head position, thus producing the patterns result-causer and

product-producer. Interestingly, Arnaud’s (2003) analysis of compounds involving production

seems to indicate the opposite pattern:

(135) a. N1 that produces N2107 = 22 tokens

b. N1 that N2 produces108 = 13 tokens

An examination of his data, however, reveals that his interpretation of production is far broader

than mine, which might explain why our results don’t seem to coincide. For instance, Arnaud

treats compounds such as route collision and marteau reflex as “N2 produced by N1,” whereas

they would have been treated as LOCATION and CAUSE respectively in my framework. Thus, the

number of compounds in (135a) above is much higher in Arnaud’s data than it would have been

in mine. Short of analyzing all of Arnaud’s data according to my approach, there is no way to

know whether the patterns I identified for both PRODUCTION and CAUSE match his. Regardless,

there are so few compounds involving these relations in my data (13 and 7 respectively) that

stating unequivocally that a certain directional pattern is favoured over the other would be ill-

considered.

107

Included here are compounds paraphrased as “N2 is produced by N1 (such production is not its function)” and “N2 is produced by N1 (such production is its function).” 108

Included here are compounds paraphrased as “N2 is an artefact used to produce N1.”

Page 236: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

222

Looking at the other relations present in the data, we find that the highly related LOCATION and

PART are quite frequent. As was discussed in the previous chapter, it can sometimes be difficult

to distinguish between these two relations given that when something is a part of something

else, it is in fact located in it (e.g. bloc-cylindres). Nevertheless, these two relations are well

represented in the data, which largely coincides with other works. Not only are these relations

frequently discussed, but they are also quantitatively important (i.e. Warren 1978). It should be

noted that LOCATION shows only a slight preference for the basic template (i.e. H is located

on/at/in M) over its reversed form (i.e. H on/at/in which M is located). We might therefore say

that the LOCATION relation is fully reversible.

Another set of related relations that appear frequently in the data are PURPOSE and USE.

Although some purposive compounds such as pause-café and clé lavabo are relatively

uncontroversial, others might not be treated in the same manner due to the availability of a

reversed USE. For example, the compounds passage piétons and timbre-poste, analyzed as cases

of PURPOSE here, also allow for an alternative treatment (i.g. passage que les piétons utilisent).

The same can be said for some of those included under USE and which could be treated

differently (e.g. langage machine). The result is that the numbers given for PURPOSE and USE

might be different under a modified analysis. That said, there is little doubt that these relations

are present in the data in some form or another and that they should be factored into a semantic

theory of compounding.

The remaining relations are of less significance. Despite their low numbers, relations such as

COMPOSITION, PRODUCTION and CAUSE are unlikely to be discarded given their near universal

status in the literature on compounding. POSSESSION, absent for NN compounds, was retained

solely on its prevalence in the literature and its potential use for other constructions (i.e. N de

N). Relations such as TOPIC and TIME, however, might not fare so well. These are in fact so

marginal that one might wonder whether they are indeed worthy of retention. My earlier

discussion of these two relations mentioned that other researchers observed constraints on their

application, which is to say that they typically only appear in compounds where one element

shares the same semantic space as the relation. Therefore, compounds involving TOPIC tend to

contain nouns such as book or meeting, while temporal compounds contain nouns such as day or

season. The few compounds identified here for these particular relations seem to support this

assertion. Given the limited size of my dataset, however, it seems wise to refrain from making

Page 237: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

223

sweeping claims regarding these relations based on numbers only. While it is true that one

should not expect an even distribution of relations across all compounds, it may still seem

surprising to see some of these basic relations come up so seldom in the data. One possible

reason for the disparity is that the dataset itself is either too small or biased to be representative

of French NN compounds as a whole. It is in fact difficult to say whether the French Wiktionary

is skewed in favour of certain combination types, which would undoubtedly influence what

types of relations are present. Compiling and analyzing similar compound data from other

sources might provide greater insight into the distribution of the retained relations. That said,

Arnaud’s 810 compounds were taken from a number of different sources using no particular

collection methodology and his distribution of relations is just as asymmetrical as mine. The

most frequent high-level relation in his data, N1 → N2, is used with 269 compounds, while N2

SYMB N1 only with 3. Simply put, some relations are used more frequently than others within

compounds.

A second explanation for some of the more uncommon relations has to do with overlap. As I

discussed in the previous chapter, a few compounds share the same semantic space and could,

under certain circumstances, be reducible to a single relation. For instance, some have viewed

PRODUCTION and CAUSE as sufficiently related to combine them into a single relation (Downing

1977). Doing so would produce a relational class containing 18 distinct compounds, a perhaps

more robust number. Moreover, the approach adopted in this work, which is to say the

integration of reversibility into the formalism, also leads to the dilution of some relations. As

was discussed previously, a few of the retained relations overlap in their reversed forms. Thus

SOURCE (X made from Y) overlaps with the reversed forms of both PRODUCTION (X that Y

produces) and CAUSE (X that Y causes). Similarly, the line between PURPOSE and USE is often

blurred because of the reversibility of the latter (X is for Y ~ X that Y uses). Eliminating

reversibility and combining some relations would help to reduce the number of low frequency

relations, at the cost of losing some information. We may find, however, that with a

significantly larger corpus, some of the more marginal relations might have a greater overall

presence. This may also be true if we were to extend the analysis to other structures.

It is also worth noting that of the 729 NN compounds examined here, 63 were identified as

exocentric. Many, but not all of these compounds are lexicalized and therefore do not involve

Page 238: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

224

any discernible relation. There are, however, a few exocentric compounds that make use of

some of the relations retained, as in the following examples:

(136) a. ballon-panier LOCATION ‘sport’

b. jambon-beurre COORDINATION ‘sandwich’

c. chèvre-pied PART ‘mythical creature’

These instances of exocentric compounds involving a particular relation are not frequent (18

items) and mostly make use of coordinative and locative associations.

All in all, approximately 78% of the 729 NN compounds retained from Wiktionary could be

accounted for using the 15 relations discussed in the previous sections. If we ignore all

lexicalized compounds, which is to say those for which there exists no discernible relation

between their constituents, then that number rises to 92%. It would seem that the relations

retained are largely able to represent the majority of semantically motivated NN compounds in

French.

Of the compounds that could not be accounted for using the relations detailed in Chapter 5,

many did pattern together according to a variety of features or factors. Although some of these

compounds are so lexicalized that their meanings are not related in any way to those of their

constituents, others rely on other, less general relations to join their elements. The following

section will discuss in greater detail some of the compounds that eluded my analysis.

6.1.2 Residual NN Compounds

A total of 141 NN compounds do not seem to involve any of the 15 retained relations, 26 of

which are exocentric containing semantically unrelated elements and thus impossible to treat

synchronically (e.g. cap-mouton, chef-mets, coq-souris, etc.). These are disregarded for the

moment, but it may already be said that these compounds are most likely opaque to most

speakers, placing them alongside simplex lexemes. The remaining unanalyzable compounds fall

under a number of different categories, some of which are related to the methodological

decisions outlined in Chapter 3. Most unanalyzed compounds, however, are simply too

idiosyncratic to be said to involve general or basic relations.

Page 239: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

225

6.1.2.1 Idiosyncratic and Partially Unrelated Compounds

Of the unanalyzed endocentric compounds, 24 contain a modifier that has no bearing on the

meaning of the whole:

(137) a. aube-vigne from the latin albus, meaning white109

b. chou-croûte from the german kraut, meaning herb110

c. laurier-tin deformation of thymum or tinus111

These compounds are similar to exocentric compounds whose modifier may retain its meaning

(e.g. chat-château), but they differ in that it is still possible to determine what the compound

denotes (i.e. un laurier-tin est un laurier ~ *un chat-chateau est un chat/est un château).

In some cases, a compound will preserve the meaning of its constituents, but the association

between them remains sufficiently idiosyncratic that any attempt to label them using the

retained relations would require that they be manipulated significantly. Forty-two such

compounds were identified in the data:

(138) a. dent oeillère “Chacune des canines de la mâchoire supérieure ainsi appelées

en raison des douleurs vers l'oeil qu'elles peuvent provoquer112.”

b. laine renaissance “Laine qui provient des déchets résultant du détramage des vieux

draps et des chiffons113.”

109

“Étym. Aube, de albus, blanc, et vigne” (Littré 1873, Vol. 1). 110

“Étym. Allem. Sauerkraut, de sauer, aigre, sur (voy. sur, adj.), et, Kraut, herbe, l'assimilation avec chou ayant altéré sauer” (Littré 1873, Vol. 1). 111

“Cf. laure. Ac. 1694 et 1718 : -thin; 1740-1798 : -thym [. . .] Var. qui s'expliquent par confusion entre thym < thymum et tin < tinus (?)” (laurier-tin, TLFi). 112

TLFi. 113

Lacroix, Eugène. 1884. Arts et métiers des manufactures, des mines, de l’agriculture, etc. Tome III. Librairie scientifique, industrielle, et agricole: Paris. 24.

Page 240: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

226

c. retraite-chapeau “Retraite complémentaire d’un montant élevé allouée à certains

dirigeants d’entreprise ou à leurs salariés en récompense de leurs

services éminents114.”

A few of these cases might allow for the application of a particular relation, but such an analysis

is often strained at best:

(139) a. cocotte minute ‘cocotte qui fait cuire très rapidement’ (TIME)

b. eaux vannes ‘eaux souillées’ (SOURCE/LOCATION)

c. médicament conseil ‘médicament acheté sur le conseil (SOURCE)

du pharmacien115’

Of course, nothing truly prohibits one from choosing a particular relation for the compounds in

(139), but the approach adopted here makes treating them in this manner problematic. One will

recall that the relations retained are all meant to apply using a simple paraphrase, such as X

causes Y, X is similar to Y, etc. while also being sufficiently informative to disambiguate the

compound. Although the relations may not capture every subtlety or nuance of a particular

compound, they should communicate most of the semantic material held between its

constituents. For this reason, the compounds in (139) above were all treated as idiosyncratic. For

instance, although the compound cocotte minute involves a temporal sense (albeit

metaphorically), the TIME relation alone fails to account for the ‘cooking’ function derived from

the head constituent via its proper function (i.e. ‘pot whose proper function is to cook quickly’).

Nor does SOURCE fully account for the predication between the elements in either eaux vannes

or médicament conseil. The point here is that not all compounds that preserve the meaning of

their constituents can easily be assigned a relation. While opinions on a given relation’s

appropriateness may vary from person to person, the compounds in (139) do not make use of the

retained relations according to the parameters established in Chapter 5.

114

Wiktionary: <http://fr.wiktionary.org/wiki/retraite-chapeau>. This compound is a relatively recent coinage: a Google NGram query suggests that it was introduced in the early 1980s. This contrasts with a compound such as laine renaissance, which, according to the TLFi, is attested in the 19th century. It would seem, then, that idiosyncratic relations are not necessarily the result of semantic drift. 115

Wiktionary: <http://fr.wiktionary.org/wiki/médicament_conseil>

Page 241: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

227

Finally, the data examined also contain three compounds involving reduplication. They are

listed in (140) below.

(140) ami-ami, noeud-noeud, train-train

While these particular constructions could have been discarded prior to the analysis, they were

nevertheless retained because they met the criteria outlined in Chapter 3, namely that they are

pairs of appositional French nouns. Unfortunately, very little can be said about these particular

cases, as they typically involve magnification or intensification and not a relational association.

6.1.2.2 Nouns and Adjectives

This brings us to discuss other cases that met the criteria for inclusion in the dataset, but which

nevertheless introduced an additional layer of relational functions beyond the scope of the

retained relations and which therefore fall within the 141 cases deemed unanalyzable.

Not only are many lexemes polysemous, but they may also belong to different lexical

categories. The French lexicon is no exception—it is filled with words that straddle the

boundaries between nouns and adjectives and verbs. In some instances, these multi-faceted

lexemes escape the analysis put forth in Chapter 5. A total of 23 NN compounds examined

involve nominal modifiers that function more like adjectives than they do nouns, instances that

Noailly (1990) refers to as “attributive nouns” (substantif épithète). Some examples are as

follows, where the head is underlined:

(141) a. boeuf mode, dose limite, écart type, grandeur nature

b. maître cylindre, chef lieu

The inclusion of these types in my dataset is largely a consequence of the method I used to

identify the lexical categories of each of the individual lexemes collected. Because I relied

solely on the headings used in LPR 2010, I was forced to label the above modifiers as nouns and

not as adjectives (I will refer the reader back to Chapter 3, Section 3.4.2 for a detailed

explanation of the methodology adopted). The compounds in (141) are in fact probably best

Page 242: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

228

viewed as cases of N-A and A-N compounds. Mathieu-Colas (1996) labels these types of

combinations as N / A=n and A=n / N respectively116.

The relation that might be said to hold between the constituents of the compounds above is

attributive in nature: the non-head qualifies the head in the same manner that an adjective might.

In fact, if we consult lexicographic works, some of these nouns are often described in a sub-

entry as having an adjectival function. For instance, LPR 2010 provides an entry for nature as

“adjectif invariable” even though its lemma is labeled as a noun. Right-headed compounds such

as those in (141b) further highlight the possibility for preposed nouns to shift to a more

adjectival function as this is French’s preferred position for adjectives. Boeuf mode, while

similar to the other compounds in (141a), is somewhat unique as it is most likely a reduction of

the expression boeuf à la mode.

It certainly would have been possible to introduce a relation such as QUALIFY or ATTRIBUTE to

account for these types of compounds, but this approach would have only weakened my

analysis: all of the relations introduced earlier make use of basic predicates to link a

compound’s constituents, but no such approach is possible with adjective-like nouns. We may

instead consider them cases of shifted A-N or N-A compounds where the modifier noun is

involved in a copula-attribute relationship (i.e. dose qui est limite, écart qui est type, grandeur

qui est nature, etc.). In instances where such an intepretation is either difficult or impossible

(e.g. boeuf mode), the speaker may be required to establish meaning via other means (i.e.

analogy: boeuf mode → boeuf préparé à la mode). The difference here may very well depend on

whether a particular noun is known to function as an adjective.

It is unfortunately unclear how these types are to be viewed in terms of semantic transparency.

They are typically compositional in the same way as other relational compounds are, but

headedness may never pose a problem to the speaker given that French adjectives are either

116

Although this is partly conjecture, as most of the compounds in (141) are not found in his article detailing his nomenclature, the labels reflect the output of his methodological approach. Of the compounds formed on maître, he says: “Les composés en maître(-)N prêtent à ambiguïté pour le premier élément ; nous proposons de les regrouper dans une sous-classe des NN (maître cuisinier, maître imprimeur, maître-assistant, maître chanteur, etc.), sauf pour les non-humains, rattachés aux AN (maître-autel, maître-cylindre, maître couple, etc.)” (Mathieu-Colas 1996: 43). I have treated all such cases as attributive nouns (cf. Noailly 1990).

Page 243: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

229

preposed or postposed. This issue will be revisited in the next chapter when the relations are

incorporated into the feature set proposed in Chapter 4.

6.1.2.3 Classificatory Relation

Closely related to the adjectival association discussed above is a relation that serves what

Jackendoff (2010) calls a “classificatory” function, which is to say that the modifier is meant to

set the head apart without actually providing additional meaning. His examples are beta cell, X-

ray, Leyden Jar, etc. These particular cases differ from those involving nouns functioning as

adjectives because the former contain lexemes that do not actually possess semantic content.

Similar compounds are found in my data:

(142) facteur sigma, mâle alpha, particule bêta, rayon gamma

These types of combinations could also be said to involve attributive nouns, but unlike the

examples in Section 6.1.2.2, the modifiers in (142) above contribute little to no meaning to the

compound. In other words, beta in beta cell is strictly used to distinguish it from other types of

cells and is not otherwise meaningful. An appropriate paraphrase might be “H of type M” (e.g.

cell of type beta). These particular instances of compounds are not likely to be easily understood

by the layperson and no doubt require specific knowledge of the fields in which they are

typically employed. There are 14 such cases in my dataset, all of which involve a Greek letter

used in a highly technical context.

6.1.2.4 NN Compounds Involving Nominalizations

As has already been discussed, a number of works on compounds have relied on syntactic

relations to account for combinations that involve nominalizations, such as chewing gum or

dishwasher. Because Lee’s (1960) work was transformational in nature, these verbal-nexus (or

synthetic) compounds were easily accounted for using a syntactic approach (i.e. someone chews

gum → chewing gum). Similarly, Adams (1973) includes two major syntactic classes of her

own: Subject-Verb (e.g. filing clerk → the clerk files X) and Verb-Object (e.g. drinking water

→ X drinks water). Even in less syntactic oriented frameworks, there is usually some way to

account for compounds in which the elements are in a head-complement relation (e.g. V-Obj in

Lauer 1995; ARGUMENT in Jackendoff 2010). These relations are used not only to represent

Page 244: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

230

synthetic compounds of the type N V-er (e.g. truck driver, dishwasher, windbreaker, etc.

Roeper and Siegel 1978, Lieber 1980, 1992, Selkirk 1982, Botha 1984), but also to account for a

number of other constructions containing deverbal heads, such as habit-forming, hair-raising,

and time-consuming (Adams 1973). Synthetic compounds in French are typically of the V-N

type (e.g. ouvre-bouteille, lave-vaisselle, essuie-glace) and are thus equally open to more

syntactically motivated relations, although these constructions pose their own set of challenges

given the inflected nature of the head (Rosenberg 2008, Villoing 2009). Considering that

nominalization is a productive morphological mechanism (Chomsky 1970), it is therefore likely

that French NN compounds containing a nominalization could involve the base’s internal

arguments. A few such cases are in fact present in my data, all of which were identified using an

ARGUMENT label:

(143) a. administrateur réseau, groupement phosphate, photo-interprétation

When the non-head element is the direct argument of the head, the compound may be

paraphrased by of (fr. de): administrateur de réseau, groupement de phosphate, interprétation

de photo. This particular fact is also paralleled by a number of established N de N constructions:

augmentation de salaire, gestion de risques, modulation de fréquence, etc. As for those cases

where the modifier is in fact the external argument, they can usually be accounted for using an

appropriate reversed relation (e.g. H that M causes/makes/uses/etc.).

A number of authors have, however, expanded the scope of their argument relations to include

other types of compounds, such as those containing zero-affix derivations (e.g. V-N compounds

in French), but this approach is hindered by a number of issues, none more troublesome than the

possibility of incongruous application: should hair brush, for instance, be interpreted as ‘X

brushes hair’ (cf. Adams 1973, Jackendoff 2010) or should it simply be treated as other non-

predicating compounds (i.e. “a brush for hair”, cf. most authors)? Adams does in fact allow for

this duality of classification, but more often than not, opts for the syntactic treatment, an

approach that sometimes leads to questionable choices: semantically, is there really a difference

between cold cure, sorted under her Verb-Object class (i.e. ‘cures cold’: 67), and cough mixture,

which she includes under the instrumental class (i.e. ‘mixture which cures a cough’ 73)? In

essence, it seems strange to treat compounds such as hair brush and steak knife differently (as

Jackendoff does, for instance) simply because the latter happens to involve a morphologically

Page 245: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

231

unrelated predicate (i.e. cut). By the same token, should pompe à eau be treated as a verbal

nexus compound because the head happens to look like its corresponding verb?

Jackendoff (2010) in fact grants his ARGUMENT function a great deal of power. This particular

function not only groups together the aforementioned deverbal headed constructions, but also a

number of compounds containing simplex nouns he argues possess semantic arguments. These

compounds permit an “of” (144a) or “by” (144b) periphrasis (N2 of/by N1):

(144) a. wardrobe color, food surplus, sea level, speed limit, union member

b. helicopter attack

According to Jackendoff, we are dealing with a non-head as argument if it saturates the

corresponding slot within the head’s argument structure (as in 144a) : *wardrobe color of her

clothes, *food surplus of potatoes (436)117. Although this test is sound (it clearly works for true

synthetic compounds), it can produce odd results for compounds that may also be said to

possess an internal argument, but which Jackendoff lists elsewhere under other functions:

(145) a. PART fingertip *fingertip of the thumb

c. LOC sandbox *sandbox of stones

One might also ask why Jackendoff only lists one compound with an internal argument (i.e

helicopter attack) and why it is not simply subsumed under his CAUSE or MAKE functions. In

fact, many of the compounds listed in those groups are nearly indistinguishable from helicopter

attack, such as sunburn under CAUSE or knife wound under MAKE. Of course, there may be a

genuine necessity to include syntactic based functions to account for NN compounds, as

evidenced by the examples mentioned earlier in (143): not only are these compounds difficult to

analyze using only basic functions, but doing so would arguably mask how their elements are

truly related. Unsurprisingly, Arnaud (2003) also includes the following three relation

actancielle low-level relations in his classification, which seem to target exactly these types of

compounds in French:

117

In cases where the non-head noun itself can take an argument, Jackendoff argues that expansion remains possible through argument inheritance (e.g. the wavelength of the light = the length of [a wave of the light], 436-437).

Page 246: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

232

(146) a. N2 est le patient du procès représenté par N1 (saisie contrefaçon)

b. N2 est l’objet de l’activité de N1 (ingénieur système)

c. N2 est l’agent/source du procès représenté par N1 (opération commando)

Of these three types, (146a-b) are compounds with modifiers that are internal arguments of their

respective heads (the example in 146a being a much clearer instance of a nominalization than

the one in 146b). The relations used in this work cannot account for these particular compounds.

Opération commando in (146c), however, may be accounted for using a basic relation (e.g.

PRODUCTION) as the modifier is instead an external argument (cf. helicopter attack). Based on

the difficulties discussed above—mainly related to zero-affix nominalizations in head

position—I have chosen to label as ARGUMENT only those compounds that involve heads

possessing an argument slot directly filled by the modifier. Compounds such as pompe à eau do

not meet this requirement because the modifier is actually filling an argument slot of an

unspecified predicate (i.e. pomper). Compare this to compounds such as mise à niveau or

passage à vide and this distinction becomes clearer. The French compounds mentioned in

(143a) and repeated below in (147a) are therefore grouped under this non-conceptual relation

(ARGUMENT)118. Although most instances seem to involve an overt affix, this is not always the

case, as shown by the compounds in (147b):

(147) a. administrateur réseau, groupement phosphate, photo-interprétation

b. retour chariot, mort-chien, auto-stop

Although not under investigation here, V-N compounds would also be labeled using the

ARGUMENT relation as they too involve a non-head constituent that directly fills one of the

head’s internal argument slots.

6.1.3 NN Compounds: Conclusion

Of the 15 relations retained in the previous chapter, 14 were present in the dataset examined.

Only POSSESSION did not seem to apply to NN compounds in French. One possible explanation

118

All compounds that could not be analyzed using one of the 15 retained relations are included under the category “Other” in Table 6.1, which includes compounds labeled as ARGUMENT as this is not a denotational relation.

Page 247: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

233

for this particular relation’s absence is that French genitive is marked nearly exclusively by the

preposition DE (and, to a lesser extent, à; this will be discussed in the following section).

Although English also uses a preposition to indicate possession (i.e. OF), it also makes use of

the ’s morpheme, which may be dropped at some stage of the compounding operation (e.g.

butcher’s knife → butcher knife).

The evaluation of the relations finds that French NN compounds favour equative relations, such

as COORDINATION, FUNCTION, SIMILARITY and HYPERNYMY. Other highly frequent relations (i.e.

with more than 40 items in the data) are PURPOSE, PART and LOCATION. Although the other

relations discussed earlier were also present in the data, they were not nearly as well-

represented. In some cases, there were so few instances of a given relation that other analyses

might instead be preferred (e.g. TIME and TOPIC). Overall, however, the set of relations discussed

in Chapter 5 remained highly relevant for NN compounds, which lends support to the notion

that compounding relies on basic, recurrent relations to bind constituents. It remains to be shown

whether the same can be said for other types of constructions, such as N à N. These particular

combinations will be the focus of the next section.

6.2 N à N Compounds

As was mentioned in Chapter 3, a small number of N à N compounds were collected from

Wiktionary with the intent of expanding upon the discussion of semantic transparency as it

might apply to more semantically restrictive constructions. The relations retained in the previous

chapter were therefore applied to 319 N à N compounds. Before exploring in further detail the

results of this analysis, I will first briefly examine some of the previous work on the preposition

à in an attempt to determine how its semantic role has traditionally been viewed. This will be

followed by a look at the N à N compounds retained here from the perspective of the relations

used in the present work.

6.2.1 The Preposition À

French compounding may involve a number of different prepositions as evidenced by Mathieu-

Colas’s (1996) inventory of French nominal compounds, which includes not only the major

linking units à, de, and en, but also other, less frequent prepositions as well, such as avec, entre,

Page 248: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

234

par, pour, sans119, etc. It is de, however, that boasts the most prominent place in his inventory,

which not only coincides with my data from Wiktionary (2234 occurrences of N de N,

compared to 319 occurrences of N à N), but also with textual frequency numbers cited

elsewhere (see Saint-Dizier 2006).

Despite these quantitative differences, à and de (and to a lesser degree, en) have both long been

viewed as semantically empty or underspecified prepositions (cf. prépositions incolores, Spang-

Hanssen 1963). Their usage can be highly variable and their meanings are often so numerous

that they very nearly cease to be meaningful. That said, most researchers agree that de is far

broader in meaning (and thus emptier) than à (Melis 2003). Much of the research on

prepositions has in fact focused on distinguishing between these two particular compounds as

they are sometimes in complementary distribution. According to Cadiot (1997), one major

difference between them is largely notional in nature: the use of à is intentional, while de is

extensional120. Thus, the distinction speakers make between verre à vin and verre de vin is

based on sense and denotation (or type and token). Verre à vin is used to refer to an entire class

of objects, while verre de vin can only be understood as denoting an instance of that particular

class. It has also been argued that à and de, like a number of other prepositions (sur/sous,

dans/hors, etc.), oppose each other in a locative sense (Cervoni 1991). This certainly seems true

of à and de from a directional perspective: à usually stands in for destination (i.e. je vais à X)

and de for origin (i.e. je viens de X).

Semantically, it may be difficult to determine with exact certainty just what meanings are

expressed by à. Dictionaries offer a number of different headings, some of which are quite clear,

while others are less so. Le Petit Robert (2010), for instance, lists the following four major

senses (for comparison, the TLFi lists seven):

119

The status of constructions involving these prepositions as compounds is in fact debated (Fradin 2009), but this is of little importance to the present work. 120

Cadiot defines the constrasting pair intentional/extensional using several criteria, but generally, intention is understood as the content of a word or expression, whereas extension designates the class of objects that the word or expression may potentially refer to (Cadiot 1997: 51-52).

Page 249: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

235

(148) I. Introduisant un objet direct (1)

II. Marquant des rapports de direction (6)

III. Marquant des rapport de position (4)

IV. Marquant la manière d’être ou d’agir (5)

These particular senses do not offer much in the way of specifics and so the editors have

attempted to further nuance the various meanings this preposition might possess by further

subdividing them into more precise usages. The numbers in parentheses indicate how many

entries each of these senses is said to have, which means that, according to LPR2010, à

possesses at a minimum 16 different meanings, many of which involve verbal, adjectival and

nominal complements (see Spang-Hanssen 1963 for an overview of what these usages entail).

Similarly, according to the 15th edition of Le bon usage (2011), à, when introducing a nominal

complement, is mostly used to express one of three broad relations:

(149) a. Possession/Belonging (Sections 1048 and 352)

b. Distribution (Section 1048) e.g. kilomètre à l’heure

c. Location (Section 1049)

These relations are perhaps all that remain of à’s once rich and diverse usage. According to Le

bon usage: “The scope of the preposition à was once far more extensive than it is today. It could

be used in a number of sentences where we now use avec, dans, de, en par, pour, selon, sur,

etc121” (2011: 1396). The multiple ways in which à is used are on full display in the numerous

comparisons this resource offers in its section on this particular preposition.

More descriptive than it is explanatory, Le bon usage does occasionally offer some insight into

the sort of restrictions that govern the usage of this preposition. Regarding the distinction

between the use of à and de in pre-nominal position, they observe that “[n]ominal complements

121

“Le domaine de la préposition à était autrefois beaucoup plus étendu qu’il n’est aujourd’hui. Elle pouvait s’employer dans bien des phrases où nous mettons avec, dans, de en, par, pour, selon, sur, etc.” (Le bon usage, 15th Edition, 2011: 1396).

Page 250: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

236

denoting containers are headed by à if it refers to a destination [pot à eau] and de when it refers

to its contents [pot d’eau]122” (2011: 463).

At its core, research on the preposition à is relatively convergent. Most works have looked to

isolate and describe the various usages of this preposition—and therefore its meaning—

according to the contexts in which it can appear. Oftentimes, these studies seek to reduce the

preposition’s usage to broad and highly general classes. Spang-Hanssen (1963), for instance,

splits à’s pre-nominal role into two groups. The first is unrestricted and may involve one of

three functions or relations: to indicate belonging (e.g. le costume à mon ami), to introduce an

abstract noun following il y a (e.g. il y a une raison à ce crime), or to mark, in a broad sense,

direction and destination (e.g. conduire à l’hôtel). Spang-Hanssen also discusses what he calls

conditioned usages of à, which are used to mark either a characteristic (e.g. armoire à glace) or

a purpose (e.g. canne à pêche)123. Bosredon and Tamba (1991), on the other hand, argue for a

far narrower approach to à and state that its usage falls within two general semantic paradigms.

The first involves combinations in which the second element is “a part, a property, a distinctive

definitional feature of the referent124” (51). They label this class using the preposition avec as it

can be substituted for à in most cases. The second paradigm, for which they mention the

paraphrases “that serves to, that is destined for, that functions using,” (51) is represented using

the preposition pour, again because it may be substituted for à. Examples for each class are as

follows:

(150) a. avec: casquette à carreaux, sac à rabat, chaussures à talons, papier à fleurs

b. pour: ver à soie, moule à gaufres, sac à dos, brosse à dents, homme à femmes

Although this two-way distinction serves Bosredon and Tamba’s purposes, one will no doubt

notice just how different some of the compounds within the same class are. This is no doubt due

122

“Les compléments des noms désignant des récipients sont introduits par à s’il s’agit de la destination et par de quand on envisage le contenu” (Le bon usage, 15th Edition, 2011: 463). 123

Spang-Hanssen also discusses a third conditioned usage of à, which involves predicative expressions such as c’est folie à vous de le croire and c’est aimable à vous d’être venu. (1963: 125). 124

“Le premier [. . .] concerne des [noms composés motivés] [. . .] dont le F2 est présenté comme une partie, une propriété, un trait définitoire distinctif du référent” (Bosredon et Tamba 1991: 50-51).

Page 251: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

237

to their reliance on other multipurpose prepositions as classifiers. While it is arguably true that

both pour and avec are far more meaningful than à, they unfortunately remain highly

polysemous and are not always sufficiently meaningful to further disambiguate a particular

combination (see discussion in Section 5.2 for an argument against the use of prepositions to

classify compounds). This is certainly true for many of the compounds in the POUR class,

which includes such widely different compounds as ver à soie and sac à dos. Despite this

shortcoming, these prepositions come up often as possible alternations that serve to distinguish

between N à N constructions.

Anscombre (1999), following his earlier work on compounds (1990), offers three types of

predicative relations for N à N compounds. The first, which he refers to as locative, is

paraphrased as “in some N1, there is an N2” (62). These compounds typically involve a

container/contained relationship (cf. comments in Le bon usage) and usually alternate with N de

N constructions:

(151) a. pot à fleurs dans le pot, il y a des fleurs (pot de fleurs)

b. réservoir à essence dans le réservoir, il y a de l’essence (réservoir d’essence)

c. jouet à piles dans le jouet, il y a des piles (*jouet de piles)

Although all three compounds can be paraphrased using a locative sentence, only those in

(151a-b) allow for conversion to an N de N construction. According to Anscombre (1992: 162),

jouet à piles in (151c) instead belongs to his second type of N à N compound, which he labels as

a statif (or suite non actancielle). This relation is characterized by a number of different stative

verbs, such as avoir, être avec, posséder, etc. and is paraphrased as “Some N1 V N2.” Other

examples are stylo à bille, meuble à tiroir, and verre à pied. Given the description Anscombre

provides for this particular class, one could argue that it attributes to à either a possessive or

partitive relational sense.

The third type of predicative meaning identified by Anscombre, and which he calls processif (or

suite actancielle), is also paraphrased as “Some N1 V N2,” but here the verb may denote any

number of dynamic actions. Compounds that represent this type are type à histoires, moulin à

café, and homme à femmes. In essence, this type corresponds to Allen’s (1978) articulation of

the Variable R, which is to say that à is a placeholder of sorts for an undefined relational value.

Page 252: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

238

Anscombre offers three basic tests to determine which type a given N à N construction might

belong to, but in some cases these tests are either inconclusive or rely on additional

argumentation in order to explain counterintuitive results. For instance, according to his tests,

homme à femmes is locative in nature, but he admits that this doesn’t make any sense and thus

argues that it is in fact an active sequence using other lines of reasoning (1999: 65). Of course, it

isn’t necessarily the effectiveness of his tests that is most important here, but rather his attempt

at circumscribing the possible meanings of à in an N à N sequence. Although his first two types

might be interpreted as locative and possessive/partitive respectively, his third type is far more

ambiguous and may involve a number of different meanings, which could potentially be filled

by some of the relations I discussed in Chapter 5.

Cadiot (1997), in his comprehensive work on French prepositions, identifies four types of

relations expressed by à in N à (DET) N constructions, each of which has a number of sub-

types. His four main types, which are summarized below, include the two types identified by

Bosredon and Tamba (1991), but are supplemented with two additional classes.

(152) i. à/pour bac à sable, cuiller à café, chair à saucisse, fer à cheval,

brosse à dents, canon à neige, aide à la traduction

ii. à/avec casquette à carreaux, chaussure à talon, steak au poivre,

armoire à glace, brioche aux noix, char à bancs, canot à moteur,

auteur à succès

iii. à/CIRCONS blessure au bras, réponse à chaud, lutte à mort

iv. à/META mort aux rats, chair à canon, pot aux roses

The first type (pour) includes several related and unrelated sub-types. Perhaps most noteworthy

is that, although this class is meant to group together N à N compounds that may allow for the

substitution of pour for à, Cadiot is only slightly concerned with the expression of purpose.

While he does refer to this relation as destinative, his analysis is far more locative in nature.

Thus, he not only talks about container/content combinations (bac à sable), but also about

bearer/borne (fer à cheval), degrees or types of physical contact (brosse à dent ~ colle à bois),

etc. This class also includes compounds paraphrased as “N1 produces N2” (canon à neige).

Page 253: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

239

As for the second type, it mostly consists of part/whole constructions, which Cadiot further

distinguishes based on just how integral a part it is (casquette à carreaux ~ char à bancs). This

class also includes combinations that express an instrumental relationship between elements

(e.g. canot à moteur), as well as a harder to assess “attributive” association (e.g. auteur à

succès).

Cadiot’s CIRCONS consists of two major sub-types consisting mainly of N à DET N

constructions. The first is again locative, but unlike the pour class, the locative constructions

allow for the prepending of “il y a” where N2 is the location of N1 (i.e. blessure au bras = il y a

une blessure au bras ~ bac à sable ≠ *il y a un bac à sable). The second sub-type describes the

manner (e.g. réponse à chaud) or result (e.g. lutte à mort) of N1 and, according to Cadiot, are

similar to constructions containing deverbal heads (e.g. mise à niveau).

META, the final type in Cadiot’s classification of the preposition à includes three subtypes, all

of which he deems semantically non-compositional. The first, which he calls délocutif125,

includes combinations such as mort aux rats and pied à terre. The second involves a metonymic

shift, such as in chair à canon and tête à claque. Finally, the third type of N à (DET) N

construction relies on a metaphoric sense, as in pot aux roses and sac à vin. The majority of

these constructions, if understood as compounds, are exocentric in nature.

Using Cadiot’s (1997) work as the basis for her own examination of à, Knittel (2010) also

argues that this preposition, when joining two nouns, will result in a construction belonging to

one of four classes, but her typology differs slightly from Cadiot’s. She summarizes her

observations in a table (partially reproduced on the following page), which shows how the

preposition’s meanings are, to some extent, a function of the conceptual classes of the

compound’s constituents (19):

125

Unfortunately, Cadiot is not clear regarding the use of this term for these types of compounds, but his glosses seem to indicate that they involve instances with an external subject (i.e. exocentric : “x donne la mort aux rats”, “x prend pied à terre”, etc., Cadiot 1997: 136).

Page 254: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

240

Table 6.3. Summary of Knittel (2010) relations for the preposition à.

Class Lexico-pragmatic Relation

N1 – N2 Examples N1 N2

1 -instrument destination sac à dos, verre à vin

2 whole part stylo à bille, bateau à voile

3 +instrument product/energy machine à pain, fer à vapeur

4 preparation ingredient tarte aux pommes, salade au thon

Although Knittel states that Classes 1, 3, and 4 are all, to some degree, locative in nature, only

class 4 is said to be entirely locative. She bases this distinction on the fact that compounds under

classes 1 and 3 involve potential locations (i.e. a wineglass without wine is still a wineglass,

whereas an apple pie without apples is no longer an apple pie). Otherwise, her four classes are

all distinct. Class 1 privileges combinations that are purposive in nature (although, like Cadiot,

Knittel emphasizes location more so than purpose), while class 3 involves compounds where N2

is either the product of N1 or the energy that allows N1 to function. Class 2 compounds,

although described as possessive, represents combinations in which N2 is part of N1. Only Class

4 sets itself apart from previous research as it strictly involves “dish-ingredient” combinations

and typically includes a determiner. Nevertheless, her first three types align closely with much

of what has already been said regarding the role of à.

In summary, research shows that although the preposition à, when introducing a nominal

complement, may express a number of different relations, its function generally falls into a

somewhat narrow band of possible roles, namely to express location and destination, part/whole

relationships, possession, purpose, and use (e.g. energy). Because there is such significant

overlap between previous research on à and the relations retained here, it is likely that the N à N

compounds collected will make extensive use of these associations. As we will see in the next

section, a small subset of the retained relations are indeed sufficient to account for most of the

compounds under investigation. A question one might ask is whether any additional relations

surface from a more in-depth analysis of the data. This will be the focus of the remainder of this

chapter.

Page 255: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

241

6.2.2 Results for N à N Compounds

Given that the preposition à is not in fact devoid of semantic content, it should come as no

surprise that compounds containing this preposition show a far more restricted use of the

relations discussed in the previous chapter. In fact, two particular facts arise when we examine

these types of constructions. First, N à N compounds involve fewer relations than NN

compounds and second, the preposition’s inherent directionality restricts the reversibility of

these relations significantly. As we will see, these facts are largely related to one another.

The following table contains the raw data resulting from the analysis of 319 N à N compounds

extracted from Wiktionary:

Table 6.4. Results of Compound Relations for N à N compounds.

Relation Basic Reversed Total PURPOSE 90 --- 90 PART 0 69 69 USE 39 6 45 LOCATION 13 14 27 PRODUCTION 18 0 18 SIMILARITY 8 --- 8 SOURCE 0 6 6 POSSESSION 5 1 6 CAUSE 5 0 5 COMPOSITION 5 0 5 TIME 2 0 2 COORDINATION 0 0 0 FUNCTION 0 0 0 HYPERNYMY 0 0 0 TOPIC 0 0 0 Other/Unanalyzable --- --- 38 Total 185 96 319

The first thing that we notice is that relations establishing some sort of parallel or equative

function between a compound’s elements are virtually non-existent for N à N constructions.

Page 256: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

242

Thus, COORDINATION, HYPERNYMY, and FUNCTION are not present in the data. In other words, à

makes it impossible for the following meanings to occur:

(153) a. H à M, where H is also an M / where C is both an H and an M

b. H à M, where M is a kind of H

c. H à M, where H functions as an M

Moreover, for similar reasons, the SIMILARITY relation is also significantly underrepresented:

(154) a. clé à béquille, clé à pipe, escalier à vis, ouvrages à cornes

b. châssis/fenêtre à tabatière, fenêtre à guillotine

The examples in (154a) involve physical similarities (i.e. H looks like M) and could in fact be

treated as particular cases of PART, though this analysis would require the use of analogy, as

they cannot be interpreted literally. The same can be said for the compounds in (154b), which

instead rely on a different type of similarity (i.e. H functions like M). What is important to note

about the above observations is that whereas NN compounds seem to rely heavily on equative

relations, N à N compounds do not. The preposition seems to block such relations—we might

also expect similar results with other constructions involving prepositions (e.g. N de N).

The strong directionality of the preposition also affects the centricity of N à N compounds.

Unlike NN compounds, which may be right-headed, the N à N constructions present in the data

are all left-headed.

N à N compounds seem to favour the PURPOSE, PART, USE, and LOCATION associations. These

four relations largely correspond to the major paradigms identified for à in the research

discussed in the previous section (Cadiot 1997, Anscombre 1999, Knittel 2010). Examples of

these compounds are as follows:

(155) a. PURPOSE bac à sable, boîte à lettres, brosse à dents, clé à bougies

b. PART armoire à glace, baignoire à porte, clé à molette

c. USE arme à feu, bombe à hydrogène, machine à sous

d. LOCATION banque à domicile, passage à niveau, sac à dos

Page 257: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

243

These four relations alone account for just over 70% of all the N à N compounds in my data,

which, again, reflects previous research on these types of constructions. It should be noted that

although PART follows both the basic and reversed patterns for NN compounds, it is only

reversed for N à N constructions: all occurrences are of the type “N1 of which N2 is a part.”

This particular template is also favoured for NN compounds, which further emphasizes that

partitive French compounds are strongly conditioned to have their head constituents denote the

whole element126 (cf. whole-part in Knittel 2010).

USE and LOCATION for N à N compounds, on the other hand, are both reversible (like their NN

counterparts), but not without some issues:

(156) a. USE reversed chardon à foulon, hache/frein/sac à main; chaise à

porteurs, abreuvoir à mouche

b. LOCATION reversed arbre à grives, boule à neige, chambre/tube à

air/gaz/vide; poire à poudre, moulin à prière

The reversed form of USE (H that C uses) could, in most cases, be treated as purposive. In fact,

only chardon à foulon (= ‘chardon que le foulon utilise’) in (156a) seems like a clear case of

USE, as it is difficult to claim that a thistle’s purpose is to be used by a fuller. Compounds such

as N à main, where N is an instrument of sorts, are analyzed as instances of USE, but one might

argue that N1 is “destined” for N2 and therefore an example of PURPOSE. This analysis has not

been retained here because PURPOSE is typically reserved for compounds in which the modifier

is the object of the underlying relation. This means that a purposive reading of hache à main

would result in an axe whose purpose is to cut (off) hands. The USE relation emphasizes that it is

instead an axe that one uses with one’s hand (as opposed to larger axes requiring the use of both

hands and one’s upper body) and thus falls under the “powered by” scope of this relation. The

same can be said for chaise à porteurs, which means ‘chair carried by bearers’ and not one that

carries them, which is why it is subsumed under USE and not PURPOSE. There may be sufficient

influence from the complement’s deverbal status to block the incorrect interpretation, but this

126

As was mentioned earlier in Section 6.1.1, this particular constraint is reversed for compounds involving de, a fact that is most likely related to free syntactic partitive constructions (e.g. morceau de sucre, pointe de tarte, litre de lait, etc.).

Page 258: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

244

remains to be seen. As for abreuvoir à mouche, its exocentric status favours USE over other

relations as it is not in fact a drinking trough for bugs, but is instead understood, metaphorically,

as one used by bugs (its meaning being ‘wound’).

Similar issues exist with N à N compounds and the reversed form of LOCATION (H in/on/near

which M is located), which is to say, that PURPOSE could also be invoked for some of the

compounds listed. Of the compounds analyzed as instances of a reversed LOCATION, arbre à

grives and boule à neige (snowglobe) are the most representative of this type. A number of

compounds built on chambre/tube/diode à N might also benefit from a purposive reading, but

such an analysis fails to recognize that these items serve a purpose other than containing

something, as evidenced by the inappropriateness of paraphrasing them using for (e.g. *tube

pour [le] vide, ?chambre pour le gaz). As for poire à poudre and moulin à prière, these are

either weakly endocentric or exocentric compounds that, despite the figurative meaning of the

head, seem to target a locative relation as either can be paraphrased as “H that contains M.”

Finally, LOCATION remains one of the better examples of a fully reversible relation as both the

basic and reversed patterns are as evenly distributed for N à N compounds as they are for NN

compounds.

Beyond the four main relations identified above, PRODUCTION is also well represented (18

items), even though its counterpart CAUSE is far less so (5 items):

(157) a. PRODUCTION abeille à miel, cabane à sucre, machine à café, vache à lait,

ver à soie

b. CAUSE armes à enquerre, tête à claques/gifles, charbon à tumeurs,

pierre à feu

Neither of these relations are reversible for N à N compounds. This fact contrasts with the

results obtained for NN compounds where these relations were present for both templates.

Again, this is most likely due to the preposition’s strong directionality. Knittel (2010) includes

PRODUCTION under class 3, which also accounts for combinations in which N2 is the source of

energy of N1 (which corresponds to USE in my typology). Also included under PRODUCTION

here are compounds denoting plants, such as acajou à pommes, arbre à cornichons and arbre à

pain, for which it can be understood that “N1 produces N2” (see Section 5.2.2.10 of Chapter 5

for a discussion of this analysis).

Page 259: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

245

As for CAUSE, only charbon à tumeurs and pierre à feu are relatively uncontroversial instances

of this relation, although the latter might instead be included under PRODUCTION. Armes à

enquerre refers to a coat of arms that possesses unconventional features. The word enquerre,

here, is slightly problematic as a number of similar, yet different analyses are available.

According to LPR 2010, this lexeme is part of an adjectival locution headed by a preposition

(i.e. à enquerre). Citing armes à enquerre as its example, LPR defines this entry as follows:

“Qui présentent une singularité, une irrégularité à éclaircir (en parlant d'armes).” If this

particular word only exists within the context of this construction, then it may be excluded from

the dataset according to the criteria set forth in Chapter 3. Yet, the word itself remains a noun

elsewhere. The TLFi, for instance, while it also discusses the word in the context of an

adjectival locution, provides the following definition based on the verbal form enquerir:

(158) enquerre emploi subst. masc. “Recherche de la signification, vérification”

Using this particular sense, it seems justified to treat the compound as meaning “coat of arms

that prompts inquiry,” which places it squarely within the causal relation, albeit with some

degree of coercion. The synonymous pair tête à claques and tête à gifles are also difficult to

treat, but are labeled as instances of CAUSE as they may be paraphrased accordingly (i.e. tête qui

cause des gifles). The TLFi defines these constructions as follows:

(159) tête à gifles: “Visage déplaisant et exaspérant de bêtise, de fatuité, à tel point

qu'on voudrait le gifler”

These compounds are often used to denote a person, which technically renders them exocentric.

What is interesting is that the result (i.e. gifle/claque) is never actualized. It is the desire to carry

out these acts that is provoked. Despite this particular characteristic, there nevertheless remains

a sense of cause between the compounds’ elements, which is why they have been labeled using

this relation.

Similar in structure to the above relations is SOURCE. Because of à’s strong directionality, this

relation runs counter to PRODUCTION and CAUSE for N à N compounds and instead patterns with

PART as a “reversed-only” relation. SOURCE related compounds can only mean “H that M is

from,” which is to say that the head element must denote the origin, while the modifier must

denote the resulting product. That said, this mirrored template is perhaps a further argument for

Page 260: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

246

folding SOURCE into either PRODUCTION and CAUSE, thus allowing these to be reversed. Of the 6

compounds labeled as SOURCE, most could easily be treated as reversed instances of

PRODUCTION:

(160) SOURCE: betterave/canne à sucre, mûrier à papier, palmier à huile, pierre à

chaux/plâtre

Under the current analysis, a compound such as sugar cane (canne à sucre) is paraphrased as

“cane from which sugar is made” (canne à partir duquel le sucre est fait). The alternative

treatment involving PRODUCTION (e.g. cane that produces sugar; canne qui produit du sucre),

while technically sound, is somewhat infelicitous as the origin element does not actually

“produce” the element denoted by the modifier. Treating the compounds in (160) as instances of

SOURCE has the benefit of not only reflecting the earlier observation that N à N compounds are

mostly unidirectional, but also remaining consistent with the distinction made earlier between

PRODUCTION and SOURCE for NN compounds (e.g. chêne-liège; see Section 5.2.2.10 for a

discussion). Moreover, viewing the compounds in (160) as instances of SOURCE and not

PRODUCTION better relates them to their N de N analogues, a few of which are given below:

(161) a. sucre de canne (SOURCE) ~ canne à sucre (SOURCE REVERSED)

b. huile de palmier (SOURCE) ~ palmier à huile (SOURCE REVERSED)

Source is the natural analysis for the constructions in (161) involving de, which lends support to

the treatment adopted here for their N à N analogues.

As for the remaining relations identified, most seem to be of marginal relevance for N à N

compounds. The relative frequency of COMPOSITION is similar to that of SOURCE, though the

occurrences identified are unlikely to raise any questions.

(162) a. COMPOSITION code à octets, (co)polymère à blocs, étoile à neutrons, puce

à gènes

As was mentioned in Chapter 5, COMPOSITION differs from PART in that one constituent is the

sole component of the whole denoted by the other constituent. The compounds in (162) above

are all plausible candidates for this particular relation as they may all be paraphrased without

difficulty as “H composed of M.”

Page 261: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

247

Very little can be said of the TIME relation. Only two tokens were noted in the dataset, both of

which involve a modifier possessing temporal features:

(163) TIME échange/marché à terme

The two cases of TIME, much like the ones identified for NN combinations, involve lexemes that

belong to temporal semantic classes. They are likely interpretable without a priori knowledge of

any compound relations. These constructions, along with those identified as temporal for NN

compounds, further suggests that TIME may not be a fundamental relation and instead simply

one that arises from the meaning of a compound’s constituents. As for TOPIC, no occurrences

were found for N à N compounds. Given how few cases were identified for NN compounds, it’s

not surprising that none were found in the N à N data. It is possible that a larger corpus would

result in some instances of TOPIC, but it may very well be that this particular relation is entirely

subsumed by other constructions such as N de N (e.g. livre d’histoire ~ *livre à histoire ;

chanson d’amour ~ *chanson à amour).

Finally, it was said in the previous chapter that the POSSESSION relation was inapplicable to NN

compounds and of only limited applicability to N à N compounds. The following compounds

are the only ones in my N à N data that accept a possessive reading, all of which are reversed (H

that M possesses):

(164) POSSESSION barbe à papa, bonnet à prêtre, bourse à berger/pasteur, fils à papa

There is, however, one basic case of possession, punk à chien (‘punk qui possède un chien’),

which, while clearly based on that relation, runs counter to what is expected in this type of

construction (see Chapter 5, Section 5.2.2.5 for further discussion of these cases).

Despite POSSESSION’s seemingly limited applicability, it was nevertheless retained because it

would no doubt be a necessary relation to account for a number of N de N constructions (e.g.

droit d’auteur). What is worth noting is that à is often said to express possession or belonging

alongside de, often interchangeably. Le bon usage (2011), for instance, lists this as one of à’s

roles (1396). In the research on à discussed in Section 6.2.1 above, possession comes up

frequently, albeit often in the part-whole sense, but there are also instances involving belonging

(e.g. le livre à Pierre). It might thus seem strange that this relation is so uncommon in the data.

Page 262: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

248

One possible explanation is that this particular usage of à belongs to oral speech. Le bon usage,

for example, notes that the possessive sense expressed in phrases such as “la fille unique à M. le

maire” or “le manteau à M. Bernard”, belongs to a colloquial register:

“The expressions mentioned above come from either tradition or popular speech, where à remains, nearly everywhere, in use to indicate belonging. (The statement from the Ac. 2001, art. à, IV, 1, “This expression is no longer in use,” is inadequate.) But seldom does it appear in written texts, outside of instances where authors wish to imitate former usage [. . .] or to reproduce local or popular expressions127.” (Le Bon Usage: 455)

Furthermore, they add that the nominal complement in possessive constructions involving à is

always a person or animal and never a thing128. The examples in (164) above seem to support

this statement.

This distinction is echoed in Spang-Hanssen’s (1963) work on prepositions, in which he remarks

that there is a striking contrast between the indication of possession in written and oral speech.

He observes that “only de is judged correct in these type of sentences, but in everyday speech à

is used freely, which should not be considered crude, but simply colloquial129” (33). These

characteristics of à’s usage might explain why N à N compounds in French seldom involve

possession, that the coining of compounds involving possession is influenced by a strong

prescriptivism in favour of N de N constructions. As Le bon usage remarks, only two N à N

fixed constructions seem to retain a possessive meaning, both of which are present in

Wiktionary and labeled as possession here: barbe à papa and fils à papa. The other cases listed

in (164) are exocentric compounds denoting plants, which entered the French language long ago

(e.g. bourse à pasteur, circa 1600 according to TLFi) at a time when à was perhaps still the

preferred marker of the genitive case.

127

“Les expressions signalées plus haut viennent, soit de la tradition, soit du parler populaire, où à reste, à peu près partout, très vivant pour marquer l’appartenance. (La formule de l’Ac. 2001, art. à, IV, 1, « Cette expression n’est plus en usage », est inadéquate.) Mais ceci apparaît rarement dans la langue écrite, en dehors des cas où les auteurs veulent imiter l’usage ancient [. . .] ou reproduire les expressions populaires ou locales.” (Le bon usage 2011: 455). 128

“Le complément concerne des personnes, parfois des animaux, jamais de choses” (Le bon usage 2011: 455). 129

“De seul est estimé correct dans ces sortes de phrases, mais le langage courant se sert plus volontiers de à qui n’est nullement vulgaire, simplement familier” (Spang-Hanssen 1963: 33).

Page 263: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

249

In summary, 281 (approximately 85%) of the 319 N à N compounds collected from Wiktionary

were accounted for using the 15 relations retained in the previous chapter. The most prominent

of these relations correlated strongly with previous research on à. It was also found that

reversibility is significantly impacted by the strong directionality of the preposition, which also

influences which relations are in fact applicable to N à N constructions. Thus, unlike for NN

compounds, equative relations are not permitted for N à N constructions. Despite the high

degree of relevance exhibited by the retained relations, 15% of the compounds examined could

not be accounted for. These residual cases will be the focus of the next section.

6.2.3 N à N: Residual Data

Like NN compounds, a number of N à N compounds could not be analyzed using the relations

from Chapter 5. Of the 319 N à N compounds examined, 38 defied analysis. Although a few of

these bear some resemblance to the unanalyzable NN compounds discussed earlier, there are a

number of particularities to N à N compounds that placed them outside the scope of my

analysis. I will look at each of these in turn.

6.2.3.1 Idiosyncratic and Semantically Unrelated N à N

Similar to NN counterparts, a number of N à N compounds involve lexemes that do not

contribute meaning to the whole. A total of 17 exocentric N à N compounds were identified, all

of which fall into one of three groups:

(165) a. N1 metonymic/metaphoric bouche à feu, pelle à cul

b. N1-N2 unrelated (lexicalized) manche à balle130, pot à tabac131

c. Idiosyncratic relation face à main, tête à queue

The first type of exocentric compound includes those in which the leftmost constituent, the

head, only retains its meaning from a metaphoric or metonymic perspective. Thus, bouche à feu,

130

manche à balle: “région. (arg. étudiants de Belgique). Étudiant qui se signale par son zèle à travailler” (TLFi). 131

pot à tabac: “personne petite et grosse” (LPR 2010). There is also a literal acceptation for this combination, for which the relation would be PURPOSE.

Page 264: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

250

referring to a canon, is based on a metonymic relation132 (‘mouth of the canon’), while in pelle à

cul (lawn chair), the meaning of the head is metaphorical. Because N à N compounds are always

left-headed, there are no exocentric compounds in which only N2 is unrelated. Much like for

NN compounds, however, it is possible for neither constituent to contribute meaning to the

whole, as in the examples in (165b) above. Most of these compounds can be said to be

lexicalized. As for the compounds listed in (165c), these are exocentric N à N constructions

which retain their meaning, but that rely on relations not included in my list: face à main refers

to a small pair of binoculars held up to ones face using a handle, while tête à queue refers to a

movement in which the head and tail swap positions. These compounds are not unusual,

however, as they seem to rely on functions long since attributed to à, namely destination or

direction. The closest relation available here is LOCATION, but it is not able to capture the

meaning of these compounds adequately, most likely because of their exocentric nature.

Like NN compounds, a number of N à N compounds also involve some form of reduplication,

though in this instance they are mostly exocentric:

(166) goutte-à-goutte, main à main, mot à mot, porte-à-porte, terre-à-terre

Unlike their NN counterparts, N à N compounds involving identical lexemes typically express a

sequence of acts or events, but with a sense of repetition, which is to say one after the other (e.g.

goutte-à-goutte = ‘goutte après goutte’, porte-à-porte = ‘aller d’une porte à l’autre’). In most

cases, these constructions are adverbial in nature and have been nominalized133.

There were only 17 endocentric compounds that could not be analyzed using the relations

retained. Some of these include cases for which the modifier does not contribute meaning (as in

167a), while others simply involve a head and modifier connected in some idiosyncratic fashion

(as in 167b):

132

No doubt mouth here is also, to some degree, based on a metaphoric interpretation, but the use of this particular lexeme to refer to an opening is well-established (mouth of a cave, mouth of a bottle, mouth of a river). 133

For instance, the TLFi says of goutte-à-goutte: “Substantivation de la loc. goutte-à-goutte attestée dès 1170.”

Page 265: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

251

(167) a. manche à balai, valet à patin

b. logiciel à contribution, pelle à balai, acquit-à-caution

Perhaps more problematic are a series of compounds which clearly preserve the meaning of

their individual components, but which involve already established phrases headed by à:

(168) boule à zéro, compte à rebours, fabrication à façon, oeuf à cheval, steak à cheval, tueurs

à gages

These compounds are actually best treated as instances of an N with a postposed PP, as the latter

are constructions in their own right (e.g. à rebours, à façon, à cheval, etc.). Of the tests

discussed in Chapter 3 for distinguishing between syntactic constructions and compounds, the

separability criterion shows that most of the cases in (168) are not sufficiently atomic (e.g.

fabrication industrielle à façon, steak saignant à cheval). Even if we treated these constructions

as compounds, assigning them a relation is difficult, as doing so usually requires that the

preposition be replaced by the targeted relation. Because these PPs have specific meanings that

differ greatly from the meaning of their internal nouns, the preposition cannot be removed from

the construction. For instance, à cheval means ‘straddle’, while the simple lexeme cheval

means, among other things, ‘horse’. Even if a relation were acceptable for a compound such as

oeuf à cheval, paraphrasing it as oeuf RELATION cheval could not possibly work. These instances

are therefore beyond the scope of the analysis proposed here.

6.2.3.2 N à N Compounds Involving Nominalizations

Also present in the dataset of N à N compounds are a number of nominalizations with internal

arguments filled by the modifier, though their numbers are limited. These were mentioned

briefly in Section 6.1.2.4 above, where they were labeled using the ARGUMENT relation. They

are repeated here for convenience:

(169) mise à disposition/jour/niveau/pied, passage à vide, maintien à poste; condamné à mort

Such constructions are perhaps sufficiently transparent so as to avoid, under most

circumstances, lexicalization, which would explain why so few cases are present in the

Wiktionary. Another explanation might be that there aren’t actually that many resultative

nominalizations of verbs with indirect objects that allow for constructions similar to those in

Page 266: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

252

(169). One possible constraint is that the proposition must be locative. This would explain why

the constructions in (169a) below are acceptable, while those in (169b) are not without the

inclusion of a determiner.

(170) a. affichage à écran, arrivée à destination

b. *aboutissement à résultat, *invitation à mariage, *contribution à projet

Further examination of these types of constructions, while warranted, goes well beyond the

scope of this work. Suffice it to say that N à N construction, like NN compounds, do share some

similarities with other verbal-nexus compounds when the head constituent is a nominalization

and the modifier fills the former’s argument slot. Because the elements that make up these

particular types of constructions are connected in a relatively obvious way, it is likely that these

compounds are easier to understand than those requiring that some unexpressed relationship be

established by the speaker.

6.3 Summary

In this chapter, I examined how the 15 relations proposed earlier might apply to French

compounds by analyzing 729 NN and 319 N à N compounds. Factoring out compounds whose

elements did not contribute meaning to the whole, the retained relations were able to account for

92% of all NN compounds and 94% of all N à N compounds. The results of this analysis also

show that these two types of constructions favour different relations. Table 6.5 on the following

page contains the relative frequency134 of each relation only for compounds for which a given

relation was assigned.

Given what is understood about the preposition à, it is not surprising to see that fewer relations

were present for compounds involving this preposition than those that do not. NN compounds

seem to make use of all retained relations save POSSESSION.

134

The relative frequency is calculated by dividing the number of compounds of a given relation by the number of compounds that were labelled using a relation. Relations with a relative frequency greater than 0.05 (i.e. 5%) are in bold.

Page 267: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

253

Table 6.5. Relative frequency of relations across compound types.

Relation NN N à N COORDINATION 0.194 0.000 SIMILARITY 0.191 0.029 FUNCTION 0.136 0.000 LOCATION 0.098 0.098 PART 0.077 0.250 PURPOSE 0.074 0.312 HYPERNYMY 0.070 0.000 USE 0.052 0.163 COMPOSITION 0.033 0.018 SOURCE 0.024 0.022 PRODUCTION 0.019 0.062 TOPIC 0.014 0.000 CAUSE 0.012 0.018 TIME 0.005 0.007 POSSESSION 0.000 0.022

In the following chapter, I will attempt to incorporate some of the results and observations made

here into a typology of semantic transparency based on the features discussed earlier in Chapter

4. It should be noted, however, that the overall relative frequency of relations will not factor

heavily into the proposed typology. On the one hand, the core dataset is most likely too small to

truly reveal unambiguous distributional information about the retained relations, and on the

other, relational constraints imposed by a compound’s constituents are probably better

predictors of what associations may emerge at the level of meaning construal. The incorporation

of the relations will instead be based on the complexity of their semantics, along with SRI

values in instances where compounds share otherwise identical transparency profiles. These

aspects will be explored in greater detail in the first half of Chapter 7.

Page 268: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

254

Chapter 7

Putting It All Together

In chapters 4 and 5, I examined several different morphological and semantic properties of

compounds that I argue play a role in their degree of semantic transparency. Some of these

properties are variations on features discussed elsewhere in the literature on transparency, while

others, such as a compound’s semantic relation, have received less attention within this context.

Although most of these features were addressed individually, they all show a great deal of

interdependency, both in terms of the constraints they introduce and the ways in which they

combine to form a compound’s meaning. Consequently, if traditional models of semantic

transparency are to be expanded upon, these features must be assessed holistically. In this

chapter, I explore how these properties relate to each other and conclude with a re-examination

of the data collected in light of the proposed typology of semantic transparency.

7.1 Semantic Transparency: A Definition Revisited

At the conclusion of Chapter 2, I offered a working definition of semantic transparency,

repeated here for convenience:

(171) For a lexical unit C, semantic transparency refers to the degree of semantic

interpretability of C

Little has in fact changed since this definition was first proposed. The focus has instead been on

establishing a set of features that might allow for a better evaluation of the perceived challenges

involved in determining the meaning of a given compound. Consequently, several factors were

explored, all of which were assumed to play some part in transparency as defined above. Before

going into greater detail on how these features might relate to each other, I would like to briefly

emphasize a few key points about the definition in (171).

Page 269: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

255

First, semantic transparency is to be understood as a function of the relationship between form

and meaning to the extent that this relationship may be inexistent or imperfect. In other words,

although the meaning of a compound is usually related to the meaning of its parts in non-trivial

ways, this relationship seldom succeeds in explicitely communicating all aspects of a

compound’s meaning. For instance, although the meaning of doghouse may be formalized as

‘house’ ⊕ PURPOSE ⊕ ‘dog’, this simple concatenation fails to highlight that a doghouse has a

certain shape and size, is usually located in a yard, doesn’t typically have a door, etc. This

additional information is not necessarily out of the speaker’s reach, however, given that

combining concepts also involves establishing compatibility between them. In doghouse, the

modifier imposes certain constraints on the head, which allows for greater denotational

specificity. Nevertheless, there exists a discrepancy between form and meaning, and it is

precisely this discrepancy that makes compounds good case studies for semantic transparency.

Conversely, this deviation is also why compounds pose significant challenges to any model of

the phenomenon. Moreover, when we consider that not all of a compound’s constituents may

factor into its meaning, the relationship becomes even more tenuous. This is not to say,

however, that a compound may not be transparent, but that transparency is a highly relative

concept.

Second, it bears repeating that any claim regarding transparency or opacity must largely ignore

the fact that it may vary from one speaker to the next. Semantic transparency is therefore a

speaker dependent concept. Unfortunately, there is no way to account for this fact in a

generalized theory of transparency. For instance, if someone is unfamiliar with the words dog

and house, he or she is unlikely to understand the compound dog house, which would render this

compound opaque to him or her; universally, however, this proposition is untenable. We

therefore cannot discuss semantic transparency with individual speakers in mind, but must

instead attempt to evaluate the concept with a sort of idiolectic agnosticism. In other words, a

theory of semantic transparency should be formulated according to an “ideal speaker-listener”

paradigm (Chomsky 1965). The only assumption we may make is that the speaker or listener is

familiar with the construction’s constituents.

Third, semantic transparency should be understood as a characteristic that applies to both

existing and novel compounds, though perhaps not in exactly the same way. In the case of novel

compounds, one might prefer Štekauer’s (2005) term “meaning predictability” as it emphasizes

Page 270: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

256

that meaning for newly coined words should ideally be predictable. This is not to say that

predictability doesn’t apply to existing compounds, but the focus should rather be on how

“accessible” meaning is given that transparency is only relevant if the speaker doesn’t already

know the meaning of the compound. After all, a novel compound doesn’t have an established

meaning: the speaker coins a new compound with a particular designatum in mind, which is

typically related to the semantic representations of its constituents; meaning predictability is

therefore the likelihood that a novel combination AB will mean M given the meaning of A and

B. Existing compounds, on the other hand, already have an established meaning, which may or

may not be predictable. The question here is therefore “To what degree is the meaning of C

derivable from that of A and B?” Although the distinction may be subtle, it nevertheless remains

crucial.

Finally, the approach to transparency adopted in this work is primarily focused on the

interpretive process. This is not to say that transparency has no effect on the creation of novel

compounds or the use of existing ones, but rather that a construction’s degree of transparency is

largely a matter of the listener’s ability to establish meaning based on the information available,

which we may assume consists of two parts: the construction itself and the context within which

it was used. Crucially, the most transparent compound is one that does not require any context to

be understood. When a compound is first coined, it is ideally created so as to achieve maximal

transparency (cf. Grice’s maxims of quantity and manner, Grice 1975), but once the item is

established, the factors or conditions that originally motivated its creation may no longer be

obvious to the listener. Any processing costs associated with transparency are therefore incurred

at the level of interpretation.

If the premises above are in fact well-founded, then we may expand upon the definition

provided earlier by, on the one hand, elaborating on what transparency entails, and on the other,

incorporating the features explored in previous chapters. Semantic transparency may therefore

be defined as follows:

Page 271: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

257

(172) For a lexical unit C composed of units A and B, for which the meaning(s) of A and B are

known, semantic transparency refers to the degree of semantic interpretability of C,

given

i. the headedness of C

ii. the compositionality of AB

iii. the nature of the relation held between A and B

iv. the semantic homogeneity of C-like constructions

The first three properties listed in (172) represent the minimum factors required to adequately

formalize semantic transparency for compounds. The homogeneity property in (172iv) is meant

to augment the relational features in (172iii).

The aim of any formal model is to remain as parsimonious as possible while providing the same

descriptive and explanatory value that another, more complex system might offer. In this regard,

the features and factors retained in this work represent an attempt at extending existing models

of semantic transparency without introducing a large number of distinct elements. While it is

entirely possible that future work on transparency may lend support for the incorporation of

additional factors, the definition provided above remains sufficiently expanded for the purposes

of developing a typology involving compounds.

7.2 Semantic Transparency: A First Pass

Chapter 3 looked at a pair of related features that have often been cited as crucial components to

a compound’s semantics, namely headedness and compositionality. Several refinements were

proposed, however, to better account for some of the variation observed in the data.

Consequently, additional factors such as head position and sense extension were incorporated

into previous models of semantic transparency. Together, these features allow for compounds to

be classified in a manner that reflects the challenges they pose at the level of interpretation.

7.2.1 Primary Factors

It was argued that the most crucial factor in a compound’s semantic transparency is its

centricity: an endocentric compound is more transparent than an exocentric one. This is largely

Page 272: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

258

due to the fact that endocentric compounds offer a starting point for processing multi-word

lexemes by providing an answer to the question “what is it?” An endocentric compound

typically supplies this information by way of its semantic head, which is a hypernym of the

compound. Exocentrics do not usually provide this information and in instances where they do,

it is usually by means of a metonymy that does little to help the hearer in determining the nature

of the designatum (e.g. razorback = wild hog). Favouring endocentric compounds in a typology

of transparency is also, to some extent, supported by the data: the majority of the compounds

examined are in fact endocentric, which is to be expected if compounds are, at the most basic

level, meant to communicate meaning in sometimes contextually impoverished conditions.

In the case of headed compounds, it was also argued that the position of its head is a major

factor of transparency, where we may distinguish between canonical and non-canonical

positions. Again, the distinction rests on the assumption that non-canonical heads will pose

greater challenges at the level of interpretation. Moreover, the head may be subject to sense

extension, such as metaphor or metonymy, which will have an effect on how easily the

compound’s designatum may be identified. The result is a compound that is either strongly or

weakly endocentric.

Compositionality is determined according to individual components’ meaning in relation to that

of the whole. In other words, a compound is fully compositional if both constituents retain their

meaning. For semantically headed compounds, this feature is based on the meaning contribution

of the non-head element, which may undergo sense extension, thus reducing its degree of

compositionality. Although endocentric compounds cannot be non-compositional because the

head contributes meaning to the whole, exocentric compounds may go from fully compositional

to non-compositional, depending on the retention of meaning by their constituents.

These features may be combined into a hierarchic tree in which each terminal node represents a

particular combination of properties. The results of this distribution is presented in its entirety in

Figure 7.1 on the following page.

Page 273: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

259

Figu

re 7

.1. A

typo

logy

of s

eman

tic tr

ansp

aren

cy o

f com

poun

ds.

Com

poun

d

Endo

cent

ric

Can

onic

al

Hea

d

Stro

ngly

En

doce

ntric

Fully

C

ompo

sitio

nal

Wea

kly

Com

posi

tiona

l Pa

rtial

ly

Com

posi

tiona

l

Wea

kly

Endo

cent

ric

Fully

C

ompo

sitio

nal

Wea

kly

Com

posi

tiona

l Pa

rtial

ly

Com

posi

tiona

l

Non

-C

anon

ical

H

ead

Stro

ngly

En

doce

ntric

Fully

C

ompo

sitio

nal

Wea

kly

Com

posi

tiona

l Pa

rtial

ly

Com

posi

tiona

l

Wea

kly

Endo

cent

ric

Fully

C

ompo

sitio

nal

Wea

kly

Com

posi

tiona

l Pa

rtial

ly

Com

posi

tiona

l

Exoc

entri

c

Fully

C

ompo

sitio

nal

Wea

kly

Com

posi

tiona

l Pa

rtial

ly

Com

posi

tiona

l N

on-

Com

posi

tiona

l

Sem

antic

Rel

iabi

lity

Inde

x (S

RI)

Se

man

tic R

elat

ions

Page 274: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

260

The proposed hierarchy contains 16 possible permutations, but as was discussed in Chapter 4,

not all of these permutations are necessarily possible. No French compounds, for instance, were

found that involved an established trope on both the head and the modifier135, thus potentially

pointing to limitations imposed on weakly endocentric compounds. This is also true for non-

canonically headed compounds. It is unclear if this is a language dependent constraint or if other

factors related to French are responsible for these restrictions. Every other combination,

however, is attested, albeit with varying degrees of frequency. The following table shows the

distribution of compounds for each of the attested permutations in Figure 7.1.

Table 7.1. Distribution of features for the French compounds collected from Wiktionary.

Endo. Canonical Head

Strong Centricity Compositionality136

# of Items

NN N à N

+ + + Full 491 240

+ + + Weak 79 25

+ + + Partial 17 0

+ + − Full 8 16

+ − + Full 61 ---

+ − + Weak 2 ---

+ − + Partial 9 ---

+ − − Full 1 ---

− NA NA Full 31 20

− NA NA Weak 7 13

− NA NA Partial 8 1

− NA NA Non 17 4

One will notice that compounds generally favour combinations that provide as much semantic

content as possible. The vast majority of items in the data are not only endocentric, but also

canonically headed with modifiers that don’t involve tropes. Based on these observations, we

135

It should be noted that tropes may apply to both constituents for exocentric compounds, similar to Benzces’s (2006) findings for English. 136

While the other features are all binary, compositionality may have one of four values: fully compositional, weakly compositional, partially compositional, and non-compositional.

Page 275: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

261

may state that compounds favour transparency over opacity. If this is the case, how can we

further distinguish between what amounts to the majority of compounds? The lower section of

Figure 7.1 offers insight into additional means of classification, namely by using semantic

relations and the semantic reliability index to differentiate between compounds that might

otherwise possess the same degree of transparency.

7.2.2 Semantic Relations

While the meaning of a compound’s constituents is paramount to determining just how

transparent it is, the nature of the unexpressed semantic relation that connects its elements is

arguably just as important. In Chapter 5, I proposed a set of 15 basic relations based on a survey

of previous research on the subject. In Chapter 6, following a close examination of the

Wiktionary data, I discussed three additional associations (i.e. argument, adjective, and

classification). The focus was on determining how frequent some of these relations were, as well

as whether they might feasibly be applied across a large set of disparate compounds. The results

of the analysis showed that the majority of compounds do indeed make use of a relatively

restricted number of semantic associations, which suggests that speakers may be sensitive to this

information during processing. Psycholinguistic research lends support to this position (see

Chapter 5 for relevant discussion).

It should be noted at the outset that the semantic relations that hold between a compound’s

elements are inversely related to the degree of compositionality: the less compositional a

compound is, the less likely it is to involve a relation. In fact, this correlation is quite probably

discrete. In other words, only fully compositional and weakly compositional compounds may

make use of semantic relations to link their elements, which means that semantic relations may

only factor into a compound’s semantic transparency if it is compositional. Thus, the discussion

that follows is only relevant for a subset of the items in the typology, though this subset does

account for approximately 90% of the data collected.

7.2.2.1 Relation Types

The relations I proposed largely consist of frequent and recurring predicates used to join a

compound’s constituents (i.e. X causes Y, X is part of Y, X is a type of Y, etc.). These relations,

however, are not all functionally identical: they differ according to the nature of the relationship

Page 276: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

262

held between elements. Wisniewski (1996) distinguishes between three types of relations in his

approach to compound processing, which is to say that speakers link concepts via either

relational predication, property mapping, or hybrid combination. Costello and Keane (2000)

argue that there are in fact five such “interpretation types”: relational, property, hybrid,

conjunctive, and known-concept. A more widely used typology of compound types, however,

comes from Scalise and Bisetto (2009)—first proposed in Bisetto and Scalise (2005)—who

group together compounds according to three basic associations: a constituent may either

depend on the other (subordinate) or it may qualify it (attributive), or both constituents may

share equal status within the compound (coordinate)137. If we consider this particular typology,

we may distribute the 15 relations I proposed in Chapter 5 (along with the non-conceptual

relations discussed in Chapter 6) as follows:

(173) a. Subordinate: PRODUCTION, CAUSE, PART, COMPOSITION, SOURCE, LOCATION,

POSSESSION, TIME, TOPIC, PURPOSE, USE (also argument)

b. Attributive: SIMILARITY (also adjectival, classificatory)

c. Coordinate: COORDINATION; HYPERNYMY, FUNCTION

As we can see, most of the relations identified are subordinate in nature as they involve

constituents in a complement relation (see Scalise and Bisetto 2009 for a more detailed look at

their typology). Only SIMILARITY is attributive as it describes constituents that relate to each

other on the basis of property mapping (i.e. X is similar to Y based on property W). Also

attributive in nature are NN compounds in which one element functions as an adjective (e.g.

maître-cylindre, chef-lieu, expert-comptable). We may add that classificatory compounds, such

as particule bêta and mâle alpha, also involve an attributive association, but these types are not

as crucial given that they are not compositional (see Chapter 6, Section 6.1.2.3 for a brief

discussion of these cases). As for coordinate compounds, three relations retained in this work

correspond to this particular class, most prominently the COORDINATION relation. We may say

that this relation represents the most prototypical coordinative relation. HYPERNYMY and

137

It is worth noting that these three categories correspond exactly to those found by Wisniewski (1996) during his experiments: subordinate = relational, attributive = property mapping, coordinate = hybridization.

Page 277: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

263

FUNCTION, on the other hand, might be considered sub-types of coordinate compounds as they

allow for copulative expansion (this will be discussed in a moment).

No matter the number of relations identified, one must ask whether the nature of the relationship

between constituents has an effect on a compound’s degree of transparency. According to Bell

and Schäfer (2013), a compound whose designatum is an intersection of its constituents

represents the most fundamental case and might consequently represent the most transparent

instance of compounding:

“The most basic configuration possible would be one where A and B retain their original meaning, and the relationship is set to identity. That is, the property expressed by A and by B hold of the very same entity, and the semantics is thus intersective. These combinations might be regarded as the most transparent AB combinations. Classic examples result from the combination of Kamp’s (1975) predicative adjectives with a nominal head, e.g. fourlegged animal.” (Bell and Schäfer 2013: 3).

Although the type of relationship described by Bell and Schäfer coincides primarily with the

adjectival relation observed for AN and NA constructions, certain attributive NN compounds

would also be subsumed under this class (e.g. grandeur nature, maître-cylindre). These types

might therefore be considered instances of highly transparent associations. While compounds

involving a classificatory relation are also largely attributive in nature, their intersective

meaning representation is eclipsed by the fact that they make use of nouns that have no

designatum on their own (e.g. particule bêta, where bêta does not signify anything). This means

that, even if they are considered highly transparent, they fall under a less transparent category in

the typology (i.e. partially compositional). SIMILARITY, on the other hand, is not a likely

candidate for high transparency, this despite its inclusion under the attributive class in Bisetto

and Scalise’s (2005) typology. The fact is that, while SIMILARITY does differ from subordinate

relations, it is only partially intersective. A tigershark, for instance, is a tiger with some of a

shark’s features: only a subset of a shark’s semantic representation is included in the meaning of

the compound (i.e. a tigershark is not the set of sharks that are tigers). This contrasts with other

attributive compounds where the non-head constituent is typically included in its entirety in the

meaning of the whole (cf. maître-cylindre). Moreover, SIMILARITY is a highly malleable and

multi-faceted relation. In some instances, a compound involving this relation may target

physical similarities (e.g. chou-fleur), functional similarities (e.g. magasin phare), behavioural

similarities (e.g. fourmi-lion), etc. This wide range of possible associations is also further

Page 278: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

264

complicated when we take into account the fact that these compounds often rely on metaphor to

establish the relation between their constituents. A compound based on the SIMILARITY relation

is thus inherently more complex than other attributive types.

Also analogous to Bell and Schäfer’s concept of intersective semantics are the COORDINATION

and HYPERNYMY relations. Compounds involving these relations make complete use of the

semantic representation of their elements and are thus, as I’ve said previously, equative. While

COORDINATION is intersective, HYPERNYMY is inclusive, as shown in the following diagrams:

Figure 7.2. Difference between coordinated and hypernymic compounds.

A case can be made to consider these types as instances of Bell and Schäfer’s most transparent

class as, according to their approach, the relation is set to identity (i.e. BE). This may be shown

using the following copula constructions:

(174) a. COORDINATION a singer that is a songwriter / a songwriter that is a singer

b. HYPERNYMY an oak is a tree / the tree is an oak

We may also add compounds involving the FUNCTION relation to the list of intersective

combinations as they usually invoke an identity interpretation as well: compounds such as

mémoire tampon (buffer memory) or bateau lavoir (wash-shed), although not strictly speaking

coordinated compounds, may nevertheless be paraphrased using a copula (i.e. memory that is a

buffer) because a functionality reading can, under most circumstances, reclassify an object (i.e.

a rock used as a paperweight is a paperweight).

Are compounds based on the semantic intersection of their constituents really the most

transparent? Although there undoubtedly exist arguments in favour of a number of different

analyses, I propose that both the ARGUMENT and PURPOSE relations be ranked higher than those

discussed above with regards to their global transparency effects.

tree      oak  

singer   songwriter  

Page 279: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

265

The chief argument in favour of this position is that compounds for which constituents are in a

head-argument relationship require far less guesswork in how they should be interpreted: in

standard synthetic compounds, the modifier typically fills the first internal argument slot of the

head. Consequently, compounds such as groupement phosphate (phosphate grouping) and

administrateur réseau (network administrator), both instances of the ARGUMENT relation, are

semantically complete because relational information is present in the head constituent. In other

words, the predicate linking the elements is explicit. This is in stark contrast with most other

types of compounds where determining what relation might hold between their elements is not

immediately apparent. Only prototypical attributive compounds (i.e. those involving adjectives

or nouns with adjectival functionality) might allow for similar arguments to be made, although

the locus of the relation typically lies with the non-head element. Even coordinate compounds,

which are clearly intersective, still require that a semantic relation be established, namely one of

identity. This relation does not emerge from either constituent, but instead surfaces once they

are combined. In other words, nothing in the meaning representations of either singer or

songwriter indicates that a coordination of elements will emerge when they are combined; this

information is instead gleaned from the combination itself (i.e. they are co-hyponyms, they

share semantic features, etc.).

An interesting corollary to treating argument based compounds as the most transparent type is

that traditional French synthetic compounds, such as ouvre-bouteille and lave-vaisselle, are also

granted a great deal of transparency. This consequence seems entirely tenable given the fact that

this type of compound is both productive and frequent138, and poses few comprehension

challenges despite the fact that they possess zero-affix heads (‘a V-N is an artefact that does V

to N’). If one adheres to a strict semantic definition of centricity, these compounds typically fail

the IS-A test, which might lead to an exocentric treatment139, but the nature of the relation

would still ensure that they are granted the highest degree of semantic transparency within their

class. Exactly how they would be featured in the typology proposed here, however, is a matter

of future research.

138

In the original data retained from Wiktionary, there are 885 V-N compounds. 139

This is not to say that V-N compounds should in fact be treated as exocentric, as the phonologically unrealized head is functionally similar to the –er affix in English synthetics (Lieber 1992).

Page 280: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

266

Similarly, some purposive compounds are also semantically complete, although to a lesser

degree than typical synthetic compounds. When compounds involving the PURPOSE relation

target the head constituent’s proper function (PF), the exact nature of the relation is available via

the head. The difference between these types of compounds and synthetic compounds is that in

the latter the non-head element fills an argument slot of the head, while in the former, the non-

head is an argument of an unexpressed verbal predicate. Although synthetic compounds nearly

always allow for a paraphrase using of/de (as in 175a; cf. Jackendoff 2010), purposive

constructions typically use for/pour (as in 175b). Another distinction is that that the verbal

predicate may be morphologically distinct from the head for purposive compounds—this is not

generally the case for true synthetics. The following examples from English illustrate these

differences:

(175) a. deverbal head: truck driver driver of truck / person who drives a truck

snow removal removal of snow / act of removing snow

b. purpose (PF): fish bowl bowl ?of/for fish (bowl that holds fish)

bread knife knife *of/for bread (knife that cuts bread)

Despite the additional step required to establish how purpose links together the constituents, we

may argue that compounds involving the head’s proper function might prove easier to interpret

by speakers than other types. Compounds such as abri-vent, passage piétons and timbre-poste

are therefore closer to synthetic compounds in terms of transparency than compounds involving

other predicating relations.

In summary, the relations discussed above can be ordered as follows, from most transparent to

least transparent:

(176) Relational Transparency Hierarchy:

argument > purpose (PF) > adjectival > intersective > similarity

7.2.2.2 Ordering Subordinate Relations

What of the remaining subordinate relations? How might they be ordered? At the heart of this

question is how these relations might be said to emerge in the first place. Some authors argue

that compound relations are entirely a matter of pragmatics, which certainly explains the

Page 281: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

267

presence of less frequent or even idiosyncratic relations (Downing 1977, Bell and Schäfer

2013). While it is true that some novel compounds are indeed coined relative to the context of

their utterance (see Downing’s 1977 bike-girl example), many come into being simply because

the thing they are meant to denote is best described using a particular pair (or more) of words.

As Bolinger (1975) says (cited in Downing 1977):

“Words are not coined in order to extract the meanings of their elements and compile a new meaning from them. The new meaning is there FIRST, and the coiner is looking for the best way to express it without going to too much trouble.” (Bolinger 1975: 109)

Thus, the relation that binds a pair of words, while not entirely pre-determined, may be

surprisingly predictable. Wisniewski’s (1996, 1997) tests show that when two lexemes are

combined, the relation that is most likely to arise depends on how the hearer perceives their

relatedness. In other words, highly similar items will favour a property or hybrid reading (i.e.

zebra fish = animal-animal, resemblance), while different items will favour a relational reading

(i.e. bee sting = animal-act, cause). Although this approach allows for predictions to be made

regarding the emergence of top-level relations (i.e. subordinate, attributive, or coordinate), it

does not tell us exactly what sub-type might be actualized.

Another problem is that conceptual relatedness may not in fact be a very good predictor of

compound type. Experiments by Estes and Glucksberg (2000), for instance, show that otherwise

incompatible lexemes may result in a property interpretation if the modifier possesses a highly

salient feature capable of filling a relevant dimension in the head constituent. Their tests showed

that speakers favoured a property relation for feather luggage (i.e. ‘luggage that weighs very

little’), because “light” is a salient feature of feather and a luggage’s weight is considered highly

relevant. These results contrast with a combination such as feather storage, which they found

was far less likely to elicit a property reading from speakers because weight is not a relevant

feature of storage. According to Estes and Glucksberg, relational information depends on

features of both the head and the modifier, with one being incorporated into the other’s frame.

This model, often called a schema or slot-filling theory, is widely adopted and is at the heart of

several theories of conceptual combinations (among others, Wisniewski 1996, Estes and

Glucksberg 2000, Baroni et al. 2007). We may also draw parallels between this approach and

other models in combinatorial semantics (cf. Pustejovsky 1995).

Page 282: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

268

One question that arises is whether the locus of relational information is in the head or the

modifier. The standard approach is for the modifier to fill a relevant slot in the head constituent

(cf. feather luggage discussed above). Even in instances where the association between elements

is subordinative, if the relation may be derived from one of the elements, it most likely

originates from the head, similar to what was discussed earlier regarding proper functions (e.g.

fish bowl; a bowl is meant to hold something → a bowl that holds fish). On the other hand, some

have argued that relational information resides instead in the modifier. Gagné (2001), for

instance, argues that “relations are associated with the modifier’s representation, rather than

existing as independent structures’’ (247). This approach is based on research by Gagné and

Shoben (1997) that showed that the interpretation of a compound is facilitated by how

frequently a particular relation is used with the modifier. Speakers found compounds involving

the lexeme mountain far easier to interpret when they made use of a relation frequently used

with that lexeme (LOCATION: mountain cloud) than when they involved an infrequent relation

(TOPIC: mountain magazine). They call the set of a modifier’s relations its relational

distribution.

The examples provided by Gagné (2001), however, only serve to underline how the source of

relational information is perhaps not so static. In mountain cloud, location may indeed stem

from the modifier, but in mountain magazine, topic most likely originates from the head.

Another aspect that challenges Gagné’s approach is that it requires that many lexemes possess

an almost infinite number of relations. Estes and Jones (2006), who argue that “relations

constitute representational structures in and of themselves” (90), offer the following examples to

illustrate just how various the relations stemming from the concept bear in modifier position

would need to be:

(177) bear paw (part/whole), bear scare (causal), bear season (temporal), bear toy

(possessive), bear tracks (from), bear cave (habitat), bear cub (subtype), bear family

(of), bear story (about), bear playground (for)

Although Estes and Jones do not believe that relations are part of individual lexemes’ semantic

or conceptual representation, their examples lend support to an approach that favours the head

over the modifier as the locus of the instantiated relations. This is not to say, however, that the

modifier plays no role in the validation of such relations, as certain lexemes in modifier position

Page 283: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

269

will influence the type of relation available. Materials or substances, for instance, will favour a

composition reading when in modifier position (e.g. copper tube, rubber duck, paper bag, etc.).

Unfortunately, even this remains a tendency and not an immutable fact as compounds with

identical material or substance-like modifiers may differ in their use of relations (e.g. milk

chocolate: PART, milk glass: PURPOSE).

The following sections examine ways in which we might be able to rank subordinate relations

according to their degree of semantic transparency.

7.2.2.3 Source of the Relation

Given the previous discussion on the source of semantic relations within compounds, we might

wish to draw a hard line between modifier and head as factors in transparency. According to

Baroni et al. (2007), the head of a compound functions as the base of meaning composition if

the compound is relational, whereas the modifier fulfils this function for attributive compounds.

Intuitively, adjectival compounds do seem to depend on the modifier as the source of the

association—such is the nature of determination. The SIMILARITY relation also seems to function

in this manner, as a salient property of the modifier is applied to the head (e.g. a zebra fish is a

fish with stripes, stripes being a zebra’s most salient feature). In this way, the relation may be

said to originate in the modifier. As for subordinate relations, which are relational in Baroni et

al.’s framework, we might ask if the head is indeed a reliable predictor of compound type.

Unfortunately, there is no shortage of compounds that challenge this approach to the source of a

compound’s relation. Consider, for instance, the following examples:

(178) a. stylo-bille ‘Stylo dont la plume est remplacée par une fine bille de métal’

Relation: PART Source: bille? (modifier)

b. chêne kermès ‘chêne qui abrite les kermès’

Relation: LOCATION Source: chêne? (head)

c. bouton-pression ‘bouton engagé à l’aide de pression’

Relation: USE Source: ?

Page 284: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

270

In (178a), the PART relation may originate from the modifier, bille, but only if we understand it

as a component object. Although this is not entirely implausible, it does require that the meaning

representation of bille be either extended or underspecified. Similarly, in (178b), we may trace

the LOCATION relation to the head constituent, chêne, but again, the lexeme’s semantic

representation does not explicitly condition this association. This is further evidenced by the fact

that the three other instances of plant-animal compounds in the data do not involve a locative

relation (e.g. chou-vache, fougère aigle, menthe-coq). If we turn to bouton-pression in (178c),

we find a case where neither constituent may be said to bring about the USE relation. Such

compounds, while perhaps not representative, are certainly not exceptional: there are many

instances where the relation, when seen from the perspective of the whole, is entirely

reasonable, but when viewed from the point of view of the constituents, is not apparent without

significant coercion (e.g. code-barres, COMPOSITION; danse-poteau, USE; train-fantôme140,

LOCATION, etc.). Such cases suggest that relations may occasionally originate from the

combination itself and not from any individual constituent.

That said, many compounds do in fact make use of a relation associated with one of its

constituents. The compounds below in (179) may be contrasted with those in (178a-b) above:

(179) a. montre-bracelet PART source: bracelet (modifier)

b. sauce tomate SOURCE source: sauce (head)

In both cases, the relation in question seems to originate from one of the constituents. What we

notice, however, is that the direction of a relation’s application is largely a function of its source.

In other words, reversibility, as it was defined in Chapter 6, is tied to either the head or the

modifier’s role in producing the association. For instance, for locative compounds, the pattern

“X in/on/at Y” is generated when the non-head functions as the location, while the pattern “X

that Y is in/on/at” is the result of a locative head:

(180) a. manchot antipode ‘manchot situé aux antipodes’ source: modifier

b. poche-revolver ‘poche dans laquelle est situé un revolver’ source: head

140

“Train installé dans les parcs d’attractions [. . .] dans lequel les visiteurs viennent se faire peur avec [. . .] des apparitions lugubres.” (<http://fr.wiktionary.org/wiki/train_fant%C3%B4me>)

Page 285: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

271

If the origin of the relation cannot always be predicted based on the semantics of either the

modifier or the head, might we instead look to reversibility as a possible means to further

differentiate between subordinate relations?

7.2.2.4 Reversibility

During the presentation of the semantic relations retained for this work, I discussed the

possibility that some relations be applied according to either a basic template (H REL M) or a

reversed template (H that M REL). It is therefore worth exploring if splitting subordinate

relations according to reversibility might allow for their classification in terms of transparency

effects. In other words, is the active use of a predicate more transparent than a passive one (or

vice versa)? Compare, for instance, the following pairs of compounds:

(181) a. piétin-échaudage = H CAUSES M arrêt maladie = H CAUSED BY M

b. appareil-photo = H PRODUCES M café-filtre = H PRODUCED BY M

c. bouton-pression = H USES M code machine = H USED BY M

While it may seem appealing to argue that the compounds in the first column are more

transparent than those in the second column, by virtue of their thematic relations, there is in fact

little evidence to support this position. In fact, we may argue that this approach is difficult to

adopt for a number of reasons.

First, although in some instances, a compound’s relation might be ambiguous with regard to its

direction of application (e.g. it is feasible that arrêt maladie might be understood as ‘arrêt qui

cause une maladie’), most cases are largely unambiguous, regardless of their application.

Second, there is no evidence to suggest that a reversed template, while perhaps incurring

additional processing costs, necessarily reduces a compound’s interpretability from the

perspective of the speaker. In other words, is beeswax (‘wax made by bees’) more opaque than

sweat gland (‘gland that makes sweat’) simply because its head is the theme and not the agent?

This question brings us to a third point, which is that some relations clearly favour an indirect

application. PART for instance, is predominantly applied as H that M is a part of—it is therefore

unlikely that this template is in fact the most opaque of the two available for this relation.

Furthermore, some relations don’t seem to favour either template: LOCATION, for example, is

nearly evenly split between its basic template and its reversed form. Given these facts, the most

Page 286: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

272

prudent argument to make regarding reversibility is that it may affect transparency in a relative

manner, according to the relation involved. In other words, the reversed PART template for NN

and N à N compounds is more transparent than its basic form, but that no such distinction may

be made for the LOCATION relation. While this is useful for comparing compounds that make use

of the same semantic relation, it does little, however, to evaluate compounds across different

relational types.

7.2.2.5 Frequency

The relative frequency of certain relations might also be considered a likely indicator of

transparency: compounds that make use of highly frequent relations could be more transparent

than those involving low frequency relations. The distribution of relations across the data was

provided in Chapter 6. The results for unranked subordinate relations are repeated below:

Table 7.2. Number of compounds in the data for each subordinate relation.

Relation NN N à N LOCATION 56 27 PART 47 69 USE 32 45 SOURCE 12 6 PRODUCTION 13 18 TOPIC 8 0 CAUSE 7 5 TIME 3 2 POSSESSION 0 6

Although this approach is appealing for a number of reasons, namely the ease with which it can

be incorporated into a transparency model, it unfortunately fails to take into account that not all

noun pairs will allow for any and all of these relations. While the head may govern the

emergence of a given relation, the modifier must fulfill the requirements of the relation as it

pertains to the head. Thus, some compounds are unlikely to make use of highly frequent

relations if they are incompatible with the semantic features of their constituents. It is unlikely,

for instance, that a compound headed by an animate being would involve the topic relation.

Conversely, compounds headed by items with informational content are likely to involve this

relation (i.e. history book, news magazine, horror story, etc.).

Page 287: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

273

This is in fact where the SRI discussed in Chapter 4 proves most useful. By calculating the

distribution of relations for various templates, we may then compare relations and compounds

based on their relative pertinence for a given compound. In other words, compounds involving

two different relations may be equally transparent if these relations are equally pertinent for

their respective heads. Conversely, compounds that make use of the same relation may possess

different degrees of transparency if that relation is more frequent for one than the other. To

illustrate this point, the following table contains 3 different compounds that make use of the

PART relation, each of which differs according to their calculated SRI141:

What the table shows is that a relation may carry more weight for certain compounds than for

others. Although PART is a frequent relation overall, its relative importance varies according to

the head noun. We may hypothesize that compounds that make use of PART alongside the noun

voiture may be more transparent that those involving the noun bloc. This is especially true in

cases where a template clearly favours a particular relation even if that relation is not frequent

across all compound types142. For instance, compounds involving sauce as their head noun (e.g.

sauce tomate, sauce soja, sauce arachide, etc.) make extensive use of the SOURCE relations,

which is evidenced by its high SRI (0.500 over 14 compounds). These numbers contrast with

SOURCE’s overall low frequency (14 occurrences across 729 NN compounds, or approximately

2%). The presence of the SOURCE relation is no doubt governed, to some degree, by the nature

of the head (i.e. a sauce is made from something) and the ability of the modifier to fill its

corresponding source/composition slot. It would therefore be unwise to state that a compound

141

The SRI is calculated by dividing the number of compounds for a given template (i.e. template N-X) that involve the same semantic relation by the total number of compounds for that template. In the table above, the SRI was calculated using compounds listed under the head lexeme’s entry in LPR2010, in Arnaud (2003), as well as in the data originally collected from Wiktionary. The retained templates were selected based on the total number of items from all three sources. 142

Given the low number of templates identified in the Wiktionary data, it must be noted that some compounds may not allow for this sort of analysis. It is presumed that a large dataset is required for the SRI to be truly effective.

Compound Relation # of types of RELATION

# of types of N-X SRI

bateau-pompe PART REVERSED

H that has M as a part

3 13 0.231 bloc-ressort 2 11 0.182 voiture-radio 5 13 0.385

Page 288: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

274

such as sauce tomate has a low semantic transparency rating by virtue of the SOURCE relation’s

low frequency across other compound types. Subsequently, the evaluation of relations according

to their frequency for a given template allows for semantic affinity to be judged without

establishing lexical and schematic representations for all nouns. In other words, if CAUSE

frequently occurs with template N-X, then N and X are clearly compatible in a way that allows

for that relation to emerge. This is not to say that assessing compatibility should be ignored, but

simply that identifying relations for a large set of compounds would arguably reveal similar

information regarding their shared features. This compatibility might be characterized in a

number of ways, such as in the conceptual classes of the compound’s elements (e.g. artefact,

animal, food, etc. cf. Maguire et al. 2010) or the feature sets of their semantic representations

(cf. qualia in Pustejovsky 1995, semantic body in Lieber 2004). Thus, a compound such as

pumpkin-squash might involve a coordinative relation because this is the most likely association

for PLANT-PLANT combinations. Although the matter clearly warrants further research, we may

state here that the degree of semantic transparency observed for subordinate compounds can be

evaluated based on the frequency of the relation within the set of compounds involving similar

lexemes. This evaluation may be achieved by using the SRI equation described in Chapter 4.

7.2.3 Summary

Based on the observations made above, relational information for compounds may be ranked

according to the following hierarchy:

(182) a. ARGUMENT

b. PURPOSE (proper function)

c. adjectival relation

c. COORDINATION, HYPERNYM, FUNCTION (as well as classificatory143 relations)

d. Subordinate Relations ordered according to the compound’s SRI

e. SIMILARITY

f. Idiosyncratic Relations

143

As has already been mentioned, the classificatory relation, as observed in the data, does not seem to involve highly transparent compounds. Although this relation shares much in common with other attributive relations, it falls under the partially compositional node of the hierarchy and therefore does not compete with other relations in terms of transparency. It is included here for the sake of completeness.

Page 289: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

275

One will have noticed that the SIMILARITY relation, which was said to be attributive in nature, is

ranked lower than subordinate relations. The reason for this is that, as was discussed earlier,

SIMILARITY is an underspecified relation: any number of attributes or properties may be at play

in these compounds, which arguably reduces their degree of interpretability. Given the large

number of possible realizations for this relation (e.g. shaped like, the color of, smells like, tastes

like, etc.), I assume that it is in fact less transparent than other basic predicative relations (e.g.

cause, produce, part of, etc.).

It should also be noted that at the lowest point on this list are idiosyncratic relations. These are

associations that do not correspond to any of the relations proposed here. Distinguishing

between a set of basic relations for compounds, which is to say highly recurring and elementary

relations, and those that only occur in a select few compounds allows us to hypothesize that the

latter are less transparent than others because they require that meaning be established using

knowledge not typically observed for other compound types. Therefore, compounds such as

avantage choc, capital-risque and laine renaissance, although strongly endocentric and fully

compositional, may present challenges not necessarily present in compounds that involve

recurring relations if these are taken to be a fundamental component of their word formation.

To conclude, compounds may be ordered according to their degree of semantic transparency,

which is determined based on several factors. Consequently, the proposed hierarchy takes into

account the factors discussed in Chapter 4, namely centricity and compositionality. Moreover,

compounds may, to some degree, be classified using the semantic relations that join their

elements. Some of these relations are more basic than others, according to either the nature of

the relationship (i.e. intersective) or the completeness of their semantics (i.e. argumental).

Subordinate relations, which are defined as relations in which one constituent is dependent of

the other, may be further classified, but only when taking into account the semantic

homogeneity of their templates.

The following section re-examines the compounds extracted from Wiktionary in light of the

typology proposed here.

Page 290: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

276

7.3 The Semantic Transparency of French Compounds

This section focuses on examining attested French compounds according to their place within

the typology outlined in Section 7.2. The emphasis is on offering examples for each of the

transparency profiles discussed above, while expanding on the role of semantic relations at

individual levels of the typology.

7.3.1 Canonical Endocentric Compounds

The vast majority of compounds present in the data are left-headed, endocentric compounds. Of

the 666 NN endocentric compounds identified, 595 (or 81%) have their head in canonical

position144. In the case of N à N endocentrics, all cases are left-headed and are therefore

canonical according to the definition provided in Chapter 4. We may, however, further

distinguish this subset of compounds according to other features, such as compositionality and

relationship type.

7.3.1.1 Strongly Endocentric, Fully Compositional: passage piétons and

boîte à outils

Most of the endocentric compounds in the data are both strongly endocentric, which is to say

that the head may be understood literally, and fully compositional. By fully compositional, I

mean that the non-head element contributes meaning to the whole in a literal sense. According

to the features explored in this work, this particular type represents both the most semantically

transparent and the most frequent instance of compounds. The following table shows how these

features compare across all the compounds retained:

NN N à N

# of compounds 729 319

# of +canon, +strong, +compositional 491 240

% of +canon, +strong, +compositional 67.3% 75.2%

144

Coordinated compounds are considered canonically headed and are therefore included in this count.

Page 291: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

277

Within this class of compound, we may distinguish between the following types, ordered for

transparency according to the hierarchy of semantic relations discussed in Section 7.2.2:

Table 7.3. +canon, +strong, fully compositional compounds ordered according to relations.

NN N à N Relation

groupement phosphate mise à niveau ARGUMENT

passage piétons boîte à outils PURPOSE (PF)

grandeur nature --- adjectival

auteur-compositeur banane plantain circuit tampon

--- --- ---

COORDINATION HYPERNYMY FUNCTION

various subordinate relations

chou-fleur escalier à vis SIMILARITY

médicament conseil logiciel à contribution idiosyncratic

Table 7.3 shows how strongly endocentric, fully compositional compounds might be ranked in

terms of their degree of semantic transparency when taking into account their relational

associations. Intersective compounds not involving adjectival nouns are grouped together—no

attempt is made here to rank these relations. Further research into the matter might provide

additional insight on how these relations might be contrasted in terms of the challenges they

pose at the level of meaning construal. Subordinate compounds are not listed in the table above

and will be discussed in a moment. Compounds based on the SIMILARITY relation are ranked

lower according to the arguments presented in Section 7.2.2.2.

At the very bottom of the list are compounds that make use of idiosyncratic relations, which is

to say relations that cannot be said to be basic or recurrent. Médicament conseil serves as a

rather suitable example of this type of compound: its constituents are connected by the semantic

template H purchased based on M (of W). In the case of logiciel à contribution, the relation is

best described as financially supported by.

Compounds involving subordinate relations are unranked, but may be compared according to

their localized frequency (i.e. within compound templates). As was discussed in Section 7.2.2.5,

Page 292: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

278

when used in conjunction with the retained semantic relations, the SRI can provide an indication

of how frequent a particular relation is for a given compound template. The table below offers

SRI data for NN compounds involved in high frequency templates with the head element

serving as its base. These compounds were selected according to the number types for each

template, so that SRI calculations were based on a similar number of compounds (which range

from 13 to 17 per template)145.

Table 7.4. SRI values for compounds within recurring N-X templates.

Compound Relation SRI of C Average SRI of X-N

sauce tomate SOURCE 0.500 0.327

effet revenu CAUSE (rev) 0.438 0.352

voiture radio PART (rev) 0.385 0.290

voiture salon LOCATION (rev) 0.308 0.290

bateau pompe PART (rev) 0.231 0.195

carte adresse LOCATION (rev) 0.176 0.135

bateau pirate USE (rev) 0.077 0.195

carte senior USE (rev) 0.059 0.135

What the table above shows is that it is not immediately clear whether SRI values are in fact a

useful metric for comparing degrees of transparency. Intuitively, none of the compounds in

Table 7.4 is particularly difficult to understand, nor are they wildly different in their use of

relations. It therefore seems highly doubtful that these numbers should serve, under most

circumstances, as indicators of transparency. That said, in extreme cases such as sauce-N, the

SRI does highlight instances where a particular relation is either highly dominant or marginal,

which can be used to assess transparency. Moreover, when used in conjunction with a

template’s average SRI146, a compound’s SRI value will indicate how it compares to the

template’s set homogeneity. Most of the compounds in Table 7.4 have higher than average SRI

values, which suggests that their relational meaning is dominant across other similar

145

Once again, because the number of recurring templates in the data taken from Wiktionary is quite low, the reported SRI here makes use of other sources, namely Arnaud (2003) and LPR2010. See Appendix A for a list of the compounds used. 146

A template’s SRI is the average of all SRI values for that template’s compounds.

Page 293: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

279

compounds. Only two of the compounds listed here, carte senior and bateau pirate, have below

average SRI values, both of which seem to run counter to these arguments given how intuitively

easy they are to understand.

Setting aside for a moment the fact that these compounds involve the low frequency relation

USE, carte senior’s low SRI may in fact be explained by carte’s polysemous nature: within the

17 carte-N compounds examined, three different heads were identified:

(183) a. petit carton rectangulaire... ex. carte-adresse, carte-senior

b. représentation à échelle réduite... ex. carte météo, carte radar

c. circuit imprimé ex. carte-tuner, carte mère

If the SRI were only calculated using synonymous head nouns, carte-senior would possess a

higher SRI value, which might better correspond with one’s intuitions regarding this compound.

This approach, however, fails to take into account the actual task of disambiguating the head

noun within the compound. While the method adopted here is meant to account for “noise” in

the shared space of a compound’s template, only rigorous testing of these types with speakers is

likely to reveal which of the two approaches is a better indicator of transparency.

As for bateau-pirate, it is clear that USE is not a dominant relation for this template, despite the

head noun’s artefactual nature. According to the data, bateau-N compounds favour FUNCTION

(e.g. bateau-lavoir) and PART (e.g. bateau-pompe) relations, which results in a low SRI for all

other types. Does bateau-pirate possess a lower degree of transparency then? Perhaps not, given

that the constituents share the same semantic space (i.e. pirates are chiefly understood in terms

of ships and open waters). This is in fact where the SRI most obviously fails, as it does not

explicitly account for the relatedness of constituents. In this regard, the concept of template

homogeneity would need to be augmented with additional factors in order to more accurately

model the probability that a particular relation should emerge for a pair of lexical items. One

such method might involve the incorporation of individual constitutents’ conceptual classes, an

approach that Ryder (1994) had in fact explored.

Despite these shortcomings, a comparison of semantic relations across similar compound types

may nevertheless reveal subtle differences within individual configurations or profiles. Whether

Page 294: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

280

these differences ultimately affect a compound’s degree of transparency remains, for the time

being, unclear.

7.3.1.2 Strongly Endocentric, Weakly Compositional: mot-clé and piano

à queue

Weakly compositional compounds are those for which the non-head contributes meaning via a

trope such as metaphor or metonymy. A total of 75 NN and 25 N à N compounds were found to

possess these features, examples of which are as follows:

(184) a. NN cocotte-minute, crème fleurette, laine renaissance, mot-clé, plan cul

b. N à N arbre à cornichons, boîte à pet, nom à penture, piano à queue,

serpent à lunettes

These compounds may involve any of the non-equative relations, but require that meaning be

determined using a non-established trope:

(185) a. laine renaissance → laine SOURCE [metaphor renaissance]

b. arbre à cornichons → arbre PRODUCTION [metaphor cornichons]

Because of this additional layer of meaning, these types are taken to possess a greater degree of

opacity than the fully compositional compounds discussed above.

7.3.1.3 Strongly Endocentric, Partially Compositional: bateau-mouche

An endocentric compound is partially compositional if its modifier does not contribute meaning

to the whole. It is considered partially compositional because the head retains its meaning. In

most cases, there is no basic relation between the compound’s constituents (as in 186a). The

exception to this case are so-called classificatory compounds (as in 186b), which may be

paraphrased as “H of type M”. In total, 17 NN endocentric compounds exhibiting partial

compositionality were found in the data.

(186) a. bateau-mouche, belote contrée, conducteur fantôme, laurier-tin

b. mâle alpha, particule bêta, rayon gamma

Page 295: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

281

No such cases were identified for N à N compounds: if a modifier cannot be understood

literally, it can typically be motivated via sense extension (see Section 7.3.1.2). It’s unclear why

this is the case, but the most obvious explanation is that the preposition imposes a

meaningfulness constraint on the element it introduces, thereby limiting any semantic shift

toward opacity that might occur.

Also treated as partially compositional are compounds involving common nouns that originated

as proper nouns. The modifier is therefore purely referential and does not possess a sense.

Unlike other partially compositional compounds, these cases may involve idiosyncratic relations

issuing from the proper noun’s relationship to the head:

(187) a. rose noisette rose DISCOVERED BY Noisette147

b. valet-à-patin valet INVENTED BY Patin148

In all cases, however, the chief features of the compound are available via the head element, but

with no immediate means to further restrict the set of its designatum. In other words, the listener

may correctly conclude that rose noisette (or bateau-mouche, etc.) is a type of rose, but will be

unable to determine how it may be distinguished from other types because of the modifier’s

referential or inexistent semantic content.

7.3.1.4 Weakly Endocentric: valse-hésitation and poire à poudre

Weakly endocentric compounds are compounds for which the head constituent retains meaning

via a well established trope that typically does not violate the IS-A condition149. As was

discussed in Chapter 4, weak centricity seems to constrain the modifier, in that the latter must

147

“[L]a Rose Noisette, variété à laquelle on a donné ce nom en l’honneur de M. Noisette, qui, le premier, a fait connaître cette charmante fleur” (Jardin de France, Vol. 37, 1846: 319). 148

“On lui a donné le nom de valet, parce qu’il sert de lui même comme serviteur ; de à Patin, du nom de celui à qui on en attribue l’invention” (Dictionaire des sciences médicales, 1821, Volume 56, p. 493). Despite this dictionary’s claim regarding the meaning of valet, which would suggest that it be treated as weakly endocentric, I am treating it as an ordinary endocentric as valet may now mean ‘tool’ (see entry B in TLFi or entry III in LPR2010). This acceptation therefore corresponds to the meaning of valet-à-patin, albeit with a highly ambiguous referent (i.e. the compound does not allow one to determine just what kind of tool it refers to). 149

The test often requires that it be weakened : C is like an H (e.g. une valse-hésitation est comme une valse).

Page 296: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

282

contribute meaning to the whole literally. This observation contrasts with Benzces’s (2006)

findings for English, where tropes may operate simultaneously on both constituents, although

her research focused on compounds traditionally viewed as exocentric.

The following NN and N à N compounds can be considered representative of this type:

(188) a. valse-hésitation ‘suite de décisions, d'actes contradictoires’

b. poire à poudre ‘petite gourde de cuir bouilli [. . .] dans laquelle on mettait de la

poudre de chasse’

Although there are too few such cases to allow for definitive conclusions in terms of the

relations involved, the different types observed suggest that most relations are likely available to

tropic compounds (COMPOSITION and LOCATION in 188a-b respectively). Other examples are as

follows:

(189) a. pomme-cajou, site internet, page web

b. cheval à bascule, échelle à poissons, piège à cons, tête à claques

Although technically endocentric, the compounds identified here no doubt pose a greater

challenge at the level of interpretation than literal endocentrics, but they most likely remain

easier to understand than true exocentrics.

It bears repeating that not all instances of sense extension are equivalent at the cognitive level.

Some metaphors are no doubt easier to decipher than others, and metonymy might be less

challenging than most metaphors. As I stated in Chapter 4, however, these attributes are set

aside here in favour of a narrow typology, where flagging instances of tropes is taken as a

sufficient indicator of reduced transparency. That said, nothing about the approach adopted here

prevents future models from further discriminating between compounds whenever sense

extension is present.

7.3.2 Non-Canonical Endocentric Compounds

Only NN compounds may involve non-canonical heads, which is to say compounds for which

the head element occupies the least frequent position for a given compound type. As I showed in

Chapter 4, French nominal compounds are typically left-headed, but may occasionally be right-

Page 297: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

283

headed. Under the typology proposed here, non-canonical heads are said to reduce semantic

transparency and are considered a major axis of the hierarchy. The data reveal that, like

canonically headed compounds, right-headed compounds may include several configurations

based on centricity and compositionality.

7.3.2.1 Strongly Endocentric, Fully Compositional: bracelet-montre

Most right-headed compounds are strongly endocentric and fully compositional, which is

unsurprising given that canonically headed compounds favour this configuration as well. Of the

71 right-headed NN compounds present in the data, 62 (or 86%) fall under this category.

Interestingly, most of the relations from Chapter 5 are accounted for, suggesting that head

position does not impose any restrictions on what kind of associations are available to

compounds with non-canonical heads. The following table contains examples for each of the

relations present in the data:

Table 7.5. Strongly endocentric, fully compositional NN compounds, ordered by relation.

NN Relation

photo-interprétation ARGUMENT

auto-école PURPOSE (PF)

maître-cylindre adjectival

lieutenant-général colloid-calcite test match

COORDINATION HYPERNYMY FUNCTION

bracelet-montre fan-club panier-repas art-thérapie ciné-club agora-phobie

PART COMPOSITION LOCATION USE TOPIC CAUSE

taupe-grillon SIMILARITY

clin-foc idiosyncratic

Although subordinate relations such as PRODUCTION and TIME were not present in the data, it is

likely that a larger dataset would contain compounds that make use of these associations. It

should also be noted that the subordinate relations in the table above are unranked as few of

Page 298: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

284

these compounds are involved in templates frequent enough to calculate an SRI. Nevertheless,

the listed compounds represent the most transparent of non-canonically headed NN compounds,

with a possible differentiation based on relation type.

7.3.2.2 Strongly Endocentric, Weakly Compositional: reine-marguerite

Very few of the right-headed compounds in the data involve a trope on the modifier, but some

cases were nevertheless identified. The compound reine-marguerite is one such case, where

reine is related to the flower’s extreme beauty150. Another instance is mule-jenny, borrowed

from English and for which the modifier mule is taken to mean ‘hybrid151.’ In either case, this

type of compound will most likely prove more difficult to fully understand than those listed in

section 7.3.2.1 given that the modifier’s semantic contribution is weakened by the presence of a

metaphor.

7.3.2.3 Strongly Endocentric, Partially Compositional: aube-vigne

Partially compositional right-headed compounds contain modifiers that, for one of several

reasons, do not contribute meaning to the whole. In the case of aube-vigne, for instance, its

etymology points to a deformation of its Latin origins (see Note 109109 in Chapter 6 for details

on this compound). Only 8 such cases were present in the data. As the following examples

show, some of these types of compounds are in fact English loanwords that have been adapted

to French:

(190) boule-dogue, lime uranite, quartier-maître

This particular type of compound can therefore be explained using etymological facts: they are

right-headed because they are originally from a language with right-headed compounds (i.e.

Latin and English) and their modifiers do not contribute meaning because they are deformations

150

“[. . .] ils convinrent de lui donner le nom reine-marguerite, en considération de sa beauté et de sa ressemblance avec nos marguerites” (Tessier et al. 1787: 710) 151

“Empr. à l’angl. mule-jenny (1792 ds NED) comp. de jenny* et de mule, issu du fr. mule*, employé au sens de « hybride » pour désigner une machine combinant des systèmes empruntés à deux types de machines différents” (TLFi).

Page 299: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

285

or calques of foreign words (e.g. eng. bull → boule, eng. quarter → quartier). These compounds

are presumed to possess a greater degree of opacity than those discussed above given that the

only component that contributes meaning is a non-canonical head.

7.3.2.4 Weakly Endocentric: vidéo-lynchage

Right-headed compounds that might be considered weakly endocentric, yet fully compositional,

are not numerous. In fact, only vidéo-lynchage seems to fit this particular set of criteria, which is

defined as follows:

(191) Activité consistant à filmer une personne à son insu au moyen d'un téléphone cellulaire

pendant qu'on la bat pour ensuite diffuser par Internet la vidéo ainsi obtenue152.

We may hypothesize that compounds with non-canonical heads are not likely to involve tropes

on the modifying element as they are already difficult to process. In the case of vidéo-lynchage,

the preposed modifier is not quite as troublesome because of its prefix-like functionality, which

allows for the correct morphological parsing to occur. One must nevertheless make sense of the

metaphoric head, a task that will influence how the compound is interpreted if no contextual

support is provided.

7.3.3 Exocentric Compounds

In terms of semantic transparency, exocentric compounds may be opposed to endocentric

compounds by an inadequate denotation of the head: while an endocentric compound denotes a

hyponym of its the head, no such relation may be established for exocentrics. Based on this

distinction, we may state that exocentric compounds are inherently more opaque constructions

than their endocentric counterparts. This does not mean, however, that all exocentric compounds

should be viewed as equally opaque. Compositionality, for instance, remains a factor for many

of these types of compounds, which also means that semantic relations may be present in

exocentric compounds. In the following sections, we look at each of these cases in turn.

152

<http://fr.wiktionary.org/wiki/vidéo-lynchage>

Page 300: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

286

7.3.3.1 Fully Compositional Exocentric: ballon-panier and pied à boule

Even if a compound is exocentric, its constituents may still contribute meaning to that of the

whole. In this sense, they are fully compositional. Although several of these compounds involve

idiosyncratic relations that are not easily paraphrased using basic predicates, a number of them

do make use of some of the relations discussed previously. Table 7.6 contains examples of the

relations observed for exocentric NN compounds:

Table 7.6. Exocentric NN compounds involving basic semantic relations.

Compound Hypernym Relation

mort-chien plant ARGUMENT

jambon-beurre sandwich COORDINATION

poisson-évêque mythical creature SIMILARITY

ballon-panier game LOCATION/PURPOSE

chèvre-pied mythical creature PART

chiffre-taxe stamp/ticket TOPIC

lac-laque residue COMPOSITION

The compounds listed above represent examples of the typology’s most semantically transparent

of exocentrics: although they may not possess semantic heads, their constituents all provide

meaning in non-trivial ways. It should be noted, however, that not all relations retained were

observed for exocentric compounds. Given the small number of basic relations identified for this

class (i.e. 19/37 for NN, 12/20 for N à N153), I will refrain from making any claims with regards

to either possible or impossible associations. A larger dataset might offer more conclusive

evidence with regards to semantic relations within exocentric compounds, but we may

nevertheless presume that exocentrics do not block the use of basic relations.

A few exocentric and compositional compounds make use of idiosyncratic relations to connect

their constituents. The compound année-lumière might be used to represent not only this class

of compound, but also a specific subtype involving various units of measurement:

153

The numbers reported here are only for exocentric compounds for which both constituents retain their literal meaning.

Page 301: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

287

(192) électron-volt, kilogramme-force, kilomètre-heure, tonne-mètre

Although one might be tempted to state that the relation for such compounds is one used to

express a rate (i.e. kilomètre par heure), this approach is not always correct (e.g. *électron par

volt). Other exocentric compounds that remain fully compositional and make use of non

standard relations, while not numerous, are present in the data. Examples are as follows:

(193) a. cheval-vapeur ‘unité de mesure de la force d’une machine à vapeur

selon la force excercée par un cheval154’

b. face à main ‘binocle à manche que l’on tient à la main155’

Although exocentric N à N compounds may also be fully compositional and make use of

various basic relations, these relations are not as well represented. This may be partly related to

the fact that, in general, N à N compounds involve fewer basic relations than their NN

counterparts. Only ARGUMENT and LOCATION are present for compositional exocentric N à N

compounds:

Table 7.7. Exocentric N à N compounds involving basic semantic relations.

Compound Hypernym Relation

mise à pied termination (employment) ARGUMENT

pied à boule warning (bowling) LOCATION

Because of the low number of tokens for either type of compound, relations are not ranked

based on SRI values. Although they are treated together here, this is not to say that the SRI

cannot be used to differentiate between exocentric compounds. As was already discussed in

Chapter 4, compounds patterned on the template N-lumière, where N is a period of time, are all

analogically based on année-lumière and therefore share the same, albeit idiosyncratic, relation.

154

This is an approximate paraphrase that does not take into account how this unit is actually calculated : “La force d’un cheval vapeur équivaut à 75kil. élevés à la hauteur d’un mètre par seconde, mais la force réelle d’un cheval vivant ne représente pas plus de 50 kil. élevés à la même hauteur pendant le même espace de temps” (Dictionnaire du commerce et des marchandises, tome 2, 1839, p. 1401). 155

LPR2010.

Page 302: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

288

It is likely that a speaker familiar with the base of the analogy will find any of the other forms

easy to interpret, which might therefore be predicted by its higher than average SRI156.

7.3.3.2 Weakly Compositional: radio-trottoir and cage à écureuil

An exocentric compound may also be weakly compositional if one or more of its constituents is

subject to sense extension. These compounds may manifest themselves in several ways,

depending on which element undergoes a tropic shift:

(194) a. trope on X, Y is literal bec-figue / moulin à parole

b. X is literal, trope on Y ---

c. trope on X, trope on Y radio-trottoir / cage à écureuil

One should recall that compounds that involve sense extension on the head element are only

considered (weakly) endocentric if the trope is established (i.e. listed as such in a lexicographic

work). Otherwise, the compound is understood as exocentric. Interestingly, no compounds were

observed in the data for which only the right-most element is subject to a trope. This may be

related to observations made in Chapter 4 regarding endocentrics and figurative non-heads, but

it is unclear how this could be formalized to also account for exocentrics. Again, a larger dataset

might provide additional information with which to offer a hypothesis regarding the relationship

between centricity and sense extension on the modifier. We may state, however, that the

compounds in (194a) (and 194b, as the case may be) are more transparent than those in (194c)

as the former only involve a single trope.

One should note that combinations in (194c) are instances for which metaphor (or metonymy)

applies to the individual elements and not the whole. In radio-trottoir (‘réseau de

communication personnel’), for instance, the metaphors involved target the individual

components: radio for communication, and trottoir for the means of transmission. The same

may be said of cage à écureuil (‘construction pour enfant’), where cage stands for the structure

based on physical resemblance and écureuil for children based on their actions and behaviour

156

Only six N-lumière constructions are present in the data, all of which mean ‘distance parcouru par la lumière en un(e) N’. Arnaud (2003) contains one additional compound sharing this pattern (prise lumière), but which differs in meaning, suggesting a near absolute SRI value for this template.

Page 303: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

289

within said structure. This contrasts with instances where the metaphor applies to the whole

compound, which is to say combinations that contain individual lexemes understood literally,

but that together consist of a metaphor. The following examples represent such cases:

(195) pot à tabac (‘personne grosse et courte’), barbe à papa (‘confiserie’),

nid à rats (‘logement obscur et malpropre’)

Pot à tabac, for instance, in its non-literal use, refers to a short and portly individual, an allusion

to the general shape of the object that the compound denotes literally (‘pot destiné à contenir du

tabac’). The individual constituents, however, are not the source of meaning. These cases may

all be paraphrased as “an N that is like a C.” Although we might wish to treat these as non-

compositional compounds, which would highlight the fact the individual elements do not

contribute directly to the meaning to the whole, such an approach would fail to recognize that

these compounds are, in Svensson’s (2008) terms, “motivatable.” The meaning of the whole is

in fact related to its parts, albeit in a roundabout manner as the compound must be understood

figuratively. They are therefore treated here as weakly compositional based on the fact that they

can be motivated.

NN compounds that are exocentric on the one hand and rely on tropes on the other are in fact

uncommon in the data (7 items). Conversely, N à N compounds with these features, while not

very frequent, are much easier to pick out (13 items):

(196) abreuvoir à mouche (‘plaie’), moulin à paroles (‘personne bavarde’),

chair à canon (‘militaire en première ligne’), bouche à feu (‘canon’)

The compounds described above most likely possess a similar transparency profile to the

compositional exocentrics from the previous section. Like those, weakly compositional

exocentrics are motivated constructions. The chief difference, however, is that these cases make

use of sense extension in their semantic representations, which adds an additional layer of

complexity to the relationship between form and meaning. They are therefore considered

slightly less transparent than literal exocentrics, but more transparent than the two types that

follow.

Page 304: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

290

7.3.3.3 Partially Compositional: chat-château and soda à pâte

Exocentric compounds are partially compositional if only one of their constituting elements

retains meaning. Although the most likely candidate for meaning retention is the right-most

constituent (as in 197a), the head may occasionally contribute to the meaning of the whole

(197b):

(197) a. or-sol, trou-madame Y unrelated

b. chat-château, bourg-épine X unrelated

Partially compositional compounds such as these do not typically involve any relation between

their elements. This is entirely expected as there is no way to bind a semantically irrelevant

constituent to a semantically relevant one.

Only one such case is found for N à N compounds, soda à pate, which is a calque of the English

baking soda. While this compound is in fact endocentric in English, the head soda should be

translated in French as soude. The borrowed term is therefore technically meaningless in

French, which might explain why the Office québécois de la langue française argues against its

usage157.

7.3.3.4 Non-Compositional: cap-mouton and sagne à tamis

Non-compositional exocentric compounds represent the most opaque type of compound. These

are typically instances of lexicalized combinations that can no longer be motivated on

synchronic grounds. Most of these compounds entered the lexicon long ago—any relationship

between the constituents and the meaning of the whole is no longer apparent without specific

knowledge of their etymological or historical origins. Examples are as follows:

(198) a. cap-mouton, chef-mois, compère-loriot, coq-souris, mont-joie

b. fauteuil à voile, manche-à-balle, sagne à tamis, ventre à choux

157

“Le terme soda à pâte, calque de l'anglais baking soda, est un emprunt entrant inutilement en concurrence avec les termes français existants” (OQLF, <http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=8871488>).

Page 305: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

291

It is highly unlikely that a speaker, knowing only the meaning of the constituting elements,

would be able to interpret these compounds even if contextual information were provided. From

a purely semantic point of view, these compounds have more in common with simplex words

than they do with other compounds as their constituents provide no means with which to

accurately establish meaning for the whole. The relationship between form and meaning in these

instances is therefore non-existent.

7.4 Summary

In this chapter, I sought to synthesize the compound properties discussed in previous chapters,

focussing on how these features might combine into a more granular typology of semantic

transparency. The examination of the data from the perspective of this typology revealed a

diverse, if not disproportional set of compounds. Table 7.8 on the following page summarizes

the typology discussed in this chapter and contains example compounds for each combination of

factors. The contents of the table are ordered in descending order of transparency, which is to

say that the compounds at the top are more transparent than those at the bottom.

Given the limited number of compound templates present in the data, further subdivision based

on semantic relations have been omitted from the table. The reader is encouraged to consult

Section 7.3.1.1 for a discussion on how these relations apply to the largest class in the typology.

Based on the features and properties discussed, French compounds are accounted for in at least

twelve of the sixteen semantic transparency profiles proposed in Section 7.2. While not every

configuration proved relevant for the data examined, one should be careful not to take these

observations as undeniable proof that the twelve attested profiles represent the limits of

transparency classification. A study of additional French compounds, compound types (e.g. AN,

NA, N de N, etc.) or of compounds in other languages may very well provide evidence in

support of unattested configurations.

Page 306: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

292

Table 7.8. Summary of compounds and their features, ordered by transparency.

Endo. Canonical Head

Strong Centricity Compositionality Compounds

+ + + Full passage piétons boîte à outils

+ + + Weak mot-clé piano à queue

+ + + Partial bateau-mouche ---

+ + − Full valse-hésitation poire à poudre

+ − + Full bracelet-montre ---

+ − + Weak reine-marguerite ---

+ − + Partial aube-vigne ---

+ − − Full vidéo-lynchage ---

− NA NA Full ballon-panier pied à boule

− NA NA Weak radio-trottoir cage à écureuil

− NA NA Partial chat-château soda à pâte

− NA NA Non cap-mouton sagne à tamis

It should also be noted that caution must be taken when comparing transparency across

compound types. In other words, there is no definite indication that we may objectively evaluate

the transparency of NN compounds alongside N à N compounds. While both types were

assessed and classified according to the typology, it is possible that one type shows a greater

overall degree of transparency than the other. We might for instance hypothesize that, all things

being equal, N à N compounds are more transparent than other types based on the fact that the

preposition was shown to be meaningful, which restricts both the possible relational associations

that may emerge and head position configurations. Even this line of reasoning, however, would

need to factor in distributional differences for relational categories (i.e. intersective relations are

not applicable for N à N compounds, but highly relevant for NN compounds), which makes

cross-type comparisons all the more difficult.

Page 307: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

293

In summary, the proposed typology offers a richer and more granular approach to the concept of

semantic transparency. The features discussed, when combined, arguably better reflect the

numerous characteristics that determine a compound’s semantic representation from the

perspective of interpretation.

Page 308: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

294

Chapter 8

Conclusion

The explicit aim of this thesis was to re-examine the concept of semantic transparency from the

perspective of compounds, the purpose of which was to propose a typology that offered both a

comprehensive and granular approach to the concept. To this end, a data set of French nominal

compounds was analyzed and four particular semantic features were examined, namely

headedness, compositionality, semantic homogeneity, and unexpressed relations.

8.1 Contributions of the Thesis

Semantic transparency, both as a theoretical concept and as a lexical property, has traditionally

been understood narrowly as a direct function of the relationship between a complex unit’s

meaning and its constituents. In other words, transparency is usually equated with

compositionality, the result of which is that a compound is considered transparent if it is

compositional and opaque if it is not. My review of the literature on the topic in Chapter 2

showed that, although this description is in fact prevalent among researchers, there have been

efforts to move away from this narrow view of the concept toward a more multi-faceted

approach. In the course of this discussion, I argued that transparency should be understood

holistically as a property involving several interrelated features (one of which is

compositionality) and that any attempts to formalize the concept should seek to incorporate

these features into its model. It is my contention that such an approach offers not only a richer

conceptual space within which to discuss the semantics of compounds, but also a more effective

means to distinguish between them from the perspective of meaning construal.

The typology of semantic transparency proposed in Chapter 7 is the result of the analysis of over

1,000 attested French compounds (729 NN and 319 N à N) collected from Wiktionary. The

decision to focus on these types of constructions was in part based on the fact that French

compounds have received limited attention with regards to transparency. Moreover, positional

Page 309: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

295

constraints on the head constituent are typically looser in French than they are in English, not to

mention that French binomial constructions may make use of linking units (i.e. N à N). It should

be noted, however, that none of the features examined are in fact language dependent, but

simply that French compounding offers a greater diversity of material with which to discuss

transparency.

Overall, my typology consists of two major parameters, each of which is actualized along a

number of other properties. The first is headedness (or centricity), which is used to distinguish

between endocentric and exocentric compounds. This distinction is deemed crucial to the

evaluation of a compound’s semantic transparency given that the head is the principal classifier

for the item: an endocentric compound provides speakers with the key features necessary to

establish its broad denotation. Furthermore, head position is also considered significant

whenever variation may occur. In French, for instance, NN compounds are typically left headed,

but they may nevertheless be right-headed under certain conditions (e.g. auto-école, taupe-

grillon). This fact is accounted for by incorporating the notion of canonical head into the

typology, which assumes that non-standard heads pose a greater challenge to speakers

attempting to establish meaning. This feature is parameterized according to language and

compound type (e.g. canonical head position for English NN compounds is to the right, while in

French it is to the left; nominal AN compounds in both of these languages are typically right-

headed). Finally, compounds for which the head undergoes an established sense extension are

considered weakly endocentric (e.g. metaphor in valse-hésitation) and are therefore less

transparent than literal (or strong) endocentrics, but more transparent than exocentrics.

The second major component of the typology is compositionality, which is determined by the

semantic contribution of a compound’s constituents. Endocentric compounds are at a minimum

partially compositional as the head necessarily contributes meaning to the whole; exocentric

compounds, on the other hand, may be completely non-compositional. Furthermore, just as

tropes may affect centricity, so too may they affect compositionality: when the non-head

constituent contributes meaning via an established trope, it is considered weakly compositional

(e.g. metaphor in mot-clé). Both headedness and compositionality were the focus of Chapter 4.

To my knowledge, the approach adopted in this work for both of these features offers new

insight into compound transparency. By proposing weak variants for endocentricity and

Page 310: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

296

compositionality, we are able to highlight instances of compounds that can be motivated on non-

literal grounds without treating them as opaque.

Also explored in Chapter 4 is a factor I called semantic homogeneity, which is the measure of

how semantically related similar compounds are. Compounds, like other multi-lexeme

constructions, may be reduced to patterns based on a single shared constituent (e.g. papier-N or

pompe à N, where the common element is the head, and N-lumière or N à vapeur, where the

shared constituent is instead the modifier). By considering all compounds that fit a particular

template, we may determine how semantically homogeneous they are by dividing the number of

compounds that share the same relational association by the total number of compounds for that

template. The result of this equation was labeled the semantic reliability index (or SRI).

Compounds with a high SRI value indicate that they pattern semantically with other similar

types, which might, on the one hand, indicate a strong compatibility between their constituents,

and on the other, have a facilitatory effect on its interpretation if the speaker relies on analogy

during processing. It was argued, however, that this approach requires a large number of types

for the SRI to be truly meaningful and that it is likely to be most useful when trying to evaluate

transparency for compounds that otherwise share the same transparency features.

The fourth major component of this work involves the unexpressed relations observed between

a compound’s constituents. A survey of the French compounds retained revealed that of the 15

most prominent relations cited in the literature, a mere 10 sufficed to account for more than 75%

of the NN French constructions examined. For N à N compounds, the number of relevant

relations was even lower as the preposition was shown to impose significant constraints on what

type of association may be instantiated between constituents: 5 of the basic relations retained

account for 78% of all N à N compounds examined.

The typology proposed in the previous chapter incorporates all of these features into a hierarchy

consisting of 16 possible configurations based on headedness and compositionality. Of these

possible configurations, only 12 were found to be relevant in French. Compounds were

subsequently ordered according to these features, the results of which are reproduced in the

following table:

Page 311: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

297

Table 8.1. Transparency configurations, from most to least transparent.

Endo. Canonical Head

Strong Centricity Compositionality Compounds

+ + + Full passage piétons boîte à outils

+ + + Weak mot-clé piano à queue

+ + + Partial bateau-mouche ---

+ + − Full valse-hésitation poire à poudre

+ − + Full bracelet-montre ---

+ − + Weak reine-marguerite ---

+ − + Partial aube-vigne ---

+ − − Full vidéo-lynchage ---

− NA NA Full ballon-panier pied à boule

− NA NA Weak radio-trottoir cage à écureuil

− NA NA Partial chat-château soda à pâte

− NA NA Non cap-mouton sagne à tamis

Chapter 7 also describes how semantic relations may be ranked according to transparency,

where synthetic compounds, along with those involving a purposive relation based on a

constituent’s proper function, are considered the most transparent because they are relationally

“complete”. The order of all relations, grouped according to their shared properties, is given as

follows:

(199) ARGUMENT > PURPOSE (PF) > adjectival > intersective > subordinate > SIMILARITY > idiosyncratic

It was suggested that subordinate compounds with identical transparency profiles, which

account for the largest class of relational compound, could be differentiated using the SRI

described in Chapter 4.

The work presented in this thesis is based on the premise that a compound’s meaning—and thus

its semantic transparency—is not a simple function of its parts. Rather, a compound’s meaning

Page 312: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

298

construal relies on a number of factors, all of which may influence how easily it may be

understood. This work, while theoretical, represents a first step toward a formalization of

transparency that takes into account the multi-faceted nature of compounds. That said, the role

of compositionality in a compound’s overall transparency cannot be ignored: according to the

analysis conducted in Chapter 7, approximately 80% of NN and 81% of N à N compounds are

fully compositional in the sense adopted in this work (i.e. both A and B contribute meaning to

the whole literally). These findings reveal two crucial points about the semantics of compounds.

First, compounding is evidently a productive process, in the qualitative sense of the term (what

Bauer 2001 calls availability158): their formation is governed by traditional generative

operations that produce semantically compositional constructions. In this regard, these findings

might be said to lend support to a treatment of compounding within syntax, where fully

compositional compounds are generated according to the same rules that produce syntactic

phrases (similar to Lees 1960 or Levi 1978; see also Fabb 1998, Di Sciullo & Williams 1987).

After all, unlike morphological derivation, compounding is not a closed system: any two words

may feasibly be combined to form a compound, an analogous principle to phrase generation. At

the level of processing, non-compositional compounds might therefore be stored as single units

and accessed as such, whereas compositional compounds might undergo decomposition as any

other phrase might. This approach, however, only paints part of the picture as it must still

contend with the fact that relational information is in most instances implicit and that many

compounds, despite their compositionality, do not always provide a means for these relations to

emerge (i.e. they are not fully predictable given the nature of the components, e.g. chêne

kermès). This is all the more pertinent given that the majority of compounds examined can be

accounted for using a relatively small set of basic relations, which suggests that there are in fact

constraints on compound formation that are not easily accounted for using compositionality

alone. This is in part why I argued in Chapter 2 that compositionality does not strictly imply

transparency, but that transparency most certainly entails compositionality (see Section 2.4.1 for

a discussion of this relationship).

158

“The availability of a morphological process is its potential for repetitive rule-governed morphological coining, either in a general or in a particular well-defined environment or domain” (Bauer: 2011). Also, see Schröder (2011) for a recent discussion of morphological productivity.

Page 313: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

299

Second, the prominence of compositionality within the data also shows why expanding

transparency to include other factors is a worthwhile endeavour. If transparency is simply

another word for compositionality, then 80% of compounds are transparent, yet a close

examination of the data reveals that many of these fully compositional compounds are

semantically distinct on other grounds: compounds such as café-filtre and jambon-beurre differ

in terms of centricity, while head position distinguishes auto-mitrailleuse from auto-école.

There is thus sufficient evidence to support an expanded view of transparency, one that takes

into account compositionality while also making use of other features to further distinguish

between otherwise similar compounds. The features explored in this thesis allow for this more

granular model of the concept.

Even if the proposed typology were found to be incorrect, it is unlikely that it would result in a

significant reduction of features. As I argued in Chapter 7, the semantic transparency of

compounds involves, at a minimum, headedness, compositionality, and unexpressed relations;

semantic homogeneity measures are meant to further augment these three basic factors. Any

issues with the typology are most likely to be related to how these features are weighted, which

could be adjusted following testing with speakers (this will be touched upon briefly in Section

8.4.5).

Apart from the proposed typology, the work conducted here also provides a great deal of

practical information with which to pursue future research on French compounds. For instance,

the data collected on headedness and other properties may be used to further explore, among

other things, the effects of mismatched morphological features (i.e. gender percolation).

Furthermore, the work on semantic relations offers empirical support to the notion that

compounds typically make use of a small set of recurring associations. Additionally, it was

shown that not only does the preposition à greatly restrict what relations may emerge for a given

pair of nouns, it also affects the directionality of said relations: while many NN compounds

allow for relations to be reversed, N à N constructions do not possess the same degree of

flexibility. Subsequent work on N de N constructions could offer additional insight on the

relationship between prepositions and compound relations. These findings could ultimately

contribute to future morphological and semantic research on compounding, as well as inform

other fields such as psycholinguistics and computational linguistics, where the focus is on

Page 314: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

300

gaining a better understanding of meaning composition by looking more closely at how

compounds are both stored and processed.

8.2 Remarks on the Wiktionary Data

In chapter 3, I discussed some of the issues related to the use of Wiktionary as the source of

compounds for this work, namely that its openness and its lack of lexicographic rigour might

introduce questionable items into the data. While it is true that some infrequent or region

specific compounds were present in the final dataset (e.g. manche à balle, fauteuil à voile,

radio-trottoir), the vast majority of the entries retained are also accounted for in traditional

dictionaries, either LPR2010 or TLFi. Although I did not cross-reference every single entry

because many of the compounds under investigation were known to me as a native speaker (e.g.

moulin à vent, barbe à papa, café-crème), those that required a closer examination were either

listed entries in those dictionaries, or could be found listed within the entries of the head word

(e.g. papier-bible, for instance, is not listed separately in LPR2010, but it can be found within

the entry for the lemma papier). In instances where no reference could be found for a particular

compound, a search in specialized dictionaries or older texts usually provided sufficient

information regarding both its usage and its origins (see Chapter 6 and 7 for examples of such

cases).

The question, of course, is whether some of the conclusions advanced in this work are in fact

based on a representative sample of French compounds. To answer this question would arguably

require that another dataset be compiled and that the same work be conducted with those

compounds. The methodological choice made at the beginning of this work, which is to say to

rely solely on Wiktionary as the source of data, no doubt introduced uncommon constructions

into the study, but as long as these compounds could be said to exist (i.e. attested elsewhere),

then they are legitimate items for the investigation of transparency as they must still be

interpreted by speakers.

8.3 Polylexical.com

As was mentioned in Chapters 1 and 3, the nominal compounds extracted from Wiktionary have

been tagged and made available to other researchers at www.polylexical.com through a database

Page 315: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

301

and search interface I created over the course of this project. Although all 10,000 nominal

constructions are included in the database, only NN and N à N compounds are fully labeled. All

other constructions (AN, NA, VN, N de N, etc.) are nevertheless labeled with their lexical

categories, gender, and number. The following figure shows the search interface available to

users.

Figure 8.1. Search interface for Polylexical.com.

Users may search for any string using the input in section (1). The basic search function will

return all matches, regardless of position. For instance, a search for the string “table” will return

table d’hôte and sel de table, as well as partial matches like expert-comptable and étable à

pourceaux. For more precise results, users may use the advanced search function in (2), which

allows for specific positions to be targeted, including exact string matches (i.e. so that a search

for “table” does not return expert-comptable). The advanced search function also makes use of a

number of parameters to further restrict results. These parameters, shown in (3), allow users to

conduct searches according to constituents’ part of speech and gender, as well as the

compound’s linking unit (e.g. preposition, hyphen, determiner, etc.), head, gender, number, and

Page 316: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

302

semantic relation. The very last parameters are based on the semantic reliability index discussed

in Chapter 4. Users may search for recurring templates for either the leftward or rightward

constituent, while also performing queries based on a compound’s SRI value. The following

screenshot shows the results of a search for N à N compounds involving the USE relation and

having no fewer than four occurrences of N1.

Figure 8.2. Sample of search results from Polylexical.com.

The entire dataset can also be downloaded as a comma separated value (CSV) file, which will

allow other researchers to further manipulate and label the compounds according to the needs of

their own projects.

8.4 Future Perspectives

Although the features and factors discussed in this work were meant to extend previous models

of compound transparency without introducing a large number of variables, there are

nevertheless other aspects of compounding that merit exploration. Some of these additions

involve refining existing semantic properties, namely the treatment of tropes, while others

consist of incorporating new features into the typology, such as the conceptual classes of a

Page 317: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

303

compound’s constituents. Not all additions, however, are semantic in nature. There has also

been a great deal of research on more quantitative factors in compound processing, such as in

overall and relative frequency, as well as in morphological family sizes. What follows are brief

descriptions of how future work on transparency could benefit from these areas of research.

8.4.1 Sense Extension

Although sense extension was incorporated into the typology proposed in this work, its

implementation is somewhat rudimentary. To wit, when a trope is present in either constituent,

the compound is classified as weak regardless of the trope’s type (metaphor or metonymy) or its

relative complexity. As was briefly touched upon in Chapter 4, however, tropes may operate

together at several different levels. According to Benczes’s (2006, 2010) study of exocentric

compounds in English, metonymy and metaphor may be present individually or together and

may combine into several different configurations. Thus, the modifier may involve a metonymy,

while the head makes use of a metaphor (e.g. firedog ‘iron support for burning logs in a

fireplace’), or both constituents might involve a metaphor (e.g. flame sandwich ‘note consisting

of a negative comment between two positive ones’). Benczes also shows that these tropes may

simultaneously apply to the relation held between constituents and the constituents themselves.

For instance, the compound bell-bottoms (‘trousers that are very wide at the bottom of the leg’)

involves a relational metaphor involving shape (i.e. bottoms shaped like a bell), and a

metonymy on the head constituent (i.e. bottoms for pants → PART FOR WHOLE). The fact that

several tropic configurations are possible suggests that compounds involving metonymy and

metaphor may present greater variation in their transparency, which could then be reflected at

several different levels of the typology.

Furthermore, sense extension itself may emerge in a variety of ways. In fact, the terms metaphor

and metonymy cover a wide range of cognitive associations, many of which present different

types of processing challenges. Radden and Kövecses (1999), in their study of metonymy, show

that the concept may be actualized in several different ways, including (but not limited to)

instances of MANNER FOR ACTION, CAUSE FOR EFFECT, and CONTAINER FOR CONTENTS. By

combining cognitive and communicative principles, Radden and Kövecses offer a hierarchy of

metonymic vehicles, which they state as follows: form > form-concept > concept > reality.

According to their framework, forms are prefered over concepts because human experience

Page 318: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

304

better relates to concrete objects than it does to abstract concepts; concepts, on the other hand,

rank higher than reality because a person’s experience is necessarily subjective, which may not

always coincide with reality. Form-concepts, which are understood as signs (i.e. a word used to

refer to a real thing or event) occupy an intermediate space between these two realms. What

Radden and Kövecses show is that metonymy is a multi-layered phenomenon and that its

presence in language is a complex matter with clear consequences on how speakers both use and

process language. The same may be said of metaphor (Kövecses 2002).

The manner in which tropes were incorporated into the typology of semantic transparency

proposed in this work, while clearly a reasonable first step, undoubtedly represents only a partial

formalization of their role in the interpretation of compounds.

8.4.2 Conceptual Classes

When speakers are presented with a pair of nouns, they must establish how these items relate to

each other. Psycholinguistic research on concept combination is well-established and has shown

that speakers make use of the classes of the consituents involved (animal, plant, artefact, etc.;

Wisniewski and Love 1998), as well as the frequency of a particular relation for a given lexical

item (Gagné and Shoben 1997). Downing (1977) had already observed that the relation held

between a compound’s constituents was related to the semantic classes of its constituents. Thus,

compounds involving animals favour relations targetting appearance and habitat, while those

containing natural objects favour composition, origin, and location.

Recent work by Maguire et al. (2010) on a large corpus of English compounds used statistical

data to assess both semantic similarity of combined items and the frequency with which certain

concepts co-occur. In their first analysis, they found a strong correlation between the semantic

content of the consituents and their use in combinations. More specifically, the more

semantically similar two nouns were, the more likely they were to combine. This is explained

according to slot-filling theories, where the modifier fills a slot in the head; conceptually similar

nouns are more likely to allow for this operation to occur. Maguire et al. also conducted a

second test that looked at the conceptual classes of the constituents. They found that some

combinations were highly recurrent: the most frequent compounds are plant-plant combinations

(54%), followed by substance-substance (24%) and location-group (13%). More importantly,

Page 319: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

305

they found a strong correlation between pairs of categories and the type of relation they

instantiated. For instance, a random sample of 100 substance-artefact combinations showed that

they involved a composition relation in 68% of cases; the same sampling for area-animal

combinations revealed an almost exclusive use of a locative relation (91% of cases).

When discussing the semantic reliability index in Chapter 4, I mentioned that relying on

templates based on a particular lexeme (i.e. papier-N, pompe à N, etc.) may only paint part of

the picture, that templates might benefit from using semantic classes or categories instead of

specific lexemes. Ryder’s (1994) own work used both methods and showed that classes, when

sufficiently constrained, correlated with a speaker’s interpretation of novel compounds (i.e.

container + N). In Chapter 7, I suggested that one of the benefits of the SRI is that it implicitly

reveals the compatibility between not only individual constituents, but also their classes: the

more frequently a particular template makes use of a relation, the more likely it is that

constituents will share features or be compatible via other means. Given these observations,

along with the findings discussed above, it seems entirely justified to incorporate conceptual

classes into a model of semantic transparency. At present, we may assume that compounds

involving items of either similar or frequent conceptual classes possess a greater degree of

semantic transparency than those involving otherwise incompatible constituents. Future work

would require that these compatibilities be quantified and that any language specific properties

be established.

8.4.3 Frequency

Little was said here about lexical frequency as a factor of transparency. The principal reason for

this omission is that frequency is not a semantic property, nor is it a property unique to

compounds (see Ford et al. 2010 or Hay 2003 for frequency effects in derivational morphology).

That said, substantial research has been conducted on the effect of item frequency on compound

processing, a factor some might wish to see included in a formal model of transparency.

Experimental data tends to show that there exists an inversely proportional relationship between

frequency of a lexical item and the time it takes speakers to process it, though results vary with

regards to constituent and whole item frequency effects. Early research by Van Jaarsveld and

Rattink (1988) on processing novel and lexicalized Dutch compounds showed that lexical

Page 320: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

306

decision times were generally only affected by the frequency of the compound and not by that of

its constituents, although some of their tests revealed that the frequency of the second noun (i.e.

the head), under certain conditions, could also affect response times. Just over twenty years

later, Baayen et al. (2010) found similar results for lexical decision tasks involving English

compounds: compound frequency is by far the strongest predictor of response time latencies,

with constituent frequencies seemingly having little effect. Interestingly, however, Baayen et al.,

arguing that traditional linear models of measurement are insufficient, re-examined the data

using a non-linear model, which showed that the effect of a compound’s frequency on reaction

times is in some part modulated by the modifier’s frequency when the compound itself is of low

frequency. These findings, along with other observations based on experiments in naming tasks

and eye-tracking records, led Baayen et al. to hypothesize that frequency effects in compound

processing is a complex system of interactions involving several factors.

A recent study based on user ratings, however, reveals that constituent frequency may in fact

play a significant role in how speakers interpret compounds. In Bell and Schäfer (2013),

participants were tasked with rating the perceived literality (i.e. compositionality) of a series of

compounds on a scale of 0 to 5. Bell and Schäfer then analyzed these ratings using frequency

data for their constituents and found that the literality judgements of the raters strongly

correlated with the frequency of either constituent. In other words, the more frequent a particular

constituent was, the higher the literality rating, and vice-versa. The results of their analysis

suggest that speakers view compounds as more literal if their constituents are highly familiar.

While compound and constituent frequency is no doubt relevant to models of lexical processing

(i.e. decomposition vs. full-access theories), I think that, conceptually, it should only be viewed

as a secondary component of semantic transparency given how the concept was defined in

Chapter 7159. The reasoning behind this position is that, while the more frequent a compound’s

constituents are, the easier it may be to recognize as a possible word (cf. lexical decision tasks),

frequency itself says nothing about the meaning of the item or how it relates to the meaning of

the whole construction. Of course, the only way to confirm this hypothesis is to conduct

159

Crucially, the semantic transparency of a compound is based solely on its constituents (i.e. the speaker is unfamiliar with the compound itself).

Page 321: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

307

experiments in which token frequency is accounted for within the proposed typology.

Regardless of the position held on the matter, were the hierarchy advanced here taken as a

framework with which to evaluate the likelihood that certain compounds would be stored and

accessed as single units, then frequency data would no doubt prove to be a necessary component

of such a model.

8.4.4 Family Size

Also related to some degree to frequency effects in lexical processing is morphological family

size, which is loosely defined as the number of complex words in which a particular item may

participate. Research has successfully shown that the family size of a constituent or affix has an

effect on how speakers process complex words. Schreuder and Baayen (1997), for instance,

reported that participants responded much faster in lexical decision tasks for simplex words with

large family sizes. They followed up this work in Bertram et al. (2000) and found that the family

size of affixes in Dutch affected participants’ reaction times, although significant semantic

effects were also observed (i.e. family size effects were only observed when semantically

unrelated family members were removed from the analysis). Subsequent work on compounds

has shown similar effects, but with nevertheless interesting differences. De Jong et al. (2002),

for instance, found a positional family frequency effect for both English and Dutch compounds,

which is to say that the frequency of a constituent in a particular position was a better predictor

of reaction times than the constituent’s family size. More recently, Juhasz and Berkowitz (2011)

again found evidence that family size has an effect on English compound recognition. In

particular, they found that participants responded faster to compounds when the first constituent

possessed a large family size. Furthermore, in a sentence reading experiment in which gaze

duration was measured, Juhasz and Berkowitz found that participants spent less time on

compounds containing constituents with large morphological families.

One will have no doubt noticed parallels between family size and the semantic homogeneity

feature discussed in Chapter 4. Although the templates discussed in that chapter were strictly

based on a single lexical item, the approach could easily be expanded to include family size.

What the findings in both Bertram et al. (2000) and De Jong et al. (2002) show, however, is that

the incorporation of family size into the typology should take into account constituent position,

Page 322: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

308

as well as the semantic homogeneity of the family itself (i.e. the presence and number of

homonyms within the family).

8.4.5 Testing the Typology and Closing Remarks

Finally, the typology proposed here remains entirely theoretical until it is tested with speakers.

Although the features and factors discussed are all based on previous research on compounding,

including experimental work done in psycholinguistics, the resulting classification of

compounds cannot be said to be conclusive until it is confirmed with speakers. Chapter 2 looked

at some of the ways that a compound’s semantic transparency has been quantitatively assessed,

most of which involved lexical decision tasks measuring response times with or without priming

(Sandra 1990, Jarema et al. 1999, Libben et al. 2003). The assumption is that lower response

times indicate reduced processing costs, which may be interpreted as a higher degree of

semantic transparency. In this regard, such experiments could be used to test the validity of the

typology proposed here: if overall reaction times for compounds correspond to the different

levels of the typology, then we may assume that the order and weight of the retained features is

consistent with the compounds’ degree of transparency. Conversely, if no such correspondence

is found, then the typology would need to be modified according to the results. Caution,

however, should be taken regarding this approach, as lexical decision tasks are also sensitive to

factors largely unrelated to semantic transparency (e.g. word length).

Another issue that arises from this approach, however, is that the typology was created using

mostly established compounds, which makes it difficult to avoid introducing outside effects on

processing such as frequency or even word length (cf. Hudson and Bergman 1985). In other

words, many of the compounds that were analyzed will be familiar to speakers, which will result

in faster response times under most conditions. Moreover, as I had briefly touched upon earlier

when discussing frequency effects, it is not entirely clear if response times are in fact a good

measure of transparency, at least not as it is defined here. A transparent compound is understood

here as a compound for which meaning is easily determined. It is a qualitative property. In this

regard, I believe that the best way to evaluate the correctness of the typology is to make use of

questionnaires in which speakers are asked a) if they are familiar with the compound, b) if they

are familiar with the constituents, c) if they already know the meaning of the compound, d) if

they know the meaning of the constituents, and e) what the meaning of the compound is. This is

Page 323: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

309

similar to the work conducted by, among others, Gleitman and Gleitman (1971) and Ryder

(1994). We might also wish to offer speakers the opportunity to rate each compound’s degree of

“transparency” themselves, similar to Sandra (1994) and Zwitserlood (1994). In this way, we are

able to gather information regarding the speaker’s knowledge of the compounds, as well as the

meaning they attribute it. Such an approach would therefore serve to either validate or invalidate

the typology proposed and ideally offer insight on how it could be improved upon in the future.

After all, while semantic transparency is arguably a fundamental property of compounds, it is

ultimately dependent on the speaker and his or her ability to make sense of the sometimes highly

ambiguous pairings that permeate the language.

Page 324: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

310

References

Ackerman, Farrell and Philip LeSourd. 1997. Toward a lexical representation of phrasal predicates. In Complex Predicates, ed. Alex Alsina, Joan Bresnan, and Peter Sells, 67–106. Stanford University: Center for the Study of Language and Information.

Adams, Valerie. 1973. An introduction to modern English word-formation. London: Longman.

Adams, Valerie. 2001. Complex words in English. Harlow, England: Pearson Longman.

Allen, Margaret. 1978. Morphological investigations. Doctoral dissertation, University of Connecticut.

Amiot, Dany. 2005. Between compounding and derivation: Elements of word-formation corresponding to prepositions. In Morphology and its demarcations: selected papers fron the 11th Morphology Meeting, ed. Wolfgang U. Dressler, Dieter Kastovsky, Oskar E. Pfeiffer, and Franz Rainer, 183-196. Amsterdam: John Benjamins Publishing Company.

Anderson, Stephen R. 1982. Where’s Morphology? Linguistic Inquiry 13(4): 571–612.

Andreevskaia, Alina and Sabine Bergler. 2006. Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In Proceedings of the 11th Conference of the European Association for Computational Linguistics, 209–216. Trento, Italy.

Anscombre, Jean-Claude. 1990. Pourquoi un moulin à vent n’est pas un ventilateur. Langue française 86: 103–125.

Anscombre, Jean-Claude. 1999. Le jeu de la prédication dans certains composés nominaux. Langue française 122: 52–69.

Apothéloz, Denis. 2002. La construction du lexique français: principes de morphologie dérivationnelle. Paris: Ophrys.

D’Arcais, Giovanni B. Flores. 1993. The Comprehension and Semantic Interpretation of Idioms. In Idioms: Processing, Structure, and Interpretation, ed. Cristina Cacciari and Patrizia Tabossi, 79–98. Hillsdale, NJ: Lawrence Erlbaum Associates.

Arcodia, Giorgio F., Nicola Grandi, and Bernhard Wälchli. 2010. Coordination in Compounding. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 177–198. Amsterdam: John Benjamins Publishing Company.

Arnaud, Pierre J. L. 2003. Les composés Timbre-poste. Lyon: Presses Universitaires de Lyon.

Arnaud, Pierre J. L. 2004. Problématique du nom composé. In Le nom composé: données sur seize langues, ed. Pierre J. L. Arnaud, 329-353. Lyon: Presses universitaires de Lyon.

Arnaud, Pierre J. L. 2008. Semantic Complexity in English [NN]N Compounds. Anglophonia 24: 7–21.

Page 325: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

311

Arnaud, Pierre J. L., Emmanuel Ferragne, Diana M. Lewis, and François Maniez. 2008. Adjective + Noun sequences in attributive or NP-final positions: Observations on lexicalization. In Phraseology: an interdisciplinary perspective, ed. Sylviane Granger and Fanny Meunier, 111–125. Amsterdam: John Benjamins Publishing Company.

Aronoff, Mark. 1976. Word Formation in Generative Grammar. Cambridge, MA: MIT Press.

Aronoff, Mark. 2007. In The Beginning was the Word. Language 83(4): 803–830.

Aronoff, Mark and Kirsten Fudeman. 2005. What is Morphology? Malden, MA: Blackwell Publishing.

Baayen, R. Harald, Victor Kuperman, and Raymond Bertram. 2010. Frequency effects in compound processing. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 257–270. Amsterdam: John Benjamins Publishing Company.

Baayen, R. Harald and Rochelle Lieber. 1996. Word frequency distributions and lexical semantics. Computers and the Humanities 30(4): 281–291.

Baker, Mark. 1985. The mirror principle and morphosyntactic explanation. Linguistic inquiry 16(3): 373–415.

Banerjee, Satanjeev and Ted Pedersen. 2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Computational Linguistics and Intelligent Text Processing, ed. Alexander Gelbukh, 136–145. Berlin: Springer.

Barbaud, Philippe. 1971. L’ambiguïté structurale du composé binominal. Cahier de linguistique 1: 71–116.

Baron, Irène and Michael Herslund. 2001. Semantics of the verb HAVE. Typological Studies in Language 47: 85–98.

Baroni, Marco, Emiliano Guevara, and Vito Pirrelli. 2006. Sulla tipologia dei composti N+N in italiano: principi categoriali ed evidenza distribuzionale a confronto. In Atti del 40esimo Congresso della SLI, 21–23. Roma: Bulzoni. Cited in Baroni et al. 2007.

Baroni, Marco, Emiliano Guevara, and Vito Pirrelli. 2007. NN Compounds in Italian: Modelling Category Induction and Analogical Extension. Lingue e linguaggio 6(2): 263–290.

Bartning, Inge. 2001. Towards a typology of French NP de NP structures or how much possession is there in complex noun phrases with “de” in French. In Dimensions of possession, ed. Irène Baron, Michael Herslund, and Finn Sørensen, 147–167. Amsterdam: John Benjamins Publishing Company.

Bassac, Christian. 2006. A compositional treatment for English compounds. Research in Language 4: 133–153.

Page 326: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

312

Bassac, Christian and Pierrette Bouillon. 2013. The telic relationship in compounds. In Advances in Generative Lexicon Theory, ed. James Pustejovsky, 109–126. Dordrecht: Springer.

Bates, Elizabeth and Brian MacWhinney. 1987. Competition, Variation, and Language Learning. In Mechanisms of Language Acquisition, ed. Brian MacWhinney, 157–194. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Bauer, Laurie. 1978. The Grammar of Nominal Compounding. Odense: Odense University Press.

Bauer, Laurie. 1983. English Word-Formation. Cambridge: Press Syndicate of the University of Cambridge.

Bauer, Laurie. 1998. When Is a Sequence of Two Nouns a Compound in English? English Language and Linguistics 2(1): 65–86.

Bauer, Laurie. 2001a. Compounding. In Language Typology and Language universals, Vol. 1, ed. Martin Haspelmath, König Ekkehard, Wulf Österreicher, and Wolfgang Raible, 695–707. Berlin: Mouton de Gruyter.

Bauer, Laurie. 2001b. Morphological Productivity. Cambridge: Press Syndicate of the University of Cambridge.

Bauer, Laurie. 2003. Introducing Linguistic Morphology. 2nd ed. Washington, DC: Georgetown University Press.

Bauer, Laurie. 2004. A Glossary of Morphology. Edinburgh: Edinburgh University Press.

Bauer, Laurie. 2008a. Dvandva. Word Structure 1(1). 1–20.

Bauer, Laurie. 2008b. Les composés exocentriques de lʼanglais. In La composition dans une perspective typologique, ed. Dany Amiot, 35–47. Arras: Artois presses université.

Bauer, Laurie. 2010. The typology of exocentric compounding. In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 167–175. Amesterdam: John Benjamins Publishing Company.

Bavoux, Claudine. 2008. Le français des dictionnaires: l’autre versant de la lexicographie française. Bruxelles: De Boeck.

Becker, Thomas. 1993. Back-formation, cross-formation, and “bracketing paradoxes” in paradigmatic morphology. Yearbook of morphology 6: 1–25.

Bell, Melanie J. and Martin Schäfer. 2013. Semantic transparency: challenges for distributional semantics. In Proceedings of the IWCS 2013 workshop: Towards a formal distributional semantics, ed. Aurelie Herbelot, Roberto Zamparelli, and Gemma Boleda, 1-10. Potsdam: Association for Computational Linguistics.

Page 327: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

313

Benczes, Réka. 2005. Metaphor- and metonymy-based compounds in English: a cognitive linguistic approach. Acta Linguistica Hungarica 52(2): 173–198.

Benczes, Réka. 2006. Creative compounding in English: the semantics of metaphorical and metonymical noun-noun combinations. Amsterdam: John Benjamins Publishing Company.

Benczes, Réka. 2010. Setting limits on creativity in the production and use of metaphorical and metonymical compounds. In Cognitive Perspectives on Word Formation, ed. Alexander Onysko and Sascha Michel, 219–242. New York: Mouton de Gruyter.

Bertram, Raymond, R. Harald Baayen, and Robert Schreuder. 2000. Effects of family size for complex words. Journal of Memory and Language 42(3): 390–405.

Bisetto, Antonietta and Sergio Scalise. 2005. The classification of compounds. Lingue e linguaggio, 4(2): 319–332.

Bloomfield, Leonard. 1933. Language. Chicago: The University of Chicago Press.

Bolinger, Dwight. 1975. Aspects of Language. 2nd ed. New York: Harcourt Brace Jovanovich.

Booij, Geert. 2007. The grammar of words: An introduction to linguistic morphology. 2nd ed. Oxford: Oxford University Press.

Booij, Geert. 2010. Compound construction: Schemas or analogy? A construction morphology perspective. In Cross-disciplinary issues in compounding, ed. Sergio Scalise and Irene Vogel, 93–108. Amesterdam: John Benjamins Publishing Company.

Borillo, Andrée. 1996. La relation partie-tout et la structure [NI à N2] en français. Faits de langues 4(7): 111–120.

Bosredon, Bernard and Irène Tamba. 1991. Verre à pied, moule à gaufres: préposition et noms composés de sous-classe. Langue française 91(1): 40–55.

Botha, Rudolf P. 1984. Morphological Mechanisms: Lexicalist Analyses of Synthetic Compounding. 1st ed. New York: Pergamon Press.

Bresnan, Joan and Sam A. Mchombo. 1995. The lexical integrity principle: Evidence from Bantu. Natural Language & Linguistic Theory 13(2): 181–254.

Brousseau, Anne-Marie and Emmanuel Nikiema. 2001. Phonologie et morphologie du français. Saint-Laurent, Québec: Fides.

Burrow, Thomas. 1955. The Sanskrit language. London: Faber and Faber.

Butterworth, B. 1983. Lexical representation. Language production 2: 257–294.

Cadiot, Pierre. 1997. Les prépositions abstraites en français. Paris: Armand Colin.

Page 328: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

314

Cajolet-Laganière, Hélène, Pierre Martel, and Chantal-Édith Masson. 2010. Le dictionnaire général du français de l’équipe FRANQUS: quelques aspects originaux de la description lexicographique. In XXVe CILPR Congrès International de Linguistique et de Philologie Romanes Innsbruck, ed. Maria Iliescu, Heidi Siller-Runggaldier and Paul Danler, 241–249. Berlin: Walter De Gruyter.

Cañas, Alberto, J., Alejandro Valerio, Juan Lalinde-Pulido, Marco Carvalho, and Marco Arguedas. 2003. Using WordNet for word sense disambiguation to support concept map construction. In String Processing and Information Retrieval, ed. Mario A. Nascimento, Edleno S. de Moura, and Arlindo L. Oliveira, 350–359. Berlin: Springer-Verlag.

Caramazza, Alfonso, Alessandro Laudanna, and Cristina Romani. 1988. Lexical access and inflectional morphology. Cognition 28(3): 297–332.

Carstairs-McCarthy, Andrew. 2002. An introduction to English morphology: words and their structure. Edinburgh: Edinburgh University Press.

Cervoni, Jean. 1991. La préposition: étude sémantique et pragmatique. Paris: Duculot.

Chialant, Doriana and Alfonso Caramazza. 1995. Where is morphology and how is it processed? The case of written word recognition. In Morphological aspects of language processing, ed. Laurie Beth Feldman, 55–76. Hillsdale, NJ: L. Erlbaum Associates.

Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Chomsky, Noam. 1970. Remarks on Nominalization. In Readings in English Transformational Grammar, ed. Roderick A. Jacobs and Peter S. Rosenbaum, 184–221. Washington, D.C.: Georgetown University Press.

Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press.

Chomsky, Noam and Morris Halle. 1968. The Sound pattern of English. New York: Harper & Row.

Clark, Eve V. and Ruth A. Berman. 1987. Types of linguistic knowledge: Interpreting and producing compound nouns. Journal of Child Language 14(3): 547–567.

Cohen, Benjamin and Gregory L. Murphy. 1984. Models of concepts. Cognitive Science 8(1): 27–58.

Copestake, Ann and Ted Briscoe. 1995. Semi-productive Polysemy and Sense Extension. Journal of Semantics 12: 15-67.

Corbin, Danielle 1992. Hypothèses sur les frontières de la composition nominale. Cahiers de grammaire 17: 26–55.

Corbin, Danielle. 1997. Locutions, composés, unités polylexématiques  : lexicalisation et mode de construction. La locution  : entre langue et usages, 53–101. Fontenay: ENS Éditions.

Page 329: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

315

Costello, Fintan J., Tony Veale, and Simon Dunne. 2006. Using WordNet to automatically deduce relations between words in noun-noun compounds. In Proceedings of the COLING/ACL on Main conference poster sessions, 160–167. Morristown, NJ: Association for Computational Linguistics.

Costello, Fintan J. and Mark T. Keane. 2000. Efficient creativity: Constraint-guided conceptual combination. Cognitive Science 24(2): 299–349.

Cruse, D.A. 1986. Lexical Semantics. Cambridge: Cambridge University Press.

Dal, Georgette and Dany Amiot. 2008. La composition néoclassique en français et ordre des constituants. In La composition dans une perspective typologique, ed. Dany Amiot, 89–113. Arras: Artois Presses Université.

Darmesteter, Arsène. 1874. Traité de la formation des mots composés dans la langue française comparée aux autres langues romanes et au latin. Paris: A. Franck.

Derwing, Bruce L. and Royal Skousen. 1989. Morphology in the mental lexicon: A new look at analogy. In Yearbook of morphology 2, ed. Geert Booij and Jaap van Marle, 55–71. Dordrecht: Foris.

Dirven, René and Marjolyn Verspoor. 2004. Cognitive exploration of language and linguistics. 2nd rev. ed. Amsterdam: John Benjamins Publishing Company.

Dohmes, Petra, Pienie Zwitserlood, and Jens Bölte. 2004. The impact of semantic transparency of morphologically complex words on picture naming. Brain and language 90: 203–212.

Downing, Pamela. 1977. On the creation and use of English compound nouns. Language 53(4): 810–842.

Dressler, Wolfgang U. 1985. On the predictiveness of natural morphology. Journal of Linguistics 21(2): 321–339.

Estes, Zachary and Sam Glucksberg. 2000. Interactive property attribution in concept combination. Memory & Cognition 28(1): 28–34.

Estes, Zachary and Lara L. Jones. 2006. Priming via relational similarity: A copper horse is faster when seen through a glass eye. Journal of Memory and Language 55(1): 89–101.

Fabb, Nigel. 1998. Compounding. In The handbook of morphology, ed. Andrew Spencer and Arnold M. Zwicky, 66–83. Oxford: Blackwell Publishers.

Feldman, Laurie Beth and Matthew John Pastizzo. 2003. Morphological facilitation: The role of semantic transparency and family size. In Morphological structure in language processing, ed. R. Harald Baayen and Robert Schreuder, 233–258. Berlin: Mouton de Gruyter.

Page 330: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

316

Feldman, Laurie B., Emily G. Soltano, Matthew J. Pastizzo, and Sarah E. Francis. 2004. What do graded effects of semantic transparency reveal about morphological processing? Brain and Language 90: 17–30.

Fellbaum, Christiane, ed. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

Fonagy, Ivan. 1975. La structure sémantique des constructions possessives. In Langue, discours, société: pour Emile Benveniste, ed. Julia Kristeva, Jean Claude Milner and Nicolas Ruwet, 44–84. Paris: Éditions du Seuil.

Ford, M.A., M.H. Davis and W.D. Marslen-Wilson. 2010. Derivational morphology and base morpheme frequency. Journal of Memory and Language 63(1): 117–130.

Fradin, Bernard. 2003. Nouvelles approches en morphologie. Paris: Presses universitaires de France.

Fradin, Bernard. 2009. IE, Romance: French. In Oxford Handbook on Compounding, ed. Rochelle Lieber and Pavol Štekauer, 417–435. Oxford: Oxford University Press.

Frauenfelder, Uli H. and Robert Schreuder. 1992. Constraining psycholinguistic models of morphological processing and representation: The role of productivity. In Yearbook of morphology 1991, ed. Geert Booij and Jaap van Marle, 165–183. Dordrecht: Springer Netherlands.

Frisson, Steven, Elizabeth Niswander-Klement, and Alexander Pollatsek. 2008. The role of semantic transparency in the processing of English compound words. British Journal of Psychology 99(1): 87–107.

Gaeta, Livio and Davide Ricca. 2009. Composita solvantur: Compounds as lexical units or morphological objects. Italian Journal of Linguistics 21(1): 35–70.

Gagné, Christina L. 2001. Relation and lexical priming during the interpretation of noun-noun combinations. Learning, Memory 27(1): 236–254.

Gagné, Christina L. 2002. Lexical and Relational Influences on the Processing of Novel Compounds. Brain and Language 81: 723–735.

Gagné, Christina L. and Edward J. Shoben. 1997. Influence of Thematic Relations on the Comprehension of Modifier-Noun Combinations. Journal of Experimental Psychology: Learning, Memory, and Cognition 23(1): 71–87.

Gagné, Christina L. and Edward J. Shoben. 2002. Priming relations in ambiguous noun-noun combinations. Memory & Cognition 30(4): 637–646.

Gagné, Christina L. and Thomas L. Spalding. 2007. Conceptual Combination: Implications for the mental lexicon. In The representation and processing of compound words, Vol. 1, ed. Gary Libben and Gonia Jarema, 145–169. Oxford: Oxford University Press.

Page 331: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

317

Gagné, Christina L. and Thomas L. Spalding. 2009. Constituent integration during the processing of compound words: Does it involve the use of relational structures? Journal of Memory and Language 60: 20–35.

Gagné, Christina L., Thomas L. Spalding, and Melissa C. Gorrie. 2005. Sentential context and the interpretation of familiar open-compounds and novel modifier-noun phrases. Language and Speech 48(2): 203–219.

Geeraerts, Dirk. 2002. The interaction of metaphor and metonymy in composite expressions. Metaphor and metonymy in comparison and contrast, ed. René Dirven and Ralf Pörings, 435–465. Berlin: Mouton de Gruyter.

Gibbs, Raymond, Nandini Nayak, and Cooper Cutting. 1989. How to kick the bucket and not decompose: Analyzability and idiom processing. Journal of memory and language 28(5): 576–593.

Girju, Roxana, Adriana Badulescu, and Dan Moldovan. 2003. Learning semantic constraints for the automatic discovery of part-whole relations. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 1–8. Association for Computational Linguistics.

Girju, Roxana, Dan Moldovan, Marta Tatu, and Daniel Antohe. 2005. On the semantics of noun compounds. Computer Speech & Language 19(4): 479–496.

Girju, Roxana, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. 2007. SemEval-2007 Task 04: Classification of semantic relations between nominals. In Proceedings of the 4th International Workshop on Semantic Evaluations, 13–18. Association for Computational Linguistics.

Giurescu, Anca. 1975. Les mots composés dans les langues romanes. The Hague: Mouton.

Gleitman, Lila R. and Henry Gleitman. 1971. Phrase and Paraphrase: Some Innovative Uses of Language. New York: Norton.

Glucksberg, Sam. 1993. Idiom meanings and allusional content. In Idioms: Processing, structure, and interpretation, ed. Cristina Cacciari and Patrizia Tabossi, 3–26. Hillsdale, NJ: L. Erlbaum Associates.

Goethem, Kristel Van. 2009. Choosing between A+N compounds and lexicalized A+N phrases: The position of French in comparison to Germanic languages. Word Structure 2(2): 241–253.

Goossens, Louis. 1995. Metaphtonymy: The interaction of metaphor and metonymy in figurative expressions for linguistic action. Pragmatics & beyond. New series 33: 159–174.

Grandy, Richard E. 1990. Understanding and the Principle of Compositionality. Philosophical Perspectives 4: 557–572.

Page 332: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

318

Granger, Sylviane and Magali Paquot. 2008. Disentangling the Phraseological Web. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 27–49. Amsterdam: John Benjamins Publishing Company.

Grevisse, Maurice and André Goosse. 2011. Le bon usage: grammaire francaise. 15th ed. Bruxelles: De Boeck-Duculot.

Grice, Paul. 1975. Logic and conversation. Syntax and semantics 3: 41–58.

Gries, Stefan Th. 2008. Phraseology and Linguistic Theory: A Brief Survey. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 3–25. Amsterdam: John Benjamins Publishing Company.

Gross, Gaston. 1988. Degré de figement des noms composés. Langages 90: 57–72.

Gross, Gaston. 1996. Les expressions figées en français. Paris: Ophrys.

Guevara, Emiliano, Sergio Scalise, Antonietta Bisetto, and Chiara Melloni. 2006. Morbo/Comp: a multilingual database of compound words. Paper presented at the LREC 2006, 5th Conference on Language Resources and Evaluation, Genoa, Italy.

Ten Hacken, Pius. 1994. Defining morphology. Zurich: Georg Olms Verlag.

Ten Hacken, Pius. 1999. Motivated tests for compounding. Acta linguistica hafniensia 31(1): 27–58.

Hale, Kenneth L. and Samuel Jay Keyser. 2002. Prolegomenon to a theory of argument structure. Cambridge, MA: MIT Press.

Halle, Morris. 1973. Prolegomena to a theory of word formation. Linguistic inquiry 4(1): 3–16.

Halle, Morris and Alec Marantz. 1994. Distributed morphology and the pieces of inflection. The view from building 20. 111–176.

Hamel, Marie-Josée. 2010. Prototype d’un dictionnaire électronique de reformulation pour apprenants avancés de français langue seconde. Cahier de l’APLIUT 29(1): 73-82.

Hatcher, Anna Granville. 1960. An Introduction to the Analysis of English Noun Compounds. Word 16: 356–373.

Hay, Jennifer. 2003. Causes and consequences of word structure. New York: Routledge.

Herbst, Thomas. 1996. What are collocations: Sandy beaches or false teeth? English Studies 77(4): 379–393.

Hudson, Patrick T. W and Marijke W. Bergman. 1985. Lexical knowledge in word recognition: Word length and word frequency in naming and lexical decision tasks. Journal of Memory and Language 24(1): 46–58.

Page 333: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

319

Van Jaarsveld, Henk J. and Gilbert E. Rattink. 1988. Frequency effects in the processing of lexicalized and novel nominal compounds. Journal of Psycholinguistic Research 17(6): 447–473.

Van Jaarsveld, Henk J., Riet Coolen, and Robert Schreuder. 1994. The role of analogy in the interpretation of novel compounds. Journal of Psycholinguistic Research 23(2): 111–137.

Jackendoff, Ray. 1974. Morphological and semantic regularities in the lexicon. Bloomington, IN: Indiana University Linguistics Club.

Jackendoff, Ray. 1992. Semantic Structures. Cambridge, MA: MIT Press.

Jackendoff, Ray. 2009. Compounding in the parallel architecture and conceptual semantics. In Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 105–128. Oxford: Oxford University Press.

Jackendoff, Ray. 2010. Meaning and the lexicon: the parallel architecture, 1975-2010. New York: Oxford University Press.

Jarema, Gonia, Céline Busson, Rossitza Nikolova, Kyrana Tsapkini, and Gary Libben. 1999. Processing Compounds: A Cross-Linguistic Study. Brain and Language 68(2): 362–369.

Jespersen, Otto. 1961. A modern English grammar on historical principles. Vol. VI. London: George Allen & Unwin. [1942].

Johnston, Michael and Frederica Busa. 1996. Qualia structure and the compositional interpretation of compounds. In Proceedings of the ACL SIGLEX workshop on breadth and depth of semantic lexicons, 77–88. Santa Cruz, California: Association for Computational Linguistics.

Jong, Nivja H. De, Laurie B. Feldman, Robert Schreuder, Matthew Pastizzo, and R. Harald Baayen. 2002. The processing and representation of Dutch and English compounds: Peripheral morphological and central orthographic effects. Brain and Language 81(1): 555–567.

Juhasz, Barbara J. and Rachel N. Berkowitz. 2011. Effects of morphological families on English compound word recognition: A multitask investigation. Language and Cognitive Processes 26(4-6). 653–682.

Kamp, Hans. 1975. Two theories about adjectives. In Formal semantics of natural language, ed. E. L. Keenan, 123–155. Cambridge, UK: Cambridge University Press.

Katz, Jerrold J. 1973. Compositionality, idiomaticity, and lexical substitution. In A festschrift for Morris Halle, ed. Morris Halle, Stephen R. Anderson, and Paul Kiparsky, 357–376. New York: Holt, Rinehart, and Winston.

Page 334: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

320

Kehayia, Eva, Gonia Jarema, Kyrana Tsapkini, Danuta Perlak, Angela Ralli, and Danuta Kadzielawa. 1999. The Role of Morphological Structure in the Processing of Compounds: The Interface between Linguistics and Psycholinguistics. Brain and Language 68(1-2): 370–377.

Kim, Su Nam and Timothy Baldwin. 2005. Automatic interpretation of noun compounds using WordNet similarity. In Natural Language Processing–IJCNLP 2005, ed. Robert Dale, 945–956. Berlin: Springer.

Knittel, Marie-Laurence. 2009. Le statut des compléments du nom en [de NP]. The Canadian Journal of Linguistics 54(2): 255–290.

Knittel, Marie-Laurence. 2010. Modification et détermination dans les expressions N à N en français. Inria, ATILF. http://hal.archives-ouvertes.fr/docs/00/53/25/80/PDF/Knittel_N_A_N_.pdf

Kopecka, Anetta. 2006. The semantic structure of motion verbs in French. In Space in languages: Linguistic systems and cognitive categories, ed. Maya Hickmann and Stéphane Robert, 83–101. Amesterdam: John Benjamins Publishing Company.

Kövecses, Zoltan. 2002. Metaphor: A Practical Introduction. New York: Oxford University Press.

Lakoff, George and Mark Johnson. 1980. Metaphors we live by. Chicago: University of Chicago Press.

Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Theoretical prerequisites. California: Stanford University Press.

Langacker, Ronald W. 2009. Metonymic grammar. In Metonymy and metaphor in grammar, vol. 25, ed. Klaus-Uwe Panther, Linda L Thornburg, and Antonio Barcelona, 45–71. Amesterdam: John Benjamins Publishing Company.

Lapointe, Steven. 1980. Lexical Analysis of the English Auxiliary Verb System. In Lexical Grammar, ed. Teun Hoekstra, Harry van der Hulst, and Michael Moortgat, 215-254. Dordrecht: Foris Publications.

Lauer, Mark. 1995. Designing statistical language learners: Experiments on noun compounds. Doctoral dissertation, Macquarie University.

Lees, Robert B. 1960. The grammar of English nominalizations. 5th ed. 1968. Bloomington, IN: Mouton.

Leonard, Rosemary. 1984. The interpretation of English noun sequences on the computer. Amsterdam: Elsevier Science Publications Company.

Page 335: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

321

Lesselingue, Chrystèle. 2003. Les noms composés [NN] N “holonymiques”: illustration de la spécificité sémantique des unités construites morphologiquement. In Silexicales 3. Les unités morphologiques, vol. 3. Silexicales, ed. Bernard Fradin, Georgette Dal, Nabil Hathout, Françoise Kerleroux, Marc Plénat, and Michel Roché, 100–107. Villeneuve-d’Ascq: U.M.R. SILEX.

Levi, Judith N. 1974. On the alleged idiosyncracy of non-predicate NP’s. In Papers from the Tenth Regional Meeting, Chicago Linguistic Society, ed. Michael W. La Galy, Robert A. Fox, and Anthony Bruck, 402–415. Chicago, IL: Chicago Linguistic Society.

Levi, Judith N. 1978. The Syntax and Semantics of Complex Nominals. New York: Academic Press.

Li, Xiaobin, Stan Szpakowicz, and Stan Matwin. 1995. A WordNet-based algorithm for word sense disambiguation. In Proceedings of the Fourteenth International Joint Conference On Artificial Intelligence, Vol. 2, ed. C. S. Mellish, 1368–1374. San Meteo, CA: Morgan Kaufmann Publishers.

Libben, Gary. 1998. Semantic Transparency in the Processing of Compounds: Consequences for Representation, Processing, and Impairment. Brain and Language 61(1): 30–44.

Libben, Gary. 2006. Why Study Compound Processing? An Overview of the Issues. In The Representation and Processing of Compound Words, ed. Gary Libben and Gonia Jarema, 1–22. Oxford: Oxford University Press.

Libben, Gary, Martha Gibson, Yeo Bom Yoon, and Dominiek Sandra. 2003. Compound Fracture: The Role of Semantic Transparency and Morphological Headedness. Brain and Language 84(1): 50–64.

Libben, Maya R. and Debra A. Titone. 2008. The multidetermined nature of idiom processing. Memory & cognition 36(6): 1103–1121.

Lieber, Rochelle. 1980. On the organization of the lexicon. Bloomington: Indiana University Linguistics Club.

Lieber, Rochelle. 1989. On percolation. In Yearbook of Morphology 2, ed. Geert Booij and Jaap van Marle, 95–138. Dordrecht: Foris.

Lieber, Rochelle. 1992. Deconstructing Morphology. Chicago: University of Chicago Press.

Lieber, Rochelle. 2004. Morphology and Lexical Semantics. Cambridge: Cambridge University Press.

Lieber, Rochelle. 2009. A Lexical Semantic Approach to Compounding. In The Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 78–104. Oxford: Oxford University Press.

Lieber, Rochelle. 2010. Introducing morphology. Cambridge, UK: Cambridge University Press.

Page 336: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

322

Lieber, Rochelle and Sergio Scalise. 2007. The Lexical Integrity Hypothesis in a new theoretical universe. Lingue e linguaggio 1: 7–32.

Lieber, Rochelle and Pavol Štekauer, eds. 2009. The Oxford Handbook of Compounding. Oxford: Oxford University Press.

Maguire, Phil, Edward J. Wisniewski, and Gert Storms. 2010. A corpus study of semantic patterns in compounding. Corpus Linguistics and Linguistic Theory 6(1): 49–73.

Manning, Christopher D. and Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT press.

Marchand, Hans. 1965. The analysis of verbal nexus substantives. Indogermanische Forschungen 70(2): 117–145.

Marchand, Hans. 1960. The categories and types of present-day English word-formation. 2nd ed. 1969. München: Beck.

Marslen-Wilson, William, Lorraine. K Tyler, Rachelle Waksler, and Lianne Older. 1994. Morphology and meaning in the English mental lexicon. Psychological Review 101(1): 3–33.

Martin, Robert. 1997. Sur les facteurs du figement lexical. In La locution  : entre langue et usages, ed. Michel Martins-Baltar and Blanche-Noëlle Grunig, 291–305. Fontenay-aux-Roses: ENS Éditions.

Mathieu-Colas, Michel. 1994. Les mots à trait d’union  : problèmes de lexicographie informatique. Paris: Didier Erudition.

Mathieu-Colas, Michel. 1995. Un Dictionnaire électronique des mots à trait d’union. Langue Francaise 108: 76–85.

Mathieu-Colas, Michel. 1996. Essai de typologie des noms composés français. Cahiers de Lexicologie 69: 71–125.

Mel’čuk, Igor, André Clas, and Alain Polguère. 1995. Introduction à la lexicologie explicative et combinatoire. Louvain-la-Neuve: Duculot.

Melis, Ludo. 2003. La préposition en français. Paris: Ophrys.

Moldovan, Dan, Adriana Badulescu, Marta Tatu, Daniel Antohe, and Roxana Girju. 2004. Models for the semantic classification of noun phrases. In CLS ’04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, 60–67. Stroudsburg, PA: Association for Computational Linguistics.

Monsell, Stephen. 1985. Repetition and the lexicon. Progress in the psychology of language 2: 147–195.

Page 337: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

323

Müller, Christof and Iryna Gurevych. 2009. Using wikipedia and wiktionary in domain-specific information retrieval. In Evaluating Systems for Multilingual and Multimodal Information Access: Ninth Workshop of the Cross-Language Evaluation Forum, ed. Carol Peters, 219–226. Berlin: Springer.

Murphy, Gregory L. 1988. Comprehending complex concepts. Cognitive Science 12(4): 529–562.

Navarro, Emmanuel, Franck Sajous, Bruno Gaume, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP, ed. Iryna Gurevych and Torsten Zesch, 19–27. Morristown, NJ: Association for Computational Linguistics.

Noailly, Michèle. 1990. Le substantif épithète. Paris: Presses universitaires de France.

Nunberg, Geoffrey. 1979. The non-uniqueness of semantic solutions: Polysemy. Linguistics and Philosophy 2(3): 145–184.

Nunberg, Geoffrey. 1995. Transfers of meaning. Journal of semantics 12(2): 109–132.

Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language 70(3): 491–538.

Olsen, Susan. 2001. Copulative compounds: a closer look at the interface between syntax and morphology. In Yearbook of morphology 2000, ed. Geert Booij and Jaap Marle, 279–320. Dordrecht: Springer Netherlands.

Partee, Barbara Hall, Alice G. B. ter Meulen, and Robert Eugene Wall. 1990. Mathematical methods in linguistics. Dordrecht: Kluwer Academic.

Pham, Hien and R. Harald Baayen. 2013. Semantic relations and compound transparency: A regression study in CARIN theory. Psihologija 46(4): 455-478.

Polguère, Alain. 2003. Lexicologie et sémantique lexicale. Montréal: Les Presses de l’Université de Montréal.

Pollatsek, Alexander and Jukka Hyönä. 2005. The role of semantic transparency in the processing of Finnish compound words. Language and Cognitive Processes 20(1): 261–290.

Pustejovsky, James. 1995. The Generative Lexicon. Cambridge, MA: MIT Press.

Radden, Günter and Zoltán Kövecses. 1999. Towards a theory of metonymy. In Metonymy in language and thought, ed. Klaus-Uwe Panther and Günter Radden, 17–60. Amsterdam: John Benjamins Publishing Company.

Rainer, Franz and Soledad Varela. 1992. Compounding in Spanish. Rivista di linguistica 4(1): 117–142.

Page 338: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

324

Rastle, Kathleen and Marjolein Merkx. 2011. Semantic constraints on morphological processing. In Lexical Representation: A Multidisciplinary Approach, ed. M. Gareth Gaskell and Pienie Zwitserlood, 13–31. Berlin: De Gruyter Mouton.

Riegel, Martin. 1988. Les séquences composées N1-N2: une catégorie floue. Studia romanica posnaniensia 13: 129–138.

Riegel, Martin. 1991. Ces noms dits “composés”  : arguments et critères. Studia Romanica Posnaniensia 16: 149–161.

Riegel, Martin. 2001. The grammatical category “Possession” and the part-whole relation in French. In Dimensions of Possession, ed. Irène Baron, Michael Herslund, and Finn Sorensen, 187–200. Amesterdam: John Benjamins Publishing Company.

Roelofs, Ardi and Harald Baayen. 2002. Morphology by itself in planning the production of spoken words. Psychonomic Bulletin & Review 9(1): 132–138.

Roeper, Thomas and Muffy E.A Siegel. 1978. A lexical transformation for verbal compounds. Linguistic Inquiry 9(2): 199–260.

Rosario, Barbara and Marti Hearst. 2001. Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, ed. Lillian Lee, 82–90. Association for Computational Linguistics.

Rosenberg, Maria. 2007. Classification, Headedness and Pluralization: Corpus Evidence from French Compounds. Acta Linguistica Hungarica 54(3): 341–360.

Rosenberg, Maria. 2008. La formation agentive en français. Les composés [VN/A/Adv/P]N/A et les dérivés V-ant, V-eur et V-oir(e). Doctoral dissertation, Stockholm University.

Rosenberg, Maria. 2011. Les composés francais VN – aspects sémantiques. Revue Romane 46(1): 69–88.

Roussarie, Laurent and Florence Villoing. 2003. Some semantic investigations on the French VN construction. In Proceedings of the Second International Workshop on Generative Approaches to the Lexicon, 1–8. Geneva.

Rubin, Gary S., Curtis A. Becker, and Roger H. Freeman. 1979. Morphological structure and its effect on visual word recognition. Journal of Verbal Learning and Verbal Behavior 18(6): 757–767.

Ryder, Mary Ellen. 1994. Ordered chaos. Berkeley: University of California Press.

Saint-Dizier, Patrick. 2006. Introduction to the Syntax and Semantics of Prepositions. In Syntax and Semantics of Prepositions, vol. 29, ed. Patrick Saint-Dizier, 1–25. Dordrecht: Springer.

Page 339: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

325

Sandra, Dominiek. 1990. On the representation and processing of compound words: Automatic access to constituent morphemes does not occur. The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology 42(3): 529–567.

Scalise, Sergio. 1984. Generative morphology. Dordrecht: Foris Publications.

Scalise, Sergio. 1992. Compounding in Italian. Rivista di linguistica 4(1): 175–200.

Scalise, Sergio and Antonietta Bisetto. 2009. The Classification of Compounds. In The Oxford Handbook of Compounding, ed. Rochelle Lieber and Pavol Štekauer, 34–53. Oxford: Oxford University Press.

Scalise, Sergio and Antonio Fábregas. 2010. The head in compounding. In Cross-disciplinary issues in compounding, ed. Sergio Scalise and Irene Vogel, 109–126. Amsterdam: John Benjamins Publishing Company.

Scalise, Sergio and Emiliano Guevara. 2006. Exocentric compounding in a typological framework. Lingue e linguaggio 2: 185–206.

Scalise, Sergio and Irene Vogel. 2010. Why Compounding? In Cross-Disciplinary Issues in Compounding, ed. Sergio Scalise and Irene Vogel, 1–20. Amsterdam: John Benjamins Publishing Company.

Schreuder, Robert and R. Harald Baayen. 1995. Modeling Morphological Processing. In Morphological aspects of language processing, ed. Laurie Beth Feldman, 131–156. Hillsdale, NJ: Lawrence Erlbaum Associates.

Schreuder, Robert and R.Harald Baayen. 1997. How Complex Simplex Words Can Be. Journal of Memory and Language 37(1): 118–139.

Schröder, Anne. 2011. On the productivity of verbal prefixation in English: synchronic and diachronic perspectives. Tübingen: Narr.

Di Sciullo, Anna-Maria and Edwin Williams. 1987. On the Definition of Word. Cambridge, MA: MIT Press.

Séaghdha, Diarmuid. 2008. Learning compound noun semantics. Doctoral dissertation, University of Cambridge.

Selkirk, Elisabeth. 1982. The Syntax of Words. Cambridge, MA: MIT Press.

Shoben, Edward J. 1991. Predicating and nonpredicating combinations. In Psychology of Word Meanings, ed. Paula J. Schwanenflugal, 117–135. Hillsdale, NJ: Erlbaum.

Skousen, Royal. 1989. Analogical modeling of language. Dordrecht: Kluwer Academic Publishers.

Spalding, Thomas. L and Christina. L Gagné. 2007. Semantic property activation during the interpretation of combined concepts. The Mental Lexicon 2(1): 25–47.

Page 340: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

326

Spang-Hanssen, Ebbe. 1963. Les prépositions incolores du français moderne. Copenhague: G.E.C. Gad.

Štekauer, Pavol. 1998. An Onomasiological Theory of English Word-formation. Amsterdam: John Benjamins Publishing.

Štekauer, Pavol. 2005. Meaning Predictability in Word Formation: Novel, context-free naming units. Amsterdam: John Benjamins Publishing Company.

Storms, Gert and Edward J. Wisniewski. 2005. Does the order of head noun and modifier explain response times in conceptual combination? Memory & Cognition 33(5): 852–861.

Svensson, Maria Helena. 2004. Critères de figement: l’identification des expressions figées en français contemporain. Doctoral dissertation, Umea University.

Svensson, Maria Helena. 2008. A Very Complex Criterion of Fixedness: Non-Compositionality. In Phraseology. An Interdisciplinary Perspective, ed. Sylviane Granger and Fanny Meunier, 81–93. Amsterdam: John Benjamins Publishing Company.

Tabossi, Patrizia, Rachele Fanari, and Kinou Wolf. 2008. Processing Idiomatic Expressions: Effects of Semantic Compositionality. Journal of Experimental Psychology: Learning, Memory, and Cognition 34(2): 313–327.

Taft, Marcus & Kenneth I. Forster. 1975. Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior 14(6): 638–647.

Taft, Marcus and Kenneth. I Forster. 1976. Lexical storage and retrieval of polymorphemic and polysyllabic words. Journal of Verbal Learning and Verbal Behavior 15(6): 607–620.

Taft, Marcus. 1981. Prefix stripping revisited. Journal of Verbal Learning and Verbal Behavior 20(3): 289–297.

Takada, Hareo. 2008. Le mot composé: étude contrastive de certains types de mots composés  : japonais et français. Niigata: Niigata University, Graduate School of Modern Society and Culture.

Tan, Keng-Woei, Hyoil Han, and Ramez Elmasri. 2000. Web data cleansing and preparation for ontology extraction using WordNet. In Proceedings of the First International Conference on Information Systems Engineering, 11-18. Los Alamitos, CA: IEEE Computer Society.

Thompson, Sandra A. 1975. On the issue of productivity in the lexicon. Kritikon Litterarum 4: 332–349. Cited in Bauer 1983.

Tulloch, Sara. 1991. The Oxford Dictionary of New Words. Oxford and New York: Oxford University Press. Cited in Bauer 2001.

Page 341: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

327

Tutin, Agnès and Francis Grossmann. 2001. Collocations régulières et irrégulières  : esquisse de typologie du phénomène collocatif. Revue française de linguistique appliquée VII(1): 1–16.

Vanderwende, Lucy. 1994. Algorithm for automatic interpretation of noun sequences. In Proceedings of the 15th conference on Computational linguistics, vol. 2, 782–788.

Villoing, Florence. 2002. Les mots composés [VN]N/A du français: réflexions épistémologiques et propositions d’analyse. Doctoral dissertation, Université Paris X.

Villoing, Florence. 2003. Les mots composés VN du français: arguments en faveur d’une construction morphologique. Cahiers de grammaire 28: 183–196.

Villoing, Florence. 2009. Les mots composés VN. Aperçus de morphologie du français. 175–197.

Wälchli, Bernhard. 2005. Co-compounds and natural coordination. New York: Oxford University Press.

Warren, Beatrice. 1978. Semantic patterns of noun-noun compounds. Göteborg: Acta Universitatis Göthoburgensis.

Weiskopf, Daniel A. 2007. Compound Nominals, Context, And Compositionality. Synthese 156(1): 161–204.

Williams, Edwin. 1981. On the Notions “Lexically Related” and “Head of a Word.” Linguistic Inquiry 12(2): 245–274.

Wisniewski, Edward J. 1996. Construal and similarity in conceptual combination. Journal of Memory and Language 35: 434–453.

Wisniewski, Edward J. 1997. When concepts combine. Psychonomic Bulletin & Review 4(2): 167–183.

Wisniewski, Edward J. 1998. Property instantiation in conceptual combination. Memory & Cognition 26(6): 1330–1347.

Wisniewski, Edward. J. and Emily J. Clancy. 2004. You don’t need a weatherman to know which way the wind blows: The role of discourse context in conceptual combination. Unpublished manuscript. Cited in Storms and Wisniewski 2005.

Wisniewski, Edward J. and Bradley C. Love. 1998. Relations versus Properties in Conceptual Combination. Journal of Memory and Language 38(2): 177–202.

Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC ‘08), 1646–1652. European Language Resources Association.

Zwanenburg, Wiecher. 1992. Compounding in French. Rivista di linguistica 4(1): 221–240.

Page 342: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

328

Zwitserlood, Pienie. 1994. The role of semantic transparency in the processing and representation of Dutch compounds. Language and Cognitive Processes 9(3): 341–368.

Lexicographic Resources and Other References

Guillaumin, M., ed. 1839. Dictionnaire du commerce et des marchandises. Vol. 2. Paris: Guillaumin et Compagnie.

Jardins de France. 1846. Annales de la Société royale d’horticulture de Paris, et Journal spécial de l’état et des progrès du jardinage. Vol. 37. Paris: Société d’horticulture.

Lacroix, Eugène, ed. 1884. Études sur l’Exposition de 1878, annales et archives de l’industrie au XIXe siècle. Vol. 2. Paris: Librairie scientifique, industrielle et agricole.

Littré, Emile, ed. 1873. Dictionnaire de la langue française contenant: la nomenclature, la grammaire, la signification des mots, la partie historique, l’étymologie. 4 vols. Paris: Hachette.

Martin, Shannon E. and David A. Copeland. 2003. The Function of Newspapers in Society: A Global Perspective. Westport, CT: Praeger.

Nysten, Pierre Hubert, Émile Littré, Charles Robin. 1858. Dictionnaire de médecine, de chirurgie, de pharmacie, des sciencies accessoires et de l’art vétérinaire. Paris: J.B. Baillière et Fils.

Panckoucke, Charles-Louis-Fleury, ed. 1821. Dictionaire des sciences médicales. Vol. 56. Paris: C. L. F. Panckoucke.

Rey-Debove, Josette, and Alain Rey, eds. 2010. Le nouveau Petit Robert 2010: Dictionnaire alphabétique et analogique de la langue française. Digital Version. Bureau van Dijk.

De Roujoux, M. 1839. Histoire des Rois et des Ducs de Bretagne. Vol. 3. Paris: Duféy.

Rozier, François. 1793. Cours complet d’agriculture théorique, practique, économique et de médecine rurale et vétérinaire: suivi d’une méthode pour étudier l’Agriculture par principes ou Dictionnaire universel d’agriculture. Paris: Librairie d’Éducation et des Sciences et Arts.

Tessier, Alexandre Henri, Auguste Denis Fougeroux de Bondaroy, André Thouin, Louis-Augustin-Guillaume Bosc, and Jacques Joseph Baudrillart. 1787. Encyclopédie méthodique: Agriculture. Vol. 1. Paris: Panckoucke.

Le Trésor de la Langue Française informatisé. ATILF. http://atilf.atilf.fr/

Wiktionary. The Wikimedia Foundation. http://www.wiktionary.org

Page 343: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

329

Appendices

Appendix A - Sample SRI calculations for various N-X templates

See Section 7.3.1.1 for a discussion on how the following data was used.

N1 N2 Relation Paraphrase # of Types SRI

effet

retard, revenu, trame, chaîne, papillon, placébo, revenu

CAUSE-REV H caused by M 7 0.438

boomerang, balançoire, domino, dynamo, pyjama, rebond

SIMILARITY H similar to M 6 0.375

bœuf, monstre Adjectival H ADJ M 2 0.125

tunnel (metaph) PRODUCTION H produces M 1 0.063 Total # of Types/Average SRI 16 0.352

carte1 soleil, vue, adresse LOCATION-REV H has M on/in it 3 0.176 carte1 cadeau, index, lettre FUNCTION H functions as M 3 0.176 carte1 mer, réponse PURPOSE H for (*) M 2 0.118 carte2 météo, radar TOPIC H about M 2 0.118 carte1 senior USE-REV H that M uses 1 0.059 carte3 fille, mère SIMILARITY H similar to M 2 0.118 carte3 tuner, mémoire FUNCTION H functions as M 2 0.118 carte3 son, vidéo PRODUCTION H produces M 2 0.118

Total # of Types/Average SRI 17 0.135

sauce

soja, tomate, arachide, câpres, feuilles, graine, gombo

SOURCE H made from M 7 0.500

madère, moutarde PART-REV H that M is a part of 2 0.143

poivrade, carbonara, poulette HYPERNYM H that M is a type of 3 0.214

mousseline SIMILARITY H is similar to M 1 0.071

barbecue PURPOSE H for M 1 0.071 Total # of Types/Average SRI 14 0.327

Page 344: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

330

N1 N2 Relation Paraphrase # of Types SRI bateau citerne, feu, pompe PART-REV H that M is a part of 3 0.231

pirate USE-REV H that M uses 1 0.077

lavoir, pousseur, bus, phare FUNCTION H that functions as M 4 0.308

mouche --- --- 1 0.077

pilote PURPOSE H for M 1 0.077

dragon, mère SIMILARITY H that is similar to M 2 0.154

école (N1) PURPOSE H for M 1 0.077 Total # of Types/Average SRI 13 0.195

voiture guérite, radio,

couchettes, lits, poubelles

PART-REV H of which M is a part 5 0.385

balai, pilote FUNCTION H that functions as M 2 0.154 bar, restaurant,

salon, école LOCATION-REV H in which M is located 4 0.308

sport, poste PURPOSE H for M 2 0.154 Total # of Types/Average SRI 13 0.290

Page 345: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

331

Appendix B – Comparison of compound relations in the literature

Relations entirely enclosed within parentheses indicate that it is implied by the author. Not included in this table are: Jespersen (1942) and Hatcher (1960).

Relation Adams (1973) Downing (1977) Levi (1978) Agentive (S-V) Subject-Verb (6 sub-classes) X ACT-Subj

Complement (V-O) Verb-Object (7 sub-classes) X ACT-Obj; Agent

Goal X X

Attribute X X X

Classification Names X

Coordination X Half-Half (BE)

Identity Appositional (C) X BE

Hypernymy Appositional (B1, B2) X BE

Resemblance Resemblance (8 sub-classes); Form (B) Comparison (BE)

Similarity Resemblance (B, D1) Comparison (BE)

Part-Whole Associative (A1, C1) Whole-Part; Part-Whole HAVE

Composition Compostion (A1, A2, A3, C1) Composition MAKE2

Possession Associative (A2, A3, C2) X HAVE

Cause Instrumental (B3, D2) X CAUSE

Make/Produce X Product MAKE1

Result X X X

Source Associative (C3); Instrumental (D3) Source FROM

Instrumental Instrumental (13 sub-classes) User USE

Location Locative (8 sub-classes) Place IN

Contents Contents (C2) (Place) IN

Temporal Locative (8 sub-classes) Time IN

Purpose X Purpose FOR

Function Appositional (A) X X

Manner X X X

Means Instrumental (D1) X X

Topic X X ABOUT

Occupation X Occupation X

Prevent/Protect Instrumental (B1, B2) X (FOR)

Other

Page 346: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

332

Relation Warren (1978) Shoben (1991) Vanderwende (1994) Agentive (S-V) X X Who/What?

Complement (V-O) X X Whom/What?

Goal Goal-Object X X

Attribute X X X

Classification Proper Names X X

Coordination (Dvandva) (is) X

Identity Copula: Attributive is (What kind of?)

Hypernymy Copula: Subsumptive is What kind of?

Resemblance Resemblance X X

Similarity Resemblance; Copula: Like X X

Part-Whole Whole-Part; Part-Whole has

What is it a part of?/What are its parts?

Composition Source-Result made of Made of what?

Possession Possessor-Belonging has Whose?

Cause Causer-Result causes What does it cause?/What causes it?

Make/Produce (Causer-Result) makes X

Result X X X

Source Origin-Object; Source-Result derived from X

Instrumental

uses/used by How?

Location Location-Object located Where?

Contents Place-Object; (Purpose) located (Where?)

Temporal Time-Object X When?

Purpose Purpose for What for?

Function X X X

Manner X X X

Means Motive Power-Result X X

Topic X about X

Occupation X X X

Prevent/Protect X X X

Other

Page 347: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

333

Relation Lauer 1995 Rosario and Hearst 2001 Arnaud 2003160 Agentive (S-V) V-Subj Subject bn, if

Complement (V-O) V-Obj Object bb

Goal X X X

Attribute X Characteristic; Attribute; Property ap

Classification X X ar, ce, ay

Coordination BE X X Identity BE X X

Hypernymy BE X X Resemblance X X ja

Similarity X X X

Part-Whole OF X af, bv, bh, ka Composition OF Material aw, ca, ia, ib, ax

Possession OF X dd Cause (WITH) Cause (ad)

Make/Produce (FOR) Produce de, ba, an Result X X X

Source FROM X ad, ie Instrumental WITH Instrument (aq)

Location AT Location aa, da, bw, id, dc, bf, bz Contents (IN; AT) X ab, au

Temporal IN; ON Time; Frequency cj, bj, by Purpose FOR Purpose al, cb

Function X X X

Manner X X X Means X (Instrument) aq

Topic ABOUT Topic bx Occupation X X X

Prevent/Protect X X ak, aj

Other

Medecine: Procedure, Defect, Measure of, Inhibitor, etc. Support: bi, db, ic

160

Arnaud assigns each semantic relation an arbitrary code consisting of pairs of letters. They are retained here in the interest of space, but may easily be cross-referenced using Table 22 (65-67) in Arnaud (2003).

Page 348: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

334

Relation Moldovan et al. 2004

/Girju et al. 2005 Girju et al. 2007 Seaghdha 2008

Jackendoff 2010

Agentive (S-V) Agent X (ACTOR)/(INSTR) X

Complement (V-O) X X (ACTOR)/(INSTR) Argument

Goal Recipient X X X

Attribute Attribute-Holder X HAVE - Property X

Classification X X X CLASSIFY

Coordination X X X BE

Identity X X BE - Identity X

Hypernymy Is-A (Hypernymy) X (BE - Identity) KIND-OF

Resemblance X X (BE - Similarity) SAME/SIMILAR

Similarity X X BE - Similarity X

Part-Whole Part-Whole Part-Whole HAVE - Part-Whole PART-OF

Composition X BE - Substance/Form COMPOSED-OF

Possession Possession X HAVE - Possession HAVE

Cause Cause Cause-Effect (ACTOR)/(INSTR) CAUSE

Make/Produce Make/Produce Product-Producer (ACTOR)/(INSTR) MAKE

Result Result X X X

Source Source Origin-Entity X MADE-OF

Instrumental Instrument Instrument-Agency INSTRUMENT X

Location Location/Space X IN - Spatially located BE-LOC

Contents (Location/Space) Content-Container IN - Spatially located (BE-LOC)

Temporal Temporal X IN - Temporal BE-LOC-TEMP

Purpose Purpose X X (Proper Function)

Function X X X SERVES-AS

Manner Manner X X X

Means Means X X X

Topic Topic Theme-Tool ABOUT X

Occupation X X X X

Prevent/Protect X X X PROTECT-FROM

Other Kinship HAVE - Group member

Page 349: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

App

endi

x C

– P

artia

l sch

ema

of W

iktio

nary

’s d

atab

ase

stru

ctur

e

Gre

y ou

tline

s ind

icat

e ta

bles

use

d to

con

stitu

te d

atas

et o

f Fre

nch

com

poun

ds a

s des

crib

ed in

Cha

pter

3.

335

Page 350: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

336

Appendix D - NN French compounds retained from Wiktionary

All 729 NN French Compounds retained from Wiktionary. Compound Head Relation Tropes N1 SRI Val N2 SRI Val abri-vent N1 purpose

abricot-pêche N1 resemblance abricotier-pays N1 location accordéon musette N1 similarity

0.250 action éclair N1 similarity Metaph N2

action reflet N1 similarity Metaph N2 adjudant-chef N1 adjective

1.000

adresse email N1 purpose agora-phobie N2 cause-rev aller-retour N1/N2 coordination alphabet hindi N1 use-rev aluminium-épidote N2 part-rev âme sœur N1 similarity Metaph N2

ami-ami Exo --- amiante-ciment N2 part-rev ampli-syntoniseur N1/N2 coordination anacardier cajou N1 production analyste-programmeur N1/N2 coordination âne-zèbre N1/N2 coordination animal-garou N1/N2 coordination

0.990 année-homme Exo ---

année-lumière Exo ---

1.000 appareil photo N1 production

appui-main N1 purpose

1.000 appui-pied N1 purpose

1.000 0.250

appui-pot N1 purpose

1.000 appui-tête N1 purpose

1.000

arc-doubleau N1 hypernymy argent-métal N1 hypernymy-rev arginine-vasopressine N2 part-rev arrêt maladie N1 cause-rev arrêt-buffet N1 purpose art-thérapie N2 use article zéro N1 adjective

0.250 artiste plasticien N1/N2 coordination

assurance-chômage N1 purpose

0.333 assurance-emploi N1 purpose

0.667

assurance-vie N1 purpose

0.667 attaché-case N2 use-rev

aube-vigne N2 --- auteur-compositeur N1/N2 coordination auto-car N2 ---

0.250 auto-école N2 purpose Meton N1 0.250 0.833

auto-mitrailleuse N1 part-rev

0.250 auto-stop N2 argument

0.250

avantage choc N1 --- Metaph N2 avion-cargo N1 function

bâbord amure N1 location-rev

Page 351: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

337

Compound Head Relation Tropes N1 SRI Val N2 SRI Val bain-douche N1/N2 coordination

bal musette N1 part-rev

0.500 baladeur radio N1/N2 coordination

0.250

balai-brosse N2 part baleine tueuse N1 similarity ballon-panier Exo location ballon-sonde N1 function banane plantain N1 hypernymy bande mère N1 similarity Metaph N2

0.583

bande-annonce N1 function bar-tabac N1 purpose baryton-basse N1/N2 coordination baryum-orthose N2 part-rev bateau pilote N1 purpose

0.125 0.250 bateau pousseur N1 function

0.375

bateau-bus N1 function

0.375 bateau-dragon N1 resemblance

0.125

bateau-école N2 purpose Meton N1 0.125 0.833 bateau-lavoir N1 function

0.375

bateau-mère N1 similarity Metaph N2 0.125 0.583 bateau-mouche N1 ---

0.125 0.750

baume copalme N1 source bébé éprouvette N1 location bec-figue Exo --- Meton N1

belote contrée N1 --- benne-kangourou N1 --- beurre noisette N1 resemblance Meton N1/N2

biens-fonds N1/N2 coordination bistro-brasserie N1/N2 coordination blanc-seing Exo location-rev Meton N1

bleu charrette N1 location

0.250 bleu charron N1 use-rev

0.250

bleu ciel N1 resemblance Meton N2 0.500 bleu horizon N1 resemblance Meton N2 0.500 bloc-cylindres N1 part-rev

0.250

bloc-eau N1 location-rev Meton N2 0.500 bloc-moteur N1 function

0.250

bloc-note N1 location-rev Meton N2 0.500 bœuf carotte N1 part-rev

0.333

bœuf mode N1 adjective

0.333 bœuf-carotte Exo ---

0.333

bois-chandelle N1 source-rev

1.000 boîtier adaptateur N1 function

bombe aérosol N1 production bonus malus N1/N2 coordination borne-fontaine N2 resemblance boule-dogue N2 --- bourg-épine Exo --- bourgeois-bohème N1/N2 coordination bout-dehors N1 location bouton-pression N1 use bracelet-montre N2 part-rev brin sens N1 --- broue-pub N2 purpose

Page 352: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

338

Compound Head Relation Tropes N1 SRI Val N2 SRI Val bureau-chef N1 similarity

1.000

buse bondrée N1 hypernymy café crème N1 part-rev

0.143 café-bar N1/N2 coordination

0.429 0.500

café-bistro N1/N2 coordination

0.429 café-comptoir N1 use

0.143

café-concert N1 location-rev

0.143 café-filtre N1 production-rev

0.143

café-théâtre N1/N2 coordination

0.429 caisse-palette N1/N2 coordination

calcium-autunite N2 part-rev calcium-pyromorphite N2 part-rev camion-citerne N1 part-rev camping-car N2 purpose canapé-lit N1/N2 coordination canard colvert N1 hypernymy canne flèche N1 source-rev cap-mouton Exo --- capital-risque N1 --- capitan-pacha N2 --- capsule-congé N1 location-rev carbonate-apatite N2 composition cardinal diacre N1/N2 coordination cargo-dortoir N1 function carte fille N1 similarity Metaph N2 0.250

carte mère N1 similarity Metaph N2 0.250 0.583 carte soleil N1 location-rev Meton N2 0.250

carte tuner N1 function

0.500 carte-cadeau N1 function

0.500

carte-index N1 function

0.500 carte-lettre N1 function

0.500

carte-vue N1 location-rev Meton N2 0.250 carton-index N1 function

0.333

carton-pâte N1 source

0.667 carton-pierre N1 resemblance

0.667

cas régime N1 topic cas sujet N1 topic case départ N1 location céleri-rave N1/N2 coordination cellule assistante N1 function

1.000 cellule hôte N1 function

1.000

cellule souche N1 function

1.000 centimètre cube N1 adjective

centre-ville N1 location cercle unité N1 --- césium-analcime N2 composition cession-bail N1 purpose chapeau melon N1 resemblance Meton N2

chargeuse-pelleteuse N1/N2 coordination châssis-support N1 function chat serval N1 hypernymy

0.250 chat-château Exo ---

0.250

chat-garou N1/N2 coordination

0.250 0.990 chat-tigre N1 resemblance Meton N1/N2 0.250

Page 353: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

339

Compound Head Relation Tropes N1 SRI Val N2 SRI Val chaussure bateau N1 purpose Meton N2

chef magistrat N2 coordination

0.667 chef-lieu N2 similarity

0.667

chef-mets Exo ---

0.333 chef-mois Exo ---

0.333

chêne kermès N1 location-rev

0.250 chêne-gomme N1 part-rev

0.250

chêne-liège N1 source-rev

0.250 chêne-pommier N1 resemblance

0.250

chèque-repas N1 purpose chèque-vacances N1 purpose cheval-vapeur Exo --- chevêche brame N1 --- chèvre-pied Exo part-rev

0.250 chien-cerf N1 ---

0.200

chien-garou N1/N2 coordination

0.200 0.990 chien-loup N1 resemblance

0.200

chien-nid Exo --- Metaph N2 0.200 chien-rat Exo coordination

0.200

chiffre-taxe Exo topic chou-croûte N1 ---

0.167 chou-fleur N1 resemblance

0.333

chou-navet N1 resemblance

0.333 chou-palmiste Exo part Metaph N1 0.167 chou-rave N1/N2 coordination

0.167

chou-vache N1 purpose

0.167 chouette chevêche N1 hypernymy

1.000

chouette effraie N1 hypernymy

1.000 chouette harfang N1 hypernymy

1.000

ciné-club N2 topic ciné-parc N2 location-rev circuit tampon N1 function

0.800 clé crocodile N1 resemblance Meton N1/N2

clé lavabo N1 purpose client-cible N1 function Metaph N2

1.000

clin-foc N2 --- cobalt-mica N1 resemblance cocotte-minute N1 --- Metaph N2

code machine N1 use-rev Meton N2 0.333 code source N1 location Metaph N2 0.333 code-barres N1 composition

0.333

colin-tampon Exo ---

0.200 colis-route N1 location

colloid-calcite N2 hypernymy comédie-ballet N1/N2 coordination commissaire-priseur N1/N2 coordination compère-loriot Exo --- compte utilisateur N1 use-rev conducteur fantôme N1 --- consommateur cible N1 function Metaph N2

1.000

contrôle-commande N1/N2 coordination coq faisan N1 hypernymy

0.667 coq-héron N1 hypernymy

0.667

coq-souris Exo ---

0.333

Page 354: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

340

Compound Head Relation Tropes N1 SRI Val N2 SRI Val côté cour N1 location

côté jardin N1 location coton-poudre N1 function coton-tige N1 location couche-culotte N1/N2 coordination coupe colonel Exo --- Meton N1

courtier négociant N1/N2 coordination coussin péteur N1 function coût cible N1 function Metaph N2

1.000

couteau éplucheur N1 function crédit-bail N1 function crème fleurette N1 --- Metaph N2

croiseur-école N1 use-rev

0.167 cueilleuse-égreneuse N1/N2 coordination

cul-rousselet Exo --- culotte garçonne N1 use-rev danse-poteau N1 use daphné garou N1/N2 hypernymy

0.910 date butoir N1 function Metaph N2

date limite N1 adjective daurade coryphène N1 hypernymy débat-spectacle N1/N2 coordination dent œillère N1 --- Meton N2

député-maire N1/N2 coordination désintégration alpha N1 classify

1.000 disque vinyle N1 composition

distance-temps N1 --- dose limite N1 adjective drainage-taupe N1 production-rev drap-housse N1/N2 coordination duché-pairie N1/N2 coordination eau mère N1 --- Metaph N2

0.830

eaux-vannes N1 location Meton N2 écart type N1 adjective

échange cambiste N1 production-rev écho fantôme N1 similarity Metaph N2

écho mirage N1 similarity Metaph N2 écrevisse signal N1 part-rev Metaph N2 effet papillon N1 cause-rev Metaph C électron-volt Exo ---

élément formant N1 function emballage-bulle N1 composition emballage-coque N1 hypernymy émission-débat N1 part-rev épinard-fraise Exo coordination épreuve minute N1 time équivalent lait N1 composition erreur système N1 cause-rev espace-boutique N1 location espace-temps N1/N2 coordination espèce cible N1 function Metaph N2

1.000

étage vernier N1 function étalon-or N1 use expert-comptable N2 adjective

Page 355: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

341

Compound Head Relation Tropes N1 SRI Val N2 SRI Val facteur sigma N1 classify

fan-club N2 composition femme-renarde N1/N2 coordination fermeture éclair N1 similarity Metaph N2

fibre-cellule Exo coordination fiducie-sûreté N1 use filet-poubelle N1 function fille-mère N1/N2 coordination

0.167 film-fleuve N1 similarity Metaph N2

fleur-feuille N1 resemblance fluor calcium N1 composition focalisation zéro N1 adjective

0.500 format coquille N1 location-rev Meton N2

fougère aigle N1 part-rev Metaph N2 fourmi-lion N1 similarity Metaph N2 franc métro N1 --- Meton N2 fréquence radio N1 use-rev

0.250

fric-frac Exo --- fusée-sonde N1 function gaine-culotte N1/N2 coordination gaz hydrogène N1 composition

0.750 gaz oxygène N1 composition

0.750

gaz propulseur N1 function

0.250 gaz sarin N1 composition

0.750

gène chimère N1 similarity gène suppresseur N1 function gentilhomme verrier N1/N2 coordination gin rami N2 --- gomme-résine N1/N2 coordination gorfou macaroni N1 resemblance Metaph N2

gorfou sauteur N1 similarity gorge-fouille Exo --- gouet serpentaire N1 hypernymy grandeur nature N1 adjective grave-ciment N1/N2 coordination grenouille-taureau N1 similarity Meton N1/N2

grille écran N1 part groupement phosphate N1 argument guerre proxy N1 use guillemet-apostrophe N1/N2 coordination halte-garderie N2 --- heure-lumière Exo ---

1.000 hippocampe feuille N1 resemblance

homme-fourmi N1 similarity Metaph N2 1.000 homme-grenouille N1 resemblance

1.000

homme-sandwich N1 similarity Metaph N2 1.000 horloge pointeuse N1 function

hôtel-dieu Exo ---

Metaph N1 Meton N2

hôtellerie-restauration N1/N2 coordination houx-frelon N1 hypernymy huppe-col Exo coordination Meton N1

image-gradient N1 production-rev info-ballon N2 location-rev

Page 356: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

342

Compound Head Relation Tropes N1 SRI Val N2 SRI Val info-bulle N2 location-rev

jambon beurre Exo coordination jardin verger N1 function jaune paille N1 resemblance Meton N2

jazz manouche N1 production-rev jour-homme Exo --- jour-lumière Exo ---

1.000 jupe-culotte N1/N2 coordination

kilogramme-force Exo --- kilomètre-heure Exo --- lac-laque Exo composition Metaph N1

laine mère N1 source Metaph N2 0.333 0.830 laine pelade N1 --- Metaph N2 0.333

laine renaissance N1 --- Metaph N2 0.333 laitue asperge N1 resemblance Meton N1

lancer arbalète N1 similarity langage machine N1 use-rev Meton N2

lapin chasseur N1 --- Meton N1 lapin-garou N1/N2 coordination

0.990

larve échinocoque N1 hypernymy laurier sauce N1 source-rev Meton N1 0.500

laurier-cerise N1 production

0.250 laurier-tarte N1 source-rev Meton N1 0.500 laurier-tin N1 ---

0.250

lecteur-graveur N1/N2 coordination léopard-garou N1/N2 coordination

0.990 lettre patente N1 function

liane-corail N1 resemblance lieutenant-colonel N2 coordination lieutenant-général N2 coordination lime-uranite N2 --- linon-batiste N1 hypernymy lit mezzanine N1 resemblance lit-cage N1 resemblance livret-police N1 location-rev Meton N2

location-financement N1/N2 coordination locution-phrase N1/N2 coordination logiciel antivirus N1 function

1.000 logiciel espion N1 function

1.000

logiciel médiateur N1 function

1.000 logiciel-socle N1 function

1.000

lord-lieutenant N1/N2 coordination

1.000 lord-maire N1/N2 coordination

1.000

loup-garou N1/N2 coordination

0.990 lucilie bouchère N1 similarity Metaph N2

macareux moine N1 resemblance machin-chose N1/N2 coordination machine-outil N1 use magasin phare N1 similarity Metaph N2

magasin-pilote N1 function

0.250 mail-coach N2 purpose

maison-mère N1 similarity Metaph N2

0.583 maître-autel N2 adjective

0.200

maître-chanteur N2 adjective

0.200

Page 357: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

343

Compound Head Relation Tropes N1 SRI Val N2 SRI Val maître-cylindre N2 adjective

0.200

maître-nageur N2 adjective

0.200 maîtresse femme N1/N2 coordination

mâle alpha N1 classify

1.000 malle-poste N1 purpose

manche pagode N1 resemblance manchot antipode N1 location

0.333 manchot empereur N1 similarity Metaph N2 0.667 manchot pygmée N1 resemblance

0.667

mandat-poste N1 use marche-palier N1 part margis-chef N1 adjective

1.000 marteau-pilon N1 part-rev

martre-zibeline N1 hypernymy marxiste-léniniste N1/N2 coordination médecin légiste N1/N2 coordination médicament conseil N1 --- mémoire cache N1 function

0.667 mémoire flash N1 adjective Metaph N2 0.333 mémoire tampon N1 function

0.667 0.800

menthe pouliot N1 hypernymy menthe-coq N1 similarity Metaph N2

menuisier-moulurier N1/N2 coordination mère maquerelle N1/N2 coordination Metaph N1

merisier-pays N1 location mètre cube N1 adjective minute-lumière Exo ---

1.000 mode paysage N1 resemblance

mode portrait N1 resemblance mois-homme Exo --- mois-lumière Exo ---

1.000 moissonneuse-batteuse N1/N2 coordination

molécule hôte N1 function monsieur-dame Exo coordination mont-joie Exo --- montre-bracelet N1 part-rev mort-chien Exo argument mot vedette N1 location

0.200 mot-clé N1 similarity Metaph N2 0.600 mot-obus N1 similarity Metaph N2 0.600 mot-outil N1 similarity Metaph N2 0.600 mot-valise N1 similarity Metaph N2 0.200 moteur vernier N1 function

moteur-fusée N1 part moto-école N2 purpose Meton N1

0.833

mouche araignée N1 resemblance Meton N1/N2 mouche-scorpion N1 resemblance Meton N1/N2 moucheron piqueur N1 function

mouette pygmée N1 resemblance moustique tigre N1 resemblance Meton N1/N2

mule-jenny N2 similarity Metaph N1 mur-rideau N1 similarity Metaph N2 navire-citerne N1 part-rev

0.333

navire-école N1 purpose

0.333 0.167

Page 358: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

344

Compound Head Relation Tropes N1 SRI Val N2 SRI Val navire-mère N1 similarity Metaph N2 0.333 0.583 newton-mètre N1 ---

noctuelle gamma N1 part-rev Metaph N2

0.200 nœud papillon N1 resemblance

nœud-nœud N1 --- noix-chandelle N1 function nord-est N1/N2 coordination nord-ouest N1/N2 coordination œuf mimosa N1 resemblance Meton N1/N2

œuf-coque N1 location oiseau-chameau N1 resemblance

0.500 oiseau-cloche N1 similarity

0.250

oiseau-lyre N1 part-rev Metaph N2 0.250 oiseau-mouche N1 resemblance

0.500 0.750

ombre-chevalier N1 --- onde radio N1 use-rev

0.250 or-sol Exo ---

orchestre musette N1 part-rev

0.500 orienteur-marqueur N1/N2 coordination

ours-garou N1/N2 coordination

0.990 page web N1 location Metaph N1

pal-fer N1 composition palmier-dattier N1 hypernymy palpe-mâchoire N1 location panier-repas N2 location panthère-garou N1/N2 coordination

0.990 papa-gâteau N1 ---

papier bible N1 purpose

0.910 papier brouillard N1 resemblance

0.910

papier calque N1 purpose

0.910 papier carbone N1 part-rev

0.910

papier japon N1 production-rev Meton N2 0.910 papier kraft N1 hypernymy

0.910

papier maïs N1 source

0.910 papier toilette N1 purpose

0.182

papier-cul N1 purpose

0.182 papier-filtre N1 function

0.182

papier-monnaie N1 function

0.182 papillon dauphin Exo coordination Meton N1/N2

papy-boom N2 argument paquet-cadeau N1/N2 coordination parc relais N1 function parking-relais N1 function participation-pari N1 --- participe présent N1 topic particule alpha N1 classify

1.000 1.000 particule bêta N1 classify

1.000

particule gamma N1 classify

1.000 0.800 passage piétons N1 purpose

pause-café N1 purpose pause-carrière N1 time peptide signal N1 function persan dari N1 hypernymy pétrolier-minéralier N1/N2 coordination

Page 359: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

345

Compound Head Relation Tropes N1 SRI Val N2 SRI Val phage transducteur N1 function

phosphate-allophane N2 composition photo-identification N2 use

0.250 photo-interprétation N2 argument

0.750

photo-interprète N2 argument

0.750 photo-montage N2 argument

0.750

photolyse éclair N1 cause-rev pie-mère Exo ---

0.830 pied cube N1 adjective

pieds-paquets Exo location pierre miel N1 resemblance Meton N1/N2

pierre ponce N1 hypernymy piétin-échaudage N1 cause piétin-verse N1 cause pigeon voyageur N1 similarity pince crocodile N1 resemblance Meton N2

piqueur-suceur N1/N2 coordination pitaine clés N1 production plan cul N1 adjective Metaph N2 0.333

plan médias N1 use

0.333 plan-séquence N1 composition

0.333

plante-crayon N1 resemblance Meton N1 plante-éponge N1 hypernymy

plateforme bus N1 --- pneu contact N1 use poche-cuiller Exo --- poche-revolver N1 location-rev poids coq N1 argument Metaph C 0.800

poids mouche N1 argument Metaph C 0.800 0.250 poids paille N1 argument Metaph C 0.800

poids plume N1 argument Metaph C 0.800 poids welter Exo hypernymy

0.200

point presse N1 location-rev

0.333 point zéro N1 location

0.333 0.250

point-virgule N1/N2 coordination

0.333 poisson fourrage N1 function

0.143

poisson soleil N1 ---

0.143 poisson-chat N1 resemblance Meton N1/N2 0.286 poisson-épée N1 part-rev Metaph N2 0.286 poisson-évêque Exo resemblance

0.286

poisson-pilote N1 similarity Metaph N2 0.143 0.500 poisson-sabre N1 part-rev Metaph N2 0.286

pomme cajou N1 production-rev Metaph N1 pomme cannelle N1 similarity Metaph N1 pont-bascule N1 part-rev

pont-canal N1 location pop-punk N1/N2 coordination porte papillon N1 resemblance Meton N2

porte-fenêtre N1/N2 coordination portrait-robot N1 production-rev Metaph N2

potentiel hydrogène N1 --- pouce-pied Exo ---

0.250 poule faisane N1 hypernymy

poursuite-bâillon N1 function Metaph N2

Page 360: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

346

Compound Head Relation Tropes N1 SRI Val N2 SRI Val prés-bois N1 location-rev

prince-président N1/N2 coordination programme-cadre N1 function promotion canapé N1 use Meton N2

pulvérisateur-mélangeur N1/N2 coordination punaise-mouche N1 resemblance

1.000 0.750 punk rock N1/N2 coordination

quark beauté N1 --- quark charme N1 --- quarte-fagot Exo --- quartier-maître N2 --- quartz morion N1 hypernymy quartz prase N1 hypernymy question piège N1 function Metaph N2

quinte flush N1/N2 coordination raccourci clavier N1 location

radio trottoir Exo --- Metaph N1 Meton N2 0.143

radio-amateur N2 use

0.286 radio-gramophone N1/N2 coordination

0.429

radio-phonographe N1/N2 coordination

0.429 radio-réveil N1/N2 coordination

0.429

radio-taxi N2 use

0.286 radio-télévision N2 use

0.143

radioactivité alpha N1 classify

1.000 1.000 radioactivité bêta N1 classify

1.000

radioactivité gamma N1 classify

1.000 0.800 raie léopard N1 resemblance Meton N1/N2

ramasseuse-presse N1/N2 coordination rat-garou N1/N2 coordination

0.990 raton laveur N1 similarity

rayon alpha N1 classify

1.000 rayon gamma N1 classify

0.800

rayonnement alpha N1 classify

1.000 réception-cadeaux N1 location-rev

reine mère N1/N2 coordination

0.167 reine-marguerite N2 similarity Metaph N1

renouée-bambou N1 resemblance Meton N1 répondeur-enregistreur N1/N2 coordination

réponse type N1 adjective requin-baleine N1 resemblance requin-marteau N1 part-rev Meton N1/N2

restaurant-bar N1/N2 coordination

1.000 0.500 restaurant-bistro N1/N2 coordination

1.000

restaurant-brasserie N1/N2 coordination

1.000 restaurant-pub N1/N2 coordination

1.000

retour chariot N1 argument retraite-chapeau N1 --- réunion-bilan N1 topic réveil-matin N1 time robe-housse N1 similarity robot mixeur N1 function roche-mère N1 similarity Metaph N2

0.583

roman-feuilleton N1/N2 coordination

0.333

Page 361: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

347

Compound Head Relation Tropes N1 SRI Val N2 SRI Val roman-fleuve N1 similarity Metaph N2 0.333

roman-photo N1 part-rev

0.333 rose noisette N1 ---

rouge sang N1 resemblance Meton N2 roulage-décollage N1/N2 coordination

rouleau compresseur N1 function sabre laser N1 part-rev sac poubelle N1 purpose saisie-arrêt N1/N2 coordination

0.750 saisie-attribution N1/N2 coordination

0.750

saisie-brandon N1 use

0.250 saisie-exécution N1/N2 coordination

0.750

salaire-coût N1/N2 coordination salicaire pourpier N1 hypernymy sapeur-pompier N2 --- sauce soja N1 source sauce tomate N1 source saule amandier N1 resemblance Meton N1/N2 0.500

saule daphné N1 hypernymy

0.500 saule marsault N1 hypernymy

0.500

saule pleureur N1 similarity Metaph N2 0.500 scie égoïne N1 hypernymy

science-fiction N2 topic séchoir-atomiseur N1/N2 coordination secret défense N1 topic semaine-lumière Exo ---

1.000 sénateur-maire N1/N2 coordination

sergent-chef N1 adjective

1.000 sergent-major N2 coordination

serpent roi N1 similarity Metaph N2 serveur mandataire N1 function

service support N1 function serviette-éponge N1/N2 coordination signe moins N1 function signe plus N1 function silence radio N1 location

0.250 silure glane N1 hypernymy

singe hurleur N1 similarity

0.400 singe-araignée N1 resemblance Meton N1/N2 0.600 singe-chouette N1 similarity

0.400

singe-écureuil N1 resemblance Meton N1/N2 0.600 singe-lion N1 resemblance Meton N1/N2 0.600 site internet N1 location Metaph N1 0.750 site intranet N1 location Metaph N1 0.750 site miroir N1 function Metaph N2 0.250 site web N1 location Metaph N1 0.750 société écran N1 function

soie tussah N1 hypernymy sonde lambda N1 classify sorbier alisier N1 hypernymy soutien-gorge N1 purpose souveraineté-association N1/N2 coordination spath fluor N1 composition spectacle solo N1 composition

Page 362: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

348

Compound Head Relation Tropes N1 SRI Val N2 SRI Val station aval N1 location

0.250

station essence N1 purpose

0.500 station pivot N1 function

0.250

station-service N1 purpose

0.500 stock-tampon N1 function

0.800

structure limite N1 adjective stylo gomme N1 function stylo-bille N1 part-rev sud-est N1/N2 coordination sud-ouest N1/N2 coordination support-chaussettes N1 purpose sursaut gamma N1 classify

0.800 table-bureau N1 function

talon aiguille N1 resemblance tamarin lion N1 resemblance Meton N1/N2

tamarin sauteur N1 similarity taupe-grillon N2 similarity taux plafond N1 function Metaph N2

taux plancher N1 function Metaph N2 teinture-mère N1 similarity Metaph N2

0.583 ténia échinocoque N1 hypernymy

tente-abri N1 function terme source N1 --- terre diatomée N1 composition terre-noix Exo --- test-match N2 function test-objet N2 function tête-bêche Exo --- Meton N1

tic-tac Exo --- ticket-restaurant N1 purpose tigre-garou N1/N2 coordination

0.990 timbre-amende N1 purpose

0.333

timbre-poste N1 purpose

0.333 timbre-taxe N1 purpose

0.333

tiret cadratin N1 function tiroir-caisse N1 part tissu-éponge N1/N2 coordination titan-cotte Exo --- tolérance zéro N1 adjective

0.500 tonne-mètre Exo ---

tortue-alligator N1 resemblance Meton N1/N2 tortue-boîte N1 resemblance Meton N1 trachée-artère N1 similarity

train fantôme N1 location Metaph N2 train-train N1 ---

trépan-benne N1 part-rev trique-madame Exo --- trou-madame Exo --- unité monomère N1 composition vache-biche Exo coordination valeur vedette N1 similarity Metaph N2

valse musette N1 use

0.250 valse-hésitation N1 composition Metaph N1

variole cameline N1 ---

Page 363: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

349

Compound Head Relation Tropes N1 SRI Val N2 SRI Val vecteur navette N1 similarity Metaph N2

vélo-école N2 purpose Meton N1

0.833 vendeur négociateur N1/N2 coordination

ventre-madame Exo part Meton N1 vert méthyle N1 source

verveine citronnelle N1 similarity vidéo-achat N2 use

0.800 vidéo-clip N2 part

0.200

vidéo-lynchage N2 use Metaph N2 0.800 vidéo-protection N2 use

0.800

vidéo-surveillance N2 use

0.800 village-rue N1 location

ville-dortoir N1 similarity Metaph N2 violon alto N1 hypernymy

viorne tin N1 --- virus assistant N1 function voiture balai N1 function Metaph N2 0.286

voiture-bar N1 location-rev

0.429 0.500 voiture-couchettes N1 part-rev

0.286

voiture-lits N1 part-rev

0.286 voiture-pilote N1 function

0.286 0.500

voiture-restaurant N1 location-rev

0.429 voiture-salon N1 location-rev

0.429

vote sanction N1 purpose wagon-bar N1 location-rev

0.429 0.500 wagon-citerne N1 part-rev

0.571

wagon-foudre N1 part-rev

0.571 wagon-lit N1 part-rev

0.571

wagon-poche N1 part-rev

0.571 wagon-restaurant N1 location-rev

0.429

wagon-salon N1 location-rev

0.429 yacht-club N2 purpose Meton N1

zéolithe cyanite N1 hypernymy zinc-blende N2 hypernymy zone tampon N1 function

0.800

Page 364: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

350

Appendix E - N à N French Compounds retained from Wiktionary

All 319 N à N French Compounds retained from Wiktionary. Compound Head Relation Tropes N1 SRI Val N2 SRI Val abeille à miel N1 production

abreuvoir à mouches Exo use-rev Metaph N1

0.250 acajou à pommes N1 production Metaph N2

acquit-à-caution N1 --- adents à crémaillère N1 part-rev amortisseur à fluide N1 use ampoule à incandescence N1 use arbalète à jalet N1 use arbre à cames N1 part-rev

0.167 arbre à cornichons N1 production Metaph N2 0.500 arbre à grives N1 location-rev

0.333

arbre à pain N1 production Metaph N2 0.500 arc à poulies N1 part-rev

arme à feu N1 use

0.125 arme à implosion N1 use

armes à enquerre N1 cause armoire à glace N1 part-rev arquebuse à croc N1 part-rev Metaph N2

arquebuse à rouet N1 part-rev autocaravane à cellule N1 part-rev bac à sable N1 purpose baignoire à porte N1 part-rev bail à cheptel N1 purpose bail à complant N1 purpose baleine à bosse N1 part-rev balle à queue N1 part-rev banque à domicile N1 location Meton N1

banque à pitons N1 use Metaph N1 bar à champagne N1 purpose

bar à putes N1 location-rev barbe à papa Exo possession-rev Metaph C

barre à mine N1 purpose bateau à vapeur N1 use Meton N2

1.000

batte à beurre N1 purpose

0.250 batte à feu N1 purpose

0.125

bec à fente N1 part-rev bêtes à cornes N1 part-rev betterave à sucre N1 source-rev bibitte à patates N1 location billet à ordre N1 purpose boîte à camembert N1 purpose

0.857 boîte à cigare N1 purpose

0.857

boîte à gants N1 purpose

0.857 boîte à lettres N1 purpose

0.857

boîte à outils N1 purpose

0.857 boîte à pet N1 purpose Meton N2 0.857 boîte-à-musique N1 production

0.143

bombe à fission N1 use

1.000 bombe à fusion N1 use

1.000

Page 365: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

351

Compound Head Relation Tropes N1 SRI Val N2 SRI Val bombe à hydrogène N1 use

1.000

bombe à neutrons N1 use

1.000 bonnet à prêtre Exo possession-rev Metaph C

bouche à feu Exo production Meton N1/N2

0.125 bouche à oreille Exo --- Meton N1/N2

bouilloire à thé N1 purpose boule à neige N1 location-rev boule à zéro N1 --- Metaph N1

bourse-à-berger Exo possession-rev Metaph C bourse-à-pasteur Exo possession-rev Metaph C broche à glace N1 purpose

brosse à cheveux N1 purpose brosse à dents N1 purpose cabane à sucre N1 production cage à écureuil Exo purpose Metaph N1/N2

canne à mouche N1 purpose Meton N2

0.250 canne à pêche N1 purpose

canne à sucre N1 source-rev canon à électrons N1 purpose canon à neige N1 purpose carte à puce N1 part-rev cassican à collier N1 part-rev Metaph N2

cave à liqueurs N1 purpose chair à canon Exo --- Meton N1

chaise à bascule N1 part-rev chaise à porteurs N1 use-rev chambre à air N1 location-rev chambre à feu N1 location-rev Meton N2

0.250

chambre à gaz N1 location-rev char-à-bancs N1 part-rev charbon à tumeurs N1 cause chardon à foulon N1 use-rev châssis à tabatière N1 similarity chaussette à clous Exo part-rev Metaph N1

cheval à bascule N1 part-rev Metaph N1 cigare à moustache Exo --- Metaph N1/N2 clé à béquille N1 similarity

0.182

clé à bougies N1 purpose

0.910 clé à chaîne N1 part-rev

0.636

clé à cliquet N1 part-rev

0.636 clé à crémaillère N1 part-rev

0.636

clé à douille N1 part-rev

0.636 clé à ergot N1 part-rev

0.636

clé à fourches N1 part-rev

0.636 clé à molette N1 part-rev

0.636

clé à pipe N1 similarity

0.182 clé à pompe N1 use Meton N2 0.910 code à octets N1 composition

compensateur à ressort N1 part-rev compte à rebours N1 --- compte à vue N1 --- condamné à mort N1 argument conduite à risque N1 part-rev conteneur à verre N1 purpose

Page 366: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

352

Compound Head Relation Tropes N1 SRI Val N2 SRI Val copolymère à blocs N1 composition

cornet à bouquin N1 part-rev corps à corps Exo location coton à fromage N1 purpose couleuvre à collier N1 part-rev Metaph N2

course à pied N1 use Meton N2 couteau à beurre N1 purpose

0.250

couteau à poisson N1 purpose couteau à viande N1 purpose croix à degrés N1 location Meton N2

cuiller à dessert N1 purpose

0.500 cuiller à œuf N1 purpose

0.500

cuiller à pot N1 purpose

0.167 cuillère à café N1 purpose

0.333

cuillère à moka N1 purpose

0.333 cuillère à soupe N1 purpose

0.500

diode à vide N1 location-rev

0.833 échange à terme N1 time

échelle à poissons N1 purpose Metaph N1 écrou à créneau N1 part-rev

escalier à vis N1 similarity étable à pourceaux N1 purpose étoile à neutrons N1 composition fabrication à façon N1 --- face-à-face Exo location face-à-main Exo --- Metaph N1

0.200

fauteuil à bascule N1 part-rev fauteuil à voile Exo --- femme à barbe N1 part-rev fenêtre à guillotine N1 similarity fenêtre à tabatière N1 similarity fer à cheval N1 purpose

0.250 fermeture à glissière N1 part-rev

fil à plomb N1 part-rev filet à cheveux N1 purpose filet à provisions N1 purpose fils à papa N1 possession-rev four à chaux N1 production frein à main N1 use-rev

0.600 frein à pédale N1 use

fruit à pain N1 similarity goutte-à-goutte Exo --- groseille à maquereau N1 purpose hache à main N1 use-rev

0.600 herbe à chat N1 purpose

huile à broche N1 purpose if à baies N1 production instrument à cordes N1 part-rev instrument à vent N1 use

0.750 lampe à décharge N1 use

1.000

lampe à huile N1 use

1.000 lampe à incandescence N1 use

1.000

lampe à sodium N1 use

1.000 langage à objets N1 part-rev

Page 367: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

353

Compound Head Relation Tropes N1 SRI Val N2 SRI Val lémurien à fourche N1 part-rev Metaph N2

ligne à intervalles N1 location lit à baldaquin N1 part-rev lit à courtines N1 part-rev locomotive à vapeur N1 use Meton N2

1.000

logiciel à contribution N1 --- machine à café N1 production machine à sous N1 use machine à vapeur N1 use

1.000 main à main Exo ---

0.200

maintien à poste N1 argument manche à air N1 location Metaph N1

manche à balai N1 purpose Metaph C manche-à-balle Exo ---

manchot à jugulaire N1 part-rev Metaph N2 manomètre à écrasement N1 use

marché à terme N1 time marche à vue N1 use marsouin à lunettes N1 part-rev Metaph N2

mise à disposition N1 argument

0.833 mise à jour N1 ---

0.833

mise à niveau N1 argument

0.833 mise à pied Exo argument

0.167

mongol à batteries Exo use Metaph N1 mot à mot Exo ---

moteur à explosion N1 use mouche à miel N1 production moule à manqué N1 purpose moulin à café N1 purpose

0.444 moulin à eau N1 use

0.222

moulin à légumes N1 purpose

0.444 moulin à papier N1 production

0.222

moulin à paroles Exo production Metaph N1 0.222 moulin à poivre N1 purpose

0.444

moulin à prières N1 location-rev

0.111 moulin à sel N1 purpose

0.444

moulin à vent N1 use

0.222 0.750 munition à fragmentation N1 use

mûrier à papier N1 source-rev muscari à grappe N1 part-rev muscari à toupet N1 part-rev Metaph N2

nid à rats Exo purpose Metaph C niveau à bulle N1 use

nom à penture N1 part-rev Metaph N2 œuf à cheval N1 ---

0.250

orange à nombril N1 part-rev Metaph N2 os à moelle N1 part-rev

otarie à crinière N1 part-rev ouvrage à cornes N1 similarity palmier à huile N1 source-rev panier à salade N1 purpose papier à musique N1 purpose Meton N2

parenté à plaisanterie Exo --- passage à faune N1 purpose

0.250

Page 368: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

354

Compound Head Relation Tropes N1 SRI Val N2 SRI Val passage à niveau N1 location

0.250

passage à tabac Exo ---

0.250 passage à vide N1 argument Metaph N2 0.250 0.167

passe à poissons N1 purpose pâte à choux N1 purpose

0.333 pâte à dents N1 purpose

0.167

pâte à papier N1 purpose

0.333 pâte à pet N1 production Meton N2 0.333 pâte à prouts N1 production

0.333

pâte à sel N1 part-rev

0.167 patente à gosse N1 ---

patin à glace N1 purpose patin à roulettes N1 part-rev pêche à feu N1 use Metaph N2 0.750 0.125

pelle à balai N1 --- Metaph N1 pelle à neige N1 purpose

pelle-à-cul Exo purpose Metaph N1 pentode à vide N1 location-rev

0.833

piano à bretelles N1 part-rev Metaph N1 piano à queue N1 part-rev Metaph N2 pièce à conviction N1 purpose

pied à boule Exo location pied à coulisse Exo part-rev Metaph N1

pied-à-terre Exo location piège à cons N1 purpose Metaph N1

pierre à briquet N1 purpose

0.500 pierre à chaux N1 source-rev

0.333

pierre à évier N1 purpose

0.500 pierre à feu N1 cause

0.167 0.125

pierre à fusil N1 purpose

0.500 pierre à plâtre N1 source-rev

0.333

pince à escargots N1 purpose pince à linge N1 purpose planche à dessin N1 purpose

0.500 planche à pain N1 purpose

0.500

planche à roulettes N1 part-rev

0.500 planche à voile N1 part-rev

0.500

planchette à pince N1 part-rev poêle à marrons N1 purpose poire à poudre N1 location-rev Metaph N1

polymère à blocs N1 composition pompe à bicyclette N1 purpose

0.333 pompe à carburant N1 purpose

0.500

pompe à eau N1 purpose

0.500 pompe à huile N1 purpose

0.500

pompe à sodium N1 purpose

0.167 pompe à vélo N1 purpose

0.333

porcelaine à feu N1 ---

0.125 porte-à-porte Exo ---

pot à feu N1 location-rev Meton N2 0.250 0.250 pot à tabac Exo purpose Metaph C 0.250

poudre à canon N1 purpose propulseur à liquide N1 use puce à gènes N1 composition

Page 369: Toward a Typology of Semantic Transparency: The …...The typology of semantic transparency of compounds proposed in this work consists of four basic factors supported by a dataset

355

Compound Head Relation Tropes N1 SRI Val N2 SRI Val punk à chien N1 possession

râteau à feuilles N1 purpose roue à aubes N1 part-rev rouge à lèvres N1 purpose Meton N1

rouleau à pâtisserie N1 purpose roulement à billes N1 use sac à dos N1 location sac à main N1 use-rev

0.600 sagne à tamis Exo ---

sanglier à verrues N1 part-rev saut à ski N1 use savonnette à vilain Exo purpose Metaph N1

scannage à domicile N1 location scie à métaux N1 purpose scie à ruban N1 part-rev séparateur à œuf N1 purpose serpent à lunettes N1 part-rev Metaph N2

serpent à plumes N1 part-rev Metaph N1 serpent à sonnette N1 part-rev Metaph N2 serrure à bosse N1 part-rev

soda à pâte Exo --- soffite à caissons N1 part-rev station à essence N1 purpose steak à cheval N1 ---

0.250 stylo à bille N1 part-rev

tapette à mouche N1 purpose

0.250 tapisserie à personnages N1 part-rev

tasse à vin N1 purpose télévision à péage N1 use Meton N1

terre-à-terre Exo --- tête à claques N1 cause Meton N1 0.400

tête à gifle N1 cause Meton N1 0.400 tête à perruque N1 purpose

0.200

tête-à-queue Exo ---

0.200 tête-à-tête Exo location

0.200

tétrode à vide N1 location-rev

0.833 train à vapeur N1 use Meton N2

1.000

triode à vide N1 location-rev

0.833 trombone à coulisse N1 part-rev

trombone à pistons N1 part-rev tube à décharge N1 use

0.250 tube à essai N1 purpose

0.250

tube à vide N1 location-rev

0.500 0.833 tueur à gages N1 ---

tuteur à roulettes N1 part-rev Metaph N1 usine à gaz N1 production

vache à lait N1 production

0.200 valet-à-patin N1 ---

ventre à choux Exo --- Meton N1 ver à soie N1 production

vielle à roue N1 part-rev voiture à bras N1 use vol à voile N1 use

0.143 voyage à forfait N1 ---