Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery...

82
1/22 Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz 1 , Enrico Santus 2 and Dominik Schlechtweg 3 1 Bar-Ilan University, 2 Singapore University of Technology and Design, 3 University of Stuttgart EACL 2017

Transcript of Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery...

Page 1: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

1/22

Hypernyms under Siege:Linguistically-motivated Artillery for Hypernymy Detection

Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

1Bar-Ilan University, 2Singapore University of Technology and Design, 3University of Stuttgart

EACL 2017

Page 2: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

2/22

Hypernymy Detection - Task Definition

I Given two terms, (x, y), decide whether y is a hypernym of x

I in some senses of x and yI e.g. python’s hypernyms are animal and programming language

I Motivation: taxonomy creation, RTE, QA, ...

I e.g.: A boy is hitting a baseball ⇒ A child is hitting a baseball

Page 3: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

2/22

Hypernymy Detection - Task Definition

I Given two terms, (x, y), decide whether y is a hypernym of xI in some senses of x and y

I e.g. python’s hypernyms are animal and programming language

I Motivation: taxonomy creation, RTE, QA, ...

I e.g.: A boy is hitting a baseball ⇒ A child is hitting a baseball

Page 4: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

2/22

Hypernymy Detection - Task Definition

I Given two terms, (x, y), decide whether y is a hypernym of xI in some senses of x and yI e.g. python’s hypernyms are animal and programming language

I Motivation: taxonomy creation, RTE, QA, ...

I e.g.: A boy is hitting a baseball ⇒ A child is hitting a baseball

Page 5: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

2/22

Hypernymy Detection - Task Definition

I Given two terms, (x, y), decide whether y is a hypernym of xI in some senses of x and yI e.g. python’s hypernyms are animal and programming language

I Motivation: taxonomy creation, RTE, QA, ...

I e.g.: A boy is hitting a baseball ⇒ A child is hitting a baseball

Page 6: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

2/22

Hypernymy Detection - Task Definition

I Given two terms, (x, y), decide whether y is a hypernym of xI in some senses of x and yI e.g. python’s hypernyms are animal and programming language

I Motivation: taxonomy creation, RTE, QA, ...

I e.g.: A boy is hitting a baseball ⇒ A child is hitting a baseball

Page 7: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

3/22

Hypernymy Detection

corpus-based

path-based distributional

HypeNET[Shwartz et al., 2016]

unsupervised supervised

similarity informativeness inclusion

concat [Baroni et al., 2012]difference [Weeds et al., 2014]

ASYM [Roller et al., 2014]...

Hearst Patterns [Hearst, 1992]Snow [Snow et al., 2005][Necsulescu et al., 2015]

...

Cosine [Salton and McGill, 1986]Lin [Lin, 1998]

APSyn [Santus et al., 2016b]

SLQS [Santus et al., 2014]RCTC [Rimell, 2014]

WeedsPrec [Weeds and Weir, 2003]ClarkeDE [Clarke, 2009]

balAPinc [Kotlerman et al., 2010]...

Page 8: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

4/22

Unsupervised Hypernymy Detection

corpus-based

path-based distributional

HypeNET[Shwartz et al., 2016]

unsupervised supervised

similarity informativeness inclusion

concat [Baroni et al., 2012]difference [Weeds et al., 2014]

ASYM [Roller et al., 2014]...

Hearst Patterns [Hearst, 1992]Snow [Snow et al., 2005][Necsulescu et al., 2015]

...

Cosine [Salton and McGill, 1986]Lin [Lin, 1998]

APSyn [Santus et al., 2016b]

SLQS [Santus et al., 2014]RCTC [Rimell, 2014]

WeedsPrec [Weeds and Weir, 2003]ClarkeDE [Clarke, 2009]

balAPinc [Kotlerman et al., 2010]...

Page 9: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 10: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 11: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymy

I Specific measures are preferred in distinguishing hypernymyfrom each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 12: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 13: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 14: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

5/22

Key Takeouts

I We performed an extensive number of experiments, with:I 14 unsupervised measuresI 12 DSMs: 6 context types × 2 feature weightingsI 4 datasets

I Our findings:1. There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

2. Unsupervised measures are not deprecated given supervisedmethods

I In real-application scenario, unsupervised may be preferred

Page 15: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

6/22

Outline

Comparing Between Unsupervised Measures

Hypernym vs. All Other RelationsHypernym vs. One Other Relation

Supervised vs. Unsupervised

PerformanceRobustness

Page 16: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

7/22

Outline 1

Comparing Between Unsupervised Measures

Hypernym vs. All Other RelationsHypernym vs. One Other Relation

Supervised vs. Unsupervised

PerformanceRobustness

Page 17: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

8/22

Hypernym vs. All

I Q: Is there a “best” unsupervised measure?

I Evaluation setting: hypernymy vs. all other relations

I Compute measure score for each (x , y) pair

I Evaluation metric: average precision

I A: There is no single combination of measure, context typeand feature weighting that consistently performs best

dataset measure context type AP@100EVALution invCL joint 0.661

BLESS invCL win5 0.54

Lenci/Benotto APSyn joint 0.617

Weeds clarkeDE win5d 0.911

Page 18: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

8/22

Hypernym vs. All

I Q: Is there a “best” unsupervised measure?

I Evaluation setting: hypernymy vs. all other relations

I Compute measure score for each (x , y) pair

I Evaluation metric: average precision

I A: There is no single combination of measure, context typeand feature weighting that consistently performs best

dataset measure context type AP@100EVALution invCL joint 0.661

BLESS invCL win5 0.54

Lenci/Benotto APSyn joint 0.617

Weeds clarkeDE win5d 0.911

Page 19: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

8/22

Hypernym vs. All

I Q: Is there a “best” unsupervised measure?

I Evaluation setting: hypernymy vs. all other relations

I Compute measure score for each (x , y) pair

I Evaluation metric: average precision

I A: There is no single combination of measure, context typeand feature weighting that consistently performs best

dataset measure context type AP@100EVALution invCL joint 0.661

BLESS invCL win5 0.54

Lenci/Benotto APSyn joint 0.617

Weeds clarkeDE win5d 0.911

Page 20: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

8/22

Hypernym vs. All

I Q: Is there a “best” unsupervised measure?

I Evaluation setting: hypernymy vs. all other relations

I Compute measure score for each (x , y) pair

I Evaluation metric: average precision

I A: There is no single combination of measure, context typeand feature weighting that consistently performs best

dataset measure context type AP@100EVALution invCL joint 0.661

BLESS invCL win5 0.54

Lenci/Benotto APSyn joint 0.617

Weeds clarkeDE win5d 0.911

Page 21: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

8/22

Hypernym vs. All

I Q: Is there a “best” unsupervised measure?

I Evaluation setting: hypernymy vs. all other relations

I Compute measure score for each (x , y) pair

I Evaluation metric: average precision

I A: There is no single combination of measure, context typeand feature weighting that consistently performs best

dataset measure context type AP@100EVALution invCL joint 0.661

BLESS invCL win5 0.54

Lenci/Benotto APSyn joint 0.617

Weeds clarkeDE win5d 0.911

Page 22: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

9/22

Hypernym vs. One Other Relation

I Different measures capture different aspects of hypernymyI e.g., that x’s contexts are included in y’s

I Q: Is there a most suitable measure to distinguish hypernymyfrom each other relation?

I Consider all relations that occurred in two datasets

I For such relation, for each dataset, rank measures by AP andselect those with AP ≥ 0.8

Page 23: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

9/22

Hypernym vs. One Other Relation

I Different measures capture different aspects of hypernymyI e.g., that x’s contexts are included in y’s

I Q: Is there a most suitable measure to distinguish hypernymyfrom each other relation?

I Consider all relations that occurred in two datasets

I For such relation, for each dataset, rank measures by AP andselect those with AP ≥ 0.8

Page 24: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

9/22

Hypernym vs. One Other Relation

I Different measures capture different aspects of hypernymyI e.g., that x’s contexts are included in y’s

I Q: Is there a most suitable measure to distinguish hypernymyfrom each other relation?

I Consider all relations that occurred in two datasets

I For such relation, for each dataset, rank measures by AP andselect those with AP ≥ 0.8

Page 25: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

9/22

Hypernym vs. One Other Relation

I Different measures capture different aspects of hypernymyI e.g., that x’s contexts are included in y’s

I Q: Is there a most suitable measure to distinguish hypernymyfrom each other relation?

I Consider all relations that occurred in two datasets

I For such relation, for each dataset, rank measures by AP andselect those with AP ≥ 0.8

Page 26: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

10/22

Hypernym vs. Attribute

I Best measures: similarity measures + syntactic contexts

Possible explanation:

I Similarity measures quantify the ratio of shared contextsbetween x and y

I Attributes contexts’ are easy to distinguish syntactically

I y is almost always an adjective, x is a noun/verb

I Less shared contexts than in hyponym-hypernym pairs

Page 27: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

10/22

Hypernym vs. Attribute

I Best measures: similarity measures + syntactic contexts

Possible explanation:

I Similarity measures quantify the ratio of shared contextsbetween x and y

I Attributes contexts’ are easy to distinguish syntactically

I y is almost always an adjective, x is a noun/verb

I Less shared contexts than in hyponym-hypernym pairs

Page 28: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

10/22

Hypernym vs. Attribute

I Best measures: similarity measures + syntactic contexts

Possible explanation:

I Similarity measures quantify the ratio of shared contextsbetween x and y

I Attributes contexts’ are easy to distinguish syntactically

I y is almost always an adjective, x is a noun/verb

I Less shared contexts than in hyponym-hypernym pairs

Page 29: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

10/22

Hypernym vs. Attribute

I Best measures: similarity measures + syntactic contexts

Possible explanation:

I Similarity measures quantify the ratio of shared contextsbetween x and y

I Attributes contexts’ are easy to distinguish syntactically

I y is almost always an adjective, x is a noun/verb

I Less shared contexts than in hyponym-hypernym pairs

Page 30: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

10/22

Hypernym vs. Attribute

I Best measures: similarity measures + syntactic contexts

Possible explanation:

I Similarity measures quantify the ratio of shared contextsbetween x and y

I Attributes contexts’ are easy to distinguish syntactically

I y is almost always an adjective, x is a noun/verb

I Less shared contexts than in hyponym-hypernym pairs

Page 31: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernymsI cat and animal both eat and are aliveI you wag a tail but not a cat

Page 32: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernymsI cat and animal both eat and are aliveI you wag a tail but not a cat

Page 33: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernymsI cat and animal both eat and are aliveI you wag a tail but not a cat

Page 34: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernymsI cat and animal both eat and are aliveI you wag a tail but not a cat

Page 35: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional context

I Meronyms and holonyms are more often of different functionsthan hyponym and hypernyms

I cat and animal both eat and are aliveI you wag a tail but not a cat

Page 36: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernyms

I cat and animal both eat and are aliveI you wag a tail but not a cat

Page 37: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

11/22

Hypernym vs. Meronym

I Best measures: inclusion measures + syntactic contexts

Possible explanation:

I Inclusion measures quantify the ratio of x’s contexts includedin y’s [Weeds and Weir, 2003, Geffet and Dagan, 2005]

I Window-based contexts capture topical context

I Topics are shared between holonym-meronym andhypernym-hyponym pairs

I Syntactic contexts capture functional contextI Meronyms and holonyms are more often of different functions

than hyponym and hypernymsI cat and animal both eat and are aliveI you wag a tail but not a cat

Page 38: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

12/22

Hypernym vs. Synonym (1/2)

I Best measures: SLQS / invCL

Possible explanation:

I SLQS [Santus et al., 2014]: is x more informative than y?I y is less informative than x in hypernymy, not in synonymy

I budgie is more informative than birdI budgie and parakeet are of the same informativeness level

Page 39: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

12/22

Hypernym vs. Synonym (1/2)

I Best measures: SLQS / invCL

Possible explanation:

I SLQS [Santus et al., 2014]: is x more informative than y?

I y is less informative than x in hypernymy, not in synonymyI budgie is more informative than birdI budgie and parakeet are of the same informativeness level

Page 40: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

12/22

Hypernym vs. Synonym (1/2)

I Best measures: SLQS / invCL

Possible explanation:

I SLQS [Santus et al., 2014]: is x more informative than y?I y is less informative than x in hypernymy, not in synonymy

I budgie is more informative than birdI budgie and parakeet are of the same informativeness level

Page 41: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

12/22

Hypernym vs. Synonym (1/2)

I Best measures: SLQS / invCL

Possible explanation:

I SLQS [Santus et al., 2014]: is x more informative than y?I y is less informative than x in hypernymy, not in synonymy

I budgie is more informative than birdI budgie and parakeet are of the same informativeness level

Page 42: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

13/22

Hypernym vs. Synonym (2/2)

I invCL [Lenci and Benotto, 2012] is a strict inclusion measure,quantifying also the non-inclusion of y in x

I Inclusion measures may assign high scores to synonyms, sincethey share many contexts

I invCL penalizes (x, y) pairs in which x also includes many ofy’s contexts

Page 43: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

13/22

Hypernym vs. Synonym (2/2)

I invCL [Lenci and Benotto, 2012] is a strict inclusion measure,quantifying also the non-inclusion of y in x

I Inclusion measures may assign high scores to synonyms, sincethey share many contexts

I invCL penalizes (x, y) pairs in which x also includes many ofy’s contexts

Page 44: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

13/22

Hypernym vs. Synonym (2/2)

I invCL [Lenci and Benotto, 2012] is a strict inclusion measure,quantifying also the non-inclusion of y in x

I Inclusion measures may assign high scores to synonyms, sincethey share many contexts

I invCL penalizes (x, y) pairs in which x also includes many ofy’s contexts

Page 45: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

14/22

Outline 2

Comparing Between Unsupervised Measures

Hypernym vs. All Other RelationsHypernym vs. One Other Relation

Supervised vs. Unsupervised

PerformanceRobustness

Page 46: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

15/22

Supervised Methods

I (Distributional) state-of-the-art: supervised embedding-basedmethods

I (x, y) term-pairs are represented as a feature vector, based ofthe terms’ embeddings:

I Concatenation ~x ⊕ ~y [Baroni et al., 2012]I Difference ~y − ~x [Weeds et al., 2014]I ASYM [Roller et al., 2014]

I Evaluation setting:I Tune hyper-parameters (methods, pre-trained embeddings)I Train a logistic regression classifierI Use logistic score for ranking, report AP

Page 47: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

15/22

Supervised Methods

I (Distributional) state-of-the-art: supervised embedding-basedmethods

I (x, y) term-pairs are represented as a feature vector, based ofthe terms’ embeddings:

I Concatenation ~x ⊕ ~y [Baroni et al., 2012]I Difference ~y − ~x [Weeds et al., 2014]I ASYM [Roller et al., 2014]

I Evaluation setting:I Tune hyper-parameters (methods, pre-trained embeddings)I Train a logistic regression classifierI Use logistic score for ranking, report AP

Page 48: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

15/22

Supervised Methods

I (Distributional) state-of-the-art: supervised embedding-basedmethods

I (x, y) term-pairs are represented as a feature vector, based ofthe terms’ embeddings:

I Concatenation ~x ⊕ ~y [Baroni et al., 2012]I Difference ~y − ~x [Weeds et al., 2014]I ASYM [Roller et al., 2014]

I Evaluation setting:I Tune hyper-parameters (methods, pre-trained embeddings)I Train a logistic regression classifierI Use logistic score for ranking, report AP

Page 49: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

16/22

Supervised vs. Unsupervised Experiment

I Concat achieves almost perfect AP scores, outperformsunsupervised measures:

dataset relation best supervised best unsupervised

EVALution

meronym 0.998 0.886

attribute 1.000 0.877

antonym 1.000 0.773

synonym 0.996 0.813

BLESS

meronym 1.000 0.939

coord 1.000 0.938

attribute 1.000 0.938

event 1.000 0.847

random-n 0.995 0.818

random-j 1.000 0.917

random-v 1.000 0.895

Lenci/Benottoantonym 0.917 0.807

synonym 0.946 0.914

Weeds coord 0.873 0.824

Page 50: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?

I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 51: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 52: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:

I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 53: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernym

I [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 54: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patterns

I [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 55: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 56: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

17/22

Measuring Robustness (1/3)

I Is there any advantage to unsupervised measures?I apart from not requiring training data...

I Supervised methods do not learn the relation between x and y:I [Levy et al., 2015]: they memorize that y is a prototypical hypernymI [Roller and Erk, 2016]: they trace y’s occurrences in Hearst patternsI [Shwartz et al., 2016]: they provide the “prior” of y to fit the relation

I AP scores may suggest overfitting by the supervised methods...

Page 57: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 58: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)

I For each hypernym pair (x1,y1):I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 59: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pair

I such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 60: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ dataset

I Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 61: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 62: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 63: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

18/22

Measuring Robustness (2/3)

I We repeated the experiment of [Santus et al., 2016a]I Adding switched hypernym pairs as randomI e.g. (apple, animal), (cow, fruit)

I Using BLESS (the only dataset with random pairs)I For each hypernym pair (x1,y1):

I Sampled y2 such that (x2,y2) ∈ dataset is a hypernym-pairI such that (x1,y2) /∈ datasetI Add (x1,y2) as random

I We added 139 such new random pairs

I We used the best supervised and unsupervised methods tore-classify the revised dataset

Page 64: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

19/22

Measuring Robustness (3/3)

I Supervised method drops in performance, unsupervisedperformance is stable:

method AP@100 original AP@100 switched ∆supervised concat, word2vec, L1 0.995 0.575 -0.42

unsupervised cosWeeds, win2d, ppmi 0.818 0.882 +0.064

I 121 out of the 139 switched pairs were falsely classified ashypernyms by the supervised method!

I Unsupervised measures are robust, performance even slightlyimproves

I Consistent with the transfer-learning experiment of[Turney and Mohammad, 2015]

Page 65: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

19/22

Measuring Robustness (3/3)

I Supervised method drops in performance, unsupervisedperformance is stable:

method AP@100 original AP@100 switched ∆supervised concat, word2vec, L1 0.995 0.575 -0.42

unsupervised cosWeeds, win2d, ppmi 0.818 0.882 +0.064

I 121 out of the 139 switched pairs were falsely classified ashypernyms by the supervised method!

I Unsupervised measures are robust, performance even slightlyimproves

I Consistent with the transfer-learning experiment of[Turney and Mohammad, 2015]

Page 66: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

19/22

Measuring Robustness (3/3)

I Supervised method drops in performance, unsupervisedperformance is stable:

method AP@100 original AP@100 switched ∆supervised concat, word2vec, L1 0.995 0.575 -0.42

unsupervised cosWeeds, win2d, ppmi 0.818 0.882 +0.064

I 121 out of the 139 switched pairs were falsely classified ashypernyms by the supervised method!

I Unsupervised measures are robust, performance even slightlyimproves

I Consistent with the transfer-learning experiment of[Turney and Mohammad, 2015]

Page 67: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

19/22

Measuring Robustness (3/3)

I Supervised method drops in performance, unsupervisedperformance is stable:

method AP@100 original AP@100 switched ∆supervised concat, word2vec, L1 0.995 0.575 -0.42

unsupervised cosWeeds, win2d, ppmi 0.818 0.882 +0.064

I 121 out of the 139 switched pairs were falsely classified ashypernyms by the supervised method!

I Unsupervised measures are robust, performance even slightlyimproves

I Consistent with the transfer-learning experiment of[Turney and Mohammad, 2015]

Page 68: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 69: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferred

I Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 70: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymy

I Specific measures are preferred in distinguishing hypernymyfrom each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 71: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 72: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 73: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasets

I However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 74: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training data

I Unsupervised methods are more robust

Page 75: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

20/22

Recap

I We experimented with an extensive number of unsupervisedmeasures for hypernymy detection

I There is no specific measure which is always preferredI Different measures capture different aspects of hypernymyI Specific measures are preferred in distinguishing hypernymy

from each other relation

I Unsupervised measures are not deprecated given supervisedmethods

I Supervised methods are favorable by the existing datasetsI However, they are extremely sensitive to training dataI Unsupervised methods are more robust

Page 76: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 77: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 78: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 79: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 80: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 81: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

21/22

Future Work

I A combination of unsupervised measures may perform well indistinguishing hypernym from all other relations

I A simple classifier with unsupervised measures as featuresperforms well

I However, it’s not as robust as the unsupervised measures

I Current datasets are biased and encourage overfitting

I In the future, the development of natural datasets may betterreflect the performance of methods in real-applicationscenarios

Thank you!

Page 82: Hypernyms under Siege - Vered Shwartz · Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection Vered Shwartz1, Enrico Santus2 and Dominik Schlechtweg3

22/22

ReferencesBaroni, M., Bernardi, R., Do, N.-Q., and Shan, C.-c. (2012).Entailment above the word level in distributional semantics.In Proceedings of the 13th Conference of the EuropeanChapter of the Association for Computational Linguistics,pages 23–32. Association for Computational Linguistics.

Clarke, D. (2009).Context-theoretic semantics for natural language: anoverview.In Proceedings of the Workshop on Geometrical Models ofNatural Language Semantics, pages 112–119. Association forComputational Linguistics.

Geffet, M. and Dagan, I. (2005).The distributional inclusion hypotheses and lexicalentailment.In Proceedings of the 43rd Annual Meeting on Association forComputational Linguistics, pages 107–114. Association forComputational Linguistics.

Hearst, M. A. (1992).Automatic acquisition of hyponyms from large text corpora.In COLING 1992 Volume 2: The 15th InternationalConference on Computational Linguistics.

Kotlerman, L., Dagan, I., Szpektor, I., andZhitomirsky-Geffet, M. (2010).Directional distributional similarity for lexical inference.Natural Language Engineering, 16(04):359–389.

Lenci, A. and Benotto, G. (2012).Identifying hypernyms in distributional semantic spaces.In *SEM 2012: The First Joint Conference on Lexical andComputational Semantics – Volume 1: Proceedings of themain conference and the shared task, and Volume 2:Proceedings of the Sixth International Workshop on SemanticEvaluation (SemEval 2012), pages 75–79. Association forComputational Linguistics.

Levy, O., Remus, S., Biemann, C., and Dagan, I. (2015).Do supervised distributional methods really learn lexicalinference relations?In Proceedings of the 2015 Conference of the North AmericanChapter of the Association for Computational Linguistics:Human Language Technologies, pages 970–976. Associationfor Computational Linguistics.

Lin, D. (1998).An information-theoretic definition of similarity.ICML, 98:296–304.

Necsulescu, S., Mendes, S., Jurgens, D., Bel, N., and Navigli,R. (2015).Reading between the lines: Overcoming data sparsity foraccurate classification of lexical relationships.Lexical and Computational Semantics (* SEM 2015), page182.

Rimell, L. (2014).Distributional lexical entailment by topic coherence.In Proceedings of the 14th Conference of the EuropeanChapter of the Association for Computational Linguistics,pages 511–519. Association for Computational Linguistics.

Roller, S. and Erk, K. (2016).Relations such as hypernymy: Identifying and exploitinghearst patterns in distributional vectors for lexical entailment.

In Proceedings of the 2016 Conference on Empirical Methodsin Natural Language Processing, pages 2163–2172.Association for Computational Linguistics.

Roller, S., Erk, K., and Boleda, G. (2014).Inclusive yet selective: Supervised distributional hypernymydetection.In Proceedings of COLING 2014, the 25th InternationalConference on Computational Linguistics: Technical Papers,pages 1025–1036. Dublin City University and Association forComputational Linguistics.

Salton, G. and McGill, M. J. (1986).Introduction to modern information retrieval.McGraw-Hill, Inc.

Santus, E., Lenci, A., Chiu, T.-S., Lu, Q., and Huang, C.-R.(2016a).Nine features in a random forest to learn taxonomicalsemantic relations.In Proceedings of the Tenth International Conference onLanguage Resources and Evaluation (LREC 2016), pages4557–4564. European Language Resources Association(ELRA).

Santus, E., Lenci, A., Chiu, T.-S., Lu, Q., and Huang, C.-R.(2016b).Unsupervised measure of word similarity: How to outperformco-occurrence and vector cosine in vsms.In Thirtieth AAAI Conference on Artificial Intelligence, pages4260–4261.

Santus, E., Lenci, A., Lu, Q., and Schulte im Walde, S.(2014).Chasing hypernyms in vector spaces with entropy.In Proceedings of the 14th Conference of the EuropeanChapter of the Association for Computational Linguistics,volume 2: Short Papers, pages 38–42. Association forComputational Linguistics.

Shwartz, V., Goldberg, Y., and Dagan, I. (2016).Improving hypernymy detection with an integratedpath-based and distributional method.In Proceedings of the 54th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Papers),pages 2389–2398. Association for Computational Linguistics.

Snow, R., Jurafsky, D., and Ng, A. Y. (2005).Learning syntactic patterns for automatic hypernymdiscovery.In Advances in Neural Information Processing Systems 17,pages 1297–1304. MIT Press.

Turney, P. D. and Mohammad, S. M. (2015).Experiments with three approaches to recognizing lexicalentailment.Natural Language Engineering, 21(03):437–476.

Weeds, J., Clarke, D., Reffin, J., Weir, D., and Keller, B.(2014).Learning to distinguish hypernyms and co-hyponyms.In Proceedings of COLING 2014, the 25th InternationalConference on Computational Linguistics: Technical Papers,pages 2249–2259. Dublin City University and Association forComputational Linguistics.

Weeds, J. and Weir, D. (2003).A general framework for distributional similarity.In Proceedings of the 2003 Conference on Empirical Methodsin Natural Language Processing, pages 81–88.