Text Classification without Supervision: Incorporating World...

51
Text Classification without Supervision: Incorporating World Knowledge and Domain Adaptation Yangqiu Song Lane Department of CSEE West Virginia University 1 Much of the work was done at UIUC

Transcript of Text Classification without Supervision: Incorporating World...

Page 1: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Text Classification without Supervision: Incorporating World Knowledge and

Domain Adaptation

Yangqiu SongLane Department of CSEE

West Virginia University

1

Much of the work was done at UIUC

Page 2: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Collaborators

Dan Roth Haixun Wang Shusen Wang Weizhu Chen

2

Page 3: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Text Categorization

• Traditional machine learning approach:

Label data

Train a classifier

Make prediction

3

Page 4: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Challenges• Domain expert annotation

– Large scale problems

• Diverse domains and tasks• Topics

• Languages

• …

• Short and noisy texts

– Tweets,

– Queries,

– …4

Page 5: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Many diverse and fast changing domains

Domain specific task:entertainment or sports?

Reduce Labeling Efforts

Semi-supervised learning

A more general way?

Transfer learningZero-shot learning

Search engineSocial media…

5

Page 6: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Our Solution• Knowledge enabled learning

– Millions of entities and concepts

– Billions of relationships

• Labels carry a lot of information!

– Traditional models treat labels as “numbers or IDs”

6

Page 7: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Example:Knowledge Enabled Text Classification

Dong Nguyen announced that he would be removing his hit game Flappy Bird from both the iOS and Android app stores, saying that the success of the game is something he never wanted. Some fans of the game took it personally, replying that they would either kill Nguyen or kill themselves if he followed through with his decision.

Pick a label:

Class1 or Class2 ?Mobile Game or Sports

7

Page 8: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Dataless Text Categorization: Classification on the Fly

Documents

(Good) Labelnames Map

labels/documentsto the same space

M.-W. Chang, L.-A. Ratinov, D. Roth, V. Srikumar: Importance of Semantic Representation: Dataless Classification. AAAI 2008.Y. Song, D. Roth: On dataless hierarchical text classification. (AAAI). 2014.

Mobile Game or Sports?

World knowledge

Compute document and label similarities

Choose labels

8

Page 9: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Challenges of Using Knowledge

Representation

Inference Learning

Data vs. knowledge representation

Knowledge specification;Disambiguation

Compare different representations

Show some interesting examples

Scalability;Domain adaptation;

Open domain classes

9

Page 10: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Outline of the TalkDataless Text Classification:

Classify Documents on the Fly

Documents

Labelnames Map

labels/documentsto the same space

World knowledge

Compute document and label similarities

Choose labels

10

Page 11: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Difficulty of Text Representation

• Polysemy and Synonym

Language

Meaning

Ambiguity

Variability

apple

company fruit treecat

cat feline kitty moggy

Basic level concepts

Typicality scores

Rosch, E. et al. Basic objects in natural categories. Cognitive Psychology. 1976. Rosch, E. Principles of categorization. In Rosch, E., and Lloyd, B., eds., Cognition and Categorization. 1978. 11

Page 12: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Typicality of Entities

bird

12

Page 13: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Basic Level Concepts

pet mammalpug dog

What do we usually call it?

animal

13

Page 14: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

pug bulldog

We use the right level of concepts to describe things!

14

Page 15: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Probase: A Probabilistic Knowledge Base

Web Document Cleaning

Information Extraction

Semantic Cleaning

Knowledge Integration

Hearst patterns“Animals such as dogs and cats.”

1.68 billions documents

Mutual exclusive concepts

FreebaseWikipedia

M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. Int. Conf. on Comp. Ling. (COLING).1992.W. Wu, et al. Probase: A probabilistic taxonomy for text understanding. In ACM SIG on Management of Data (SIGMOD). 2012. (Data released http://probase.msra.cn) 15

Page 16: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

citycountrydisease

magazinebank

… local schoolJava toolbig bank

BI product…

Distribution of Concepts

Concept Distribution

16

Page 17: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Typicality

• Animal • Dog0 0.05 0.1 0.15

dog

cat

horse

bird

rabbit

deer

0 0.05 0.1 0.15

german shepherd

poodle

rottweiler

chihuahua

golden retriever

boxer

(entity, )(entity

conceptconc| )

(ept

conc )ept

nP

n

17

Page 18: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Basic Level Concepts

• Robin • Penguin

0 0.2 0.4 0.6

bird

species

character

songbird

common bird

small bird

0 0.1 0.2 0.3 0.4

animal

bird

species

flightless bird

seabird

diving bird

entit( ,concept)(concept | )

yentity

en( )tity

nP

n

18

Page 19: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Obama’s real-estate policy

president, politician investment, property, asset, plan

president, politician, investment, property, asset, plan

Concepts of Multiple Entities

+ + +w1 w2 wn =

E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.

Explicit Semantic Analysis (ESA)

19

Page 20: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

apple adobe

software company, brand, fruit brand, software company

software company, brand, fruitsoftware company, brand

Multiple Related Entities

Intersection instead of union!

20

Page 21: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Probabilistic Conceptualization

P(concept | related entities)

TypicalityBasic Level Concept

P(adobe | fruit) = 0P(fruit | adobe, apple) = 0

Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011.

1

( | ) ( )( | ) ( ) ( | )

( )

Mk k

k k i k

i

P E c P cP c E P c P e c

P E

( , )( | )

( )

i ki k

k

P e cP e c

P c{ | 1,..., }iE e i M

21

( )kP c

Page 22: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Given “China, India, Russia, Brazil”

emerging market

emerging economy

economy

emerging country

country

emerging power

emerging nation

bric country

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

22

Page 23: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Given “China, India, Japan, Singapore”

asian country

country

economy

asian nation

asian market

asian economy

asia pacific region

east asian country

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

23

Page 24: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Outline of the TalkDataless Text Classification:

Classify Documents on the Fly

Documents

Labelnames Map

labels/documentsto the same space

World knowledge

Compute document and label similarities

Choose labels

24

Page 25: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Generic Short Text Conceptualization

P(concept | short text)

1. Grounding to knowledge base

2. Clustering entities

3. Inside clusters: intersection

4. Between clusters: union

25

Page 26: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Markov Random Field Model

Entity type:instance or attribute

Entity clique:intersection

Parameter estimation: concept distribution

26

Page 27: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

developed country

economy

country

fruit

fruit crop

fruit juice

news channel

news publication

news website0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Given “U.S.”, “Japan”, “U.K.”;“apple”, “pear”; “BBC”, ”New York Time”

27

Page 28: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Tweet Clustering

0 0.2 0.4 0.6 0.8 1

Topic Model

WordNet

Freebase

WikiCategory

Wiki (ESA)

Probase

Clustering Normalized Mutual Information (NMI)

companies, animals, countries 4 region-related countries

Better access the right level of concepts

Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011. 28

Page 29: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Web Search Relevance• Evaluation data:

– 300K Web queries

– 19M query-URL pairs

• Historical data:– 8M URLs

– 8B query-URL clicks

33

34

35

36

37

NDCG@1 NDCG@2 NDCG@3 NDCG@4 NDCG@5Content Ranker Probase

Song et al., Int. Conf. on Inf. and Knowl. Man. (CIKM). 2014. 29

Page 30: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

• World knowledge bases

– General purpose

– Information bias

• Domain dependent tasks

– E.g., classification/clustering of entertainment vs. sports

– Knowledge about science/technology is useless

Domain Adaptation

30

Page 31: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Domain Adaptation for Corpus

Entity type:instance or attribute

Entity clique:intersection

Parameter estimation: concept distribution

Hyper-parameter estimation: domain adaptation

Complexity: 2( )O NM D

31

Page 32: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Domain Adaptation Results

0.5

0.6

0.7

0.8

0.9

Tweets News titles

Clu

ster

ing

NM

I

Conceptualization Domain Adaptation

Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2015. 32

Page 33: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Similarity and Relatedness

• Similarity– a specific type of relatedness– synonyms, hyponyms/hypernyms, and siblings are highly

similar• doctor vs. surgeon, bike vs. bicycle

• Relatedness– topically related or based on any other semantic relation

• heart vs. surgeon, tire vs. car

• In the following, we focus on Wikipedia!– The methodologies apply

• Entity relatedness• Domain adaptation

33

Page 34: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Dataless Text Classification: Classify Documents on the Fly

Documents

Labelnames Map

labels/documentsto the same space

World knowledge

Compute document and label similarities

Choose labels

34

Page 35: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Mobile Game or Sports?

Classification in the Same Semantic Space

arg min ( ( ), ( ))il l il Dist x l

35

+ + +w1 w2 wn =

E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.

Explicit Semantic Analysis (ESA)

Page 36: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

0.60.52

0.68

Classification F1

F1

OHLDA Topics (#topic=20, #doc/topic=100)Word2Vec (window=5, dim=500)ESA with Wiki (#concept=500)

Classification of 20 Newsgroups Documents: Cosine Similarity

V.Ha-Thuc, and J.-M. Renders, Large-scale hierarchical text classification without labelled data. In WSDM 2011.Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003.Mikolov et al. Efficient Estimation of Word Representations in Vector Space. NIPS. 2013. 36

• 20 newsgroups

• L1: 6 classes

• L2: 20 classes

• OHLDA:

• Same hierarchy

• Word2vec

• Trained on wiki

• Skipgram

Page 37: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Two Factors in Dataless Classification

• Length of document • Number of labels

1326

52104

209

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 100 200

F1

# words/document

Random Guess

Balanced binary classification Multi-class classification

37

2

20

103 6110.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500

F1

# labels

2-newsgroups

20-newsgroups

103 RCV1 Topics

611 Wiki Cates

Page 38: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Similarity• Cosine

ChampaignPoliceMakeArrestArmed

RobberyCasesTwo

ArrestedUI

Campus…

1111111

11

1111

Text 1 Text 2

38

Page 39: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Representation Densification

Vector x Vector y

1.0

0.7

Cosine

Average

Max matching

Hungarian matching

39

Page 40: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

rec.autos vs. sci.electronics(1/16 document: 13 words per text)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

50 100 200 500 1000

Acc

ura

cy

# concepts in ESA (Wiki)

Concept (Cosine) Concept (Average)

Concept (Hungarian) Word2vec (200)

Song and Roth. North Amer. Chap. Assoc. Comp. Ling. (NAACL). 2015. 40

Page 41: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Dataless Text Classification: Classify Documents on the Fly

Documents

Labelnames Map

labels/documentsto the same space

World knowledge

Compute document and label similarities

Choose labels

41

Page 42: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

0.60.52

0.68

Classification F1

F1

Topic (#topic=20, #doc/topic=100)Word2Vec (window=5, dim=500)ESA (#concept=500)

0.52

0.64

0.770.83

0.87

Classification F1

100 200 500 1,000 2,000

Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003.Mikolov et al. Efficient Estimation of Word Representations in Vector Space. Adv. Neur. Info. Proc. Sys. (NIPS). 2013.

Supervised Classification

Cosine Similarity

Classification of 20 Newsgroups Documents

42

Page 43: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Bootstrapping with Unlabeled Data

• Initialize N documents for each label

– Pure similarity based classifications

• Train a classifier to label N more documents

– Continue to label more data until no unlabeled document exists

Application of world knowledge of label meaning

Domain adaptation

Mobile gamesSports

43

Page 44: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

0.52

0.64

0.770.83

0.87

Classification F1100 200 500 1,000 2,000

Bootstrapped: 0.84

Pure Similarity: 0.68

Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014

Classification of 20 Newsgroups Documents

44

Page 45: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

2

Hierarchical Classification: Considering Label Dependency

Root

A B … N

1 2 M 1 …… 1 2 …

• Top-down classification

• Bottom-up classification (flat classification)

45

Page 46: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Top-down vs. Bottom-up

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

20-newsgroups RCV1

Mic

roF1

Top-down Botton-up

Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014

RCV1• 804,414 documents• 82 categories in 4 levels• 103 nodes in hierarchy• 3.24 labels/document

46

Page 47: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Dataless Text Classification: Classify Documents on the Fly

Documents

Labelnames Map

labels/documentsto the same space

World knowledge

Compute document and label similarities

Choose labels

47

Page 48: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Labeled data in training

Unlabeled data in training

Label names in training

I.I.D. between training and testing

Supervised learning

Yes No No Yes

Unsupervisedlearning

No Yes No Yes

Semi-supervisedlearning

Yes Yes No Yes

Transfer learning Yes Yes No No

Zero-shot learning Yes No Yes No

DatalessClassification (pure similarity)

No No Yes No

DatalessClassification (bootstrapping)

No Yes Yes Yes

48

Page 49: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Conclusions• Dataless classification

– Reduce labeling work for thousands of documents

• Compared semantic representation using world knowledge– Probabilistic conceptualization (PC)– Explicit semantic analysis (ESA)– Word embedding (word2vec)– Topic model (LDA)– Combination of ESA and word2vec

• Unified PC and ESA– Markov random field model

• Domain adaptation– Hyper-parameter estimation– Boostrapping – refining the classifier

Thank You!

Advertisement:Using knowledge as

structured information instead of flat features!

Session 7B, DM835

49

Page 50: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.
Page 51: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling.

Correlation with Human Annotation of IS-A Relationships

0.057

0.233

0.350.422

0.619

Spea

rman

’s C

orr

elat

ion

Random Guess SemEval'12 Best NN-Vector Lexical Pattern Probase

Combining Heterogeneous Models for Measuring Relational Similarity. A. Zhila, W. Yih, C. Meek, G. Zweig & T. Mikolov. In NAACL-HLT-13.

Gigaword corpus

The Web

51