1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate...

29
1 Co-Training for Cross- Lingual Sentiment Classification Xiaojun Wan ( 萬萬萬 ) Associate Professor, Peking U niversity ACL 2009

Transcript of 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate...

Page 1: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

1

Co-Training for Cross-Lingual Sentiment Classification

Xiaojun Wan (萬小軍 )

Associate Professor, Peking University

ACL 2009

Page 2: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

2

Research Gap

• Opinion mining has drawn much attention recently– Sentiment classification (POS, NEG, NEU)– Subjectivity analysis (subjective, objective)

• Annotated corpora are most important for training

• However, most of them are English data

• Corpora for other languages, including Chinese, are rare

Page 3: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

3

Related Work

• Pilot studies on cross-lingual subjectivity classification

• Mihalcea et al. ACL 2007– Bilingual lexicon and manually translated parallel

corpus• Banea et al. EMNLP 2008

– English annotation tool + MT– Build Romanian annotation tool– Not much loss compared to human translation– Suggesting MT is a viable way

Page 4: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

4

Problem Definition

• Perform cross-lingual sentiment classification– Either positive or negative

• Source: English

• Target: Chinese

• Leverage– 8000 Labeled English product reviews

– 1000 Unlabeled Chinese product reviews

– Machine translation (MT)

• Derive– Sentiment classification tools for Chinese product reviews

Page 5: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

5

Framework

• Training Phase

• Classification Phase

Page 6: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

6

Training Phase (1)Machine Translation

Page 7: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

7

Two Views

Chinese View English View

Page 8: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

8

Training Phase (2)The Co-Training Approach

English View

Page 9: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

9

Label the unlabeled data (English)

English Classifierwith SVM

Label

Een

Top p positiveTop n negative

most confidentreview

Page 10: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

10

Label the unlabeled data (Chinese)

Chinese Classifierwith SVM

Ecn

Top p positiveTop n negativemost confident

review

Label

Page 11: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

11

Remove from Unlabeled DataFinish one Iteration

Een

Top p positiveTop n negative

most confidentreview

Ecn

Top p positiveTop n negativemost confident

review

Train again Train again

Page 12: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

12

Setting

• #Iteration = 40

• p = n = 5

Page 13: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

13

Classification Phase

Chinese Classifier

English Classifier

average [-1, 1]

Page 14: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

14

Experiment Setting (Training)

8000 Amazonproduct reviews.

4000 positive4000 negative

Books, DVDs,electronics

1000 product reviews fromwww.it168.com

mp3 player,mobile phones,DC

Page 15: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

15

Experiment Setting (Testing)

• 886 Chinese product reviews from www.it168.com– 451 positive, 435 negative

– Different from unlabeled training data (outside testing)

Page 16: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

16

Baseline

• SVM– Use only labeled data

• TSVM (Transductive SVM)– Joachims, 1999– Use both labeled and unlabeled

Page 17: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

17

SVM Baselines

SVM(EN)SVM(CN)

Page 18: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

18

SVM Baselines

SVM(ENCN1)

Page 19: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

19

SVM Baselines

SVM(ENCN2)

average

Page 20: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

20

TSVM Baselines

TSVM(EN)TSVM(CN)

Page 21: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

21

TSVM Baselines

TSVM(ENCN1)

Page 22: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

22

TSVM BaselinesTSVM(ENCN2)

average

Page 23: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

23

Result: Method Comparison (1)

Page 24: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

24

Result: Method Comparison (2)Performance on Each Side

SVM(EN)

TSVM(EN)

CoTrain(EN)

Page 25: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

25

Result: Method Comparison (3)

Accuracy

SVM(EN) 0.738

TSVM(EN) 0.769

CoTrain(EN) 0.790

Accuracy

SVM(CN) 0.771

TSVM(CN) 0.767

CoTrain(CN) 0.775

CoTrain make better use of unlabeled Chinese reviews than TSVM

Page 26: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

26

Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations

Page 27: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

27

Result: Balance of (p,n) Unbalanced examples hurt the performance badly

Page 28: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

28

Conclusion & Comment

• Co-Training approach for cross-lingual sentiment classification

• Future Work– Translated and natural text have different feature

distribution

– Domain adaptation algorithm (ex. structural correspondence learning) for linking them

Page 29: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

29

Comment

• Leverage word (phrase) alignment in translated text