1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate...

Co-Training for Cross-Lingual Sentiment Classification

Xiaojun Wan (萬小軍 )

Associate Professor, Peking University

ACL 2009

Research Gap

• Opinion mining has drawn much attention recently– Sentiment classification (POS, NEG, NEU)– Subjectivity analysis (subjective, objective)

• Annotated corpora are most important for training

• However, most of them are English data

• Corpora for other languages, including Chinese, are rare

Related Work

• Pilot studies on cross-lingual subjectivity classification

• Mihalcea et al. ACL 2007– Bilingual lexicon and manually translated parallel

corpus• Banea et al. EMNLP 2008

– English annotation tool + MT– Build Romanian annotation tool– Not much loss compared to human translation– Suggesting MT is a viable way

Problem Definition

• Perform cross-lingual sentiment classification– Either positive or negative

• Source: English

• Target: Chinese

• Leverage– 8000 Labeled English product reviews

– 1000 Unlabeled Chinese product reviews

– Machine translation (MT)

• Derive– Sentiment classification tools for Chinese product reviews

Framework

• Training Phase

• Classification Phase

Training Phase (1)Machine Translation

Two Views

Chinese View English View

Training Phase (2)The Co-Training Approach

English View

Label the unlabeled data (English)

English Classifierwith SVM

Top p positiveTop n negative

most confidentreview

Label the unlabeled data (Chinese)

Chinese Classifierwith SVM

Top p positiveTop n negativemost confident

review

Remove from Unlabeled DataFinish one Iteration

Top p positiveTop n negative

most confidentreview

Top p positiveTop n negativemost confident

review

Train again Train again

Setting

• #Iteration = 40

• p = n = 5

Classification Phase

Chinese Classifier

English Classifier

average [-1, 1]

Experiment Setting (Training)

8000 Amazonproduct reviews.

4000 positive4000 negative

Books, DVDs,electronics

1000 product reviews fromwww.it168.com

mp3 player,mobile phones,DC

Experiment Setting (Testing)

• 886 Chinese product reviews from www.it168.com– 451 positive, 435 negative

– Different from unlabeled training data (outside testing)

Baseline

• SVM– Use only labeled data

• TSVM (Transductive SVM)– Joachims, 1999– Use both labeled and unlabeled

SVM Baselines

SVM(EN)SVM(CN)

SVM Baselines

SVM(ENCN1)

SVM Baselines

SVM(ENCN2)

average

TSVM Baselines

TSVM(EN)TSVM(CN)

TSVM Baselines

TSVM(ENCN1)

TSVM BaselinesTSVM(ENCN2)

average

Result: Method Comparison (1)

Result: Method Comparison (2)Performance on Each Side

SVM(EN)

TSVM(EN)

CoTrain(EN)

Result: Method Comparison (3)

Accuracy

SVM(EN) 0.738

TSVM(EN) 0.769

CoTrain(EN) 0.790

Accuracy

SVM(CN) 0.771

TSVM(CN) 0.767

CoTrain(CN) 0.775

CoTrain make better use of unlabeled Chinese reviews than TSVM

Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations

Result: Balance of (p,n) Unbalanced examples hurt the performance badly

Conclusion & Comment

• Co-Training approach for cross-lingual sentiment classification

• Future Work– Translated and natural text have different feature

distribution

– Domain adaptation algorithm (ex. structural correspondence learning) for linking them

Comment

• Leverage word (phrase) alignment in translated text

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate...

Documents

Transcript of 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate...

Frenillectomia Lingual

Peking Peking Flyer - Riverbend Peking House · Title: Peking Peking Flyer.cdr Author: Infiniti Printing Created Date: 1/9/2017 11:51:31 AM

1 Obstetrics &Gynecology Hospital Fudan University 2011.09 Chen Xiaojun Ovarian Neoplasm 卵巢肿瘤 Chen Xiaojun Ob&Gyn Hospital Fudan Uniiversity.

3 Affiliated High School Peking Family Box Peking

Moj Peking

INTERPOSICIÓN LINGUAL

Cross-lingual and Multi-lingual IR

Traditional Asian TheaterAsian. China Peking Opera Beijing The Peking Opera House.

Botão Lingual / Lingual Button / Botón Lingual€¦ · Botão Lingual / Lingual Button / Botón Lingual . Instruções de Uso / Instructions for Use / Intrucciones de Uso SELECIONE

Patologia Lingual

Abschlussbericht: Auslandsjahr an der Peking Universität ......Abschlussbericht: Auslandsjahr an der Peking Universität, 2012/13 Peking Universität, Beijing, China Austausch über

Shoaib Jameel , Wai Lam , Xiaojun Qian

Arco Lingual

Lingual Incisor

Exudat Lingual

Mul%lingual / Cross-lingual Methods

Lingual Orthodontic

Weightlifting - Men's 81kg LYU Xiaojun PV Sindhu Zacarias ...

Peking presentation

profile lingual