1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu,...

32
1 SELC:A Self- Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh

Transcript of 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu,...

Page 1: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

1

SELC:A Self-Supervised Model for Sentiment Classification

Likun Qiu, Weishi Zhang, Chanjian Hu, Kai ZhaoCIKM 2009

Speaker: Yu-Cheng, Hsieh

Page 2: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

2

Outline

Introduction Important concepts Classification Methods - Lexicon-based Methods - Corpus-based Methods Two phases of the proposed model - Basic SELC Model - SELC Model Experiment Discussion and Error Analysis Conclusion and future work

Page 3: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

3

Introduction

People evaluate the products with feelings about the product. Assigning the positive and negative sentiment values to product reviews is referred to as sentiment classification.

For the purpose of self-supervised model, both Lexicon-based method and Corpus-based are used in the model.

Page 4: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

4

Features for proposed model

Domain-independence. Exploit the complementarities between

lexicon-based and corpus-based to improve the performance.

No need to manually annotate training data for corpus-based method.

Page 5: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

5

Important concepts

Indirect expression of negative sentiment - Conveying negative feeling with positive words (Neg

ation + Positive sentiment).Ex: 不”好” [bu-hao] = not ‘good’

The frequency of indirect expression of negative sentiment is much higher than indirect expression of positive sentiment (about 6:1).

In negative document, 63% positive words are used to express negative sentiment.

Page 6: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

6

Important concepts (Cont.)

Taking a lexical item as a process unit. A Lexical item is a sequence of Chinese char

acters, excluding punctuation marks and negation words.

EX: … , XXXXX 。 …

Page 7: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

7

Classification Methods- Lexicon-based Method Steps:

1. Given a set of sentiment vocabulary , each item in is assigned a sentiment score. And a set of training data, including many reviews

2. , checking whether contains items in , then

classify the review according to the summation of sentiment scores.

3. Taking the lexical items in as new vocabulary set .

4.

5. Using the new to repeat the steps 2 to 5

Rr

'v

V

V

RV

V

r

'vVV r

Page 8: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

8

Classification Methods- Lexicon-based Method (Cont.) Features - Unsupervised.

- Domain-independence.

- Using a general sentimental item list.

- Positive classification bias, that is, higher precision on negative reviews.

Page 9: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

9

Classification Methods Corpus-based Method Steps:

1. Given two sets P and N, representing

Positive and Negative review set respectively.

2. A general sentiment dictionary is used to be the

feature set

3. Input a set R, including many new reviews.

4. According to the features of those reviews in

P and N to decide the class for those reviews

in R.

Page 10: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

10

Classification Methods Corpus-based Method (Cont.) Features: - Supervised learning gets higher performance

then unsupervised learning

- Domain-dependence.

- Negative-classification bias, that is, higher

precision on positive review.

Page 11: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

11

Two phases of the proposed model Phase 1 Phase1: Basic SELC Model

Page 12: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

12

Two phases of the proposed model Phase 1 (Cont.) Initiation Step

- An sentiment vocabulary V, including a list of items which is initialized by a sentiment dictionary.

- Each item in the sentiment dictionary is assigned with a sentiment score. positive word get +1, negative word get -1.

Page 13: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

13

Two phases of the proposed model Phase 1 (Cont.)

Step1:Computation of Review Sentiment score - Each review is divided into zones by punctuation marks. - For a zone, checking whether contains items in V. If exist, taking those items as effective items. - Each effective item i of a zone is score by: where, : length of the item : sentiment score of effective item : negation check coefficient with default value 1 * Longer effective item including more feelings about the product.

ddphrase

di NSL

LS

2

dL

dS

dN

Page 14: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

14

Two phases of the proposed model Phase 1 (Cont.) Step1:Computation of Review Sentiment score (Cont.)- Sum up all the score of the effective items of a zone to get the

ZoneScore.

- If the ZoneScore is greater than zero, the zone is classified as

positive, if smaller than zero, the zone is classified as negative.

- Sum up all ZoneScore to get ReviewScore.

- If the ReviewScore is greater than zero, the review is classified as

positive, if smaller than zero, the review is classified as negative.

Page 15: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

15

Two phases of the proposed model Phase 1 (Cont.)

Step2: Review Sentiment Classification with Ratio Control

Cpositve: Number of reviews with a positive ReviewScore

Cnegative: Number of reviews with a negative ReviewScore

Page 16: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

16

Two phases of the proposed model Phase 1 (Cont.) Step3: Iterative Retraining - Taking the lexical items that occur at least twice in those

classified reviews as candidate items.

- , stand for frequency of candidate item in positive/negative reviews respectively.

- if an candidate item in a positive review is preceded by a negation, the of the candidate item is reduced 1, vice versa.

pF nF

pF

Page 17: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

17

Two phases of the proposed model Phase 1 (Cont.) Step3: Iterative Retraining (Cont.)- The difference between each candidate is measured by:

- Sentiment score of each item in V recalculated by:

- Threshold = 1.

np

np

FF

FFdifference

||

np FF

Page 18: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

18

Two phases of the proposed model Phase 1 (Cont.) Step4: Iteration Control - The iterative process completes, if no more

difference in the classification result between two iterations.

- As long as the iteration process completes, go to the Uncertain Set Process Step.

Page 19: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

19

Two phases of the proposed model Phase 1 (Cont.) Uncertain Set Processing Step

: Number of positive zones in a review

: Number of negative zones in a review

positiveZC

negativeZC

Page 20: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

20

Two phases of the proposed model Phase 2

Phase2: SELC Model

Page 21: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

21

Two phases of the proposed model Phase 2 (Cont.) Corpus-based Supervised Method - Choose SVM as the machine-learning method.

- Using a general sentiment dictionary as feature set.

- TF-IDF is used to compute the weight.

i

iii df

Ntfw log*

Page 22: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

22

Two phases of the proposed model Phase 2 (Cont.) Integration Process - Designed to process reviews in the Uncertain Set.

- To deal with the problem of positive-classification bias for lexicon-based and negative-classification bias for corpus-based.

Page 23: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

23

Experiments

Data and Tools - 7,779 product reviews written in Chinese.

- Divide those product reviews into 10 domains and are indexed as C1 to C10.

- Each sub-corpus has equal number of positive and negative reviews.

- HowNet Sentiment Dictionary is used as the sentiment dictionary (4566 positive words and 4370 negative words)

- WEKA 3.4.11 is used to implement SVM

- Taking the result report in “Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Test” as baseline.

Page 24: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

24

Experiments (Cont.)

Page 25: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

25

Experiments (Cont.)

V1: Ratio control is removed

V2: Using different seed V3: 6 negation are used

( 不 , 不會 , 沒有 , 沒 , 雖然 , 雖 , 盡管 , 缺 , 缺乏 , 無 )

Page 26: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

26

Experiments (Cont.)

SELC* Model = Basic SELC Model without Uncertain Set Processing.

SVM-HowNet: Using 10-fold validation.

Page 27: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

27

Experiments (Cont.)

Page 28: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

28

Experiments (Cont.)

Page 29: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

29

Experiments (Cont.)

Page 30: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

30

Discussion and Error Analysis Ratio control can decrease the increasing speed

of positive item, thus overcome the positive classification bias.

A general sentiment dictionary is used to replace a seed set that generated automatically.

The utilization of supervised method improves the overall performance.

Most of error are caused by ambiguous sentiment.

Ex: 優點”多” V.S. 缺點”多”

Page 31: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

31

Conclusion and Future work

Propose a novel approach, which successful integrates a corpus-based model with lexicon-based model.

Present several strategies to overcome the positive/negative classification bias through ratio control

There are many complicated constructions involved in the indirection expression of negative sentiment.

Ex: 實現 :positive word, 避免 : negative word.

Page 32: 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

32

The End