REVIEW SELECTION BASED ON TOPIC MODELS Duc_Nguyen_Thesis...review selection based on topic models...

REVIEW SELECTION BASED ON TOPIC

MODELS

Anh Duc Nguyen

A THESIS SUBMITTED TO

THE SCIENCE AND ENGINEERING FACULTY

OF QUEENSLAND UNIVERSITY OF TECHNOLOGY

IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF INFORMATION TECHNOLOGY (RESEARCH)

School of Electrical Engineering and Computer Science

Science and Engineering Faculty

Queensland University of Technology

Brisbane, Australia

2018

review selection based on topic models i

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To the best

of my knowledge and belief, the thesis contains no material previously published or

written by another person except where due reference is made.

Signature:

Date: February 2018

QUT Verified Signature

ii review selection based on topic models

review selection based on topic models iii

To the memorable journey

iv review selection based on topic models

review selection based on topic models v

Keywords

Review Selection

Related Words

Pattern Mining

Topic Modelling

Pattern based Topic Model

WordNet

vi review selection based on topic models

Abstract

Online reviews provide valuable sources of information about products and assist

customers in purchase decision making. However, the sheer and overwhelming

volume of reviews, together with the large variations in review quality, present an

obstacle for customers in obtaining useful content from online reviews. Clearly, it is

impossible for customers to read thousands of reviews in a short time to get a full

picture of the product. Therefore, research work in helpful review selection has

attracted many researchers in recent times. A present line of research has been starting

to focus on the content of reviews, by analysing features of product mentioned in the

reviews. However, this research stream suffers some problems. The first and foremost

difficulty is the ambiguity of the text review content, which makes the task of

automatically analysing and understanding the review content very challenging. The

reason is that reviewers have their own styles and the freedom to write whatever they

want, without following any specific syntax, grammar or structure. As a result,

polysemy or synonyms are common issues in online reviews. Secondly, most of the

current proposed review selection methods are based on a perception that all features

of a product are independent. Nevertheless, there are always relationships among

features of products. Those related features are normally used together to express

information about a specific aspect of the product. Successful discovery of those

related features can certainly improve the task of review selection. Lastly, most

research has studied review selection as product-centric but not customer-centric.

Those studies consider that all of the features are equally important to customers and

propose review selection approaches that can cover as many features as possible. As a

result, returned reviews by those methods are always the same, despite the specific

needs of customers. For a specific feature that is very important to a certain customer,

reviews covering many other features but lacking detailed information about that

feature are useless to that customer. A recent work has been proposed to find

specialised reviews discussing a specific feature based on similar words to the

specified feature (Long et al., 2014) but this still suffers quite a few issues.

review selection based on topic models vii

To tackle the problems mentioned above, our work firstly proposes a novel approach

to extract the main features and related features of the product by applying data mining,

natural language processing and probabilistic Topic Modelling. By combining external

knowledge ontology WordNet with modern probabilistic model (Topic Model and

Pattern based Topic model), a completed set of related words to the main features can

be more accurately identified, therefore reducing the problem of ambiguity in online

review content. Specifically, Topic Model is a probabilistic model that can help to

discover related features of the main features. This is an important contribution to

reduce the problem of ambiguity. Secondly, in this thesis, we also propose a new

review selection method based on a single feature of the product. By utilising identified

related words to the target feature, together with our review selection methods, a set

of helpful reviews that insensitively discuss about the target feature can be identified.

As the features that are important to customer is inputted in the review selection model

by a customer, our work is therefore customer - centric.

We provide detailed experiments in this research to verify our proposed methods. The

results of our experiments in Chapter 5 confirm the outperformance of our proposed

approach over existing works.

viii review selection based on topic models

Table of Contents

Statement of Original Authorship ............................................................................................. i

Keywords ................................................................................................................................. v

Abstract ................................................................................................................................... vi

Table of Contents .................................................................................................................. viii

List of Figures .......................................................................................................................... x

List of Tables ........................................................................................................................... xi

List of Abbreviations .............................................................................................................. xii

Acknowledgements ............................................................................................................... xiii

Chapter 1: Introduction............................................................................................. 1

1.1 Overview ........................................................................................................................ 1

1.2 Research Problem and objective .................................................................................... 6

1.3 Research significance and contribution ......................................................................... 8

1.4 Publication ..................................................................................................................... 9

1.5 Thesis Outline ................................................................................................................ 9

Chapter 2: Literature Review ........................................................................... 11

2.1 Review selection .......................................................................................................... 11

2.2 Topic modelling ........................................................................................................... 19

2.3 Summary ...................................................................................................................... 25

Chapter 3: Main Feature Selection and Related Feature Selection .............. 27

3.1 Main Feature selection ................................................................................................. 28

3.2 Discovery of related words of main features ............................................................... 32

3.3 Summary ...................................................................................................................... 48

Chapter 4: Review Selection for Single Feature .............................................. 49

4.1 Overview of Helpful Review Selection ....................................................................... 49

4.2 Specialised Review Selection (SRS) method ............................................................... 50

4.3 The proposed review selection method ........................................................................ 51

4.4 Summary ...................................................................................................................... 56

Chapter 5: Experiments and Evaluation ......................................................... 57

5.1 Experimental Environment .......................................................................................... 57

5.2 Experiment Design ....................................................................................................... 57

5.3 Result Analysis and Evaluation ................................................................................... 65

5.4 Summary ...................................................................................................................... 74

Chapter 6: Conclusions...................................................................................... 76

review selection based on topic models ix

6.1 Conclusion ....................................................................................................................76

6.2 Limitations ....................................................................................................................77

6.3 Future Work ..................................................................................................................77

REFERENCES ......................................................................................................... 79

x review selection based on topic models

List of Figures

Figure 1. Mona Lisa Restaurant with 1022 reviews on Yelp.com ............................... 2

Figure 2. Canon EOS T5 with 779 reviews on Amazon.com ...................................... 2

Figure 3. Features summary ......................................................................................... 5

Figure 4. Probabilistic Latent Semantic Model. ......................................................... 20

Figure 5. A graphical model representation of the latent Dirichlet allocation

(LDA). .......................................................................................................... 22

Figure 6. Similar and sentiment words of Feature “atmosphere” .............................. 33

Figure 7. Topic 5 after removing words having low weight for one restaurant

dataset ........................................................................................................... 39

Figure 8. Related words of feature f (𝑅𝑊𝑅𝑓) ............................................................ 45

Figure 9. The representation of review content by main features and related

words ............................................................................................................ 48

Figure 10. Review Position Problem.......................................................................... 64

review selection based on topic models xi

List of Tables

Table 1. Main Features and Related Features ............................................................ 29

Table 2. Transactional Database ................................................................................ 31

Table 3. Dataset Information for Restaurant Business............................................... 59

Table 4. Main Features of Six Datasets ..................................................................... 65

Table 5. Helpfulness Score for Main Features of CAM1 .......................................... 66

Table 6. Average Helpful Score of six datasets ......................................................... 67

Table 7. Mean Significance Difference t-test ............................................................ 67

Table 8. Precision of top-10 and top-15 reviews returned by RSWR and SRS ......... 68

Table 9. Recall of top-10 and top-15 reviews returned by RSWR and SRS ............. 68

Table 10. F-score of top-10 and top-15 reviews returned by RSWR and SRS .......... 69

Table 11. Normalized Discounted Cumulative Gain ................................................. 69

Table 12. Helpfulness Score for Main Features of CAM1 ........................................ 71

Table 13. Average Helpful Score of six Datasets ...................................................... 71

Table 14. Mean Significance Difference t-test .......................................................... 72

Table 15. Precision of top-10 and top-15 returned reviews ....................................... 72

Table 16. Recall of top-10 and top-15 returned reviews ............................................ 73

Table 17. F-score of top-10 and top-15 returned reviews .......................................... 73

Table 18. Normalized Discounted Cumulative Gain ................................................. 74

xii review selection based on topic models

List of Abbreviations

LDA Latent Dirichlet Allocation

PBTM Pattern based Topic Model

PLSA Probabilistic Latent Semantic Analysis

NLP Natural Language Processing

POST Part-of-Speech Tagging

IC Information Content

LCS Lowest Common Subsumer

SRS Specialized Review Selection

RSWR Review Selection Based on Weighted Relevance

TRWS Topical Related Word Selection

PTRWS Pattern Based Topic Related Word Selection

SPMF Open-source Data Mining Library

JWNL Java WordNet Library

review selection based on topic models xiii

Acknowledgements

I would like first to express my sincere gratitude to my principal supervisor Associate

Professor Yue Xu for the continuous support of my Master research, and for her

patience, motivation, and enthusiasm. Her guidance helped me in all the stages of

research and writing of this thesis.

My sincere thanks also goes to my colleague Nan Tian for his assistance and

suggestions regarding the experiment implementation. Many thanks to the

administration staff in the University for their support during this research.

I would like to thank my parents for providing me with unfailing support

and continuous encouragement throughout my years of study.

Finally, I want to thank my wife, Hana. In the past two years, you have not only been

my wife, you have been my best friends. You have taken over family responsibility to

leave me time for my study. This accomplishment would not have been possible

without you. Thank you!

0 Chapter 1: Introduction 1

Chapter 1: Introduction

In this chapter, the background and context of this research is outlined (Section 1.1).

The gap in research areas of review selection are highlighted and the research problem

to solve is formulated (Section 1.2). The discussion of significance of the proposed

research will be provided in Section 1.3, and the research objectives will be described

in Section 1.4.

1.1 OVERVIEW

With the rapid expansion of Web2.0 and e-commerce, in recent years consumers have

witnessed an increasing number of online shopping activities. More and more shoppers

use online platforms to browse and buy products. Furthermore, online retailing with

interactive tools and forms of consumer feedback have encouraged consumers to share

their personal consumption experiences and express their opinions by writing reviews

about the purchased product. It is a fact that customers prefer to rely on comments as

an information resource in order to get a full overview of their target product. These

online reviews provide a valuable source of information about a product and assist

customers in their final purchase decision. However, the explosive proliferation of

reviews for each product on the Internet is also a headache for customers. It is common

to see several hundreds of reviews about one popular product or service on popular e-

commerce websites such as Yelp (http://Yelp.com) (Figure 1) and Amazon

(http://amazon.com) (Figure 2). Clearly, a typical user does not have enough time and

patience to sift through such overwhelming reviews. Furthermore, some reviews may

provide incorrect and missing product information, or even misleading reviews,

lowering the quality of reviews. As a result, customers may have deficiency of

knowledge on products, and be unable to judge the true quality of a product prior to

purchase. Taken seriously, the strong demand to seek helpful and high-quality reviews

has attracted researchers’ attention to the subject of online review selection.

http://yelp.com/

http://amazon.com/

2 0 Chapter 1: Introduction

Figure 1. Mona Lisa Restaurant with 1022 reviews on Yelp.com

Figure 2. Canon EOS T5 with 779 reviews on Amazon.com

Accordingly, to tackle the information-overload issues of online reviews, many

websites allow users to vote for the helpfulness of each review, based on their personal

experience. As a result, each review in the review collection obtains a helpfulness

score, for example, “153 out of 250 people found this review helpful”, and the reviews

can then be sorted according to this score. The helpfulness votes are generally an

effective way to filter reviews (Ghose & Ipeirotis, 2011), hence are a good indicator


for extracting useful reviews from the rest. Although it is certainly an improvement,

they also suffered significant drawbacks. For example, older reviews that have already

accumulated many votes are ranked higher, thereby increasing their visibility over

reviews that are newly posted. Commonly, users tend to choose high voting reviews

to read, so recent published reviews may never turn up on users’ radar due to no vote

or only a few votes.

Recently, a substantial amount of research has been conducted to identify the quality

of reviews effectively. Proposed methods have focused on automatically estimating

the quality of reviews by making use of the textual and social characteristics of

reviews, such as writing style, review length, grammar, reviewer information and

timeliness (Kim et al., 2006; Liu et al., 2007; Lu et al., 2010; Zhang & Varadarajan,

2006). One significant issue for these methods is that the content of reviews is ignored.

In fact, customers prefer to seek out, as much as possible, what a review is talking

about rather than how professionally the reviews are written. Many studies have

showed that the top reviews generated by those approaches very often contain

redundant information about a particular feature but miss out other important features

of the product that are also important to users. For example, the top-10 reviews of a

digital camera generated using those approaches repeatedly only mentioned the

“image”, and “battery” but comment nothing on the “weight”, “price”, “lens”, etc. and

fail to provide customers with an overview picture of the product. Therefore, a review

that has a professional style of writing and correct grammar but no useful content for

customers, cannot be considered as a helpful review.

This limitation has opened another line of research on the review selection field, where

the content of reviews has been taken into consideration. In online reviews, the content

of reviews can be expressed as information about features of a product buried in

reviews. Feature is primarily defined as “an attribute of the product that has been

mentioned in reviews” (Hu & Liu, 2004). For example, “display” and “battery” are

two features in the sentence “the display of this camera is blurred sometimes but the

battery is an advantage”. When reviewers write a comment about a product, they

mainly express their personal experience about features of a product. Similarly, readers

try to understand a product by seeking information about the features before making a

decision. In fact, the features of a product are the main topics of discussion in online

reviews. The importance of features in online reviews has sparked a new line of


research in review selection, based on features in the review. First of all, a number of

studies have focused on selecting a subset of reviews, which can cover many features

and preserve the characteristics of the origin review corpus. The advantage is that users

can read this sub-collection of reviews instead of becoming immersed in thousands of

the origin review corpus to get the overview of a product. Tsaparas et al. (2011)

focused on selecting reviews that can cover as many features as possible and those

reviews should offer different viewpoints for each feature. However, the selected

reviews by this method might not reflect the origin opinion distributions of the review

corpus and might lead to an insufficient picture of a product for users. Take reviews

of a digital camera, for example; if there are 80% of the total reviews that complement

the feature “price” of camera, then the overall opinion on the price of the camera

should be positive. Reviews discussing the feature “price” selected by Tsaparas‘s

method might fail to reflect the positive overall viewpoint on “price” of camera.

Selecting a set of reviews that can preserve the proportion of positive and negative

opinions has been proposed by Lappas et al. (2012). However, reviews covering many

features are not always preferred by readers. Customers have their own different needs

for different features of the product. While many users would like to know about all

features of a product, some people may be interested in a few features, or only the one

single feature that is necessary for them. For example, a person, named Tom, needs a

laptop for his new job. Travelling is one part of his work and he does not use his laptop

for designing or playing games. The portability of the laptop is much more important

than how efficient the graphics card of the laptop is, in his situation. In this case, Tom

really wanted to read reviews of the laptop that mainly discussed portability and did

not bother much about the graphics card. This indicates the necessity of research about

online reviews for single features, however there are not many works that have been

undertaken for this need. Hu and Liu (2004) were pioneers in proposal research on

features of product. Their work investigated a method to summarise the semantic

orientation of features in the review collection. Figure 3 shows the output of the feature

summary proposed by their method. According to this output, customers can gain an

overall opinion about each feature of a product without reading the whole review

collection.


Figure 3. Features summary

However, their methods only focused on sentiment analysis but not the selection task

of helpful reviews. Although customers can click on link <individual review

sentences> in order to view reviews having an associated positive or negative opinion,

those reviews may not provide the detailed information discussed regarding the

features of a particular product. In fact, customers still want to read original reviews

to gain a deeper understanding about the feature and obtain as much information as

possible by themselves. Long et al. (2014) proposed a method to find reviews based

on the amount of information about the feature. The amount of information can be

calculated by using a set of words that are relevant to the target feature. In that way,

reviews having a high amount of information about the feature are considered

specialised reviews because those reviews intensively discuss the feature. Both (Liu)

and (Long, et al.) studies deal with the task of textual analysis of natural language.

However the ambiguity of natural language makes it a very difficult task for analysis

of online review content. The reason is that the written languages used in online

reviews are very complex and do not always deliver an explicit meaning. In fact, these

two methods do not always provide good results for different kinds of dataset because

of the ambiguity issue.

In general, to the best of our knowledge, current research on review selection suffers

from the ambiguity of the polysemy and synonym issues in online review. This thesis

explores how to apply data mining, as well as a probabilistic method to analyse and

represent the content of online review more effectively. Secondly, most of the previous

studies on review selection proposed to select reviews where the whole collection of


features is taken in consideration. As analysed above, the review selection for a single

feature of a product is also necessary. The second part of this study proposes a new

method to select helpful reviews discussing a single feature or a small group of features

of a product.

1.2 RESEARCH PROBLEM AND OBJECTIVE

In the previous section, the research background and motivation in the online review

field were introduced. Current issues of research areas of online reviews are briefly

discussed and the goal of this research study is stated. In this section, research

problems will be described in detail, and then research objectives to accomplish this

study are identified.

1.2.1 Research Problem

Two primary problems for the current research into online reviews are identified as

following.

Problem 1: The ambiguity of textual content in online reviews.

The analyses of textual content on online reviews are always a difficulty for the

research areas of online reviews. In contrast with other structured data, online reviews

are unstructured and complex. Reviewers no doubt have freedom to write whatever

they want without following grammar, syntax or vocabulary rules. In fact, reviewers

usually use local language, specific phrases, abbreviations and figurative style to

express their opinion, in their comments. In addition, there are severe polysemy and

synonym issues, i.e., the same word may be used by different users to mean different

concepts, or users may use different words when referring to the same concept. As a

consequence, the task of automatically analysing text meaning to understand the

information buried in the online reviews have faced many challenges. Based upon the

discussion, the research question to address this problem is as below.

How to analyse and represent content of online reviews in a semantic way?

Problem 2: Helpful review selection according to single feature.

In the past few years, existing research has focused on selecting, classifying and

summarising online reviews by using natural language processing and data mining

techniques where all of the features of the products are taken into consideration (Dave

et al. (2003), Pang et al. (2002), Pang and Lee (2008)). Those studies considered that


features of a product are equally important and attempted to select reviews having as

many features as possible. As discussed in Section 1.1, each product feature plays a

different role in consumer consideration, depending on their needs. As a consequence,

there are certain product features that are less interesting to users than others.

Therefore, new methods of selecting helpful reviews according to a single feature

should be developed. Based upon the discussion, two questions to be addressed in this

problem as following.

Which factors should be considered for deciding the helpfulness of reviews

according to a single feature?

How do we utilise textual information in online reviews to select helpful

reviews for a single feature?

1.2.2 Research Objectives

According to the research problem and research questions, three primary research

objectives that need to be achieved for this research are listed below.

Objective 1: To propose methods to identify related similar words of the

features based on external knowledge base (WordNet).

One of the focuses in this study is to alleviate the problem of ambiguity in review

content. In online reviews, features and information about features are main topics of

discussion, thus represent the content of online reviews. Therefore, effective

identifying features and related words of the features are the primary objectives in this

study. As WordNet is a popular external electronic lexical network resource which can

be considered as an ontology for natural language terms, WordNet can help to find

related concepts to the target concept. Therefore, this thesis proposes to use knowledge

base from WordNet to identify the related similar words of the features of product.

Objective 2: To propose methods to identify related words based on

probabilistic topic models.

While WordNet is external knowledge, which can be used to find similar and synonym

words, Topic Modelling has been a state-of-the-art statistical model used in analysing

hidden themes and discovering relationships among words inside each theme (Blei et

al., 2003a). Topic modelling is therefore expected to find related words of the target

feature. Additionally, a pattern based topic model proposed by Gao et al. (2013) is also

promising method of finding related words in the text corpus. Topics in traditional

topic modelling (LDA) has limitations in semantic representation because each


generated topic from LDA only contains a list of single words. While the relatedness

of the words in the topic has been confirmed, the semantic meaning of the topic is still

a current research issue. As patterns in pattern based topic model better represent the

semantic meaning of the topic, pattern based topic model is expected to find related

words more effective. Therefore, this thesis aims to apply modern probabilistic topic

modelling, such as LDA topic model and pattern based topic model, to accurately

identify related words of the features.

Objective 3: To propose an approach to select helpful reviews for a single

feature based on the related words of the feature.

Review selection for a single feature is the second goal of this study. In order to

accomplish this goal, a new method based on information distance theory is proposed

to generate a set of helpful reviews for a single feature. Related words discovered by

the method proposed in Objective 1 and Objective 2 are utilised as the amount of

information for the selecting approach. Detail of the review selection method is

introduced in Chapter 4 of this thesis.

1.3 RESEARCH SIGNIFICANCE AND CONTRIBUTION

This research makes a number of important contributions to the research area of online

review, including semantic online review analysis and helpful review selection.

This thesis proposes a new approach to solve ambiguity problem of textual

content in online reviews.

The ambiguity problem caused by synonym and polysemy issues in the online review

is a key obstacle for the task of automatically analysing the review content. By

applying a pattern mining technique, natural language processing and probabilistic

model, the polysemy and synonym problem can be effectively alleviated. In addition,

the structural representation of the review content by features and related words can

make the review content easier to understand. This repression can be useful in a wide

variety of other studies of online review content.

The thesis proposed new approaches for effectively selecting helpful reviews

for a single feature of product.

The proliferation of online reviews makes the task of finding useful reviews for

customers more important. Most research carries out studies about selecting reviews

where all of the features of a product are taken into consideration because they believe


that customers are interested in all features of a product. However, this is not always

true when some customers might only be interested in a single feature or a small

number of features that are important for them. This thesis work proposes a new

approach of selecting helpful reviews for a single feature of product; thus contributing

to the review selection area.

1.4 PUBLICATION

A review selection method based on topic models has been published in the 2016

Pacific Rim Knowledge Acquisition Workshop (PKAW2016,

http://pkaw.org/pkaw2016/).

Nguyen, A. D., Tian, N., Xu, Y., & Li, Y. (2016, August). Specialized

Review Selection Using Topic Models. In Pacific Rim Knowledge

Acquisition Workshop (pp. 102-114): Springer International Publishing.

1.5 THESIS OUTLINE

The thesis is organized in 6 chapters. The overview of each chapter is outlined

as below:

Chapter 2: presents a detailed and critical literature review of existing related

research studies necessary to address the research problems defined in section

1.2. The literature reviews cover two related major areas: Review Selection and

Topic Modelling. The research gap and drawbacks of current review selection

method are identified and justified according to my research question.

Chapter 3: introduces our proposed related word selection method to address

the problem of ambiguity of online review content. As review content can be

represented by features and related words, correctly identifying features and

related words is crucial for understanding the review content. In this chapter,

we first discuss feature extraction by using pattern mining and then present our

method of identifying related word using WordNet and topic modelling.

Chapter 4: introduced our proposed review selection methods for a single

feature of product. The current problems of review selection method and

criteria of a helpful review for a single feature are presented. Our method of

http://pkaw.org/pkaw2016/


reviews selection using direct relevant and information distance of related

words are discussed in this chapter.

Chapter 5: discusses the experiment and evaluation of our models proposed in

Chapter 3 and Chapter 4. The proposed review selection and related word

selection models are evaluated by comparing their abilities to select helpful

reviews with the baseline models.

Chapter 6: summaries key finding, achievements and limitations. The potential

future works are also pointed out for further enhancement of the proposed

models.

Chapter 2: Literature Review 11

Chapter 2: Literature Review

This work is closely related to field of review selection and topic modelling. This

chapter presents a critical review of those areas that are essential in addressing the

research gaps mentioned in the Introduction of this thesis.

2.1 REVIEW SELECTION

Nowadays, e-commerce retail and online shopping are growing strongly, and

increasingly, a number of consumers tend to read product reviews before making

buying decisions. Such reviews on online platforms have become an information

resource that helps buyers in their purchases ( Hu and Liu (2004); Hung et al. (2004);

Ye et al. (2009); Liu et al. (2008)). Meanwhile, there are thousands of reviews written

each day, for many different products, on online merchants like amazon.com, or on

user reviews and recommendation websites such as yelp.com, etc. For instance, a

simple Canon 60D digital camera body has already accumulated 975 reviews on

amazon.com. The hundreds, or even thousands, of reviews make it impossible for users

to read all of them and choose reliable reviews for collecting information. Early works

like those of (Hu & Liu, 2004), provided a method of mining and summarising

customer reviews into useful information (features and corresponding subjective

opinion). However, such works mainly focused on classifying the semantic opinion

for each feature, but ignored the summarisation of the whole reviews. In fact,

customers may still prefer to read the whole content of reviews to have a vivid picture

of the product in which they are interested.

In addition, many of those reviews are not always satisfactory in terms of providing

useful information. The well-known problem of online reviews is that they are varied

in quality, from very useful to useless or even spam (e.g. fake reviews) (Zhang &

Zhang, 2014). Hence, it is extremely difficult for online users to weed out the helpful

reviews worth their attention. On demand, researchers on review selection fields have

been working out effective ways to extract and recommend useful reviews to users.

This section serves as an overview of existing research on selecting reviews: review

quality assessment and review selection based on product features.

12 Chapter 2: Literature Review

2.1.1 Review Quality Assessment

Some reviews are better in terms of quality than other ones. Hence, there are ideas for

sorting out all reviews in a way so that higher quality reviews are always shown first,

as discussed in the following section.

2.1.1.1 Crowd vote based metric

Online merchants such as amazon.com have built their own human feedback tool in

their websites that allow their users to vote each review as helpful or unhelpful by

clicking a thumb up or down icon after they read the reviews. The total votes a review

receives are then updated in a form of “80 out of 100 people found the following review

helpful” and displayed at the top of the review content as an indicator of

helpfulness. In these cases, the quality of an online review is determined by a

helpfulness voting ratio. However, there are some issues arising when using this

helpfulness tool, such as that newly posted reviews will only have a few votes, or more

likely, no vote, making it very difficult to identify their helpfulness (Liu, et al., 2008),

and perhaps not reflecting the real quality of reviews.

Pang and Lee (2008) pointed out two shortcomings of the helpfulness voting tool.

Firstly, many users have not answered the helpfulness question after reading the

reviews. In addition, not all of the most helpful reviews are the best reviews. There is

a tendency that the earlier a review is posted, the more votes it will get (Zhang &

Zhang, 2014). Ghose and Ipeirotis (2011) mention another point of view, that recently

posted reviews need extra time to accumulate helpfulness votes. Therefore, the helpful

votes collected by these websites may not accurately represent true helpfulness for

those reviews posted in recent times.

Based upon the disadvantages of the crowd-based voting method mentioned above,

researchers have attempted automatic classification of helpful reviews as soon as they

are posted with respect to assessing the quality of reviews. Research such as that of

(Kim, et al., 2006) investigated automatic predictions of helpfulness of reviews that

considered users’ votes as ground-truth evaluation. The authors trained an SVM

regression model to learn the helpfulness function and ranked reviews according to

their output scores. However, assessing reviews’ helpfulness based on users’ rating of

ground-truth is also quite limited due to several voting biases.


2.1.1.2 Review content and style

There are a number of studies that have attempted to investigate in textual aspects of

reviews, such as reviews’ content and writing style, to assess quality of reviews. The

content of a review is the information it provides to readers, while writing style is more

related to word choice and language, as well as number of words or sentences and the

average length of sentences in the target texts. The argument for this motivation is that

due to writers’ knowledge and language skills, their reviews are different in quality.

Previous studies such as those of O'Mahony and Smyth (2009) and Liu, et al. (2008)

have shown that the linguistic style is a very good indicator of quality of the review.

O'Mahony and Smyth (2009) investigated a number of structural features that might

affect the writing quality of a text document, such as the ratio of uppercase and

lowercase, the number of complex words, and the number of sentences, etc. They

summarised several readability aspects based upon derived structural aspects for

assisting review quality modelling. Their proposed model made use of a set of reviews

that have been labelled as helpfulness to train the classifier in terms of identifying text

features. After that, the classifier was employed to detect those reviews having good

writing quality.

Hoang et al. (2008) used a supervised classification model in which output scores are

considered as the quality of a given text document. The authors first manually labeled

experimental documents into three levels of document quality, defined as good, fair

and bad. They then trained the model learning on these aspects. The classifier as

trained from an annotated corpus, then ranked documents according to their prediction

scores. Their study found that formality – the writing style of the target document - is

the most effective aspect for assessing the quality of the target document.

In another attempt to find out well-written reviews in online movie reviews, Liu, et al.

(2007) used a fixed set of tags to label different Parts-of-Speech (POS) words

contained in the reviews in order to determine the writing style and length. Liu, et al.

(2008) enhanced the existing work of the product review helpfulness problem by a

binary classification approach. The scholars explored review aspects such as

readability, informativeness and subjectiveness. The model learns aspects on the

informativeness of reviews, such as the following:

the number of words in the review

the average length of sentences and number of sentences in the review


the number of sentences to product features

Zhang and Zhang (2014) assumed that reviews that are highly readable tend to be more

helpful. On the contrary, reviews with multiple grammatical errors and misspelled

words are less helpful to users. They use LanguageTool Java API to implement the

language and grammar check for this task. They also employ the fraction of the number

of errors in the review text and the number of sentences as the value of feature error

per sentence.

Mudambi and Schuff (2010) utilised a review database collected from Amazon.com to

analyse and predict review quality. In their hypothesis, the authors found that the

logarithm of word count and review helpfulness is positively related, and the review

length has an impact on helpfulness voting. Their research indicated that a review

provides helpful information to aid in the decision-making process of a consumer and

that the helpfulness of a review increases as the word count increases.

Recently, Ghose and Ipeirotis (2011) indicated that reviews with more subjective

words are recognised as more helpful reviews through opinion mining. Li and Vitányi

(2013) analysed content-based aspects that directly influent product reviews’

helpfulness. It was also found that written reviews that were less abstract in content

and highly comprehensible, result in higher helpfulness. Wang et al. (2013) proposed

a technique called “SumView”, a web-based review summarisation system that

automatically extracts the most representative expression of customer opinion in the

reviews on various product features. Siering and Muntermann (2013) agreed with other

authors in revealing that a unique property of reviews is that reviews with information

related to the quality of the product, received more helpfulness votes. Korfiatis et al.

(2012) investigated the directional relationship between the qualitative characteristics

of the review text, review helpfulness, and the impact of review helpfulness on the

review score. However, they found that review length has less effect on the helpfulness

ratio than review readability.

2.1.1.3 Reviewer reputation and social context

While shallow syntactic aspects from the text of reviews are mostly useful, the review

length is somehow considered weakly correlated with quality of review. Therefore,

some researchers were looking at social context aspects of reviewers to assess quality

of reviews. Social context aspects are information extracted from the reviewer's social

context, for example, the number of the reviews posted by this reviewer, the average


rating for this author, etc. (Zhang & Zhang, 2014). Results by (Liu, et al., 2008) and

(Chen & Tseng, 2011) indicate that reputation (in combination with other features) is

a very good classifier. (Liu, 2010, 2012) proposed to consider reviewer expertise as it

was observed that reviewers who were familiar with particular movie genres were

likely to produce good reviews for movies in the same or similar genres. In terms of

this fact, the proposed approach measured the similarity of a given movie (which can

be represented by a set of genres) to all movies that have been reviewed by the same

user in this regard.

O'Mahony and Smyth (2009) designed a system to learn helpfulness of hotel reviews

based on the reputation aspect of reviewers. Specifically, the system captured three

aspects of a user’s reputation, including the mean of total number of reviews written

by the user; the standard deviation of review helpfulness over all reviews written by

the user; and the ratio between those reviews that have accumulated at least five

opinions (Missen et al., 2009) and the total reviews written by the user.

Furthermore, according to Lu et al. (2009), the quality of a reviewer plays an important

part in deciding the quality of the review that he or she writes. Therefore, from the

quality of reviewers, researchers can estimate the quality of their reviews easily. The

authors observed the difference of standard deviation of the quality between two

reviews from the same author, and two reviews from two different writers. With two

reviews of the same writer, they have much lower standard deviation scores than that

of two reviews from different writers. The authors believed that the quality of reviews

would be consistent when they were written by the same writer. In addition, another

social context aspect that could help to decide the quality of a reviewer – hence quality

of review - is the quality of their peers in the social network. Although this approach

of using social context is simple and applicable, it still has some drawbacks. In fact,

all social context information is not always available for all reviews. For instance,

when a review is written by a new user, there will be no information about their history

or social network, hence predicting the social context aspect task is no longer

applicable.

Zhang and Zhang (2014) examined reviewers’ history in order to predict the

helpfulness of reviews written by them. The fact is that people with better reputations

in the online review community tend to provide more influential discussions and their

reviews tend to be more helpful. The authors collected the information from reviewers'


profile pages, such as reviewer’s ranking, the amount of reviews written by reviewers,

and percentage of helpful votes reviewers received on previous reviews. The results

showed that the percentage of helpful votes and quantity of previous reviews by

reviewers contributed to higher performance in their model.

2.1.1.4 Review meta data

In addition to author-oriented aspects, researchers also investigate to develop an

effective system based on review meta data aspects, to determine review helpfulness.

In (Chen & Tseng, 2011), the timeliness is the extent to which the information in a

review is timely and up-to-date. Old or duplicate reviews cannot reflect the value of a

product over time; thus, the quality of information is low, hence so is the quality of the

review. Believability extent is also explored by the authors. Believability is the extent

to which the review information is credible, or regarded as true. They therefore

measure the deviation of a review's product rating from the average to assess its

believability. In (Liu, 2010, 2012), the author noticed that there is a relationship

between review timeliness and review helpfulness, for which review helpfulness was

seen to decline for older reviews. Therefore, they compared the time of a review that

had been posted with the movie release time, in order to measure the impact of

timeliness upon the review helpfulness. Timeliness was shown to be a good predictor

of the helpfulness of movie reviews, where review helpfulness was seen to decline for

older reviews. Chen and Tseng (2011)’s work above also assessed reviewers based on

their review histories. They considered that if a review has high evaluation in a

category, then the author of this review is credible in the category.

2.1.2 Review selection based on product features

As discussed above, most existing approaches to assess review helpfulness are

automated prediction mechanisms, which typically rank reviews based on their overall

score. However, these methods of selecting reviews have drawbacks. First, these

works require more time and human resources in order to label the data and train the

classification system. Furthermore, the top ranking reviews selected may contain

information not useful for users, i.e. a selected review can have redundant information,

but may have a low coverage of product features. For instance, users may find that all

the top selected reviews only talk about the “food” feature of restaurant, but nothing

about other features such as “atmosphere”, “drink”, and “service”, etc. Therefore, the

top results may cover a single viewpoint only, and make that difficult for users to have


a diverse set of opinions about the product. At the same time, the ranked list of reviews

may not represent different points of view (e.g. positive, negative, and neutral) of the

reviewed product.

A product has more than one feature, and some special features tend to be more

important than others; that may affect the consumer-making-purchase process. A

product feature is defined as an attribute describing characteristics of a product that is

interesting to customers. An overall ranking of a review is an important measure;

however different product features are important to various customers based on their

needs. For instance, although a digital camera is ranked highly overall, it may have the

feature of “battery life” that is of concern to a customer. There has been substantial

amount of research focused to maximise the helpfulness of the selected reviews in

order to overcome the drawbacks of those review selection methods that are only based

upon derived review quality. In the following part of this section, we introduce two

directions in the research of useful review selection.

2.1.2.1 Select review based on product features

Early works stream focus on features of product is an approach investigated by

Popescu and Etzioni (2007). The scholars introduced an unsupervised system to extract

features and associate opinion, which are then ranked by their strength, and used to

build a model of important product features.

Zhang and Zhang (2014) proposed a feature-based, product ranking approach. They

mined data of Digital Camera and Television reviews on amazon.com to identify

features in the product category, and their associate subjective and comparative

sentences in reviews. The authors then built a product graph for each feature, and

mined the graph to determine the relative quality of products. Long, et al. (2014)

pointed out that most review selection approaches do not consider personal centric. In

other words, some users may be interested in only certain features, so they only look

for reviews that have intensive discussion about these features. Under these

circumstances, their works focused on extracting reviews in which a single feature is

intensively discussed. In detail, given a specific feature from the review collection,

their model extracted a set of similar words related to that feature. Then they used

Kolmogorov complexity and information distance to calculate the amount of

information from these related word sets. The most specialised review on a feature was

the one with minimal information distance. However, one significant drawback of this


method is that similar words of those core feature words are found based on the Google

code of length. Google finds all similar words, which are most likely, and synonyms

to the core feature words (Cilibrasi & Vitanyi, 2007). In some circumstances, the

similar words found by this tool are not related to certain contexts that the core features

have discussed. Take the feature “star” for example, when using Google distance to

find similar words to the feature “star”, words like “genius”, “lead”, “stellar” were

returned as similar words of feature “star”. The word “star”, in the context of

restaurant, indicates the ranking of restaurant and clearly does not have a similar

meaning to word “genius”, “lead” or “stellar’. Another shortcoming of this method is

that selected specialised reviews may or may be not helpful to users. As discussed

above, reviews that cover more features tend to be more helpful reviews (Tsaparas, et

al., 2011). If we only focus on finding reviews that discuss one special feature, other

high-quality reviews covering more than one feature can be missed. It is also the fact

that professional users tend to write down their opinion on a group of related features

of the product.

2.1.2.2 Review corpus representation

Rather than scoring each individual review and selecting the top-k best reviews,

researchers in this field tried to select a set of reviews that collectively perform well

and represent the whole corpus. Tsaparas, et al. (2011) pointed out that coverage and

diversity of viewpoints are very important to users, together with review quality. They

formulate it as a maximum coverage problem. Their works mainly focus on how to

select a small, but comprehensive set of reviews that best capture many different

features of a product, and also are discussed in many different viewpoints. They

proposed Greedy algorithms to extract highly rated reviews that satisfy these

requirements. The outcomes are reviews having a maximum information gain in terms

of feature coverage and opinion coverage, which enable users to better evaluate the

product in review. While their works do diversify the set of collective reviews, they

however failed to reflect accurately sentiment polarity in the reviews of the collection.

Most related to them, however, in Lappas, et al. (2012), the authors argued that

although including at least one positive and one negative opinion on each feature, the

collective set did still not reflect the proportion of positive and negative opinion on a

feature in the original review. Hence, they presented a novel approach to select a small

set of reviews that could cover all product features, at the same time preserving the


opinion distribution of the whole corpus. The outcome set of reviews is an accurate

statistical summary of the entire review collection, with respect to preserving the

proportion of opinion expressed for different features; hence it is easy for users to have

a vivid picture of the product without reading the whole corpus. However, one

drawback of their work is that quality of reviews is not considered due to them ignoring

the quality of each individual review in the collective subset.

Another novel approach, with a different angle, attempts to improve the review

selection task, based on identification of the product features and the relationship

between features. Making use of product ontology, called product feature taxonomy,

and the hierarchical relationship between features, the work of Tian et al. (2014)

proposes a review model generation in order to select reviews. Given a collection of

reviews, the model estimates the quality (of diversity and comprehensiveness of a

certain feature) and then ranks the reviews based on user-concerned criteria. Their

experiment promises further works in review selection.

2.2 TOPIC MODELLING

The study of topic modelling begins from the need of analysing, representing and

summarizing the contents of large, unstructured text collections with an expectation to

capture the latent semantics of the text collections. Latent semantic analysis (LSA) is

a first attempt used to transform a high dimensional vector space representing the text

document to a linear subspace, by applying Singular Value Decomposition (SVD)

(Deerwester et al., 1990). This subspace can be called latent semantic space because it

presents sophisticated features that help to capture the latent semantics of documents,

such as synonym and polysemy. However, LSA has some shortcomings because of its

unsatisfactory statistics theory and the complexity of computation. Hofmann (1999)

overcomes some deficiencies of Latent Semantic Analysis (LSA) by introducing the

probabilistic Latent Semantic Analysis (pLSA) - generative model. Multinominal

random variables, which represent topics, are mixture components in the pLSA models

and each independent document therefore is a mixture of latent topics.


Figure 4. Probabilistic Latent Semantic Model.

The joint probability of the observed word-document (d,w) is defined by a mixture

process:

𝑃(𝑑, 𝑤) = 𝑃(𝑑)𝑃(𝑤|𝑑), 𝑃(𝑤|𝑑) = ∑ 𝑃(𝑤|𝑧)𝑃(𝑧|𝑑)𝑧∈𝑍 (Hofmann 1990)

Hofmann‘s work shows that pLSA outperformed LSA and was viewed as a significant

step toward probabilistic modelling of text. However, there are still some issues related

to its ability to provide a generative probabilistic model for the mixing proportion for

these topics.

Recently, an improved probabilistic model for analysing large electronic archives of

documents has been employed, based on “topic”, called “topic modelling”. The aim of

topic modelling is to analyse and discover the topical patterns that run through a given

corpus of documents and record their evolution over time. Blei and McAuliffe (2008)

define topic and topic modelling as follows:

“A topic is a probability distribution over terms in a vocabulary. Informally, a

topic represents an underlying semantic theme; a document consisting of a

large number of words might be concisely modelled as deriving from a smaller

number of topics. Such topic models provide useful descriptive statistics for a

collection, which facilitates tasks like browsing, searching, and assessing

document similarity.”

Literally, topic modelling analyses the words in the original documents to find out

hidden themes running through them, as well as how these themes are related to each

other. In recent days, topic modelling is emerging as a principal unsupervised learning

method, which merely replies to analysis of original texts without the requirement of

document labels or human annotation.


2.2.1 Latent Dirichlet Allocation

The simplest, yet most well-known probabilistic topic modelling that has emerged

recently is Latent Drichlet Allocation (LDA), proposed by Blei et al. (2003b). The idea

of this LDA probabilistic topic model is based on the assumption that every document

is generated by a mixture of topics, and each topic is defined as a multinomial

distribution over fixed vocabulary of terms. Outputs of the model are the assignment

of words in documents to topics (clusters) and the distribution of topics to documents

(document proportion).

Let 𝐷 = {𝑑1, 𝑑2, . . . , 𝑑𝑀} be a collection of M documents. Having V topics, the

probability of a word in a given document is defined as:

𝑃(𝑤𝑑,𝑛 ) = ∑ 𝑃(𝑤𝑑,𝑛|𝑧𝑑,𝑛 = 𝑍𝑗) × P(zd,n = Zj)𝑉

𝑗=1, where 𝑤𝑑,𝑛 denotes the nth

word in document d, 𝑧𝑑,𝑛denotes the topic assignment for word 𝑤𝑑,𝑛 , Zj is the topic j

∈ 𝑉, zd,n = Zj means that the word 𝑤𝑑,𝑛 is assigned to topic 𝑍𝑗. The topic model

generated by using LDA consists of topic representations at collection level and topic

distributions at document level. At collection level, let 𝜙𝑗 be a multinomial distribution

over word for topic 𝑍𝑗. 𝜙𝑗 is defined as:

𝜙𝑗 = (𝜑𝑗,1, 𝜑𝑗,2, ⋯ , 𝜑𝑗,𝑛), ∑ 𝜑𝑗,𝑘 𝑛𝑘=1 = 1, where 𝜑𝑗,𝑘 is the probability of the kth

word for topic Zj.

At document level, let 𝜃𝑑 be probability distribution over topics. 𝜃𝑑 is defined as:

𝜃𝑑 = (𝜗d,1, 𝜗d,2, … , 𝜗d,𝑉) ∑ 𝜗d,j 𝑉𝑗=1 = 1, where 𝜗d,j is the probability of topic j for

document d.

As presented in the graphic model of LDA (Figure 5), there are three latent or hidden

topic structures including the topics (𝜙𝑗), per document topic distribution (𝜃𝑑) and per

document per word topic assignment (𝑧𝑑,𝑛), which need to be computed based on the

observed words 𝑤𝑑,𝑛. In other terms, the purpose is trying to answer the question,

“what is the latent structure that is likely generate the observed documents”. LDA,

therefore, can be considered as “reversing” the generative process which tries to

optimise the posterior distribution of the latent variable-given document collection.

LDA can overcome the limitation of pLSA since per-document-topic proportion is


computed based on a latent random variable called the Dirichlet parameter which is

randomly drawn from Dirichlet distribution.

Figure 5. A graphical model representation of the latent Dirichlet allocation (LDA).

Nodes denote random variables; edges denote dependence between random variables.

Shaded nodes denote observed random variables; unshaded nodes denote hidden

random variables. The rectangular boxes are “plate notation”, which denote replication

(Blei, et al., 2003a)

For approximating the posterior distributions of the latent variables in LDA, several

statistical inference techniques have been developed to infer these distributions from

large text corpora, such as expectation propagation (Minka & Lafferty, 2002), mean

field variational inference (Blei, et al., 2003a), collapsed variational inference (Teh et

al., 2006), and Gibbs Sampling (Steyvers & Griffiths, 2007). Among those techniques,

Gibbs Sampling based on Makov chain Monte Carlo is a well-known technique using

parameter estimation in LDA in recent years.

The important contribution of LDA is its ability to represent and summarise the large

text collection to shorten meaningful forms including topics and document

representation. Topics are represented by a multinomial distribution over words, where

each word assigned to those topics has a different weight value, indicating which

words are important to which topics. A document is represented by a probability

distribution over topics where each topic assigned to the document has different weight

value, indicating which topics are important to the document. Moreover, LDA can


adapt and be extended as a module in a more complex model for other more

complicated goals. Therefore, LDA has quickly become one of the most popular

probabilistic techniques for topic modelling.

2.2.2 Pattern based topic modelling

One of the common problems of LDA is that the word-based or term-based topic

representations may not be able to semantically represent documents, and make the

topic hard to understand. Gao, et al. (2013) has proposed a pattern based topic model

by applying a pattern mining technique on traditional LDA. Therefore, a pattern based

topic can be represented by a list of patterns instead of single words. As a pattern gives

more specific meaning than a single word, a pattern based topic can provide better

semantic meaning than an LDA topic.

Gao et al. (2013) proposed a two-stage approach that combined statistical topic

modelling and classical data mining techniques to represent the sematic content of

documents and improve the accuracy of topic modelling output in large document

collections. In the first stage, their works applied traditional LDA to generate topic

representation at collection level and document level. These two representations are

used to build a word-topic assignment, which is also a transactional dataset used for

pattern mining in the next stage. In the second stage, the pattern based topic can be

generated from a transactional dataset by applying a pattern mining technique.

2.2.3 Applications of topic modelling

Topic modelling is considered as a state-of-the-art technique, which has been used in

diverse fields, mainly in sentiment analysis, and information retrieval.

Application of Topic modelling on Sentiment Analysis

Mei et al. (2007) proposed a probabilistic model for the analysis of the mixture of topic

and sentiment on Weblogs. Their model-named Topic-Sentiment Mixture (TSM) is a

combination of a pLSA model and sentiment model in order to capture the latent

topical features and their sentiments simultaneously in different Weblog collections.

Titov and McDonald (2008) argued that standard models such as LDA are only

suitable for discovering topics associated with global properties of documents (e.g.,

the brands, product type or product name) rather than rateable features of documents.

The authors then extended both pLSA and LDA by building a Multi-grain LDA (MG-

LDA) model which includes both global model and local model. While the global


model identifies global terms at the document level context, the local model discovers

rated features by using a sliding text window over text. Although the model is

successful in extracting rateable features, it cannot separate feature and sentiment

words. Lin and He (2009) extended LDA by building a probabilistic modelling

approach – named joint sentiment - topic model (JST) to classify sentiment at

document level based on a topic modelling approach, but the distinction of feature

words and sentiment terms cannot be achieved.

Brody and Elhadad (2010) introduced an unsupervised method using topic modelling

to extract features and then analyse feature-specific opinion words. Although the

opinion words and extracted features are separated, their sentiment words are

discovered outside of topic modelling by analysing only adjective words. To combat

this shortcoming, Zhao, Jiang, et al. (2010) proposed the Maximum Entropy and LDA

hybrid approach (Zhao, Jing, et al.), which can automatically separate aspects and

opinion words. The method is an extension of LDA, which uses two indicator variables

to distinguish between opinion and features words. The MaxEnt component uses a

small number of training sentences to learn POS tags, assisting to separate opinion

words and sentiment words. Mukherjee and Bing (2012) proposed a joint model (SAS

and ME-SAS) to extract and categorise feature terms automatically. Similar to Zhao,

et al. (2010), they also used Maximum Entropy to separate features and sentiment

words but they also used seeds from users as guidance for the inference process. Their

models are thus semi-supervised.

Application of Topic Modelling on Information Retrieval

Gao, et al. (2013) proposed a pattern based topic model to represent text documents.

The idea is that pattern provides better semantic meaning than a single word.

Therefore, they combined pattern mining with traditional topic modelling to represent

topics by pattern, instead of single words. The results show that their proposed model

generates discriminative and semantic representation for modelling topics and

documents.

Application of Topic Modelling on Review Recommendation

Another line of work using topic modelling aimed at predicting the top-k best reviews

based on rated reviews by user. Krestel and Dokoohaki (2011) proposed a method to

model reviews based on LDA and generate an adequate ranking based on Kullback-

Leibler divergence. By using LDA, each review will be modelled as a mixture of


topics, and those latent topics are represented as a “list of multigrams with a probability

for each multigram indicating the membership degree within the topic”. After

discovering latent topics, star ratings are combined in order to transform the topic

model into a topic ranking model. Despite that their studies can be useful for providing

personalised review recommendations to users, it is only being carried out at the small

scale, thus the accuracy of the model is limited. Lakkaraju et al. (2011) proposed a

series of probabilistic models (FACTS model, CFACTS model, CFACTS-R) for

features-based sentiment analysis, which also helped to predict the ratings of review.

These joint models are based on the principle of the dependencies of semantic topics

and syntactic classes, similar to HDM-LDA proposed in Griffiths et al. (2004). Their

methods can effectively identify latent features, sentiment topics and associate ratings.

Moghaddam and Ester (2011) extracted features and feature-based ratings by

proposing three probabilistic graphical models. Firstly, the rated feature summary is

generated by the first two models by extending pLSA and standard LDA. The authors

assume that features and ratings are interdependent and introduce an Interdependent

LDA (Moghaddam & Ester, 2011) model to extract product features and predict their

ratings at the same time.

In summary, topic modelling is a powerful and flexible modelling tool, because of its

strength on modularity and extensibility. However, it also has some drawbacks. One

issue is the difficulty to detect locally frequent features because topic modelling put a

high weight on popular or common words across a large collection of documents.

Therefore, topic modelling presents the same issue with other previous approaches:

success in finding global features but failure in finding local features. Secondly, topic

modelling needs a large scale of data and a lot of tuning of parameters. Topic

modelling is thus only suitable for large-scale projects. However, its advantage cannot

be denied and research of topic modelling has kept increasing in recent times.

2.3 SUMMARY

This chapter presents an in-depth review of a number of research works related to this

study. In the field of review selection, most current studies have been switching from

review selection based on structural textual characteristics, such as writing style,

grammar, author reputation, etc. to review selection based on features of products.

Feature extraction and relationships among features are new potential criteria for


understanding the online review content, which assists to effectively select helpful

reviews. As always, the ambiguity problem is a big challenge in online reviews

because of polysemy and synonym issues in natural language. Topic model, a

probabilistic approach, has been popularly used for discovering latent semantic themes

of document collection. Because of its wide application in language semantic analysis,

the topic model is expected to enhance the task of automatic textual analysis of online

reviews. In the next chapter, we will discuss how to make use of a topic modelling to

represent the online review more effectively and alleviate the ambiguity problem of

online review.

Chapter 3: Main Feature Selection and Related Feature Selection 27

Chapter 3: Main Feature Selection and

Related Feature Selection

As discussed in the Research Problem of Chapter 1, understanding and analysing

review content has faced a lot of difficulties in the research areas of online reviews.

The first reason is that reviews are unstructured, therefore it is hard for them to be

automatically analysed and comprehended. Secondly, polysemy and synonym issues

cannot be avoided in online reviews, since writers have the freedom to write whatever

they want. As features are the main topics of online reviews, the content of online

reviews is the information about those features of the product. Therefore, dealing with

the textual content is in fact dealing with the features of the product. However, features

themselves are unable to deliver any meaning. The meaning of each feature can only

be understood by the related words around it. For example,

“This model has an excellent display. Resolution is better than even other more

expensive models. The contrast and brightness of the screen are also great since this

makes our eyes feel comfortable for a long time looking at the display.”

Words like “excellent”, “great”, “resolution”, “contrast”, “brightness”, “screen” are

related words of the target feature “display”. Those related features contribute to the

detailed discussion of the feature “display” and help to build up detailed information

about the feature of the product. Without those related words, readers cannot

understand what is currently discussed about “display”. Therefore, successful

extraction of those related words plays a crucial role in understanding review content.

To our best knowledge, no effective work has been done to find those related words.

In this chapter, we propose a novel method to identify related words to a feature. We

will first discuss methods of identifying the main features of products in Section 3.1.

We then introduce new methods to extract related words to the main features in Section

3.2.

28 Chapter 3: Main Feature Selection and Related Feature Selection

3.1 MAIN FEATURE SELECTION

In this section a pattern mining technique is used to extract the main features of a

product. Section 3.1.1 first provides a definition of two types of features of a product

or a business that are the main features and related features. The method of extracting

the main feautres of a product will be disscussed in more detail in Section 3.1.2.

3.1.1 Main features and related features

In online reviews, users normally use different words to refer to the same concept or

aspect of a product. For example, “display” and “screen” are used interchangebly to

refer to the same concept. In addition, there are sub-features that are used with the

main feature to further describe the concept in detail. For example, “resolution”,

“contrast” and “brightness” are normally used by the reviewer to more deeply analyse

the aspect “display” of the camera product in the example above. According to Hu and

Liu (2004), features are the most frequently-occurred nouns, because they are

frequently mentioned in online reviews. Therefore, words such as “display” , “screen”,

“resolution”, “contrast” and “brightness” are the features of a product because they are

frequently occurred. It is noticable that they all belong to the same concept group,

because they describe the same aspect of the product. In our thesis, we define two

kinds of features : main feature and related features to the main feature.

Main feature and related features: Given a group of feature words representing one

concept/aspect of the product, the feature having the most abstractive meaning and

most frequetly appearing in the online review is considered as the main feature in the

group. All the remaining feature words that are not the main feature in the group are

considered as the related features of the main feature.

If features in a group are at the same level of abstraction, the feature mentioned with

higher occurrence frequency in online reviews is chosen as the main feature of product.

Table 1 show an example of a main feature “display” of a camera and its related

features. Feature “display” and feature “screen” are clearly more abstract than the other

remaining features in the group so they are the potential main features. However, it is

hard to decide whether “display” is more abstractive than “screen” or the opposite

because they seem to be at the same level of abstraction (level 2). In this example, we

assume that occurrence frequency of feature “display” will be higher than feature

“screen” in the review collection, thus feature “display” is the main feature of a camera


product. All remaining features, including “screen”, “resolution”, “contrast”,

“brightness”, are then related features of that main feature, “display”.

Table 1. Main Features “Display” and Its Related Features

Main Feature ( the most

abstractive feature)

Level 1 Display

Synonym features Level 2 Display, Screen

Sub-features Level 3 Resolution, Contrast, Brightness

In the next section, the method of extracting the main features of a product using a

pattern mining technique is discussed.

3.1.2 Pattern mining based main feature selection

In order to extract the main features in the reviews, the method proposed by Hu an Liu

(2004) is first employed. However, this work improves (Hu & Liu)’s method by

choosing only opinion sentences instead of all single sentences in the reviews dataset

to prepare database transactions. Those features are then manually analysed and

grouped into different groups, where words in each group describe one concept or

aspect of the product. In each group, the most abstractive word is chosen and

considerred as the main feature of the product. Generally, there are the following steps

to identify the main features of a product in the review collection.

Step 1: Online Reviews Part-of-Speech Tagging (POST)

Part-of-Speech Tagging (POST) is a technique in Natural Language Processing (NLP)

used to identify part of speech or word form of each word such as noun, pronoun, verb,

adjective or adverb, etc. in the text corpus (Manning & Schütze, 1999). Since frequent

nouns are potential features, POST can help to identify nouns in the reviews. In this

step, the identified nouns then will be processed by applying some other NLP

techniques such as the approximate string matching technique (Baeza-Yates &

Navarro, 1998) and Word Stemming. Approximate String Matching technique can

help to deal with the problem of word variants or misspellings. For example, word

“view-finder” will be converted to “viewfinder” or word “zom” will be converted to

“zoom”. Meanwhile, Word Stemming produces the root form of a word, for instance

“len”, “lens”, “len's”, “lens”' are grouped into “lens”. Approximate String Matching


and Word Stemming technique will ensure that all identified nouns in the review

collection can be matched with each other. The matching of words are essential for

pattern mining in the next step.

Step 2: Pattern mining on online reviews

Pattern mining is popularly applied in the field of text mining where a review or

document is normally used as a transaction in the transactional database. In this case,

mostly features are expressed and commented on at sentence level, e.g. feature

“picture” is mentioned in the sentence of reviews “the picture quality of this camera is

great”. Therefore, in order to extract those features, transaction at sentence level is

more suitable than transaction at the review or document level. Hu an Liu (2004)

proposed to prepare a transaction database where all sentences of the review collection

are taken into considertation. However, there are some limitations to this. Although a

feature is frequently mentioned in sentences, not all of the sentences in the reviews

share this characteristic. By observation, in a review, the number of sentences, without

discussing certain product features, can even be higher than the number of sentences

without commenting on any feature. We termed those sentences having no feature

expression as noisy sentences because they cannot contribute to the feature extraction

of a product and the inclusion of those kinds of sentences into the transactional

database is unnecessary. Their presence can reduce the standing out of feature words

because the frequency of common words in database transaction can be increased. The

example below illustrates this point.

Example: “We go to the Digital-To-End store to browse all the products. The

store is located on level 4 of the block. After browsing around, we finally see

the label of the camera on the shelves. The first attraction of the display is

huge comparable to other models. I love the big screen and weight is also not

very heavy. The staff there give me some descriptions about the product and I

definitely love it. However, after using it for a while, I recognise that the

battery is not good at all. …”

If each noun in sentences is considered as a transaction, nouns like “store”, “product”,

“level”, “block”, “label”, “shelve”, etc. are included in the transaction database. In fact,

these words are general or global words, which are clearly not the features of the

product. Inclusion of those sentences in the transaction database makes the total

number of transactions high, which leads to decreasing the value of support for product


features. As a result, “true” features have less chance to be successfully extracted when

applying the pattern mining technique. Therefore, we believe that filtering out noisy

sentences can help to increase the performance of feature extraction.

As shown in the example above, the user expresses his /her opinion words “huge” to

describe feature “display” in sentence 4, opinion words “big”, “heavy” related to

feature “weight” in sentence 5, and opinion words “good” of feature “battery” in the

last sentence. It is noticed that those sentiment words are a signal for potential product

features in reviews since a reviewer tries to compliment or criticise features of the

product. We believe that sentences having those sentiment words more likely contain

the product features. Therefore, we only use those sentiment sentences to prepare for

the transactions database. This step not only helps to significantly reduce the number

of noisy transactions but also reduce the size of the transaction database. Table 2 shows

the preparation of a transaction database.

Table 2. Transactional Database

Transaction ID Items

Transaction1/ Sentence 1 Weight, Size, Shutter

Transaction2/ Sentence 2 Lens, weight

Transaction3/ Sentence 3 Lens, Display

…………………………..

……………………………..

Transaction n/Sentence n Display, resolution, brightness

Let 𝑇𝐷 = {𝑇1, … , 𝑇𝑛} be a transaction database of 𝑛 transactions generated from the

review collection. A pattern mining technique is then applied to 𝑇𝐷 with a minimum

threshold of 𝜎 to generate frequent patterns. In this thesis, only single-word features

are focused so only size-one patterns are kept as the list of potential features. Those

potential features are then manual grouped into different groups where all feature

words in each group represent the same aspect or concept of a product. For each group,

the main feature of the product can be selected by choosing the most abstractive word

in the group. Note that during the process of main feature extraction, only the grouping

process and selecting the main features needs some manual work, while all of the other

tasks are automatic.

Let vocabulary 𝑊={𝑤1,𝑤2,…,𝑤𝑛} denote a set of words existing in R and 𝑅 =

{𝑟1, 𝑟2, . . . , 𝑟𝑀} be a set of reviews, each review 𝑟 in R is a set of words from a

vocabulary 𝑊 , i.e., 𝑟 ⊆ 𝑊. Let 𝐹𝑅 denote a set of main features found in the review


collection after applying Pattern Mining on the review transaction database, 𝐹𝑅 =

{𝑓1, 𝑓2, . . . , 𝑓𝑚} is the list of m main features extracted.

3.2 DISCOVERY OF RELATED WORDS OF MAIN FEATURES

As discussed at the beginning of this chapter, main features and related words can

represent the content of online reviews. In Section 3.1, the method of selecting the

main features of the product is discussed. In this section, a new approach to identify

related words to the identified main features is proposed. Related words of a main

feature are words that are usually associated with the main feature in online reviews.

Those related words provide information about the main feature and make the main

feature more understandable. In general, two types of related words are defined that

can deliver the information about the main feature to the reader: sentiment words and

related feature words. The definition and the way of identification of each type of those

related words are discussed from Section 3.2.1 to 3.2.4. The final sets of related words

will be concluded in Section 3.2.5.

3.2.1 Sentiment words Identification

Reviewers normally use adjectives to compliment or criticise the main features of

product. For example, in Figure 6, words such as “amazing”, “cosy”, “chill”, “modern”

and “welcoming” are used to describe the main feature “atmosphere” and related

features “air” and “ambience”. These words contribute to the user’s opinion about

“atmosphere” and should be considered to be relevant to the feature “atmosphere”.

According to Marneffee et al. (2006), these words are called sentiment/dependent

words because they have a grammatical dependent relationship with the main feature.

Sentiment words are in fact widely utilised in the field of sentiment mining of online

review. They help to determine sentiment directions of the features of the product (Liu,

2010; Liu, et al., 2007; Popescu & Etzioni, 2007; Scaffidi et al., 2007). It is clear that

those sentiment words give certain information about the main feature and thus are

clearly related to the main feature. Therefore, the first kind of related words to the main

feature is defined as the related sentiment words.


Figure 6. Similar and sentiment words of Feature “atmosphere”

Because the sentiment words are normally used to provide the attitude of the reviewer

to the main feature, they normally stand near the main feature. Therefore, adjectives

within a threshold of 𝜎 distance from the main feature are chosen as the sentiment

words of the main feature. For a review collection 𝑅 and a main feature 𝑓 ∈ 𝐹𝑅, let

𝑆𝑊𝑅(𝑓), 𝑆𝑊𝑠𝑒𝑛(𝑓) denote a set of sentiment words to the main feature 𝑓 for R and an

individual sentence 𝑠𝑒𝑛 ∈ 𝑅, respectively, then 𝑆𝑊𝑅(𝑓) = ⋃ 𝑆𝑊𝑠𝑒𝑛(𝑓)𝑠𝑒𝑛 ∈ 𝑅 .

In order to measure the degree of relatedness of each word 𝑤 ∈ 𝑆𝑊𝑅(𝑓) to the main

feature f, the distances between the sentiment word w and f in sentences of online

reviews are used. Let 𝑆𝑠(𝑤, 𝑓) be the set of n sentences in the review collection where

each sentence in 𝑆𝑠(𝑤, 𝑓) contains w and f, i.e., 𝑆𝑠(𝑤, 𝑓) = {𝑠𝑒𝑛1, 𝑠𝑒𝑛2, . . , 𝑠𝑒𝑛𝑛}. The

distance between w to f in an individual sentence 𝑠𝑒𝑛 ∈ 𝑆𝑠(𝑤, 𝑓) can be calculated as

the number of words between w and f in sen, denoted as 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓). The

distance between w in 𝑆𝑊𝑅(𝑓) to f can be measured as the average distance from w to

f of sentences in 𝑆𝑠(𝑤, 𝑓).

𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑅(𝑤, 𝑓) = ∑ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓)𝑠𝑒𝑛∈𝑆𝑠(𝑤,𝑓)

|𝑆𝑠(𝑤, 𝑓)|

The weight or relatedness of sentiment word w to the feature f for R is calculated as

follow.

𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑤, 𝑓) =1

𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑅(𝑤, 𝑓)

Or

𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑤, 𝑓) =|𝑆𝑠(𝑤, 𝑓)|

∑ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓)𝑠𝑒𝑛∈𝑆𝑠(𝑤,𝑓) (1)


Let 𝑆𝑊𝑅(𝐹) denote the sets of related sentiment words to the main features of the

product, then 𝑆𝑊𝑅(𝐹) = ⋃ 𝑆𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹) denote sets of

corresponding weights or relatedness of the word sets in 𝑆𝑊𝑅(𝐹), then

𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑓)𝑓∈𝐹𝑅

. Algorithm 1 illustrates the method of

generating 𝑆𝑊𝑅(𝐹) and 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹).


3.2.2 WordNet based method to find similar features words

In Section 3.1.1, the sentiment words, which are the first type of related words, are

figured out. In this section, the second type of related words, which are related similar

feature words, can be identified. Reviewers may use different words to refer to the

main feature of product in the reviews. For instance, users can use words like “picture”,

“image”, “photo”, “photograph” or “pic” to refer to the main feature “picture” for

camera product or the words “atmosphere” and “ambiance” for a restaurant’s

atmosphere as described in Figure 6. Long, et al. (2014) proposed to use Google

Distance, which is an external resource, to find those words. In addition to Google

Distance, WordNet is also a popular external electronic lexical network resource to

find related concepts of the target concept which was developed by a group of

psychologists and linguists at the Princeton University in 1985 (Miller & Fellbaum,

1998). WordNet can be seen as an ontology for natural language terms that contains

around 100,000 terms, organised into taxonomic hierarchies. It stores information

about words belong to four parts-of-speech, nouns, verbs, adjectives and adverbs,

which are structured into a nodes (synsets or set of synonyms) and links (relationship

between two synsets) network. The basic relationship between the terms of the same

synset is the synonymy. Moreover, the different synsets are linked by various semantic

relations such as antonymy (opposite), hypernymy (superconcept)/ hyponymy

(subconcept) (also called Is-A hierarchy / taxonomy), and meronymy (part-of)/

holonymy (has-a). In this thesis, we use WordNet to find similar feature words to the

main feature. There are two reasons for the idea of using WordNet. First of all,

WordNet has been recognised for its practical value in various text mining tasks and

natural language processing (Liao et al., 2010). Secondly, in addition to synonym

words, WordNet can help to identify hyponym words or sub-feature words, which are

the sub-concepts of the target concept. As finding sub-features of main features is also

this study’s focus, WordNet is expected to identify both synonym features and sub-

features of the main feature in the review collection.

Let 𝑊𝑊𝑅(𝑓), denote a set of similar words to the main feature 𝑓, 𝑊𝑊𝑅(𝑓) ⊆ 𝑊.

Words in 𝑊𝑊𝑅(𝑓) are synonyms and sub-concepts of f found by using WordNet.


Calculating the weight of related word found by WordNet

Words in 𝑊𝑊𝑅(𝑓) have different degrees of similarity or different weights to the main

feature f. This study proposed to use information content similarity metrics to evaluate

the similarity of each word w to the main feature f. The similarity between two words

is related to how much information they have in common.

Information content was first proposed by Resnik (1995) to calculate the similarity

distance between concepts (synsets in WordNet taxonomy) by linking probabilities to

concepts in the WordNet hierarchy. The author first defined the probability of concept

c, P(c), as the probability encountering an instance of the concept c. Let

𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡𝑠(𝑐) be the set of all concept words that are sub-concepts of concept c.

The occurrence frequency, 𝑓𝑟𝑒𝑞(𝑐), of the concept c can be calculated by the

cumulative sum of occurrence frequency of words in 𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡𝑠(𝑐)

𝑓𝑟𝑒𝑞(𝑐) = ∑ 𝑐𝑜𝑢𝑛𝑡(𝑤)𝑤∈𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡(𝑐) .

The probability of concept c was calculated by normalising with the number of

concepts (nouns) observed in the corpus (N).

𝑃(𝑐) = 𝑓𝑟𝑒𝑞(𝑐)

𝑁

Resnik (1995) then quantified the information content of a concept c as the negative

likelihood of probability of concept c, IC(c) = -logP(c). The argument here is that the

more two words have in common, the more similar they are. The commonality of two

words can be represented by the most informative subsume, which is the lowest

common subsumer (LCS) of two concepts in the WordNet hierarchy. The information

content of the most informative subsumer of the two concepts c1 and c2 is defined as

the similarity score of them,

𝑠𝑖𝑚𝑟𝑒𝑠𝑛𝑖𝑘(𝑐1, 𝑐2) = −𝑙𝑜𝑔 𝑃( 𝐿𝐶𝑆(𝑐1, 𝑐2) )

where 𝐿𝐶𝑆(𝑐1, 𝑐2) is the lowest node in the hierarchy that subsumes both c1 and c2.

Lin (1998) improved Resnik’s similarity measure by the argument that the similarity

between two words is not just what they have in common, but also depends on the

differences between them. In other words, the similarity between two concepts c1 and

c2 is measured by the ratio between the amount of information needed to state the

commonality of I and c2 and the information needed to fully describe c1 and c2. Lin

(1998) altered Resnik’s measure and proposed an improved version of the formula as

below.


𝑠𝑖𝑚𝐿𝑖𝑛(𝑐1, 𝑐2) = 𝐼𝐶(𝑐𝑜𝑚𝑚𝑜𝑛(𝑐1, 𝑐2))

𝐼𝐶 (𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛(𝑐1, 𝑐2))

Or

𝑠𝑖𝑚𝐿𝑖𝑛(𝑐1, 𝑐2) = 2𝑙𝑜𝑔(𝐿𝐶𝑆(𝑐1, 𝑐2))

𝐿𝑜𝑔𝑃(𝑐1) + 𝐿𝑜𝑔𝑃 (𝑐2)

In this study, the measure proposed by (Lin) is applied to calculate the similarity score

or the weight of the words in 𝑆𝑊𝑅(𝑓) to the feature f. Let 𝑊𝑊𝑅(𝑓) denote a set of

word weights in 𝑊𝑊𝑅(𝑓)to the feature f. The weight of word 𝑤 ∈ 𝑊𝑊𝑅(𝑓), to the

feature f can be calculated by using the equation below.

𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝑤, 𝑓) = 2𝑙𝑜𝑔(𝐿𝐶𝑆(𝑤, 𝑓))

𝐿𝑜𝑔𝑃(𝑤) + 𝐿𝑜𝑔𝑃 (𝑓) (2)


Let 𝑊𝑊𝑅(𝐹) denote the sets of related WordNet or similar words to the main features

of the product, then 𝑊𝑊𝑅(𝐹) = ⋃ 𝑊𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) denote

the sets of corresponding weight or relatedness sets of 𝑊𝑊𝑅(𝐹), then

𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝑓)𝑓∈𝐹𝑅


identifying 𝑊𝑊𝑅(𝐹) and calculating 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) .

3.2.3 Topic model based method to find related features

In Section 3.2.2, similar related feature words to the main feature can be found by

using the external ontology resource, WordNet. The task of identifying similar words

using external ontology, which is mainly formed from previous knowledge such as

Google code of length or WordNet, is in fact not new in text mining (Cilibrasi &

Vitanyi, 2007). Those external ontology sources find similar words, which are most

likely similar to the main feature words generally. However, in some circumstances,

the found similar words are not related to certain contexts (e.g., for a particular

product) that the main features have discussed. For example, word “celebrity”, “idol”,

“stellar” and “genuine” are synonym words of the feature “star” according to Google

Distance. However, feature “star”, in the domain of restaurant, indicates the ranking

of restaurant and clearly does not have similar meaning to word “genius”, “lead” or

“stellar’. In addition, there are always intrinsic relationships among features and those

relationships among features are different from dataset to dataset. WordNet is based

on the external standard knowledge ontology so it cannot find intrinsic relationships

among features buried in the dataset itself. Given a feature, WordNet always returns

the same set of similar words to the feature, in spite of whatever datasets are used.

Topic modelling is considered to be the state-of-the-art text mining technique, which

provides a tool to discover semantic spaces in large archives of text. Topic models do

not use any external source but the text corpus itself (domain-specific corpora) to

describe the textual collection by semantic spaces (topics) and semantic representation

(related words in each topic) (Steyvers & Griffiths, 2006). In more detail, given a

collection of documents, topic modelling can learn and discover topics, each of which

is represented by a group of words that tend to co-occur in the documents. Therefore,

words in topics generated by topic model have a tight relationship with each other. As

main features are the main topics of discussion in online review, and related features

to the main feature also frequently occur with the main feature in online review, a topic


model is expected to discover correct related features of the main features, without

using any external source of language.

Latent Dirichlet Allocation (LDA) is currently the most popular approach in

generating topic models. Given 𝐷 = {𝑑1, 𝑑2, . . , 𝑑𝑚} is a collection of m documents.

The topic model generated by using LDA consists of topic representations at collection

level and topic distributions at document level. At collection level, each topic 𝑍𝑖 is

represented by a probability distribution over words, 𝜙𝑖 =

(𝜑𝑖,1 , 𝜑𝑖,2, ⋯ , 𝜑𝑖,𝑛), ∑ 𝜑𝑖,𝑘𝑛𝑘=1 = 1 , 𝜑𝑖,𝑘 is the weight for the kth word. At document

level, each document is represented by probability distribution over topics, 𝜃𝑑𝑗 =

(𝜗𝑑𝑗,1, 𝜗𝑑𝑗,2, … , 𝜗𝑑𝑗,𝑉) where 𝑉 is the number of topics, 𝜗𝑑𝑗,𝑖 is the probability of 𝑍𝑖

for document𝑑𝑗. (Blei, et al., 2003a)

In this thesis, a topic modelling technique, particularly the LDA modelling method is

employed, to find related words or features to the main features. More specifically,

LDA is applied to the review corpus to generate a set of topics. Let 𝑍 = {𝑍1, 𝑍2, . . 𝑍𝑘}

represent the list of k topics generated by LDA, each topic 𝑍𝑖 ∈ 𝑍 is a collection tion

of words, i.e., 𝑍𝑖 = {𝑤1, 𝑤2, . . . , 𝑤𝑛}, where 𝑤𝑘 is 𝑘𝑡ℎ word assigned to topic 𝑍𝑖, the

corresponding probability over words for 𝑍𝑖 is 𝜙𝑖 = (𝜑𝑖,1 , 𝜑𝑖,2, ⋯ , 𝜑𝑖,𝑛), ∑ 𝜑𝑖,𝑘𝑛𝑘=1 =

1 , 𝜑𝑖,𝑘 is the weight indicating the degree of importance of the work 𝑤𝑘 in 𝑍𝑖. By

filtering out the low-weighted words in each topic based on a minimum threshold of

𝜎, we choose the top high-weighted words to represent each topic. Let 𝑍𝑖′ be the 𝑖𝑡ℎ

topic after removing the low-weighted words, 𝑍𝑖′ is defined as below.

𝑍𝑖′ = {𝑤𝑘|𝑤𝑘 ∈ 𝑍𝑖 , 𝜑𝑖,𝑘 = ≥ 𝜎} (3)

Figure 7 shows an example of the chosen topical words in topic 5 of the topic models

generated from a restaurant dataset.

Figure 7. Topic 5 after removing words having low weight for one restaurant dataset


The words in 𝑍𝑖′ are considered related with each other because they are selected to

represent the same topic. Let 𝑍′ = {𝑍′1, 𝑍′2, . . 𝑍′𝑘} be a list of topics after filtering low-

weight words. For a given main feature𝑓, if 𝑓 is a word of 𝑍𝑖′, 𝑍𝑖

′ can be considered as

the related topic of f and the words in 𝑍𝑖′can be considered topical related words to 𝑓.

Let 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)be the list of related topics of f, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) = {𝑍𝑖′ | 𝑓 ∈

𝑍𝑖′ 𝑎𝑛𝑑 𝑍𝑖

′ ∈ 𝑍′}

Let 𝑇𝑊𝑅(𝑓) denote the related topical words of feature 𝑓 in the review collection 𝑅.

𝑇𝑊𝑅(𝑓) = ⋃ ⋃ {𝑤}

𝑤∈𝑍𝑖′

(4)

𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)

Calculating the weight of topical related words

As discussed above, words in a topic are related to each other and reflect an aspect of

the review collection. We propose a method to measure the relatedness of each topical

word 𝑤 ∈ 𝑇𝑊𝑅(𝑓) to f by comparing the weights of them in the topic. Given two

words existing in the same topic, the more similar weight they are in the topic, the

more relatedness those two words are. More precisely, let 𝑊𝑒𝑖𝑔ℎ𝑇𝑅(𝑤, 𝑓) be the

corresponding weight or relatedness of word 𝑤 ∈ 𝑇𝑊𝑅(𝑓) to f, 𝑊𝑒𝑖𝑔ℎ𝑇𝑅(𝑤, 𝑓) can

be calculated as follows:

𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑤, 𝑓) = ∑ |𝜑𝑖,𝑤 − 𝜑𝑖,𝑓|

𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓,𝑤),𝜑𝑖,𝑤≠𝜑𝑖,𝑓

|𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤)| (5)

Where 𝜑𝑖,𝑤, 𝜑𝑖,𝑓 are the weights of word w and feature word 𝑓 in topic 𝑍𝑖′,

𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) are the collection of 𝑍𝑖′having f and w, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) =

{𝑍𝑖′ | 𝑤 ∈ 𝑍𝑖

′ 𝑎𝑛𝑑 𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)}

Let 𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,

then 𝑇𝑊𝑅(𝐹) = ⋃ 𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) denote the sets of

corresponding weights or relatedness of the word sets in 𝑇𝑊𝑅(𝐹), then

𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑓)𝑓∈𝐹𝑅


identifying 𝑇𝑊𝑅(𝐹) and computing 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) .

Let 𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,

then 𝑇𝑊𝑅(𝐹) = ⋃ 𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) denote the sets of

corresponding weights or relatedness of the word sets in 𝑇𝑊𝑅(𝐹), then

𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑓)𝑓∈𝐹𝑅

. Algorithm 3 illustrates the algorithm of

generating 𝑇𝑊𝑅(𝐹) and computing 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) .


3.2.4 Pattern-enhanced Topic Model related words Identification.

Although topic models can contribute to find related feature words for the main

feature, using topical words still has problems. One of the common problems of LDA

is that the word-based or term-based topic representations may not be able to

semantically represent the topic and thus make the topic hard to understand. In

addition, popular or general words dominantly occur as the top words in some topics,

which make topics themselves not effectively distinctive represent different aspects of

the whole corpus. In the field of review selection, where each feature of a product is

the main discussion topic, some words in the topic model do not contribute very

effectively to represent the feature of the review collection. In order to solve this

problem, we need a way to represent topics generated from the LDA topic model more

effectively. In the Pattern based Topic Model proposed in (Gao, et al., 2013), each

topic 𝑍𝑗 is represented by a set of patterns instead of single words. Since phrases or

word patterns carry better semantic meaning than single words; for example, ex. “data

mining” is easier to understand than “data” or “mining”; pattern based topics therefore

are more discriminative by patterns. We are therefore inspired to apply a pattern based

topic model to enhance the task of identifying related words of the main feature.

According to Gao, et al. (2013), each topic in a pattern based topic model is represented

by a set of patterns instead of a set of words, i.e., 𝑃𝑍𝑗 = {𝑝𝑗,1, 𝑝𝑗,2, ⋯ , 𝑝𝑗,𝑙 }, each

pattern 𝑝𝑗,𝑘 is a subset of words in 𝑊, i.e., 𝑝𝑗,𝑘 ⊆ 𝑊 and l is the number of patterns

in topic 𝑃𝑍𝑗. In addition, as Gao, et al. (2013) use a pattern mining technique to

generate a pattern based topic model, each pattern in 𝑃𝑍𝑗 has a corresponding value of

support, representing the frequency occurrence of words in the pattern. Therefore, the

words in a pattern are considered closely related with each other. Let 𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑃𝑍𝑗 be

the collection of associated supports of pattern in 𝑃𝑍𝑗, 𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑃𝑍𝑗 =

{𝑠𝑢𝑝𝑃𝑗,1, 𝑠𝑢𝑝𝑃𝑗,2, . . , 𝑠𝑢𝑝𝑃𝑗,l}

Given a set of reviews 𝑅 = {𝑟1, 𝑟2, . . . , 𝑟𝑚} and vocabulary 𝑊 = {𝑤1, 𝑤2, … , 𝑤𝑛}, for

each topic 𝑍𝑗 generated by LDA, a corresponding pattern based topic 𝑃𝑍𝑗 is generated

by applying the pattern based topic model method in (Gao, et al., 2013). Let

𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) be the list of related topics of f, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) = {𝑃𝑍𝑗 | 𝑝𝑗𝑙 ∈

𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓 ∈ 𝑝𝑗𝑙}. The related pattern based topical words of 𝑓, denoted as 𝑃𝑇𝑊𝑅(𝑓)

are defined as follows:


P𝑇𝑊𝑅(𝑓) = ⋃ ⋃ {𝑤}

𝑝𝑗𝑙∈𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓,𝑤 ∈𝑝𝑗𝑙

(6)

𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)

The words in 𝑃𝑇𝑊𝑅(𝑓) are considered closely related to f.

Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) denote the set of corresponding weight or relatedness of the

related pattern based topic word w in 𝑃𝑇𝑊𝑅(𝑓), 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) can be calculated as

follows.

𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) = ∑ 𝑠𝑢𝑝𝑃𝑗,l𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓,𝑤),𝑝𝑗𝑙∈𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓,𝑤 ∈𝑝𝑗𝑙

|𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤)| (7)

Where 𝑠𝑢𝑝𝑃𝑗,l is the support of 𝑝𝑗𝑙 in topic 𝑃𝑍𝑗 , 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) are the collection

of related pattern based topics of f and w, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) = {𝑃𝑍𝑗 | 𝑤, 𝑓 ∈

𝑃𝑍𝑗 𝑎𝑛𝑑 𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)}. The set of associated weights of words in P𝑇𝑊𝑅(𝑓),

denoted as 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓).

𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓)𝑤 ∈ 𝑃𝑇𝑊𝑅(𝑓) (8)


Let 𝑃𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,

then 𝑃𝑇𝑊𝑅(𝐹) = ⋃ 𝑃𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) denote the sets of

corresponding weights or relatedness of the word sets in 𝑃𝑇𝑊𝑅(𝐹), then

𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓)𝑓∈𝐹𝑅

. Algorithm 4 shows the method of generating

𝑃𝑇𝑊𝑅(𝐹) and 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) .


3.2.5 Final Related words Identification

As mentioned earlier at the beginning of this chapter, this study focuses on finding

related words of the main features. The previous four sections discuss different

proposed methods to identify related words of a main feature. More specifically, in

Section 3.2.1, the set of sentiment words (𝑆𝑊𝑅(𝑓)) are identified as adjectives standing

near the main feature. The related feature words can be identified by using the external

ontology resource, WordNet, in Section 3.2.2 (𝑊𝑊𝑅(𝑓)) and Topic Model in Section

3.2.2 (𝑇𝑊𝑅(𝑓)). In Section 3.2.4, the method of identifying related feature words is

improved by applying a pattern mining topic model instead of a traditional topic model

(𝑃𝑇𝑊𝑅(𝑓)). In general, the final set of related words found by our methods can be

visualised in Figure 8. 𝑅𝑊𝑅(𝑓) is then defined as:

𝑅𝑊𝑅(𝑓) = {𝑤|𝑤 ∈ 𝑆𝑊𝑅(𝑓) ∪ 𝑊𝑊𝑅(𝑓) ∪ 𝑇𝑊𝑅(𝑓) } (topic model used)

𝑅𝑊𝑅(𝑓) = {𝑤|𝑤 ∈ 𝑆𝑊𝑅(𝑓) ∪ 𝑊𝑊𝑅(𝑓) ∪ 𝑃𝑇𝑊𝑅(𝑓) } (pattern based topic model

used)

Figure 8. Related words of feature f (𝑅𝑊𝑅(𝑓))

It is noticed that words in 𝑆𝑊𝑅(𝑓) are adjectives and words in 𝑊𝑊𝑅(𝑓) are nouns

while words in 𝑇𝑊𝑅(𝑓) or 𝑃𝑇𝑊𝑅(𝑓) are both noun and adjective. As 𝑆𝑊𝑅(𝑓),

𝑊𝑊𝑅(𝑓) and 𝑇𝑊𝑅(𝑓) or 𝑃𝑇𝑊𝑅(𝑓) contain words that are originated from the review

collection R, there are a number of words existing in different word sets at the same

time. Let 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) denote the set of shared sentiment words (adjectives) found by


the method in Section 3.2.1 and Pattern based Topic Model, 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) =

𝑆𝑊𝑅(𝑓) ∩ 𝑃𝑇𝑊𝑅(𝑓). Let 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓) denote the set of shared related feature words

(Noun) found by WordNet and Pattern based Topic Model, 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓) =

𝑊𝑊𝑅(𝑓) ∩ 𝑃𝑇𝑊𝑅(𝑓). The set of shared related words, 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓), is the

combination of 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) and 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓), 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓) = 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) ∪

𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓). Because the words in 𝑆ℎ𝑎𝑟𝑒𝑑𝑊𝑅(𝑓) can be found by different methods

proposed in Sections 3.2.1, 3.2.2, 3.2.3 and 3.2.4, those words are expected to have

higher certainty in relatedness to the feature f than others words in 𝑅𝑊𝑅(𝑓). Recalled

in four previous sections, every found related word has an associated weight indicating

the degree of relatedness of the word to the main feature. The method of calculating

the word weight depends on which method is used to identify the related features.

Therefore, words in 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓) can have more than one value of weight because

they are identified by different methods. In this study, the higher weight is chosen as

the final weight for the related word if it existing in 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓)).

Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) denote the set of associated weight of each word in 𝑅𝑊𝑅(𝑓).

𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓)

𝑤 ∈ 𝑅𝑊𝑅(𝑓)

(9)

where 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is the weight or relatedness of the word w to feature f.

𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is chosen depending on whether word w belongs to 𝑆ℎ𝑎𝑟𝑒𝑑𝑊𝑅(𝑓) or

not.

Let 𝑅𝑊𝑅(𝐹) denote the sets of related words to main features of the product, then

𝑅𝑊𝑅(𝐹) = ⋃ 𝑅𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) denote the sets of

corresponding weights or relatedness of the word sets in 𝑅𝑊𝑅(𝐹), then 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) =

⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓)𝑓∈𝐹𝑅

. Algorithm 5 shows the algorithm of generating 𝑅𝑊𝑅(𝐹) and

𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) .


3.3 SUMMARY

In this chapter, a new method is proposed to analyse the content of online consumer-

generated reviews. The online content is represented by a list of features and related

words found by using data mining, natural language processing and probabilistic topic

model (Figure 9). While main features are the main topics in online reviews, related

words provide the information about the features. Features and related features

together make the content easily understandable. In general, there are a number of key

points in this chapter. First of all, an improved version of the main feature extraction

is proposed in chapter 3.1. Secondly, we also proposed to identify related words to the

main features by different approaches, including WordNet, Topic Model and pattern

based topic model. While WordNet can find similar words to the main features, the

topic model method and pattern based topic model can help to identify related words

within the context of the product (domain specific). The combination of using an

external ontology resource (WordNet) and probabilistic topic model allows the

identification of a complete and accurate related word set to the features. As topic

modelling provide an interpretable low dimensional representation of the review

collection by using a number of topics where words in each topic frequently occur in

online reviews, issues of review content analysis, such as polysemy and synonym, can

be reduced.

Figure 9. The representation of review content by main features and related words

Chapter 4: Review Selection for Single Feature 49

Chapter 4: Review Selection for Single

Feature

As online commerce activities continue to grow, online reviews are the most important

and helpful resources for consumers to get product information, thus facilitate their

purchase decision process. However, the abundant quantity of review data has become

a barrier for users to go through the reviews and get vivid pictures of the products in

which they are interested. Understanding problems of overloading information in the

context of online reviews, researchers have investigated identifying high-quality,

helpful reviews. This chapter will discuss our contribution to the review selection

problem by proposing a new method to optimise the performance of the review

selection task.

4.1 OVERVIEW OF HELPFUL REVIEW SELECTION

A prior research stream on product review helpfulness is to classify a set of helpfulness

reviews from the original review corpus. Therefore, tasks of effectively differentiating

between those helpful and unhelpful reviews, together with criteria to determine the

helpfulness, are the key concerns in the field of review selection. Researchers such as

(Kim, et al., 2006) estimate the helpfulness of reviews based on review writing style,

review length and grammar, reviewer information and timeliness by employing

supervised learning approaches such as classification. However, reviews written in a

professional style and with correct grammar do not always reflect that reviews contain

useful information for customers. Recently, scholars have greatly focused on finding

helpful reviews based on the content of reviews, specifically in terms of product

features. Their primary concern is, for example, features such as “price”, “lens”,

“battery”, etc. of a digital camera, which are the main points of discussion in the

reviews and should be taken into consideration. The argument is necessarily

considered because reviewers write comments to share their experiences about a

certain product – in more detail, their opinion on features of the product may affect

their evaluation.

50 Chapter 4: Review Selection for Single Feature

Tsaparas, et al. (2011) proposed a method to find reviews covering as many features

as possible, in which they consider all features are equally important and independent.

However, from a customer’s point of view, each feature plays a different important

role in their consideration. For example, in general, the “price” feature is important

and considered by customers on a tight budget. In addition, some users might be

interested in only a feature or a small group of features, but not all the product features

at the same time. Therefore, finding reviews that discuss a main feature should be taken

into consideration. However, there are not many studies regarding this issue. Our aim

is finding helpful reviews that extensively discuss a main feature. The output reviews

are helpful and carry comprehensive information about the target features.

Recently, Long, et al. (2014) proposed a novel method, called Specialised Review

Selection (SRS), in finding specialised reviews that extensively discuss a feature. In

detail, for a specific product feature in the review corpus, the model extracts a set of

words which are similar to the target feature. Then the authors use Kolmogorov

complexity and information distance to calculate the amount of information from these

related word sets. For each feature, all reviews are then ranked according to their

information distance score. The review which most extensively discusses the feature

is the one with minimal information distance.

4.2 SPECIALISED REVIEW SELECTION (SRS) METHOD

The idea of using Kolmogorov complexity and information distance in SRS to select

reviews for a given feature is effective because both are well known algorithms to find

related objects for a given feature based on the information contained in the objects.

However, there are two significant drawbacks of their method.

The success of SRS is based on the correctness of the identified similar words

of the main feature. The authors identify similar words of the main feature by

using Google code of length (Cilibrasi & Vitanyi, 2007). As discussed in

Chapter 3, Google code is an external ontology resource, which may not always

return correct relevant words to the main feature in the domain of target dataset.

SRS uses information distance which bases only on the remaining related

words of the target feature as the method of selecting reviews. According to

Long, et al. (2014), given a set of related words to a feature and a review,

remaining related words toward the review are the set of related words that are


not existing in the review. Their formula of calculating the score of each review

only uses those remaining related words and ignores the related words in the

review. The related words in the review in fact also play an important role in

deciding the degree of feature relevance of the review. We believe that those

related words should be taken into account when calculating the score of

relevance for each review. This study therefore propose an improved method

by taking into consideration both the related words and the remaining words.

This will be discussed in more detail in Section 4.3.2.

4.3 THE PROPOSED REVIEW SELECTION METHOD

4.3.1 Criteria of helpful reviews for single feature.

The criteria of a helpful review is changing with the research timeline on online

reviews as stated in Section 2.1.1 of the Literature Review. At the current time, most

of the current research studies agree that content or features of reviews are the most

important indicator of a helpful review. In general, according to a number of studies

(Lappas, et al., 2012; Liu, et al., 2007; Tsaparas, et al., 2011; Zhang & Zhang, 2014),

three criteria have been identified for a helpful review.

First criteria: the number of features in the review.

Readers read reviews in order to understand about the features of product. Reviews

that discuss many features certainly get the interest from reviewers, as they can

generally provide information about all of the features of a product.

Second criteria: The amount of opinion about features of the review.

A helpful review is the review that can deliver the opinion of the author about the

feature to readers. People read the review in order to decide whether they will buy the

product. They are thus looking for opinion about the features of a product from other

users in the review. If the review does not provide any comment about the features, it

is useless to readers. In online reviews, the opinions about features of a product are

normally expressed by sentiment words such as “expensive”, “cheap”, “heavy”,

“light”, etc. Therefore, the high number of sentiment or opinion words associated with

features in review is a signal of a helpful review.


Third criteria: How detailed the features of a product are discussed in the

review.

The main difference of helpful reviews and unhelpful reviews is the detailed level of

discussion about the feature in the review. For example, if a reviewer mentions the

feature “display” - such as “the display is not good” - and stops there, the review is not

helpful. The writer gives his idea about the feature but does not state reasons at all. A

helpful review should provide evidence to prove the opinion of the author. For those

kinds of review, the author will continue to more deeply discuss the target feature,

such as, “The screen is too dark. The resolution of the camera is too low and font is

also small.” etc. That extra information provided makes the review more

comprehensive and persuading. We therefore believe that reviews having a high level

of comprehensive feature discussion and analysis are more likely to be helpful reviews.

It is noticed that features such as “resolution” and “font” are related sub-features of the

target feature “display”. The high amount of those sub-features can contribute to the

level of deep discussion of the review.

As our review selection model focuses on a single feature, the first criterion is not

applicable for our method. We therefore take into consideration two remaining criteria

into our proposed model. In more detail, we propose to find a helpful review for a

single feature, by using the set of related words to that feature. The selection of related

features therefore should cover the two criteria above. In Chapter 3, we already

proposed a method of identifying those related words, which are a complete set of

sentiment words and related sub-feature words. Those related words clearly cover the

second and third criteria of a helpful review. In next section, we will describe our

review selection approach for a single feature. It is noted that we use several math

notations in each section to easily explain our proposed feature relevance measure..

4.3.2 Review Selection Method

For a given feature𝑓, we want to find a set of reviews𝑅𝑓, each of which provides the

information about 𝑓. In order to find 𝑅𝑓, we need to measure the information contained

in a review, especially the information which is about the feature 𝑓. Inspired by the

work in Long, et al. (2014), which measures the amount of information in a review

using Kolmogorov complexity, we propose to measure the information in a review

that relates to a given feature using the Kolmogorov complexity of the feature’s set of

related words.


For an object 𝑤, the Kolmogorov complexity of 𝑤, denoted as 𝐾(𝑤), expresses the

information contained in 𝑤. Theoretically, the Kolmogorov complexity of 𝑤 is defined

as the length of the shortest effective binary description of producing the word w

(Grünwald & Vitányi, 2003). However, 𝐾(𝑤) is not computable in general. Following

the idea in Long, et al. (2009), in this thesis, we use the relatedness of a word 𝑤 to

feature 𝑓 and Shannon-Fano code to measure the 𝐾(𝑤) relative to 𝑓 (Ming &

Vitányi, 1997). Given a feature 𝑓, the relevance of a word 𝑤 to 𝑓 can be measured by

the conditional probability 𝑃(𝑤|𝑓) = 𝑃(𝑤, 𝑓)/𝑃(𝑓), where 𝑃(𝑤, 𝑓) can be

approximated by the document co-occurrence of 𝑤 and 𝑓 and 𝑃(𝑓) can be

approximated by document frequency of 𝑓, that is,

𝐾(𝑤) = −𝑙𝑜𝑔𝑃(𝑤|𝑓) = −𝑙𝑜𝑔𝑃(𝑤, 𝑓) + 𝑙𝑜𝑔𝑃(𝑓)

Let 𝑅𝑊𝑅(𝑓) and 𝑅𝑊𝑟(𝑓) denote the set of related words to 𝑓 in the review collection

R and an individual review r, respectively, then 𝑅𝑊𝑅(𝑓) = ⋃ 𝑅𝑊𝑟(𝑓)𝑟∈𝑅 . The

following score is calculated to measure the Kolmogorov complexity of a review 𝑟 in

terms of feature 𝑓 by calculating the Kolmogorov complexity of the words in other

reviews rather than in 𝑟:

𝑆𝑃𝐸𝑟,𝑓 = ∑ 𝐾(𝑤)

𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)

= ∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))

𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)

(10)

The value of 𝑆𝑃𝐸𝑟 ,𝑓 is considered as the information distance between 𝑅𝑊𝑅(𝑓) and

𝑅𝑊𝑟(𝑓) . The less the distance, the more related the words in 𝑟 to 𝑓 are. Therefore,

reviews having the lowest score of 𝑆𝑃𝐸𝑟 ,𝑓 are selected as the output of our system.

The normalized value of 𝑆𝑃𝐸𝑟,𝑓 can be calculated by:

𝑆𝑃𝐸𝑟,𝑓𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = ∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)

| 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)| (11)

One of the drawbacks in equation (11) is that the related words of the main feature

themselves are not taken into consideration. It is undeniable that the related words of

a main feature 𝑓 in the review directly contribute to the relevance of the review to the

feature𝑓. For example, for a set of related words 𝑅𝑊𝑅(𝑓)to the main feature𝑓, if

review A contains 20 related words while review B contains only 10 related words,


i.e, |𝑅𝑊𝑟𝐴(𝑓)| = 20 > |𝑅𝑊𝑟B(𝑓)| = 10, review A will be more likely to be related to 𝑓

than review B. Although Equation (11) uses the set of remaining words, which is

extracted from the set of related words, the related words themselves still have

important meaning in deciding the degree of relevance. Therefore, they should not be

ignored. In this study, we also take into account those related words into our method

as well. To achieve this task, two factors need to be included in the formula. The first

factor is the number of related words in each review, since a review having a high

number of related words is more likely to be related than a review having less number

of related words. The second factor is how important each related word is, in the review

to 𝑓. Most of the previous studies assume the equal importance of the related words.

However, each related word related to f certainly has different importance to the f. For

example, related features such as “resolution” and “brightness” are more related to

feature “display” than other related word of the feature “display”. It is true for

comprehensive reviews where a reviewer tries to discuss more details about the main

feature. In Chapter 3, we propose some methods to calculate the weight of related

words. In addition to this weight, the occurrence frequency of the related words and

the feature together should also be taken into consideration, as the most important

related words are normally the words that most go with the feature. In general, we

propose to measure the direct relevance by using conditional probability of related

words given 𝑓 and the weight of words.

Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) denote the set of associated weights of words in 𝑅𝑊𝑅(𝑓),

𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) = ⋃ 𝑤𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓)𝑤 ∈ 𝑅𝑊𝑅(𝑓) , where 𝑤𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is the relatedness

or weight of word 𝑤 ∈ 𝑅𝑊𝑅(𝑓) to the feature f (Note: 𝑅𝑊𝑅(𝑓), 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) can be

identified and calculated by the methods proposed in Chapter 3).

The direct relevance and direct relevance distance of review r are calculated as follows.

Direct Relevance:

𝑑𝑖𝑟𝑒𝑅𝑒𝑙𝑟,𝑓normalized =∑ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) ∗ 𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)

|𝑅𝑊𝑟(𝑓)| (12)

The higher the value of 𝑑𝑖𝑟𝑒𝑅𝑒𝑙𝑟,𝑓, the more relevant the review is to the main feature

𝑓.


Direct Relevance Distance:

𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓normalized = 1 − ∑ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤,𝑓)∗𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)

|𝑅𝑊𝑟(𝑓)| (13)

The lower the value of 𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓, the more relevant the review is to the main

feature f.

The equation (13) can be then incorporated into the equation (11) to get the final

equation for calculating the weighted relevance score of one individual review to the

main feature.

𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓 = 𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓normalized + 𝑆𝑃𝐸𝑟,𝑓𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑

or

𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓

= (1 − ∑ 𝑊𝑒𝑖𝑔ℎ𝑡(𝑤) ∗ 𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)

|𝑅𝑊𝑟(𝑓)|)

+ (∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))𝑤 ∈𝑅𝑊𝑟(𝑓)\𝑅𝑊𝑟(𝑓)

| 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)|) (14)

The lower the value of 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓, the more relevant the review is to the main

feature 𝑓. Based on the weighted relevance score, a set of reviews will be selected. We

call the method Review Selection based on Weighted Relevance (RSWR).


4.4 SUMMARY

This chapter discusses the proposed method to select helpful reviews for a single

feature. Firstly, an overview of current helpful review selection methods was discussed

to see the need of review selection for a single feature. Criteria of a helpful review for

a single feature were then identified based on previous studies of online review

selection. The proposed review selection method for a single feature is then proposed

to cover two important criteria of a helpful review, including the amount of opinion

and the detail of discussion about the target feature in the review. The review selection

model was then proposed by using the combination of direct relevance and information

distance of related words. As related words are the complete information about the

feature, a set of helpful reviews intensively discussing about the feature can be

retrieved. The experiment and evaluation in the next part provide the evidence for the

outperformance of the proposed review selection model.

Chapter 5: Experiments and Evaluation 57

Chapter 5: Experiments and Evaluation

In this chapter, we describe several experiments to evaluate our proposed methods

from Chapter 3 and Chapter 4. We first evaluate the review selection method

proposed in Chapters 4. We then use this review selection method to evaluate our

method of related word selection proposed in Chapter 3. For both evaluations, we

compare the performance of our models based on their ability to generate helpful

reviews according to a single feature.

5.1 EXPERIMENTAL ENVIRONMENT

For our development environment, we used a local laptop Intel Core i7 CPU,

installed memory of 16 GB RAM with a built-in Windows 7 operating system. All of

our experiments were implemented using Java Programming Language. A Graphic

User Interface (GUI) is built using NetBeans IDE 8.0.2 for the easy interaction of

user and the software during our experiment. A number of Java library packages

have been used in our software include:

Mathematic Statistics Library: Apache Common Mathematics Library release

3.5.

Natural Language Processing Package: json-simple-1.1, Stanford Log-linear

Part-Of-Speech Tagger and Java WordNet Library (JWNL).

Open-source Data Mining Library (SPMF).

5.2 EXPERIMENT DESIGN

5.2.1 Dataset constructions

The output of our model is a subset of reviews, which need to be evaluated. The

helpfulness score of the reviews has been considered as the gold standard to determine

the review helpfulness and has been using in a variety of studies. In this thesis, we

base the evaluation of our proposed model on this gold standard. The helpfulness

score is therefore one first and foremost criterion for choosing a reviews dataset for

experiment. In detail, a majority of reviews in the dataset should be voted for by users,

to indicate if they are helpful. In our experiment, we select two kinds of datasets

having helpfulness votes by customers, including a review collection from electronic

58 Chapter 5: Experiments and Evaluation

products (digital camera) and review collection from the food industry (restaurant) to

conduct our experiments.

In the first kind of dataset, we choose a digital camera review dataset collected from

Amazon (http://amazon.com) as Amazon has become a most popular website for

many research works about reviews recommendations recently (Ghose & Ipeirotis,

2006; Kim, et al., 2006; Liu, 2012). In addition, products of Amazon especially digital

cameras, have a significant amount of reviews, which is sufficient for our experiment.

We crawled a collection of customer reviews of a number of digital camera products

published before December 2011 on Amazon.com. The downloaded reviews were

pre-processed by stripping HTML tags and removing irrelevant information. Each

review has the following information.

Product rating: from 1 to 5 star.

Author Name: unique Amazon user identification.

Time: date when the review is posted.

Product name: name of the product. Ex., Canon 7D, etc.

Review text content.

Helpfulness vote: the number of readers saying the review is helpful.

For the second kind of dataset, we use a publicly available dataset provided by the

RecSys conference which was used in a RecSys competition organised by Yelp in

2013 (https://www.kaggle.com/c/yelp-recsys-2013). The Yelp datasets include

detailed data of over 10,000 businesses, 8,000 check-in sites, 40,000 users, and

200,000 reviews from the Phoenix, AZ metropolitan area. The reason of using the

datasets is due to the fact that they have been popularly used in research areas such as

opinion mining and recommendation systems, so their reliability and feasibility have

been confirmed. The structure of each review in the Yelp datasets is in structure of

JSON file as below:

{

'type': 'review',

'business_id': (encrypted business id),

'user_id': (encrypted user id),

'stars': (star rating),

'text': (review text),

'date': (date),

'votes': {'useful': (count), 'funny': (count), 'cool': (count)}

}

http://amazon.com/

https://www.kaggle.com/c/yelp-recsys-2013


In this study, we only keep the text review content to carry out the experiment and

review helpfulness voting to evaluate our results. The reviews having less than two

votes of helpfulness were not sufficient for the evaluation so we removed them.

Two criteria used to choose potential datasets are as following:

Criteria 1: Dataset having sufficient number of reviews (at least 300 reviews)

and each review having at least three votes.

Criteria 2: Review having sufficient number of words (average number of

words in a review should be greater than 100 words).

According to the criteria above, we firstly filtered out reviews having less than three

votes of helpfulness. We then selected datasets having sufficient reviews (greater

than 300 reviews) and sufficient average number of words in review (greater than

100 words). We recognized that even though The Yelp datasets include datasets of

over 10000 business of 200,000 reviews, most of them do not satisfy our criteria of

selecting datasets. For example, dataset named “Four Peaks Brewing Co” has 735

reviews original but end up 197 reviews after filtering out and cannot be used. Three

datasets meeting the criteria were selected for our experiment were “Cibo”, “Fez”

and “Pizzeria Bianco”.

In general, four datasets in the digital camera category and three datasets from

restaurant category of Yelp dataset are used in our experiment and shown in Table 3.

Table 3. Dataset Information for Digital Camera and Restaurant Businesses

Category Dataset Name Number of

reviews

Average number of

words in a review

Digital Camera Canon 6D (CAM1) 350 127

Canon 5D (CAM2) 363 145

Canon 7D (CAM3) 323 119

Canon T3 (CAM4) 375 123

Restaurant Cibo (REST1) 421 133

Fez (REST2) 375 142

Pizzeria Bianco (REST3) 332 122


5.2.2 Baseline model

Random

This method randomly selected a set of reviews and used this set for comparison. This

is the basic selection task, which can produce a set of random reviews without any bias

in a short period of time. In our research, we use this method to randomly select N-

reviews and use these N-reviews as the baseline of comparison with our method.

Maximum Coverage Greedy (MCG)

The method proposed in (Tsaparas, et al. (2011)) is to select a set of high quality

reviews which cover many different aspects of the product or services.

Specialised Review Selection (SRS)

The most related approach with our method is the SRS method, proposed by (Long, et

al., 2014). Although this method focuses on estimating the feature rating, their work

does include selecting the specialised reviews for a single feature. We therefore choose

SRS as our baseline model.

5.2.3 Proposed Methods

The first proposed method is our review selection method proposed in Chapter 4. We

named our proposed selection method Review Selection based on Weighted Relevance

(RSWR). In order to evaluate RSWR, we compare the performance of our model and

the baseline SRS.

Secondly, we aim to test our methods of selecting related words proposed in Chapter

3. In general, there are three main proposed methods of selecting related words.

WordNet and Sentiment Related words Selection (WSRWS)

This related word selection identifies the set of related words including similar

words and sentiment words, to the main features. These are discussed in

Section 3.2.1 and Section 3.2.2.

Topical and WordNet Related words Selection (TRWS)

The method proposed to find related words to the main feature by using

external ontology WordNet and traditional Topic Model (LDA) which is

discussed in Chapter 3.2.3 and 3.2.5.


Pattern based Topic Model and WordNet Related Word Selection (PTRWS)

The improved version of TRWS is where a Pattern based Topic Model is used

instead of a Traditional Topic Model (LDA) (discussed in Section 3.2.4 and

3.2.5).

5.2.4 Evaluation Metrics

The performance of the proposed review selection system can be evaluated according

to the method’s ability to select high-quality and helpful reviews for the target

feature. In order to evaluate the performance of the proposed approaches, we

compare the top-N-reviews selected from our proposed methods and the baseline

models. A various number of different metrics including Helpfulness Average Value,

Amazon Top Ranking, Precision, Recall, F-score, and Discounted Normalize Gain

were used as our evaluation metrics.

Helpfulness Average Score

As mentioned, our collected review datasets are from two sources, digital camera from

Amazon.com and restaurant business from Yelp.com. In this

section, we discuss our method of obtaining helpful scores for

reviews in our review datasets.

Reviews collected from digital camera datasets have votes for helpfulness and

unhelpfulness. We use the number of helpfulness votes and total number of votes to

determine the helpfulness score of the review.

𝐻𝑒𝑙𝑝𝑓𝑢𝑙(𝑟) = 𝐻𝑒𝑙𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)

𝐻𝑒𝑙𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)+𝑈𝑛𝐻𝑒𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)

where 𝑟 ∈ 𝑅

For example, for a review, if 120 people say this review is helpful and 80 people say

the review is unhelpful, then the helpfulness score is 0.6 (120/200).

Similarly, the Yelp website allows users voting for each review to indicate if it is

helpful from their perspective. Each review is associated with votes in three different

categories namely “useful”, “funny” and “cool”. We use the votes in the “useful”

category to determine the helpfulness of the review. In detail, the helpfulness score of

each review is calculated by the ratio of the count of usefulness of the review and the

total count of the review.


𝐻𝑒𝑙𝑝𝑓𝑢𝑙(𝑟) = 𝑢𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)

𝑢𝑠𝑒𝑓𝑢𝑙𝑙𝑣𝑜𝑡𝑒𝑠(𝑟) + 𝑓𝑢𝑛𝑛𝑦𝑣𝑜𝑡𝑒𝑠(𝑟) + 𝑐𝑜𝑜𝑙_𝑣𝑜𝑡𝑒𝑠(𝑟)

We evaluate the performance of our approach by comparing the average helpful score

of the top 10 and 15 reviews generated by our proposed approach to that of the

baseline. The higher the value of the average helpful score, the better the performance

of the approach. The result is confirmed by using t-test and p-value to determine the

significant difference of the results.

t-test and p-value

t-test is one of the most popular test in statistics to determine the significant difference

of the mean of a population to the mean of another population. Given two set of

reviews generated from two models, the significant difference of the mean of the first

set of reviews (X1) and the mean of the second set of reviews (X2) can be measured

by using t-test. The t-value can be calculated as:

𝑡 = 𝑋1 − 𝑋2

√𝑠1

2 + 𝑠22

𝑛1 + 𝑛2

Where 𝑛1, 𝑛2 are the size of two set 𝑋1and 𝑋2, 𝑠1 and 𝑠2are the standard deviation of

two sets, respectively. p-value can be obtained directly from the t-value. In our

experiment, we choose the level of significance (alpha) of 0.05 as the significant

boundary.

5.2.4.2 Amazon Top Ranking

Amazon.com is the most popular commerce website and it has its own algorithm to

rank the reviews in descending order of quality. The algorithm is designed by experts

where factors such as helpfulness votes, times when the review was created, etc. are

taken into account. In addition, they have approaches to avoid spam reviews thus the

top reviews of digital cameras on Amazon.com can be considered as high-quality

reviews. In our experiment, we also used top-N reviews ranking by Amazon.com as a

ground truth for the evaluation of our proposed models. A number of traditional

metrics including Precision, Recall and F-score measure were used to analyse the

result of the experiment. Precision indicate the ratio of how many of the selection

reviews are high quality, while recall reflects a portion of selected reviews that are

inside the top reviews of Amazon.com. The F test is the weighted average of the

Precision and Recall.


Let 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 = {𝑟1, 𝑟2, . . , 𝑟𝑘) 𝑎𝑛𝑑 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 = {𝑟1, 𝑟2, . . , 𝑟𝑡}, 𝑡 < 𝑘 are the top

𝑘 high quality reviews returned by Amzazon.com and top 𝑡 reviews returned by the

examined method, respectively. In our experiment, we choose k = 30 as the top 30

reviews from Amazon.com. Theses top 30 reviews serve as the ground truth and the

top 10, 15 (t =10, 15) reviews returned from the examined model serve as the examined

review sets.

Precison

Precision at tk of model (𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙) is defined as:

𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 ∩ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙

𝑡

Where 𝑡 = {10,15}

Recall

Recall at tk of model (𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙) is defined as:

𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛∩𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙

𝑘

Where 𝑘 = 30

F test

F- Measure at tk of model (𝐹@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴) is defined as:

𝐹@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =2 ∗ 𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙𝐴

∗ 𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴

𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙𝐴+ 𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴

Normalised discounted cumulative gain

Precision and Recall can be used to check whether the return reviews are in

𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛. However, they are unable to examine the position of the reviews in the

list. In fact, the position of the reviews in 𝑇𝑜𝑝𝑡𝑚𝑜𝑑𝑒𝑙is also a factor to decide the

performance of the model. Given the list of top 10 high-quality reviews in descending

quality order by two models, for example, Model A and Model B (Figure 10). Model

A and Model B return six reviews where four of them are in the Ground Truth Review

set. It is clear that the Precision of two models are equal (4/6). Nevertheless, Reviews

Set 1 ranks review 3 higher than review 4, which is the same with the Ground Truth

Review Set. This is opposite to review set 2. Therefore, Model A should have a higher

performance than Model B.


Figure 10. Review Position Problem

Discounted cumulative gain (DCG) is a measure of ranking quality and use is very

popular in Information Retrival (Järvelin & Kekäläinen, 2002). DCG is therefore used

to measure the gain of each review by comparing its position in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 with its

position in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 in this thesis. For a review 𝑟𝑖 ∈ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙, the gain of

review 𝑟𝑖 is defined as:

𝐺𝑟𝑖@𝑘 = {

5 ∗ (1 − |𝐼𝑝𝑟𝑖,𝐴𝑚𝑎𝑧𝑜𝑛 − 𝐼𝑝𝑟𝑖,𝑚𝑜𝑑𝑒𝑙|

𝑡) , 𝑖𝑓 𝑟𝑖 ∈ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙

0 , 𝑖𝑓 𝑟𝑖 ∉ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙

Where 𝐼𝑝𝑟𝑖,𝐴𝑚𝑎𝑧𝑜𝑛 is the rank position of the review 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 reviews, and

𝐼𝑟𝑖,𝑚𝑜𝑑𝑒𝑙 is the ranked position of review 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙.

According to the formula, the maximum value of 𝐺𝑟𝑖@𝑘 is 5 when the ranked position

of 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛is the same as the ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙. The value

of 𝐺𝑟𝑖@𝑘 is smaller when the distance of ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛and the

ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is higher. In the worst case where the returned

review is not in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛, then the gain is minimum (=0).

The gain for 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is called discounted cumulative Gain (DCG) and is calculated

by accumlulating the gain of each review in 𝑇𝑜𝑝𝑡𝑚𝑜𝑑𝑒𝑙. It is noted that there is the word

“discounted” because the review that is ranked lower in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙is reduced by an

amount that is logarithmically proportional to the position of the result review.

Discounted cummulative gain of the 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is defined as:

𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 = 𝐷𝐶𝐺𝑟 @𝑘 = ∑2𝐺𝑟𝑖

@𝑘 − 1

log2(𝑖 + 1)

𝑡

𝑖=1

Finally, the Discounted Cumulate Gain is normalized as:

𝑛𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙=

𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙

𝐼𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙


Where 𝐼𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is the ideal DCG (IDCG) which is the maximum possible DCG

for all reviews in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙. This ideal DCG is obtained when all reviews in

𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 have the exact position with the reviews in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛. The value of

normalized discounted cummulative gain will be calculated for both baseline model

and our proposed model and used to compare and evaluate the performance of our

proposed methods.

5.3 RESULT ANALYSIS AND EVALUATION

In this section, we carry out analysis and evaluate the results obtained from the

experiment. First of all, we evaluate our review selection method (Chapter 4) in section

5.3.1. We then evaluate our method of finding related words of the main features in

section 5.3.2.

Both evaluations require a main feature as the inputs to the model. We used the pattern

mining method (Section 3.1) to extract the list of main features. The table below gives

the list of the top examined features in each dataset, which are used as the inputted

main features to review selection model.

Table 4. Main Features of Seven Datasets

Datasets f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11

CAM1 Body Display Sensor Picture Mode Grip Shutter Zoom Battery Auto Video

CAM2 Body Sensor Menu Software Picture Battery Shutter Iso Autofocus Detail Exposure

CAM3 Viewfinder Button Zoom Aperture Speed Battery Weight Appearance Noise Light Exposure

CAM4 Viewfinder Focus Aperture Noise Light Battery Weight Auto Manual Memory

REST1 Service Cheese Atmosphere Burger Dessert Location Menu Star Staff Wine Time

REST2 Service Place Hour Drink Menu Atmosphere Cheese Price Brunch Sauce Music

REST3 Hour Wait Oven Drink Service Atmosphere Wine Pie Salad Time Onion


5.3.1 Review selection Evaluation

In Chapter 4, we proposed our review selection method for a single feature named

Review Selection based on Weighted Relevance (RSWR). In this section, we verify

the performance of RSWR by comparing its ability of selecting helpful reviews with

the specialized review selection method (SRS) proposed by Long, et al. (2014). In

more detail, we use the same input, which is an examined main feature and the set of

its related words, to SRS and RSWR to generate reviews sets. We evaluate RSWR and

SRS based on the top 10 and top 15 reviews in those generated review sets using

different evaluation metrics discussed in Section 5.2.5. Examined features are listed in

Table 4 while the related words to the main features is the set of similar words and

sentiment words, which is discussed in section 3.2.1 and 3.2.2.

5.3.1.1 Helpfulness Score

Table 5 provides the results of average helpfulness scores for eleven examined single

features of dataset CAM1 and Table 6 shows the final average helpfulness scores of

six datasets. In general, the average helpful scores of both the top 10 and top 15 reviews

generated by RSWR are always higher than SRS. Those results prove the improved

performance of our model in selecting helpful reviews for single features. The reason

is that our methods take into consideration both the direct relevant and information

distance (as discussed in Section 4.3.2). The direct relevance indicates the degree of

relevance of the review to the feature while the information distance indicates how far

off the review to the feature is. The review is of more relevance to the feature if the

direct relevance is high and the information distance is low. The SRS method only

takes into consideration the information distance but not the direct relevance.

Therefore, RWSR can better extract the helpful reviews.

Table 5. Helpfulness Score for Main Features of CAM1

Model f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 Average

Top

10

Random 0.573 0.621 0.531 0.832 0.498 0.721 0.560 0.592 0.614 0.549 0.369 0.587

MCG 0.787 0.732 0.669 0.695 0.738 0.689 0.780 0.829 0.715 0.818 0.714 0.742

SRS 0.814 0.831 0.831 0.798 0.835 0.792 0.841 0.846 0.845 0.797 0.812 0.822

RSWR 0.849 0.881 0.814 0.821 0.846 0.814 0.860 0.873 0.875 0.807 0.874 0.847

Random 0.423 0.538 0.401 0.612 0.406 0.553 0.641 0.473 0.699 0.602 0.585 0.539


Top

15

MCG 0.717 0.709 0.677 0.689 0.728 0.826 0.714 0.682 0.605 0.719 0.671 0.703

SRS 0.824 0.815 0.815 0.803 0.815 0.788 0.861 0.832 0.801 0.820 0.790 0.815

RSWR 0.860 0.830 0.832 0.820 0.814 0.794 0.871 0.822 0.816 0.821 0.814 0.827

Table 6. Average Helpful Score of seven datasets

Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3

Top 10 Random 0.587 0.587 0.568 0.609 0.554 0.462 0.526

MCG 0.742 0.718 0.730 0.729 0.710 0.619 0.732

SRS 0.822 0.843 0.805 0.852 0.712 0.739 0.717

RSWR 0.847 0.865 0.861 0.896 0.734 0.773 0.773

Top 15 Random 0.587 0.543 0.555 0.576 0.502 0.590 0.527

MCG 0.742 0.702 0.683 0.782 0.751 0.715 0.729

SRS 0.822 0.844 0.792 0.857 0.757 0.748 0.747

RSWR 0.827 0.854 0.844 0.882 0.781 0.785 0.788

t-test

In order to confirm the outperformance of RSWR over SRS in selecting helpful

reviews, we further use a t-test (as discussed in section 5.2.4.1) to verify the significant

difference of the average helpfulness score. Table 7 shows the average p-value of each

dataset.

Table 7. Mean Significance Difference t-test

CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3

p-value

(Top 10)

0.00563 0.0005703 0.0007311 0.000493 0.000865 0.0001242 0.0027621

p-value

(Top 15)

0.0007772 0.06091 0.0005713 0.000379 0.0004801 0.000776 0.002323

The p-value in the table clearly shows that most of the p-values obtained from the t-

test are smaller than the significant level (5%) except for the top 15 reviews of


dataset CAM2. Although we cannot conclude the significant difference of RSWR

and SRS for dataset CAM2, the value of 0.0691 is not very far from 5%. In general,

the t-test gives a confidence of the improved performance of RSWR over SRS.


In addition to review‘s helpful scores, we continue to use the Amazon ranking top

reviews as the second evaluation metric to compare our method to the baseline

method. Given a top-N high ranking reviews returned by the Amazon algorithm, we

would like to measure how many reviews in top-N can be selected by RSWR and

baselines. Precision, Recall and F-score of returned review sets were used to

determine the performance of RSWR and baselines. Table 8,

Table 9 and

Table 10 show the average value of Precision, Recall and F1 of the seven datasets.

Table 8. Precision of top-10 and top-15 reviews returned by RSWR and baselines

Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average

Top 10 Random 0.064 0.036 0.055 0.055 0.073 0.045 0.071 0.057

MCG 0.129 0.102 0.125 0.112 0.105 0.126 0.178 0.125

SRS 0.127 0.109 0.100 0.136 0.127 0.173 0.162 0.133

RSWR 0.164 0.155 0.200 0.127 0.164 0.100 0.174 0.155

Top 15 Random 0.036 0.073 0.048 0.067 0.073 0.055 0.049 0.057

MCG 0.192 0.121 0.149 0.179 0.138 0.178 0.121 0.154

SRS 0.200 0.188 0.145 0.170 0.127 0.200 0.213 0.178

RSWR 0.230 0.145 0.170 0.224 0.164 0.242 0.221 0.199

Table 9. Recall of top-10 and top-15 reviews returned by RSWR and baselines


Top 10 Random 0.070 0.067 0.042 0.070 0.073 0.039 0.024 0.055

MCG 0.017 0.082 0.089 0.091 0.092 0.078 0.065 0.073

SRS 0.052 0.112 0.073 0.073 0.048 0.082 0.071 0.073

RSWR 0.079 0.079 0.124 0.106 0.112 0.100 0.097 0.100


Top 15 Random 0.076 0.073 0.073 0.052 0.067 0.109 0.118 0.081

MCG 0.155 0.079 0.167 0.092 0.089 0.091 0.129 0.115

SRS 0.136 0.100 0.118 0.061 0.112 0.112 0.103 0.106

RSWR 0.212 0.200 0.200 0.188 0.170 0.158 0.147 0.182

Table 10. F-score of top-10 and top-15 reviews returned by RSWR and baselines


Top 10 Random 0.067 0.047 0.048

0.062 0.073 0.042 0.036 0.053

MCG 0.030 0.091 0.104

0.100 0.098 0.096 0.095 0.088

SRS 0.074 0.110 0.084

0.095 0.070 0.111 0.099 0.092

RSWR 0.107 0.105 0.153

0.116 0.133 0.100 0.125 0.120

Top 15 Random 0.049 0.073 0.058

0.059 0.070 0.073 0.069 0.064

MCG 0.172 0.096 0.157

0.122 0.108 0.120 0.125 0.129

SRS 0.162 0.131 0.130

0.090 0.119 0.144 0.139 0.131

RSWR 0.221 0.168 0.184

0.204 0.167 0.191 0.177 0.187

5.3.1.3 Normalized Discounted Cumulative Gain

As discussed in Evaluation Metrics part, normalised discounted cumulative gain takes

position of reviews into consideration, thus it can help to further verify performance

of RSWR. Table 11 demonstrates the improved performance of our RSWR over other

two base line models.

Table 11. Normalized Discounted Cumulative Gain


Top 10 Random 0.019 0.004 0.029 0.014 0.028 0.013 0.011 0.017

MCG 0.011 0.175 0.324 0.036 0.121 0.104 0.115 0.127

SRS 0.167 0.165 0.239 0.089 0.219 0.214 0.134 0.175

RSWR 0.287 0.167 0.378 0.134 0.318 0.276 0.175 0.248

Top 15 Random 0.0206 0.0039 0.0221 0.0184 0.0064 0.015 0.0021 0.013

MCG 0.117 0.104 0.189 0.752 0.190 0.189 0.110 0.236


SRS 0.286 0.298 0.205 0.1 0.178 0.176 0.173 0.202

RSWR 0.295 0.269 0.288 0.167 0.298 0.204 0.281 0.257

In general, according to the results of helpfulness scores and Amazon top ranking, our

proposed review selection always have a higher results than the baseline models. This

clearly proves the outperformance of RSWR.

5.3.2 Related Word Selection Evaluation

In Chapter 3, we propose a new method to identify the related words of the main

feature. The correct identification of the related words is important as they assist in

making the features more understandable. Wrong identification of those related words

can provide wrong information about the target features of a product. In this section,

we evaluate our proposed related word selection, named WSRWS, TRWS and

PTRWS. First, WSRWS, TRWS and PTRWS are used to generate different sets of

related words. Those related word sets are then inputted to our proposed review

selection method (RSWR) to generate different corresponding sets or reviews. Similar

to evaluation of our review selection method in 5.3.1, we evaluate WSRWS, TRWS

and PTRWS based on the top 10 and top 15 reviews of those generated sets of reviews

by using different evaluation metrics, including Helpfulness Score, Amazon Top

Ranking and Normalized Discounted Cumulative Gain.

5.3.2.1 Helpfulness Score

Table 12 provides the detailed results of average helpfulness scores for one dataset

(CAM1) while Table 13 summaries the average helpfulness scores for six datasets.

According to the results, TRWS and PTRWS have a higher helpfulness score than

WSRWS. This confirms the usefulness of incorporating a probabilistic topic model

into the task of related word identification. First of all, Topic Model is domain specific,

which help to identify related feature words that are buried in the review collection.

Those related features cannot be found by external ontology such as WordNet and

Google Distance or by other supervised methods. Secondly, words in each topic

reflecting one aspect of the product have a tight relationships with each other. Because

of these relationships, related words to the feature can be further confirmed. In

addition, the combination of using WordNet and Topic Model can help to find a set of

shared related words. As those shared related words can be found by different methods,

the degree of relatedness of the words in the shared group to the main feature is further


confirmed. The update of higher weights given to shared related words can increase

the importance of those related words to the main feature (more detailed in Section

3.2.5). As a result, our method can not only find related words but also discover the

corresponding weight of those related word correctly. Among the three methods, the

highest helpfulness score of PTRWS also confirms the usefulness of Pattern based

Topic Model. As patterns can better represent the semantic meaning than single words,

patterns in the Pattern based Topic Model can assist in more effective identifying

related words.

Table 12. Helpfulness Score for Main Features of CAM1

Model f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 Average

Top

10

WSRWS 0.849 0.881 0.814 0.821 0.846 0.814 0.860 0.873 0.875 0.807 0.874 0.841

TRWS 0.868 0.905 0.834 0.821 0.872 0.863 0.868 0.884 0.807 0.874 0.868 0.862

PTRWS 0.878 0.939 0.877 0.845 0.886 0.879 0.879 0.897 0.812 0.871 0.878 0.883

Top

15

WSRWS 0.860 0.830 0.832 0.820 0.814 0.794 0.871 0.822 0.816 0.821 0.814 0.832

TRWS 0.871 0.860 0.862 0.824 0.834 0.873 0.876 0.845 0.821 0.844 0.871 0.857

PTRWS 0.876 0.874 0.883 0.850 0.844 0.842 0.874 0.888 0.866 0.839 0.884 0.863

Table 13. Average Helpful Score of seven Datasets


Top 10 WSRWS 0.847 0.865 0.861 0.896 0.734 0.773 0.771 0.821

TRWS 0.860 0.863 0.883 0.916 0.769 0.801 0.792 0.841

PTRWS 0.876 0.894 0.895 0.931 0.782 0.818 0.801 0.857

Top 15 WSRWS 0.827 0.854 0.844 0.882 0.781 0.785 0.712 0.812

TRWS 0.853 0.863 0.862 0.894 0.805 0.796 0.774 0.835

PTRWS 0.865 0.882 0.871 0.921 0.811 0.823 0.810 0.855

t-test

The t-test is also used to compare the significance difference in the means value of two

sets of reviews generated. More specifically, we compare the significant difference

between the review sets generated by WSRWS and TRWS, and significant difference


between the review sets generated by WSRWS and PTRWS. Table 14 shows the p-

value of the t-test for six datasets. In general, most of p-values have the value of less

than significant level (5%) except for CAM3. Although we cannot conclude the

significant difference in the average value of WSRWS and TRWS for dataset CAM3,

the result of the five remaining datasets can still prove the significant difference in

most cases. The performance of TRWS and PTRWS over WSRWS in term of

helpfulness score are evidenced.

Table 14. Mean Significance Difference t-test

p-value Models CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3

p-value

(Top 10)

WSRWS

and

TRWS

0.0008953 0.0019027 0.4893854 0.0092010 0.0007412 0.0003682 0.0020888

WSRWS

and

PTRWS

0.0000752 0.0004558 0.0065319 0.0002363 0.0003490 0.0000731 0.003002

p-value

(Top 15)

WSRWS

and

TRWS

0.0001960 0.0007283 0.0141589 0.0487490 0.0001530 0.0008521 0.007183

WSRWS

and

PTRWS

0.0004969 0.0003715 0.0042177 0.0074329 0.000031 0.000027 0.0062827


Similar to review selection evaluation, we use Amazon Top Ranking from

Amazon.com to further verify the performance of our proposed related word selection

methods. Table 15,

Table 16, Table 17 show the average results of Precision, Recall and F-score for the

six datasets for top 10 and top 15 reviews. According to the results, PTRWS

outperforms the two remaining proposed methods and TRWS has a higher

performance than WSRWS in most cases. This again confirms the outperformance of

our methods in term of Amazon Top ranking.

Table 15. Precision of top-10 and top-15 returned reviews



Top 10 WSRWS 0.164 0.155 0.200 0.127 0.164 0.100 0.132 0.149

TRWS 0.282 0.255 0.245 0.264 0.218 0.227 0.219 0.244

PTRWS 0.318 0.291 0.336 0.400 0.309 0.364 0.297 0.331

Top 15 WSRWS 0.230 0.145 0.170 0.224 0.164 0.242 0.216 0.199

TRWS 0.248 0.248 0.261 0.224 0.327 0.285 0.276 0.267

PTRWS 0.382 0.339 0.345 0.352 0.285 0.352 0.314 0.338

Table 16. Recall of top-10 and top-15 returned reviews


Top 10 WSRWS 0.079 0.079 0.124 0.106 0.112 0.100 0.124 0.103

TRWS 0.088 0.097 0.118 0.100 0.121 0.088 0.167 0.111

PTRWS 0.142 0.161 0.133 0.115 0.091 0.158 0.177 0.140

Top 15 WSRWS 0.133 0.121 0.103 0.124 0.115 0.133 0.121 0.121

TRWS 0.188 0.112 0.145 0.139 0.136 0.115 0.156 0.142

PTRWS 0.155 0.209 0.191 0.197 0.167 0.218 0.194 0.190

Table 17. F-score of top-10 and top-15 returned reviews


Top 10 WSRWS 0.107 0.105 0.153 0.116 0.133 0.100 0.128 0.120

TRWS 0.134 0.141 0.159 0.145 0.156 0.127 0.189 0.150

PTRWS 0.196 0.207 0.191 0.179 0.141 0.220 0.222 0.194

Top 15 WSRWS 0.169 0.132 0.128 0.160 0.135 0.172 0.155 0.150

TRWS 0.214 0.154 0.186 0.172 0.192 0.164 0.199 0.183

PTRWS 0.221 0.259 0.246 0.253 0.211 0.269 0.240 0.242


5.3.2.3 Normalized Discounted Cumulative Gain

Finally, normalised discounted cumulative gain obtained from Table 18. Normalized

Discounted Cumulative Gain8 reaffirms that TRWS and PTRWS outperform

WSRWS, where PTRWS has the best performance.

Table 18. Normalized Discounted Cumulative Gain


Top 10 WSRWS 0.287 0.167 0.378 0.134 0.318 0.276 0.221 0.254

TRWS 0.295 0.179 0.372 0.169 0.341 0.293 0.227 0.268

PTRWS 0.297 0.252 0.381 0.185 0.319 0.314 0.254 0.286

Top 15 WSRWS 0.295 0.269 0.288 0.167 0.298 0.204 0.218 0.248

TRWS 0.298 0.286 0.312 0.212 0.326 0.211 0.220 0.266

PTRWS 0.384 0.373 0.367 0.283 0.372 0.237 0.256 0.325

In general, according to the results of helpfulness scores, Amazon top ranking and

normalized discounted cumulative gain, our proposed review selection always have

higher results than the baseline models. This clearly proves the outperformance of our

proposed review selection method.

5.4 SUMMARY

In this chapter, we have designed a number of experiments to evaluate our proposed

methods, in Chapter 3 and Chapter 4. The evaluation is carried out in two parts: review

selection evaluation and related words selection. In the first evaluation, we compare

the performance of our review selection method (RSWR) with the specialised review

selection method (SRS) proposed by Long, et al. (2014). The results clearly show the

outstanding performance of RSWR over SRS in term of Helpfulness Score and

Amazon Top Ranking. Further evaluation with normalized discounted cumulative gain

reaffirms that our review selection method is more effective in identifying helpful

reviews according to a single feature. In the second evaluation, we evaluate our three

related word selection methods, WSRWS, TRWS and PTRWS. Similarly, we compare


the performance of those methods using Helpfulness Score, Amazon top ranking and

normalized discounted cumulative. The results indicate that PTRWS produces the

most accurate helpful review sets, followed by TRWS. This clearly proves the power

of incorporating Topic Model and Pattern based Topic Model in the proposed method.

76 Chapter 6: Conclusions

Chapter 6: Conclusions

In this chapter we list the achievements and limitations of this study. Potential research

works which can be done for the future are also proposed.

6.1 CONCLUSION

Online reviews have become an invaluable source for customer‘s reference in recent

times. However, information overloading in review content is a big issue to readers.

The research area of review selection has been facing two research problems:

ambiguity in content reviews and the need for work on review selection according to

single feature. In this thesis, we proposed methods to solve those research problems

and gain two primary research achievements.

First of all, the new methods are employed to reduce the ambiguity of review

content by identifying features and related words of the features. By using data

mining techniques, natural language processing, ontology and probabilistic

Topic Model, this work can effectively identify related words to the main

features of the product. As a result, polysemy and synonym issues on review

content can be significant reduced. Our experiments in Section 5.3.2 verify the

outperformance of our related word selection methods.

Secondly, we propose a new method of selecting reviews for a single feature.

As customers have different background context and situations, the importance

of each feature to them are also different. They normally expect to know about

features that are more necessary for them than other features of the product.

However, most previous research works do not focus on selecting helpful

reviews that intensively discuss one single feature. In this research, we propose

to apply information distance and direct relevance of related words in order to

identify helpful reviews for a single feature. This was discussed in detail in

Chapter 4. Our experiment in Section 5.3.1 verifies the outperformance of our

review selection method.

Chapter 6: Conclusions 77

6.2 LIMITATIONS

There are two limitations in this study.

The related word selection methods use Topic Model and Pattern based Topic

Model to discover related words of the target feature. Topic model requires a

sufficient amount of reviews in the review corpus in order to work well.

Therefore, datasets having a small number of reviews or sparse datasets are not

applicable for our proposed methods.

The review selection method only focuses on selecting reviews for a single

feature. However, some people may be interested in a group of features of a

product, or all of the features. Therefore, the review selection method should

be improved to deal with multiple features at the same time.

6.3 FUTURE WORK

Probabilistic topic model is employed to discover related words to the main

feature in this study. However, we did not analyse the detailed relationship

among those related words within each topic. Understanding relationships

among related words and relationships of each related word to the target feature

surely provides more insight about the task of identifying related words. In

addition, according to many studies about Topic Model (Chang et al.),

incoherent topics do exist. Therefore, the need of topic interpretation and

evaluation before use should be considered in future work.

As mentioned in the limitation section, selecting reviews for a group of features

should be incorporated into our proposed review selection method. In our

study, associated related words of a single main feature can be identified as

represented in Chapter 3. Therefore, related words of multiple main features

can be combined in order to obtain overall related words. The final multiple

main features and their associated related words can be used as the input to our

review selection model in order to generate reviews which discussing about

those multiple features. This can help to extend our review selection not only

for a single feature but also for multiple features.

REFERENCES 79

REFERENCES

AlSumait, L., Barbará, D., Gentle, J., & Domeniconi, C. (2009). Topic significance

ranking of LDA generative models. In Machine Learning and Knowledge

Discovery in Databases (pp. 67-82): Springer.

Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating domain knowledge

into topic modeling via Dirichlet forest priors. In Proceedings of the 26th

Annual International Conference on Machine Learning (pp. 25-32): ACM.

Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising the WordNet

domains hierarchy: semantics, coverage and balancing. In Proceedings of the

Workshop on Multilingual Linguistic Ressources (pp. 101-108): Association

for Computational Linguistics.

Blei, D. M., & McAuliffe, J. D. (2007). Supervised Topic Models. Advances in

Neural Information Processing Systems (NIPS).

Blei, D. M., & McAuliffe, J. D. (2008). Supervised Topic Models. Advances in

Neural Information Processing Systems (NIPS).

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003a). Latent dirichlet allocation. the

Journal of machine Learning research, 3, 993-1022.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003b). Latent Dirichlet Allocation. Journal

of Machine Learning Research 3, 993-1022.

Brody, S., & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online

reviews. In Human Language Technologies: The 2010 Annual Conference of

the North American Chapter of the Association for Computational Linguistics

(pp. 804-812): Association for Computational Linguistics.

Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical

semantic relatedness. Computational Linguistics, 32(1), 13-47.

Chan, L. M. (1995). Library of Congress subject headings: principles and

application: ERIC.

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading

tea leaves: How humans interpret topic models. In Advances in neural

information processing systems (pp. 288-296).

80 REFERENCES

Chen, C. C., & Tseng, Y.-D. (2011). Quality evaluation of product reviews using an

information quality framework. Decision Support Systems, 50(4), 755-768.

Cilibrasi, R. L., & Vitanyi, P. (2007). The google similarity distance. Knowledge and

Data Engineering, IEEE Transactions on, 19(3), 370-383.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R.

(1990). Indexing by Latent Semantic Analysis. Journal of the American

Society for Information Science, 41(6), 391-407.

Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online

product reviews in forecasting sales: The case of motion pictures. Journal of

Interactive marketing, 21(4), 23-45.

Fellbaum, C. (2010). WordNet. In Theory and applications of ontology: computer

applications (pp. 231-243): Springer.

Fischer, K. S. (2005). Critical views of LCSH, 1990–2001: The third bibliographic

essay. Cataloging & classification quarterly, 41(1), 63-109.

Gangemi, A., Guarino, N., & Oltramari, A. (2001). Conceptual analysis of lexical

taxonomies: The case of WordNet top-level. In Proceedings of the

international conference on Formal Ontology in Information Systems-Volume

2001 (pp. 285-296): ACM.

Gao, Y., Xu, Y., & Li, Y. (2013). Pattern based Topic Models for Information

Filtering. In IEEE 13th International Conference on Data Mining Workshops

(pp. 921-928): IEEE.

Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact

of product reviews: Mining text and reviewer characteristics. Knowledge and

Data Engineering, IEEE Transactions on, 23(10), 1498-1512.

Griffiths, T. L., Mark, S., David, M. B., & Joshua B., T. (2005). Integrating topics

and syntax. Advances in Neural Information Processing Systems, 17, 537–

544. Retrieved from

Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating

topics and syntax. In Advances in neural information processing systems (pp.

537-544).

REFERENCES 81

Gruber, T. R. (1995). Toward principles for the design of ontologies used for

knowledge sharing? International journal of human-computer studies, 43(5),

907-928.

Hofmann, T. (1999, 1999). Probabilistic latent semantic indexing. In 22nd Annual

international conference on research and development in information

retrieval (pp. 50-57): ACM.

Hong, Y., Lu, J., Yao, J., Zhu, Q., & Zhou, G. (2012). What Reviews are

Satisfactory: Novel Features for Automatic Helpfulness Voting. In SIGIR

conference on research and development in information retrieval (pp. 495 -

504): ACM.

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In

Proceedings of the tenth ACM SIGKDD international conference on

Knowledge discovery and data mining (pp. 168-177): ACM.

Hung, C., Wermter, S., & Smith, P. (2004). Hybrid neural document clustering using

guided self-organization and WordNet. Intelligent Systems, IEEE, 19(2), 68-

77.

Kim, S.-M., & Hovy, E. (2004). Determining the sentiment of opinions. In

Proceedings of the 20th international conference on Computational

Linguistics (pp. 1367): Association for Computational Linguistics.

Kim, S.-M., Pantel, P., Chklovsk, T., & Pennacchiotti, M. (2006). Automatically

Assessing Review Helpfulness. In Association for Computational Linguistics

(pp. 423-430).

Krestel, R., & Dokoohaki, N. (2011). Diversifying Product Review Rankings:

Getting the Full Picture. In IEEE/WIC/ACM International Conferences on

Web Intelligence and Intelligent Agent Technology (pp. 138 - 145): ACM.

Lakkaraju, H., Bhattacharyya, C., Bhattacharya, I., & Merugu, S. (2011). Exploiting

Coherence for the Simultaneous Discovery of Latent Facets and associated

Sentiments. In SDM (pp. 498-509): SIAM.

Lappas, T., Crovella, M., & Terzi, E. (2012). Selecting a characteristic set of

reviews. In 18th SIGKDD international conference on knowledge discovery

and data mining: ACM.

82 REFERENCES

Lau, R. Y., Lai, C. C., Ma, J., & Li, Y. (2009). Automatic domain ontology

extraction for context-sensitive opinion mining. ICIS 2009 Proceedings, 35-

53.

Liang, H., Xu, Y., Li, Y., & Nayak, R. (2009). Tag based collaborative filtering for

recommender systems. In Rough Sets and Knowledge Technology (pp. 666-

673): Springer.

Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In

Proceedings of the 18th ACM conference on Information and knowledge

management (pp. 375-384): ACM.

Liu, B. (2010). Sentiment analysis: A multi-faceted problem. IEEE Intelligent

Systems, 25(3), 76-80.

Liu, B. (2012). Sentiment Analysis and Opinion Mining

Liu, J., Yunbo, C., Chin-Yew, L., Yalou, H., & Ming, Z. (2007). Low-quality

product review detection in opinion summarisation. In Association for

Computational Linguistics (pp. 334-342).

Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document

retrieval via utilizing WordNet and recognizing phrases. In Proceedings of

the 27th annual international ACM SIGIR conference on Research and

development in information retrieval (pp. 266-272): ACM.

Long, C., Zhang, J., Huang, M., Zhu, X., Li, M., & Ma, B. (2014). Estimating feature

ratings through an effective review selection approach. Knowledge and

Information Systems, 38(2), 419-446.

Lu, Y., Zhai, C. X., & Sundaresan, N. (2009). Rated aspect summarization of short

comments. In (pp. 131-140).

Ma, Z., Pant, G., & Sheng, O. R. L. (2007). Interest-based personalized search. ACM

Transactions on Information Systems (TOIS), 25(1), 5.

Manna, S., & Mendis, B. S. U. (2010). Fuzzy word similarity: a semantic approach

using WordNet. In Fuzzy Systems (FUZZ), 2010 IEEE International

Conference on (pp. 1-8): IEEE.

Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007, 2007). Topic sentiment

mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th

international conference on world wide web (pp. 171-180): ACM.

REFERENCES 83

Mei, Q., Shen, X., & Zhai, C. (2007). Automatic labeling of multinomial topic

models. In Proceedings of the 13th ACM SIGKDD international conference

on Knowledge discovery and data mining (pp. 490-499): ACM.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990).

Introduction to WordNet: An on-line lexical database*. International journal

of lexicography, 3(4), 235-244.

Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011).

Optimizing semantic coherence in topic models. In Proceedings of the

Conference on Empirical Methods in Natural Language Processing (pp. 262-

272): Association for Computational Linguistics.

Minka, T., & Lafferty, J. (2002). Expectation-propagation for the generative aspect

model. In Proceedings of the Eighteenth conference on Uncertainty in

artificial intelligence (pp. 352-359): Morgan Kaufmann Publishers Inc.

Misra, H., Cappé, O., & Yvon, F. (2008). Using LDA to detect semantically

incoherent documents. In Proceedings of the Twelfth Conference on

Computational Natural Language Learning (pp. 41-48): Association for

Computational Linguistics.

Missen, M. M. S., Boughanem, M., & Cabanac, G. (2009, 2009). Challenges for

Sentence Level Opinion Detection in Blogs. In (pp. 347-351).

Moghaddam, S., & Ester, M. (2011). ILDA: interdependent LDA model for learning

latent aspects and their ratings from online product reviews. In 34th

international ACM SIGIR conference on research and development in

information retrieval (pp. 665 - 674): ACM.

Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of

customer reviews on Amazon. com. MIS quarterly, 34(1), 185-200.

Mukherjee, A., & Bing, L. (2012). Aspect Extraction through SemiSupervised

Modeling. In Proceedings of 50th Anunal Meeting of Association for

Computational Linguistics (pp. 339-348).

Navigli, R., Velardi, P., & Gangemi, A. (2003). Ontology learning and its application

to automated terminology translation. Intelligent Systems, IEEE, 18(1), 22-31.

Newman, D., Karimi, S., & Cavedon, L. (2009). External evaluation of topic models.

In in Australasian Doc. Comp. Symp., 2009: Citeseer.

84 REFERENCES

Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of

topic coherence. In Human Language Technologies: The 2010 Annual

Conference of the North American Chapter of the Association for

Computational Linguistics (pp. 100-108): Association for Computational

Linguistics.

Noy, N. F. (2004). Semantic integration: a survey of ontology-based approaches.

ACM Sigmod Record, 33(4), 65-70.

O'Mahony, M. P., & Smyth, B. (2009). Learning to recommend helpful hotel

reviews. In Proceedings of the third ACM conference on Recommender

systems (pp. 305-308): ACM.

Ockerbloom, J. M. (2006). New maps of the library: Building better subject

discovery tools using Library of Congress Subject Headings.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and

trends in information retrieval, 2(1-2), 1-135.

Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity:

measuring the relatedness of concepts. In Demonstration papers at HLT-

NAACL 2004 (pp. 38-41): Association for Computational Linguistics.

Sabou, M., Lopez, V., Motta, E., & Uren, V. (2006). Ontology selection: Ontology

evaluation on the real semantic web.

Sowa, J. F. (2001). Building, sharing, and merging ontologies. web site: http://www.

jfsowa. com/ontology/ontoshar. htm.

Steyvers, M., & Griffiths, T. (Singer-songwriters). (2006). Probabilistic topic

models. Latent Semantic Analysis: A Road to Meaning. T. Landauer, D.

McNamara, S. Dennis, and W. Kintsch, eds. On: Lawrence Erlbaum.

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent

semantic analysis, 427(7), 424-440.

Stojanovic, N. (2005). On the query refinement in the ontology-based searching for

information. Information Systems, 30(7), 543-563.

http://www/

REFERENCES 85

Tao, X., Li, Y., Lau, R. Y., & Wang, H. (2012). Unsupervised multi-label text

classification using a world knowledge ontology. In Advances in Knowledge

Discovery and Data Mining (pp. 480-492): Springer.

Teh, Y. W., Newman, D., & Welling, M. (2006). A collapsed variational Bayesian

inference algorithm for latent Dirichlet allocation. In Advances in neural

information processing systems (pp. 1353-1360).

Tian, N., Xu, Y., & Li, Y. (2014). A review selection method using product feature

taxonomy. In 15th International Conference on Web Information Systems

Engineering, WISE 2014 (pp. 408-417): Springer.

Titov, I., & McDonald, R. (2008). Modeling online reviews with multi-grain topic

models. In Proceedings of the 17th international conference on World Wide

Web (pp. 111-120): ACM.

Tsaparas, P., Ntoulas, A., & Terzi, E. (2011). Selecting a comprehensive set of

reviews. In 17th SIGKDD nternational conference on knowledge discovery

and data mining (pp. 168 - 176): ACM.

Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In

SIGIR’94 (pp. 61-69): Springer.

Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation

methods for topic models. In Proceedings of the 26th Annual International

Conference on Machine Learning (pp. 1105-1112): ACM.

Zhang, L., & Liu, B. (2011). Identifying noun product features that imply opinions.

In Proceedings of the 49th Annual Meeting of the Association for

Computational Linguistics: Human Language Technologies: short papers-

Volume 2 (pp. 575-580): Association for Computational Linguistics.

Zhang, Y., & Zhang, D. (2014, 13-15 Aug. 2014). Automatically predicting the

helpfulness of online reviews. In Information Reuse and Integration (IRI),

2014 IEEE 15th International Conference on (pp. 662-668).

Zhao, W. X., Jiang, J., Yan, H., & Li, X. (2010). Jointly modeling aspects and

opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference

on Empirical Methods in Natural Language Processing (pp. 56-65):

Association for Computational Linguistics.

Zhao, W. X., Jing, J., Hongfei, Y., & Xiaoming, L. (2010). Jointly modeling aspects

and opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010

86 REFERENCES

Conference on Empirical Methods in Natural Language Processing (pp. 56-

65).

Baeza-Yates, R., & Navarro, G. (1998). Fast approximate string matching in a dictionary. In String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings (pp. 14-22): IEEE.

Blei, D. M., & McAuliffe, J. D. (2008). Supervised Topic Models. Advances in Neural

Information Processing Systems (NIPS). Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003a). Latent Dirichlet Allocation. Journal of

Machine Learning Research 3, 993-1022. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003b). Latent dirichlet allocation. the Journal

of machine Learning research, 3, 993-1022. Brody, S., & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online

reviews. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 804-812): Association for Computational Linguistics.

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea

leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296).

Chen, C. C., & Tseng, Y.-D. (2011). Quality evaluation of product reviews using an

information quality framework. Decision Support Systems, 50(4), 755-768. Cilibrasi, R. L., & Vitanyi, P. (2007). The google similarity distance. Knowledge and

Data Engineering, IEEE Transactions on, 19(3), 370-383. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion

extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (pp. 519-528): ACM.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990).

Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391-407.

Gao, Y., Xu, Y., & Li, Y. (2013). Pattern-Based Topic Models for Information Filtering.

In IEEE 13th International Conference on Data Mining Workshops (pp. 921-928): IEEE.

Ghose, A., & Ipeirotis, P. G. (2006). Designing ranking systems for consumer

reviews: The impact of review subjectivity on product sales and review

REFERENCES 87

quality. In Proceedings of the 16th annual workshop on information technology and systems (pp. 303-310).

Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact

of product reviews: Mining text and reviewer characteristics. Knowledge and Data Engineering, IEEE Transactions on, 23(10), 1498-1512.

Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating

topics and syntax. In Advances in neural information processing systems (pp. 537-544).

Grünwald, P. D., & Vitányi, P. M. (2003). Kolmogorov complexity and information

theory. With an interpretation in terms of questions and answers. Journal of Logic, Language and Information, 12(4), 497-529.

Hoang, L., Lee, J.-T., Song, Y.-I., & Rim, H.-C. (2008). A model for evaluating the

quality of user-created documents. In Asia Information Retrieval Symposium (pp. 496-501): Springer.

Hofmann, T. (1999, 1999). Probabilistic latent semantic indexing. In 22nd Annual

international conference on research and development in information retrieval (pp. 50-57): ACM.

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings

of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177): ACM.

Hung, C., Wermter, S., & Smith, P. (2004). Hybrid neural document clustering using

guided self-organization and wordnet. Intelligent Systems, IEEE, 19(2), 68-77. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR

techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.

Kim, S.-M., Pantel, P., Chklovsk, T., & Pennacchiotti, M. (2006). Automatically

Assessing Review Helpfulness. In Association for Computational Linguistics (pp. 423-430).

Korfiatis, N., GarcíA-Bariocanal, E., & Sánchez-Alonso, S. (2012). Evaluating content

quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content. Electronic Commerce Research and Applications, 11(3), 205-217.

Krestel, R., & Dokoohaki, N. (2011). Diversifying Product Review Rankings: Getting

the Full Picture. In IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (pp. 138 - 145): ACM.

88 REFERENCES

Lakkaraju, H., Bhattacharyya, C., Bhattacharya, I., & Merugu, S. (2011). Exploiting Coherence for the Simultaneous Discovery of Latent Facets and associated Sentiments. In SDM (pp. 498-509): SIAM.

Lappas, T., Crovella, M., & Terzi, E. (2012). Selecting a characteristic set of reviews.

In 18th SIGKDD international conference on knowledge discovery and data mining: ACM.

Li, M., & Vitányi, P. (2013). An introduction to Kolmogorov complexity and its

applications: Springer Science & Business Media. Liao, J., Mendis, B., & Manna, S. (2010). Improving hierarchical document signature

performance by classifier combination. Neural Information Processing. Theory and Algorithms, 695-702.

Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In

Proceedings of the 18th ACM conference on Information and knowledge management (pp. 375-384): ACM.

Lin, D. (1998). An information-theoretic definition of similarity. In ICML (Vol. 98, pp.

296-304): Citeseer. Liu, B. (2010). Sentiment analysis: A multi-faceted problem. IEEE Intelligent Systems,

25(3), 76-80. Liu, B. (2012). Sentiment Analysis and Opinion Mining Liu, J., Yunbo, C., Chin-Yew, L., Yalou, H., & Ming, Z. (2007). Low-quality product

review detection in opinion summarisation. In Association for Computational Linguistics (pp. 334-342).

Liu, Y., Huang, X., An, A., & Yu, X. (2008). Modeling and predicting the helpfulness of

online reviews. In Data mining, 2008. ICDM'08. Eighth IEEE international conference on (pp. 443-452): IEEE.

Long, C., Zhang, J., Huang, M., Zhu, X., Li, M., & Ma, B. (2014). Estimating feature

ratings through an effective review selection approach. Knowledge and Information Systems, 38(2), 419-446.

Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). Exploiting social context for

review quality prediction. In Proceedings of the 19th international conference on World wide web (pp. 691-700): ACM.

Lu, Y., Zhai, C. X., & Sundaresan, N. (2009). Rated aspect summarization of short

comments. In (pp. 131-140).

REFERENCES 89

Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999): MIT Press.

Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007, 2007). Topic sentiment

mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on world wide web (pp. 171-180): ACM.

Miller, G., & Fellbaum, C. (Singer-songwriters). (1998). Wordnet: An electronic

lexical database. On: MIT Press Cambridge. Ming, L., & Vitányi, P. (1997). An introduction to Kolmogorov complexity and its

applications: Springer Heidelberg. Minka, T., & Lafferty, J. (2002). Expectation-propagation for the generative aspect

model. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (pp. 352-359): Morgan Kaufmann Publishers Inc.

Missen, M. M. S., Boughanem, M., & Cabanac, G. (2009, 2009). Challenges for

Sentence Level Opinion Detection in Blogs. In (pp. 347-351). Moghaddam, S., & Ester, M. (2011). ILDA: interdependent LDA model for learning

latent aspects and their ratings from online product reviews. In 34th international ACM SIGIR conference on research and development in information retrieval (pp. 665 - 674): ACM.

Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of

customer reviews on Amazon. com. Mukherjee, A., & Bing, L. (2012). Aspect Extraction through SemiSupervised

Modeling. In Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (pp. 339-348).

O'Mahony, M. P., & Smyth, B. (2009). Learning to recommend helpful hotel reviews.

In Proceedings of the third ACM conference on Recommender systems (pp. 305-308): ACM.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and

trends in information retrieval, 2(1-2), 1-135. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification

using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86): Association for Computational Linguistics.

Popescu, A.-M., & Etzioni, O. (2007). Extracting Product Features and Opinions from

Reviews. In (pp. 9-28). London: Springer London.

90 REFERENCES

Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.

Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H., & Jin, C. (2007, 2007). Red

Opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce (pp. 182-191): ACM.

Siering, M., & Muntermann, J. (2013). What Drives the Helpfulness of Online

Product Reviews? From Stars to Facts and Emotions. In Wirtschaftsinformatik (Vol. 7).

Steyvers, M., & Griffiths, T. (Singer-songwriters). (2006). Probabilistic topic models.

Latent Semantic Analysis: A Road to Meaning. T. Landauer, D. McNamara, S. Dennis, and W. Kintsch, eds. On: Lawrence Erlbaum.

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent

semantic analysis, 427(7), 424-440. Teh, Y. W., Newman, D., & Welling, M. (2006). A collapsed variational Bayesian

inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (pp. 1353-1360).

Tian, N., Xu, Y., & Li, Y. (2014). A review selection method using product feature

taxonomy. In 15th International Conference on Web Information Systems Engineering, WISE 2014 (pp. 408-417): Springer.

Titov, I., & McDonald, R. (2008). Modeling online reviews with multi-grain topic

models. In Proceedings of the 17th international conference on World Wide Web (pp. 111-120): ACM.

Tsaparas, P., Ntoulas, A., & Terzi, E. (2011). Selecting a comprehensive set of

reviews. In 17th SIGKDD nternational conference on knowledge discovery and data mining (pp. 168 - 176): ACM.

Wang, D., Zhu, S., & Li, T. (2013). SumView: A Web-based engine for summarizing

product reviews and customer opinions. Expert Systems with Applications, 40(1), 27-33.

Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room

sales. International Journal of Hospitality Management, 28(1), 180-182. Zhang, Y., & Zhang, D. (2014, 13-15 Aug. 2014). Automatically predicting the

helpfulness of online reviews. In Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on (pp. 662-668).

REFERENCES 91

Zhang, Z., & Varadarajan, B. (2006). Utility scoring of product reviews. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 51-57): ACM.

Zhao, W. X., Jiang, J., Yan, H., & Li, X. (2010). Jointly modeling aspects and opinions

with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 56-65): Association for Computational Linguistics.

Zhao, W. X., Jing, J., Hongfei, Y., & Xiaoming, L. (2010). Jointly modeling aspects and

opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 56-65).

REFERENCES 93

REVIEW SELECTION BASED ON TOPIC MODELS Duc_Nguyen_Thesis...review selection based on topic models...

Documents

Transcript of REVIEW SELECTION BASED ON TOPIC MODELS Duc_Nguyen_Thesis...review selection based on topic models...