REVIEW SELECTION BASED ON TOPIC MODELS Duc_Nguyen_Thesis...review selection based on topic models...
Transcript of REVIEW SELECTION BASED ON TOPIC MODELS Duc_Nguyen_Thesis...review selection based on topic models...
REVIEW SELECTION BASED ON TOPIC
MODELS
Anh Duc Nguyen
A THESIS SUBMITTED TO
THE SCIENCE AND ENGINEERING FACULTY
OF QUEENSLAND UNIVERSITY OF TECHNOLOGY
IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF INFORMATION TECHNOLOGY (RESEARCH)
School of Electrical Engineering and Computer Science
Science and Engineering Faculty
Queensland University of Technology
Brisbane, Australia
2018
review selection based on topic models i
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best
of my knowledge and belief, the thesis contains no material previously published or
written by another person except where due reference is made.
Signature:
Date: February 2018
QUT Verified Signature
ii review selection based on topic models
review selection based on topic models iii
To the memorable journey
iv review selection based on topic models
review selection based on topic models v
Keywords
Review Selection
Related Words
Pattern Mining
Topic Modelling
Pattern based Topic Model
WordNet
vi review selection based on topic models
Abstract
Online reviews provide valuable sources of information about products and assist
customers in purchase decision making. However, the sheer and overwhelming
volume of reviews, together with the large variations in review quality, present an
obstacle for customers in obtaining useful content from online reviews. Clearly, it is
impossible for customers to read thousands of reviews in a short time to get a full
picture of the product. Therefore, research work in helpful review selection has
attracted many researchers in recent times. A present line of research has been starting
to focus on the content of reviews, by analysing features of product mentioned in the
reviews. However, this research stream suffers some problems. The first and foremost
difficulty is the ambiguity of the text review content, which makes the task of
automatically analysing and understanding the review content very challenging. The
reason is that reviewers have their own styles and the freedom to write whatever they
want, without following any specific syntax, grammar or structure. As a result,
polysemy or synonyms are common issues in online reviews. Secondly, most of the
current proposed review selection methods are based on a perception that all features
of a product are independent. Nevertheless, there are always relationships among
features of products. Those related features are normally used together to express
information about a specific aspect of the product. Successful discovery of those
related features can certainly improve the task of review selection. Lastly, most
research has studied review selection as product-centric but not customer-centric.
Those studies consider that all of the features are equally important to customers and
propose review selection approaches that can cover as many features as possible. As a
result, returned reviews by those methods are always the same, despite the specific
needs of customers. For a specific feature that is very important to a certain customer,
reviews covering many other features but lacking detailed information about that
feature are useless to that customer. A recent work has been proposed to find
specialised reviews discussing a specific feature based on similar words to the
specified feature (Long et al., 2014) but this still suffers quite a few issues.
review selection based on topic models vii
To tackle the problems mentioned above, our work firstly proposes a novel approach
to extract the main features and related features of the product by applying data mining,
natural language processing and probabilistic Topic Modelling. By combining external
knowledge ontology WordNet with modern probabilistic model (Topic Model and
Pattern based Topic model), a completed set of related words to the main features can
be more accurately identified, therefore reducing the problem of ambiguity in online
review content. Specifically, Topic Model is a probabilistic model that can help to
discover related features of the main features. This is an important contribution to
reduce the problem of ambiguity. Secondly, in this thesis, we also propose a new
review selection method based on a single feature of the product. By utilising identified
related words to the target feature, together with our review selection methods, a set
of helpful reviews that insensitively discuss about the target feature can be identified.
As the features that are important to customer is inputted in the review selection model
by a customer, our work is therefore customer - centric.
We provide detailed experiments in this research to verify our proposed methods. The
results of our experiments in Chapter 5 confirm the outperformance of our proposed
approach over existing works.
viii review selection based on topic models
Table of Contents
Statement of Original Authorship ............................................................................................. i
Keywords ................................................................................................................................. v
Abstract ................................................................................................................................... vi
Table of Contents .................................................................................................................. viii
List of Figures .......................................................................................................................... x
List of Tables ........................................................................................................................... xi
List of Abbreviations .............................................................................................................. xii
Acknowledgements ............................................................................................................... xiii
Chapter 1: Introduction............................................................................................. 1
1.1 Overview ........................................................................................................................ 1
1.2 Research Problem and objective .................................................................................... 6
1.3 Research significance and contribution ......................................................................... 8
1.4 Publication ..................................................................................................................... 9
1.5 Thesis Outline ................................................................................................................ 9
Chapter 2: Literature Review ........................................................................... 11
2.1 Review selection .......................................................................................................... 11
2.2 Topic modelling ........................................................................................................... 19
2.3 Summary ...................................................................................................................... 25
Chapter 3: Main Feature Selection and Related Feature Selection .............. 27
3.1 Main Feature selection ................................................................................................. 28
3.2 Discovery of related words of main features ............................................................... 32
3.3 Summary ...................................................................................................................... 48
Chapter 4: Review Selection for Single Feature .............................................. 49
4.1 Overview of Helpful Review Selection ....................................................................... 49
4.2 Specialised Review Selection (SRS) method ............................................................... 50
4.3 The proposed review selection method ........................................................................ 51
4.4 Summary ...................................................................................................................... 56
Chapter 5: Experiments and Evaluation ......................................................... 57
5.1 Experimental Environment .......................................................................................... 57
5.2 Experiment Design ....................................................................................................... 57
5.3 Result Analysis and Evaluation ................................................................................... 65
5.4 Summary ...................................................................................................................... 74
Chapter 6: Conclusions...................................................................................... 76
review selection based on topic models ix
6.1 Conclusion ....................................................................................................................76
6.2 Limitations ....................................................................................................................77
6.3 Future Work ..................................................................................................................77
REFERENCES ......................................................................................................... 79
x review selection based on topic models
List of Figures
Figure 1. Mona Lisa Restaurant with 1022 reviews on Yelp.com ............................... 2
Figure 2. Canon EOS T5 with 779 reviews on Amazon.com ...................................... 2
Figure 3. Features summary ......................................................................................... 5
Figure 4. Probabilistic Latent Semantic Model. ......................................................... 20
Figure 5. A graphical model representation of the latent Dirichlet allocation
(LDA). .......................................................................................................... 22
Figure 6. Similar and sentiment words of Feature “atmosphere” .............................. 33
Figure 7. Topic 5 after removing words having low weight for one restaurant
dataset ........................................................................................................... 39
Figure 8. Related words of feature f (𝑅𝑊𝑅𝑓) ............................................................ 45
Figure 9. The representation of review content by main features and related
words ............................................................................................................ 48
Figure 10. Review Position Problem.......................................................................... 64
review selection based on topic models xi
List of Tables
Table 1. Main Features and Related Features ............................................................ 29
Table 2. Transactional Database ................................................................................ 31
Table 3. Dataset Information for Restaurant Business............................................... 59
Table 4. Main Features of Six Datasets ..................................................................... 65
Table 5. Helpfulness Score for Main Features of CAM1 .......................................... 66
Table 6. Average Helpful Score of six datasets ......................................................... 67
Table 7. Mean Significance Difference t-test ............................................................ 67
Table 8. Precision of top-10 and top-15 reviews returned by RSWR and SRS ......... 68
Table 9. Recall of top-10 and top-15 reviews returned by RSWR and SRS ............. 68
Table 10. F-score of top-10 and top-15 reviews returned by RSWR and SRS .......... 69
Table 11. Normalized Discounted Cumulative Gain ................................................. 69
Table 12. Helpfulness Score for Main Features of CAM1 ........................................ 71
Table 13. Average Helpful Score of six Datasets ...................................................... 71
Table 14. Mean Significance Difference t-test .......................................................... 72
Table 15. Precision of top-10 and top-15 returned reviews ....................................... 72
Table 16. Recall of top-10 and top-15 returned reviews ............................................ 73
Table 17. F-score of top-10 and top-15 returned reviews .......................................... 73
Table 18. Normalized Discounted Cumulative Gain ................................................. 74
xii review selection based on topic models
List of Abbreviations
LDA Latent Dirichlet Allocation
PBTM Pattern based Topic Model
PLSA Probabilistic Latent Semantic Analysis
NLP Natural Language Processing
POST Part-of-Speech Tagging
IC Information Content
LCS Lowest Common Subsumer
SRS Specialized Review Selection
RSWR Review Selection Based on Weighted Relevance
TRWS Topical Related Word Selection
PTRWS Pattern Based Topic Related Word Selection
SPMF Open-source Data Mining Library
JWNL Java WordNet Library
review selection based on topic models xiii
Acknowledgements
I would like first to express my sincere gratitude to my principal supervisor Associate
Professor Yue Xu for the continuous support of my Master research, and for her
patience, motivation, and enthusiasm. Her guidance helped me in all the stages of
research and writing of this thesis.
My sincere thanks also goes to my colleague Nan Tian for his assistance and
suggestions regarding the experiment implementation. Many thanks to the
administration staff in the University for their support during this research.
I would like to thank my parents for providing me with unfailing support
and continuous encouragement throughout my years of study.
Finally, I want to thank my wife, Hana. In the past two years, you have not only been
my wife, you have been my best friends. You have taken over family responsibility to
leave me time for my study. This accomplishment would not have been possible
without you. Thank you!
0 Chapter 1: Introduction 1
Chapter 1: Introduction
In this chapter, the background and context of this research is outlined (Section 1.1).
The gap in research areas of review selection are highlighted and the research problem
to solve is formulated (Section 1.2). The discussion of significance of the proposed
research will be provided in Section 1.3, and the research objectives will be described
in Section 1.4.
1.1 OVERVIEW
With the rapid expansion of Web2.0 and e-commerce, in recent years consumers have
witnessed an increasing number of online shopping activities. More and more shoppers
use online platforms to browse and buy products. Furthermore, online retailing with
interactive tools and forms of consumer feedback have encouraged consumers to share
their personal consumption experiences and express their opinions by writing reviews
about the purchased product. It is a fact that customers prefer to rely on comments as
an information resource in order to get a full overview of their target product. These
online reviews provide a valuable source of information about a product and assist
customers in their final purchase decision. However, the explosive proliferation of
reviews for each product on the Internet is also a headache for customers. It is common
to see several hundreds of reviews about one popular product or service on popular e-
commerce websites such as Yelp (http://Yelp.com) (Figure 1) and Amazon
(http://amazon.com) (Figure 2). Clearly, a typical user does not have enough time and
patience to sift through such overwhelming reviews. Furthermore, some reviews may
provide incorrect and missing product information, or even misleading reviews,
lowering the quality of reviews. As a result, customers may have deficiency of
knowledge on products, and be unable to judge the true quality of a product prior to
purchase. Taken seriously, the strong demand to seek helpful and high-quality reviews
has attracted researchers’ attention to the subject of online review selection.
2 0 Chapter 1: Introduction
Figure 1. Mona Lisa Restaurant with 1022 reviews on Yelp.com
Figure 2. Canon EOS T5 with 779 reviews on Amazon.com
Accordingly, to tackle the information-overload issues of online reviews, many
websites allow users to vote for the helpfulness of each review, based on their personal
experience. As a result, each review in the review collection obtains a helpfulness
score, for example, “153 out of 250 people found this review helpful”, and the reviews
can then be sorted according to this score. The helpfulness votes are generally an
effective way to filter reviews (Ghose & Ipeirotis, 2011), hence are a good indicator
0 Chapter 1: Introduction 3
for extracting useful reviews from the rest. Although it is certainly an improvement,
they also suffered significant drawbacks. For example, older reviews that have already
accumulated many votes are ranked higher, thereby increasing their visibility over
reviews that are newly posted. Commonly, users tend to choose high voting reviews
to read, so recent published reviews may never turn up on users’ radar due to no vote
or only a few votes.
Recently, a substantial amount of research has been conducted to identify the quality
of reviews effectively. Proposed methods have focused on automatically estimating
the quality of reviews by making use of the textual and social characteristics of
reviews, such as writing style, review length, grammar, reviewer information and
timeliness (Kim et al., 2006; Liu et al., 2007; Lu et al., 2010; Zhang & Varadarajan,
2006). One significant issue for these methods is that the content of reviews is ignored.
In fact, customers prefer to seek out, as much as possible, what a review is talking
about rather than how professionally the reviews are written. Many studies have
showed that the top reviews generated by those approaches very often contain
redundant information about a particular feature but miss out other important features
of the product that are also important to users. For example, the top-10 reviews of a
digital camera generated using those approaches repeatedly only mentioned the
“image”, and “battery” but comment nothing on the “weight”, “price”, “lens”, etc. and
fail to provide customers with an overview picture of the product. Therefore, a review
that has a professional style of writing and correct grammar but no useful content for
customers, cannot be considered as a helpful review.
This limitation has opened another line of research on the review selection field, where
the content of reviews has been taken into consideration. In online reviews, the content
of reviews can be expressed as information about features of a product buried in
reviews. Feature is primarily defined as “an attribute of the product that has been
mentioned in reviews” (Hu & Liu, 2004). For example, “display” and “battery” are
two features in the sentence “the display of this camera is blurred sometimes but the
battery is an advantage”. When reviewers write a comment about a product, they
mainly express their personal experience about features of a product. Similarly, readers
try to understand a product by seeking information about the features before making a
decision. In fact, the features of a product are the main topics of discussion in online
reviews. The importance of features in online reviews has sparked a new line of
4 0 Chapter 1: Introduction
research in review selection, based on features in the review. First of all, a number of
studies have focused on selecting a subset of reviews, which can cover many features
and preserve the characteristics of the origin review corpus. The advantage is that users
can read this sub-collection of reviews instead of becoming immersed in thousands of
the origin review corpus to get the overview of a product. Tsaparas et al. (2011)
focused on selecting reviews that can cover as many features as possible and those
reviews should offer different viewpoints for each feature. However, the selected
reviews by this method might not reflect the origin opinion distributions of the review
corpus and might lead to an insufficient picture of a product for users. Take reviews
of a digital camera, for example; if there are 80% of the total reviews that complement
the feature “price” of camera, then the overall opinion on the price of the camera
should be positive. Reviews discussing the feature “price” selected by Tsaparas‘s
method might fail to reflect the positive overall viewpoint on “price” of camera.
Selecting a set of reviews that can preserve the proportion of positive and negative
opinions has been proposed by Lappas et al. (2012). However, reviews covering many
features are not always preferred by readers. Customers have their own different needs
for different features of the product. While many users would like to know about all
features of a product, some people may be interested in a few features, or only the one
single feature that is necessary for them. For example, a person, named Tom, needs a
laptop for his new job. Travelling is one part of his work and he does not use his laptop
for designing or playing games. The portability of the laptop is much more important
than how efficient the graphics card of the laptop is, in his situation. In this case, Tom
really wanted to read reviews of the laptop that mainly discussed portability and did
not bother much about the graphics card. This indicates the necessity of research about
online reviews for single features, however there are not many works that have been
undertaken for this need. Hu and Liu (2004) were pioneers in proposal research on
features of product. Their work investigated a method to summarise the semantic
orientation of features in the review collection. Figure 3 shows the output of the feature
summary proposed by their method. According to this output, customers can gain an
overall opinion about each feature of a product without reading the whole review
collection.
0 Chapter 1: Introduction 5
Figure 3. Features summary
However, their methods only focused on sentiment analysis but not the selection task
of helpful reviews. Although customers can click on link <individual review
sentences> in order to view reviews having an associated positive or negative opinion,
those reviews may not provide the detailed information discussed regarding the
features of a particular product. In fact, customers still want to read original reviews
to gain a deeper understanding about the feature and obtain as much information as
possible by themselves. Long et al. (2014) proposed a method to find reviews based
on the amount of information about the feature. The amount of information can be
calculated by using a set of words that are relevant to the target feature. In that way,
reviews having a high amount of information about the feature are considered
specialised reviews because those reviews intensively discuss the feature. Both (Liu)
and (Long, et al.) studies deal with the task of textual analysis of natural language.
However the ambiguity of natural language makes it a very difficult task for analysis
of online review content. The reason is that the written languages used in online
reviews are very complex and do not always deliver an explicit meaning. In fact, these
two methods do not always provide good results for different kinds of dataset because
of the ambiguity issue.
In general, to the best of our knowledge, current research on review selection suffers
from the ambiguity of the polysemy and synonym issues in online review. This thesis
explores how to apply data mining, as well as a probabilistic method to analyse and
represent the content of online review more effectively. Secondly, most of the previous
studies on review selection proposed to select reviews where the whole collection of
6 0 Chapter 1: Introduction
features is taken in consideration. As analysed above, the review selection for a single
feature of a product is also necessary. The second part of this study proposes a new
method to select helpful reviews discussing a single feature or a small group of features
of a product.
1.2 RESEARCH PROBLEM AND OBJECTIVE
In the previous section, the research background and motivation in the online review
field were introduced. Current issues of research areas of online reviews are briefly
discussed and the goal of this research study is stated. In this section, research
problems will be described in detail, and then research objectives to accomplish this
study are identified.
1.2.1 Research Problem
Two primary problems for the current research into online reviews are identified as
following.
Problem 1: The ambiguity of textual content in online reviews.
The analyses of textual content on online reviews are always a difficulty for the
research areas of online reviews. In contrast with other structured data, online reviews
are unstructured and complex. Reviewers no doubt have freedom to write whatever
they want without following grammar, syntax or vocabulary rules. In fact, reviewers
usually use local language, specific phrases, abbreviations and figurative style to
express their opinion, in their comments. In addition, there are severe polysemy and
synonym issues, i.e., the same word may be used by different users to mean different
concepts, or users may use different words when referring to the same concept. As a
consequence, the task of automatically analysing text meaning to understand the
information buried in the online reviews have faced many challenges. Based upon the
discussion, the research question to address this problem is as below.
How to analyse and represent content of online reviews in a semantic way?
Problem 2: Helpful review selection according to single feature.
In the past few years, existing research has focused on selecting, classifying and
summarising online reviews by using natural language processing and data mining
techniques where all of the features of the products are taken into consideration (Dave
et al. (2003), Pang et al. (2002), Pang and Lee (2008)). Those studies considered that
0 Chapter 1: Introduction 7
features of a product are equally important and attempted to select reviews having as
many features as possible. As discussed in Section 1.1, each product feature plays a
different role in consumer consideration, depending on their needs. As a consequence,
there are certain product features that are less interesting to users than others.
Therefore, new methods of selecting helpful reviews according to a single feature
should be developed. Based upon the discussion, two questions to be addressed in this
problem as following.
Which factors should be considered for deciding the helpfulness of reviews
according to a single feature?
How do we utilise textual information in online reviews to select helpful
reviews for a single feature?
1.2.2 Research Objectives
According to the research problem and research questions, three primary research
objectives that need to be achieved for this research are listed below.
Objective 1: To propose methods to identify related similar words of the
features based on external knowledge base (WordNet).
One of the focuses in this study is to alleviate the problem of ambiguity in review
content. In online reviews, features and information about features are main topics of
discussion, thus represent the content of online reviews. Therefore, effective
identifying features and related words of the features are the primary objectives in this
study. As WordNet is a popular external electronic lexical network resource which can
be considered as an ontology for natural language terms, WordNet can help to find
related concepts to the target concept. Therefore, this thesis proposes to use knowledge
base from WordNet to identify the related similar words of the features of product.
Objective 2: To propose methods to identify related words based on
probabilistic topic models.
While WordNet is external knowledge, which can be used to find similar and synonym
words, Topic Modelling has been a state-of-the-art statistical model used in analysing
hidden themes and discovering relationships among words inside each theme (Blei et
al., 2003a). Topic modelling is therefore expected to find related words of the target
feature. Additionally, a pattern based topic model proposed by Gao et al. (2013) is also
promising method of finding related words in the text corpus. Topics in traditional
topic modelling (LDA) has limitations in semantic representation because each
8 0 Chapter 1: Introduction
generated topic from LDA only contains a list of single words. While the relatedness
of the words in the topic has been confirmed, the semantic meaning of the topic is still
a current research issue. As patterns in pattern based topic model better represent the
semantic meaning of the topic, pattern based topic model is expected to find related
words more effective. Therefore, this thesis aims to apply modern probabilistic topic
modelling, such as LDA topic model and pattern based topic model, to accurately
identify related words of the features.
Objective 3: To propose an approach to select helpful reviews for a single
feature based on the related words of the feature.
Review selection for a single feature is the second goal of this study. In order to
accomplish this goal, a new method based on information distance theory is proposed
to generate a set of helpful reviews for a single feature. Related words discovered by
the method proposed in Objective 1 and Objective 2 are utilised as the amount of
information for the selecting approach. Detail of the review selection method is
introduced in Chapter 4 of this thesis.
1.3 RESEARCH SIGNIFICANCE AND CONTRIBUTION
This research makes a number of important contributions to the research area of online
review, including semantic online review analysis and helpful review selection.
This thesis proposes a new approach to solve ambiguity problem of textual
content in online reviews.
The ambiguity problem caused by synonym and polysemy issues in the online review
is a key obstacle for the task of automatically analysing the review content. By
applying a pattern mining technique, natural language processing and probabilistic
model, the polysemy and synonym problem can be effectively alleviated. In addition,
the structural representation of the review content by features and related words can
make the review content easier to understand. This repression can be useful in a wide
variety of other studies of online review content.
The thesis proposed new approaches for effectively selecting helpful reviews
for a single feature of product.
The proliferation of online reviews makes the task of finding useful reviews for
customers more important. Most research carries out studies about selecting reviews
where all of the features of a product are taken into consideration because they believe
0 Chapter 1: Introduction 9
that customers are interested in all features of a product. However, this is not always
true when some customers might only be interested in a single feature or a small
number of features that are important for them. This thesis work proposes a new
approach of selecting helpful reviews for a single feature of product; thus contributing
to the review selection area.
1.4 PUBLICATION
A review selection method based on topic models has been published in the 2016
Pacific Rim Knowledge Acquisition Workshop (PKAW2016,
http://pkaw.org/pkaw2016/).
Nguyen, A. D., Tian, N., Xu, Y., & Li, Y. (2016, August). Specialized
Review Selection Using Topic Models. In Pacific Rim Knowledge
Acquisition Workshop (pp. 102-114): Springer International Publishing.
1.5 THESIS OUTLINE
The thesis is organized in 6 chapters. The overview of each chapter is outlined
as below:
Chapter 2: presents a detailed and critical literature review of existing related
research studies necessary to address the research problems defined in section
1.2. The literature reviews cover two related major areas: Review Selection and
Topic Modelling. The research gap and drawbacks of current review selection
method are identified and justified according to my research question.
Chapter 3: introduces our proposed related word selection method to address
the problem of ambiguity of online review content. As review content can be
represented by features and related words, correctly identifying features and
related words is crucial for understanding the review content. In this chapter,
we first discuss feature extraction by using pattern mining and then present our
method of identifying related word using WordNet and topic modelling.
Chapter 4: introduced our proposed review selection methods for a single
feature of product. The current problems of review selection method and
criteria of a helpful review for a single feature are presented. Our method of
10 0 Chapter 1: Introduction
reviews selection using direct relevant and information distance of related
words are discussed in this chapter.
Chapter 5: discusses the experiment and evaluation of our models proposed in
Chapter 3 and Chapter 4. The proposed review selection and related word
selection models are evaluated by comparing their abilities to select helpful
reviews with the baseline models.
Chapter 6: summaries key finding, achievements and limitations. The potential
future works are also pointed out for further enhancement of the proposed
models.
Chapter 2: Literature Review 11
Chapter 2: Literature Review
This work is closely related to field of review selection and topic modelling. This
chapter presents a critical review of those areas that are essential in addressing the
research gaps mentioned in the Introduction of this thesis.
2.1 REVIEW SELECTION
Nowadays, e-commerce retail and online shopping are growing strongly, and
increasingly, a number of consumers tend to read product reviews before making
buying decisions. Such reviews on online platforms have become an information
resource that helps buyers in their purchases ( Hu and Liu (2004); Hung et al. (2004);
Ye et al. (2009); Liu et al. (2008)). Meanwhile, there are thousands of reviews written
each day, for many different products, on online merchants like amazon.com, or on
user reviews and recommendation websites such as yelp.com, etc. For instance, a
simple Canon 60D digital camera body has already accumulated 975 reviews on
amazon.com. The hundreds, or even thousands, of reviews make it impossible for users
to read all of them and choose reliable reviews for collecting information. Early works
like those of (Hu & Liu, 2004), provided a method of mining and summarising
customer reviews into useful information (features and corresponding subjective
opinion). However, such works mainly focused on classifying the semantic opinion
for each feature, but ignored the summarisation of the whole reviews. In fact,
customers may still prefer to read the whole content of reviews to have a vivid picture
of the product in which they are interested.
In addition, many of those reviews are not always satisfactory in terms of providing
useful information. The well-known problem of online reviews is that they are varied
in quality, from very useful to useless or even spam (e.g. fake reviews) (Zhang &
Zhang, 2014). Hence, it is extremely difficult for online users to weed out the helpful
reviews worth their attention. On demand, researchers on review selection fields have
been working out effective ways to extract and recommend useful reviews to users.
This section serves as an overview of existing research on selecting reviews: review
quality assessment and review selection based on product features.
12 Chapter 2: Literature Review
2.1.1 Review Quality Assessment
Some reviews are better in terms of quality than other ones. Hence, there are ideas for
sorting out all reviews in a way so that higher quality reviews are always shown first,
as discussed in the following section.
2.1.1.1 Crowd vote based metric
Online merchants such as amazon.com have built their own human feedback tool in
their websites that allow their users to vote each review as helpful or unhelpful by
clicking a thumb up or down icon after they read the reviews. The total votes a review
receives are then updated in a form of “80 out of 100 people found the following review
helpful” and displayed at the top of the review content as an indicator of
helpfulness. In these cases, the quality of an online review is determined by a
helpfulness voting ratio. However, there are some issues arising when using this
helpfulness tool, such as that newly posted reviews will only have a few votes, or more
likely, no vote, making it very difficult to identify their helpfulness (Liu, et al., 2008),
and perhaps not reflecting the real quality of reviews.
Pang and Lee (2008) pointed out two shortcomings of the helpfulness voting tool.
Firstly, many users have not answered the helpfulness question after reading the
reviews. In addition, not all of the most helpful reviews are the best reviews. There is
a tendency that the earlier a review is posted, the more votes it will get (Zhang &
Zhang, 2014). Ghose and Ipeirotis (2011) mention another point of view, that recently
posted reviews need extra time to accumulate helpfulness votes. Therefore, the helpful
votes collected by these websites may not accurately represent true helpfulness for
those reviews posted in recent times.
Based upon the disadvantages of the crowd-based voting method mentioned above,
researchers have attempted automatic classification of helpful reviews as soon as they
are posted with respect to assessing the quality of reviews. Research such as that of
(Kim, et al., 2006) investigated automatic predictions of helpfulness of reviews that
considered users’ votes as ground-truth evaluation. The authors trained an SVM
regression model to learn the helpfulness function and ranked reviews according to
their output scores. However, assessing reviews’ helpfulness based on users’ rating of
ground-truth is also quite limited due to several voting biases.
Chapter 2: Literature Review 13
2.1.1.2 Review content and style
There are a number of studies that have attempted to investigate in textual aspects of
reviews, such as reviews’ content and writing style, to assess quality of reviews. The
content of a review is the information it provides to readers, while writing style is more
related to word choice and language, as well as number of words or sentences and the
average length of sentences in the target texts. The argument for this motivation is that
due to writers’ knowledge and language skills, their reviews are different in quality.
Previous studies such as those of O'Mahony and Smyth (2009) and Liu, et al. (2008)
have shown that the linguistic style is a very good indicator of quality of the review.
O'Mahony and Smyth (2009) investigated a number of structural features that might
affect the writing quality of a text document, such as the ratio of uppercase and
lowercase, the number of complex words, and the number of sentences, etc. They
summarised several readability aspects based upon derived structural aspects for
assisting review quality modelling. Their proposed model made use of a set of reviews
that have been labelled as helpfulness to train the classifier in terms of identifying text
features. After that, the classifier was employed to detect those reviews having good
writing quality.
Hoang et al. (2008) used a supervised classification model in which output scores are
considered as the quality of a given text document. The authors first manually labeled
experimental documents into three levels of document quality, defined as good, fair
and bad. They then trained the model learning on these aspects. The classifier as
trained from an annotated corpus, then ranked documents according to their prediction
scores. Their study found that formality – the writing style of the target document - is
the most effective aspect for assessing the quality of the target document.
In another attempt to find out well-written reviews in online movie reviews, Liu, et al.
(2007) used a fixed set of tags to label different Parts-of-Speech (POS) words
contained in the reviews in order to determine the writing style and length. Liu, et al.
(2008) enhanced the existing work of the product review helpfulness problem by a
binary classification approach. The scholars explored review aspects such as
readability, informativeness and subjectiveness. The model learns aspects on the
informativeness of reviews, such as the following:
the number of words in the review
the average length of sentences and number of sentences in the review
14 Chapter 2: Literature Review
the number of sentences to product features
Zhang and Zhang (2014) assumed that reviews that are highly readable tend to be more
helpful. On the contrary, reviews with multiple grammatical errors and misspelled
words are less helpful to users. They use LanguageTool Java API to implement the
language and grammar check for this task. They also employ the fraction of the number
of errors in the review text and the number of sentences as the value of feature error
per sentence.
Mudambi and Schuff (2010) utilised a review database collected from Amazon.com to
analyse and predict review quality. In their hypothesis, the authors found that the
logarithm of word count and review helpfulness is positively related, and the review
length has an impact on helpfulness voting. Their research indicated that a review
provides helpful information to aid in the decision-making process of a consumer and
that the helpfulness of a review increases as the word count increases.
Recently, Ghose and Ipeirotis (2011) indicated that reviews with more subjective
words are recognised as more helpful reviews through opinion mining. Li and Vitányi
(2013) analysed content-based aspects that directly influent product reviews’
helpfulness. It was also found that written reviews that were less abstract in content
and highly comprehensible, result in higher helpfulness. Wang et al. (2013) proposed
a technique called “SumView”, a web-based review summarisation system that
automatically extracts the most representative expression of customer opinion in the
reviews on various product features. Siering and Muntermann (2013) agreed with other
authors in revealing that a unique property of reviews is that reviews with information
related to the quality of the product, received more helpfulness votes. Korfiatis et al.
(2012) investigated the directional relationship between the qualitative characteristics
of the review text, review helpfulness, and the impact of review helpfulness on the
review score. However, they found that review length has less effect on the helpfulness
ratio than review readability.
2.1.1.3 Reviewer reputation and social context
While shallow syntactic aspects from the text of reviews are mostly useful, the review
length is somehow considered weakly correlated with quality of review. Therefore,
some researchers were looking at social context aspects of reviewers to assess quality
of reviews. Social context aspects are information extracted from the reviewer's social
context, for example, the number of the reviews posted by this reviewer, the average
Chapter 2: Literature Review 15
rating for this author, etc. (Zhang & Zhang, 2014). Results by (Liu, et al., 2008) and
(Chen & Tseng, 2011) indicate that reputation (in combination with other features) is
a very good classifier. (Liu, 2010, 2012) proposed to consider reviewer expertise as it
was observed that reviewers who were familiar with particular movie genres were
likely to produce good reviews for movies in the same or similar genres. In terms of
this fact, the proposed approach measured the similarity of a given movie (which can
be represented by a set of genres) to all movies that have been reviewed by the same
user in this regard.
O'Mahony and Smyth (2009) designed a system to learn helpfulness of hotel reviews
based on the reputation aspect of reviewers. Specifically, the system captured three
aspects of a user’s reputation, including the mean of total number of reviews written
by the user; the standard deviation of review helpfulness over all reviews written by
the user; and the ratio between those reviews that have accumulated at least five
opinions (Missen et al., 2009) and the total reviews written by the user.
Furthermore, according to Lu et al. (2009), the quality of a reviewer plays an important
part in deciding the quality of the review that he or she writes. Therefore, from the
quality of reviewers, researchers can estimate the quality of their reviews easily. The
authors observed the difference of standard deviation of the quality between two
reviews from the same author, and two reviews from two different writers. With two
reviews of the same writer, they have much lower standard deviation scores than that
of two reviews from different writers. The authors believed that the quality of reviews
would be consistent when they were written by the same writer. In addition, another
social context aspect that could help to decide the quality of a reviewer – hence quality
of review - is the quality of their peers in the social network. Although this approach
of using social context is simple and applicable, it still has some drawbacks. In fact,
all social context information is not always available for all reviews. For instance,
when a review is written by a new user, there will be no information about their history
or social network, hence predicting the social context aspect task is no longer
applicable.
Zhang and Zhang (2014) examined reviewers’ history in order to predict the
helpfulness of reviews written by them. The fact is that people with better reputations
in the online review community tend to provide more influential discussions and their
reviews tend to be more helpful. The authors collected the information from reviewers'
16 Chapter 2: Literature Review
profile pages, such as reviewer’s ranking, the amount of reviews written by reviewers,
and percentage of helpful votes reviewers received on previous reviews. The results
showed that the percentage of helpful votes and quantity of previous reviews by
reviewers contributed to higher performance in their model.
2.1.1.4 Review meta data
In addition to author-oriented aspects, researchers also investigate to develop an
effective system based on review meta data aspects, to determine review helpfulness.
In (Chen & Tseng, 2011), the timeliness is the extent to which the information in a
review is timely and up-to-date. Old or duplicate reviews cannot reflect the value of a
product over time; thus, the quality of information is low, hence so is the quality of the
review. Believability extent is also explored by the authors. Believability is the extent
to which the review information is credible, or regarded as true. They therefore
measure the deviation of a review's product rating from the average to assess its
believability. In (Liu, 2010, 2012), the author noticed that there is a relationship
between review timeliness and review helpfulness, for which review helpfulness was
seen to decline for older reviews. Therefore, they compared the time of a review that
had been posted with the movie release time, in order to measure the impact of
timeliness upon the review helpfulness. Timeliness was shown to be a good predictor
of the helpfulness of movie reviews, where review helpfulness was seen to decline for
older reviews. Chen and Tseng (2011)’s work above also assessed reviewers based on
their review histories. They considered that if a review has high evaluation in a
category, then the author of this review is credible in the category.
2.1.2 Review selection based on product features
As discussed above, most existing approaches to assess review helpfulness are
automated prediction mechanisms, which typically rank reviews based on their overall
score. However, these methods of selecting reviews have drawbacks. First, these
works require more time and human resources in order to label the data and train the
classification system. Furthermore, the top ranking reviews selected may contain
information not useful for users, i.e. a selected review can have redundant information,
but may have a low coverage of product features. For instance, users may find that all
the top selected reviews only talk about the “food” feature of restaurant, but nothing
about other features such as “atmosphere”, “drink”, and “service”, etc. Therefore, the
top results may cover a single viewpoint only, and make that difficult for users to have
Chapter 2: Literature Review 17
a diverse set of opinions about the product. At the same time, the ranked list of reviews
may not represent different points of view (e.g. positive, negative, and neutral) of the
reviewed product.
A product has more than one feature, and some special features tend to be more
important than others; that may affect the consumer-making-purchase process. A
product feature is defined as an attribute describing characteristics of a product that is
interesting to customers. An overall ranking of a review is an important measure;
however different product features are important to various customers based on their
needs. For instance, although a digital camera is ranked highly overall, it may have the
feature of “battery life” that is of concern to a customer. There has been substantial
amount of research focused to maximise the helpfulness of the selected reviews in
order to overcome the drawbacks of those review selection methods that are only based
upon derived review quality. In the following part of this section, we introduce two
directions in the research of useful review selection.
2.1.2.1 Select review based on product features
Early works stream focus on features of product is an approach investigated by
Popescu and Etzioni (2007). The scholars introduced an unsupervised system to extract
features and associate opinion, which are then ranked by their strength, and used to
build a model of important product features.
Zhang and Zhang (2014) proposed a feature-based, product ranking approach. They
mined data of Digital Camera and Television reviews on amazon.com to identify
features in the product category, and their associate subjective and comparative
sentences in reviews. The authors then built a product graph for each feature, and
mined the graph to determine the relative quality of products. Long, et al. (2014)
pointed out that most review selection approaches do not consider personal centric. In
other words, some users may be interested in only certain features, so they only look
for reviews that have intensive discussion about these features. Under these
circumstances, their works focused on extracting reviews in which a single feature is
intensively discussed. In detail, given a specific feature from the review collection,
their model extracted a set of similar words related to that feature. Then they used
Kolmogorov complexity and information distance to calculate the amount of
information from these related word sets. The most specialised review on a feature was
the one with minimal information distance. However, one significant drawback of this
18 Chapter 2: Literature Review
method is that similar words of those core feature words are found based on the Google
code of length. Google finds all similar words, which are most likely, and synonyms
to the core feature words (Cilibrasi & Vitanyi, 2007). In some circumstances, the
similar words found by this tool are not related to certain contexts that the core features
have discussed. Take the feature “star” for example, when using Google distance to
find similar words to the feature “star”, words like “genius”, “lead”, “stellar” were
returned as similar words of feature “star”. The word “star”, in the context of
restaurant, indicates the ranking of restaurant and clearly does not have a similar
meaning to word “genius”, “lead” or “stellar’. Another shortcoming of this method is
that selected specialised reviews may or may be not helpful to users. As discussed
above, reviews that cover more features tend to be more helpful reviews (Tsaparas, et
al., 2011). If we only focus on finding reviews that discuss one special feature, other
high-quality reviews covering more than one feature can be missed. It is also the fact
that professional users tend to write down their opinion on a group of related features
of the product.
2.1.2.2 Review corpus representation
Rather than scoring each individual review and selecting the top-k best reviews,
researchers in this field tried to select a set of reviews that collectively perform well
and represent the whole corpus. Tsaparas, et al. (2011) pointed out that coverage and
diversity of viewpoints are very important to users, together with review quality. They
formulate it as a maximum coverage problem. Their works mainly focus on how to
select a small, but comprehensive set of reviews that best capture many different
features of a product, and also are discussed in many different viewpoints. They
proposed Greedy algorithms to extract highly rated reviews that satisfy these
requirements. The outcomes are reviews having a maximum information gain in terms
of feature coverage and opinion coverage, which enable users to better evaluate the
product in review. While their works do diversify the set of collective reviews, they
however failed to reflect accurately sentiment polarity in the reviews of the collection.
Most related to them, however, in Lappas, et al. (2012), the authors argued that
although including at least one positive and one negative opinion on each feature, the
collective set did still not reflect the proportion of positive and negative opinion on a
feature in the original review. Hence, they presented a novel approach to select a small
set of reviews that could cover all product features, at the same time preserving the
Chapter 2: Literature Review 19
opinion distribution of the whole corpus. The outcome set of reviews is an accurate
statistical summary of the entire review collection, with respect to preserving the
proportion of opinion expressed for different features; hence it is easy for users to have
a vivid picture of the product without reading the whole corpus. However, one
drawback of their work is that quality of reviews is not considered due to them ignoring
the quality of each individual review in the collective subset.
Another novel approach, with a different angle, attempts to improve the review
selection task, based on identification of the product features and the relationship
between features. Making use of product ontology, called product feature taxonomy,
and the hierarchical relationship between features, the work of Tian et al. (2014)
proposes a review model generation in order to select reviews. Given a collection of
reviews, the model estimates the quality (of diversity and comprehensiveness of a
certain feature) and then ranks the reviews based on user-concerned criteria. Their
experiment promises further works in review selection.
2.2 TOPIC MODELLING
The study of topic modelling begins from the need of analysing, representing and
summarizing the contents of large, unstructured text collections with an expectation to
capture the latent semantics of the text collections. Latent semantic analysis (LSA) is
a first attempt used to transform a high dimensional vector space representing the text
document to a linear subspace, by applying Singular Value Decomposition (SVD)
(Deerwester et al., 1990). This subspace can be called latent semantic space because it
presents sophisticated features that help to capture the latent semantics of documents,
such as synonym and polysemy. However, LSA has some shortcomings because of its
unsatisfactory statistics theory and the complexity of computation. Hofmann (1999)
overcomes some deficiencies of Latent Semantic Analysis (LSA) by introducing the
probabilistic Latent Semantic Analysis (pLSA) - generative model. Multinominal
random variables, which represent topics, are mixture components in the pLSA models
and each independent document therefore is a mixture of latent topics.
20 Chapter 2: Literature Review
Figure 4. Probabilistic Latent Semantic Model.
The joint probability of the observed word-document (d,w) is defined by a mixture
process:
𝑃(𝑑, 𝑤) = 𝑃(𝑑)𝑃(𝑤|𝑑), 𝑃(𝑤|𝑑) = ∑ 𝑃(𝑤|𝑧)𝑃(𝑧|𝑑)𝑧∈𝑍 (Hofmann 1990)
Hofmann‘s work shows that pLSA outperformed LSA and was viewed as a significant
step toward probabilistic modelling of text. However, there are still some issues related
to its ability to provide a generative probabilistic model for the mixing proportion for
these topics.
Recently, an improved probabilistic model for analysing large electronic archives of
documents has been employed, based on “topic”, called “topic modelling”. The aim of
topic modelling is to analyse and discover the topical patterns that run through a given
corpus of documents and record their evolution over time. Blei and McAuliffe (2008)
define topic and topic modelling as follows:
“A topic is a probability distribution over terms in a vocabulary. Informally, a
topic represents an underlying semantic theme; a document consisting of a
large number of words might be concisely modelled as deriving from a smaller
number of topics. Such topic models provide useful descriptive statistics for a
collection, which facilitates tasks like browsing, searching, and assessing
document similarity.”
Literally, topic modelling analyses the words in the original documents to find out
hidden themes running through them, as well as how these themes are related to each
other. In recent days, topic modelling is emerging as a principal unsupervised learning
method, which merely replies to analysis of original texts without the requirement of
document labels or human annotation.
Chapter 2: Literature Review 21
2.2.1 Latent Dirichlet Allocation
The simplest, yet most well-known probabilistic topic modelling that has emerged
recently is Latent Drichlet Allocation (LDA), proposed by Blei et al. (2003b). The idea
of this LDA probabilistic topic model is based on the assumption that every document
is generated by a mixture of topics, and each topic is defined as a multinomial
distribution over fixed vocabulary of terms. Outputs of the model are the assignment
of words in documents to topics (clusters) and the distribution of topics to documents
(document proportion).
Let 𝐷 = {𝑑1, 𝑑2, . . . , 𝑑𝑀} be a collection of M documents. Having V topics, the
probability of a word in a given document is defined as:
𝑃(𝑤𝑑,𝑛 ) = ∑ 𝑃(𝑤𝑑,𝑛|𝑧𝑑,𝑛 = 𝑍𝑗) × P(zd,n = Zj)𝑉
𝑗=1, where 𝑤𝑑,𝑛 denotes the nth
word in document d, 𝑧𝑑,𝑛denotes the topic assignment for word 𝑤𝑑,𝑛 , Zj is the topic j
∈ 𝑉, zd,n = Zj means that the word 𝑤𝑑,𝑛 is assigned to topic 𝑍𝑗. The topic model
generated by using LDA consists of topic representations at collection level and topic
distributions at document level. At collection level, let 𝜙𝑗 be a multinomial distribution
over word for topic 𝑍𝑗. 𝜙𝑗 is defined as:
𝜙𝑗 = (𝜑𝑗,1, 𝜑𝑗,2, ⋯ , 𝜑𝑗,𝑛), ∑ 𝜑𝑗,𝑘 𝑛𝑘=1 = 1, where 𝜑𝑗,𝑘 is the probability of the kth
word for topic Zj.
At document level, let 𝜃𝑑 be probability distribution over topics. 𝜃𝑑 is defined as:
𝜃𝑑 = (𝜗d,1, 𝜗d,2, … , 𝜗d,𝑉) ∑ 𝜗d,j 𝑉𝑗=1 = 1, where 𝜗d,j is the probability of topic j for
document d.
As presented in the graphic model of LDA (Figure 5), there are three latent or hidden
topic structures including the topics (𝜙𝑗), per document topic distribution (𝜃𝑑) and per
document per word topic assignment (𝑧𝑑,𝑛), which need to be computed based on the
observed words 𝑤𝑑,𝑛. In other terms, the purpose is trying to answer the question,
“what is the latent structure that is likely generate the observed documents”. LDA,
therefore, can be considered as “reversing” the generative process which tries to
optimise the posterior distribution of the latent variable-given document collection.
LDA can overcome the limitation of pLSA since per-document-topic proportion is
22 Chapter 2: Literature Review
computed based on a latent random variable called the Dirichlet parameter which is
randomly drawn from Dirichlet distribution.
Figure 5. A graphical model representation of the latent Dirichlet allocation (LDA).
Nodes denote random variables; edges denote dependence between random variables.
Shaded nodes denote observed random variables; unshaded nodes denote hidden
random variables. The rectangular boxes are “plate notation”, which denote replication
(Blei, et al., 2003a)
For approximating the posterior distributions of the latent variables in LDA, several
statistical inference techniques have been developed to infer these distributions from
large text corpora, such as expectation propagation (Minka & Lafferty, 2002), mean
field variational inference (Blei, et al., 2003a), collapsed variational inference (Teh et
al., 2006), and Gibbs Sampling (Steyvers & Griffiths, 2007). Among those techniques,
Gibbs Sampling based on Makov chain Monte Carlo is a well-known technique using
parameter estimation in LDA in recent years.
The important contribution of LDA is its ability to represent and summarise the large
text collection to shorten meaningful forms including topics and document
representation. Topics are represented by a multinomial distribution over words, where
each word assigned to those topics has a different weight value, indicating which
words are important to which topics. A document is represented by a probability
distribution over topics where each topic assigned to the document has different weight
value, indicating which topics are important to the document. Moreover, LDA can
Chapter 2: Literature Review 23
adapt and be extended as a module in a more complex model for other more
complicated goals. Therefore, LDA has quickly become one of the most popular
probabilistic techniques for topic modelling.
2.2.2 Pattern based topic modelling
One of the common problems of LDA is that the word-based or term-based topic
representations may not be able to semantically represent documents, and make the
topic hard to understand. Gao, et al. (2013) has proposed a pattern based topic model
by applying a pattern mining technique on traditional LDA. Therefore, a pattern based
topic can be represented by a list of patterns instead of single words. As a pattern gives
more specific meaning than a single word, a pattern based topic can provide better
semantic meaning than an LDA topic.
Gao et al. (2013) proposed a two-stage approach that combined statistical topic
modelling and classical data mining techniques to represent the sematic content of
documents and improve the accuracy of topic modelling output in large document
collections. In the first stage, their works applied traditional LDA to generate topic
representation at collection level and document level. These two representations are
used to build a word-topic assignment, which is also a transactional dataset used for
pattern mining in the next stage. In the second stage, the pattern based topic can be
generated from a transactional dataset by applying a pattern mining technique.
2.2.3 Applications of topic modelling
Topic modelling is considered as a state-of-the-art technique, which has been used in
diverse fields, mainly in sentiment analysis, and information retrieval.
Application of Topic modelling on Sentiment Analysis
Mei et al. (2007) proposed a probabilistic model for the analysis of the mixture of topic
and sentiment on Weblogs. Their model-named Topic-Sentiment Mixture (TSM) is a
combination of a pLSA model and sentiment model in order to capture the latent
topical features and their sentiments simultaneously in different Weblog collections.
Titov and McDonald (2008) argued that standard models such as LDA are only
suitable for discovering topics associated with global properties of documents (e.g.,
the brands, product type or product name) rather than rateable features of documents.
The authors then extended both pLSA and LDA by building a Multi-grain LDA (MG-
LDA) model which includes both global model and local model. While the global
24 Chapter 2: Literature Review
model identifies global terms at the document level context, the local model discovers
rated features by using a sliding text window over text. Although the model is
successful in extracting rateable features, it cannot separate feature and sentiment
words. Lin and He (2009) extended LDA by building a probabilistic modelling
approach – named joint sentiment - topic model (JST) to classify sentiment at
document level based on a topic modelling approach, but the distinction of feature
words and sentiment terms cannot be achieved.
Brody and Elhadad (2010) introduced an unsupervised method using topic modelling
to extract features and then analyse feature-specific opinion words. Although the
opinion words and extracted features are separated, their sentiment words are
discovered outside of topic modelling by analysing only adjective words. To combat
this shortcoming, Zhao, Jiang, et al. (2010) proposed the Maximum Entropy and LDA
hybrid approach (Zhao, Jing, et al.), which can automatically separate aspects and
opinion words. The method is an extension of LDA, which uses two indicator variables
to distinguish between opinion and features words. The MaxEnt component uses a
small number of training sentences to learn POS tags, assisting to separate opinion
words and sentiment words. Mukherjee and Bing (2012) proposed a joint model (SAS
and ME-SAS) to extract and categorise feature terms automatically. Similar to Zhao,
et al. (2010), they also used Maximum Entropy to separate features and sentiment
words but they also used seeds from users as guidance for the inference process. Their
models are thus semi-supervised.
Application of Topic Modelling on Information Retrieval
Gao, et al. (2013) proposed a pattern based topic model to represent text documents.
The idea is that pattern provides better semantic meaning than a single word.
Therefore, they combined pattern mining with traditional topic modelling to represent
topics by pattern, instead of single words. The results show that their proposed model
generates discriminative and semantic representation for modelling topics and
documents.
Application of Topic Modelling on Review Recommendation
Another line of work using topic modelling aimed at predicting the top-k best reviews
based on rated reviews by user. Krestel and Dokoohaki (2011) proposed a method to
model reviews based on LDA and generate an adequate ranking based on Kullback-
Leibler divergence. By using LDA, each review will be modelled as a mixture of
Chapter 2: Literature Review 25
topics, and those latent topics are represented as a “list of multigrams with a probability
for each multigram indicating the membership degree within the topic”. After
discovering latent topics, star ratings are combined in order to transform the topic
model into a topic ranking model. Despite that their studies can be useful for providing
personalised review recommendations to users, it is only being carried out at the small
scale, thus the accuracy of the model is limited. Lakkaraju et al. (2011) proposed a
series of probabilistic models (FACTS model, CFACTS model, CFACTS-R) for
features-based sentiment analysis, which also helped to predict the ratings of review.
These joint models are based on the principle of the dependencies of semantic topics
and syntactic classes, similar to HDM-LDA proposed in Griffiths et al. (2004). Their
methods can effectively identify latent features, sentiment topics and associate ratings.
Moghaddam and Ester (2011) extracted features and feature-based ratings by
proposing three probabilistic graphical models. Firstly, the rated feature summary is
generated by the first two models by extending pLSA and standard LDA. The authors
assume that features and ratings are interdependent and introduce an Interdependent
LDA (Moghaddam & Ester, 2011) model to extract product features and predict their
ratings at the same time.
In summary, topic modelling is a powerful and flexible modelling tool, because of its
strength on modularity and extensibility. However, it also has some drawbacks. One
issue is the difficulty to detect locally frequent features because topic modelling put a
high weight on popular or common words across a large collection of documents.
Therefore, topic modelling presents the same issue with other previous approaches:
success in finding global features but failure in finding local features. Secondly, topic
modelling needs a large scale of data and a lot of tuning of parameters. Topic
modelling is thus only suitable for large-scale projects. However, its advantage cannot
be denied and research of topic modelling has kept increasing in recent times.
2.3 SUMMARY
This chapter presents an in-depth review of a number of research works related to this
study. In the field of review selection, most current studies have been switching from
review selection based on structural textual characteristics, such as writing style,
grammar, author reputation, etc. to review selection based on features of products.
Feature extraction and relationships among features are new potential criteria for
26 Chapter 2: Literature Review
understanding the online review content, which assists to effectively select helpful
reviews. As always, the ambiguity problem is a big challenge in online reviews
because of polysemy and synonym issues in natural language. Topic model, a
probabilistic approach, has been popularly used for discovering latent semantic themes
of document collection. Because of its wide application in language semantic analysis,
the topic model is expected to enhance the task of automatic textual analysis of online
reviews. In the next chapter, we will discuss how to make use of a topic modelling to
represent the online review more effectively and alleviate the ambiguity problem of
online review.
Chapter 3: Main Feature Selection and Related Feature Selection 27
Chapter 3: Main Feature Selection and
Related Feature Selection
As discussed in the Research Problem of Chapter 1, understanding and analysing
review content has faced a lot of difficulties in the research areas of online reviews.
The first reason is that reviews are unstructured, therefore it is hard for them to be
automatically analysed and comprehended. Secondly, polysemy and synonym issues
cannot be avoided in online reviews, since writers have the freedom to write whatever
they want. As features are the main topics of online reviews, the content of online
reviews is the information about those features of the product. Therefore, dealing with
the textual content is in fact dealing with the features of the product. However, features
themselves are unable to deliver any meaning. The meaning of each feature can only
be understood by the related words around it. For example,
“This model has an excellent display. Resolution is better than even other more
expensive models. The contrast and brightness of the screen are also great since this
makes our eyes feel comfortable for a long time looking at the display.”
Words like “excellent”, “great”, “resolution”, “contrast”, “brightness”, “screen” are
related words of the target feature “display”. Those related features contribute to the
detailed discussion of the feature “display” and help to build up detailed information
about the feature of the product. Without those related words, readers cannot
understand what is currently discussed about “display”. Therefore, successful
extraction of those related words plays a crucial role in understanding review content.
To our best knowledge, no effective work has been done to find those related words.
In this chapter, we propose a novel method to identify related words to a feature. We
will first discuss methods of identifying the main features of products in Section 3.1.
We then introduce new methods to extract related words to the main features in Section
3.2.
28 Chapter 3: Main Feature Selection and Related Feature Selection
3.1 MAIN FEATURE SELECTION
In this section a pattern mining technique is used to extract the main features of a
product. Section 3.1.1 first provides a definition of two types of features of a product
or a business that are the main features and related features. The method of extracting
the main feautres of a product will be disscussed in more detail in Section 3.1.2.
3.1.1 Main features and related features
In online reviews, users normally use different words to refer to the same concept or
aspect of a product. For example, “display” and “screen” are used interchangebly to
refer to the same concept. In addition, there are sub-features that are used with the
main feature to further describe the concept in detail. For example, “resolution”,
“contrast” and “brightness” are normally used by the reviewer to more deeply analyse
the aspect “display” of the camera product in the example above. According to Hu and
Liu (2004), features are the most frequently-occurred nouns, because they are
frequently mentioned in online reviews. Therefore, words such as “display” , “screen”,
“resolution”, “contrast” and “brightness” are the features of a product because they are
frequently occurred. It is noticable that they all belong to the same concept group,
because they describe the same aspect of the product. In our thesis, we define two
kinds of features : main feature and related features to the main feature.
Main feature and related features: Given a group of feature words representing one
concept/aspect of the product, the feature having the most abstractive meaning and
most frequetly appearing in the online review is considered as the main feature in the
group. All the remaining feature words that are not the main feature in the group are
considered as the related features of the main feature.
If features in a group are at the same level of abstraction, the feature mentioned with
higher occurrence frequency in online reviews is chosen as the main feature of product.
Table 1 show an example of a main feature “display” of a camera and its related
features. Feature “display” and feature “screen” are clearly more abstract than the other
remaining features in the group so they are the potential main features. However, it is
hard to decide whether “display” is more abstractive than “screen” or the opposite
because they seem to be at the same level of abstraction (level 2). In this example, we
assume that occurrence frequency of feature “display” will be higher than feature
“screen” in the review collection, thus feature “display” is the main feature of a camera
Chapter 3: Main Feature Selection and Related Feature Selection 29
product. All remaining features, including “screen”, “resolution”, “contrast”,
“brightness”, are then related features of that main feature, “display”.
Table 1. Main Features “Display” and Its Related Features
Main Feature ( the most
abstractive feature)
Level 1 Display
Synonym features Level 2 Display, Screen
Sub-features Level 3 Resolution, Contrast, Brightness
In the next section, the method of extracting the main features of a product using a
pattern mining technique is discussed.
3.1.2 Pattern mining based main feature selection
In order to extract the main features in the reviews, the method proposed by Hu an Liu
(2004) is first employed. However, this work improves (Hu & Liu)’s method by
choosing only opinion sentences instead of all single sentences in the reviews dataset
to prepare database transactions. Those features are then manually analysed and
grouped into different groups, where words in each group describe one concept or
aspect of the product. In each group, the most abstractive word is chosen and
considerred as the main feature of the product. Generally, there are the following steps
to identify the main features of a product in the review collection.
Step 1: Online Reviews Part-of-Speech Tagging (POST)
Part-of-Speech Tagging (POST) is a technique in Natural Language Processing (NLP)
used to identify part of speech or word form of each word such as noun, pronoun, verb,
adjective or adverb, etc. in the text corpus (Manning & Schütze, 1999). Since frequent
nouns are potential features, POST can help to identify nouns in the reviews. In this
step, the identified nouns then will be processed by applying some other NLP
techniques such as the approximate string matching technique (Baeza-Yates &
Navarro, 1998) and Word Stemming. Approximate String Matching technique can
help to deal with the problem of word variants or misspellings. For example, word
“view-finder” will be converted to “viewfinder” or word “zom” will be converted to
“zoom”. Meanwhile, Word Stemming produces the root form of a word, for instance
“len”, “lens”, “len's”, “lens”' are grouped into “lens”. Approximate String Matching
30 Chapter 3: Main Feature Selection and Related Feature Selection
and Word Stemming technique will ensure that all identified nouns in the review
collection can be matched with each other. The matching of words are essential for
pattern mining in the next step.
Step 2: Pattern mining on online reviews
Pattern mining is popularly applied in the field of text mining where a review or
document is normally used as a transaction in the transactional database. In this case,
mostly features are expressed and commented on at sentence level, e.g. feature
“picture” is mentioned in the sentence of reviews “the picture quality of this camera is
great”. Therefore, in order to extract those features, transaction at sentence level is
more suitable than transaction at the review or document level. Hu an Liu (2004)
proposed to prepare a transaction database where all sentences of the review collection
are taken into considertation. However, there are some limitations to this. Although a
feature is frequently mentioned in sentences, not all of the sentences in the reviews
share this characteristic. By observation, in a review, the number of sentences, without
discussing certain product features, can even be higher than the number of sentences
without commenting on any feature. We termed those sentences having no feature
expression as noisy sentences because they cannot contribute to the feature extraction
of a product and the inclusion of those kinds of sentences into the transactional
database is unnecessary. Their presence can reduce the standing out of feature words
because the frequency of common words in database transaction can be increased. The
example below illustrates this point.
Example: “We go to the Digital-To-End store to browse all the products. The
store is located on level 4 of the block. After browsing around, we finally see
the label of the camera on the shelves. The first attraction of the display is
huge comparable to other models. I love the big screen and weight is also not
very heavy. The staff there give me some descriptions about the product and I
definitely love it. However, after using it for a while, I recognise that the
battery is not good at all. …”
If each noun in sentences is considered as a transaction, nouns like “store”, “product”,
“level”, “block”, “label”, “shelve”, etc. are included in the transaction database. In fact,
these words are general or global words, which are clearly not the features of the
product. Inclusion of those sentences in the transaction database makes the total
number of transactions high, which leads to decreasing the value of support for product
Chapter 3: Main Feature Selection and Related Feature Selection 31
features. As a result, “true” features have less chance to be successfully extracted when
applying the pattern mining technique. Therefore, we believe that filtering out noisy
sentences can help to increase the performance of feature extraction.
As shown in the example above, the user expresses his /her opinion words “huge” to
describe feature “display” in sentence 4, opinion words “big”, “heavy” related to
feature “weight” in sentence 5, and opinion words “good” of feature “battery” in the
last sentence. It is noticed that those sentiment words are a signal for potential product
features in reviews since a reviewer tries to compliment or criticise features of the
product. We believe that sentences having those sentiment words more likely contain
the product features. Therefore, we only use those sentiment sentences to prepare for
the transactions database. This step not only helps to significantly reduce the number
of noisy transactions but also reduce the size of the transaction database. Table 2 shows
the preparation of a transaction database.
Table 2. Transactional Database
Transaction ID Items
Transaction1/ Sentence 1 Weight, Size, Shutter
Transaction2/ Sentence 2 Lens, weight
Transaction3/ Sentence 3 Lens, Display
…………………………..
……………………………..
Transaction n/Sentence n Display, resolution, brightness
Let 𝑇𝐷 = {𝑇1, … , 𝑇𝑛} be a transaction database of 𝑛 transactions generated from the
review collection. A pattern mining technique is then applied to 𝑇𝐷 with a minimum
threshold of 𝜎 to generate frequent patterns. In this thesis, only single-word features
are focused so only size-one patterns are kept as the list of potential features. Those
potential features are then manual grouped into different groups where all feature
words in each group represent the same aspect or concept of a product. For each group,
the main feature of the product can be selected by choosing the most abstractive word
in the group. Note that during the process of main feature extraction, only the grouping
process and selecting the main features needs some manual work, while all of the other
tasks are automatic.
Let vocabulary 𝑊={𝑤1,𝑤2,…,𝑤𝑛} denote a set of words existing in R and 𝑅 =
{𝑟1, 𝑟2, . . . , 𝑟𝑀} be a set of reviews, each review 𝑟 in R is a set of words from a
vocabulary 𝑊 , i.e., 𝑟 ⊆ 𝑊. Let 𝐹𝑅 denote a set of main features found in the review
32 Chapter 3: Main Feature Selection and Related Feature Selection
collection after applying Pattern Mining on the review transaction database, 𝐹𝑅 =
{𝑓1, 𝑓2, . . . , 𝑓𝑚} is the list of m main features extracted.
3.2 DISCOVERY OF RELATED WORDS OF MAIN FEATURES
As discussed at the beginning of this chapter, main features and related words can
represent the content of online reviews. In Section 3.1, the method of selecting the
main features of the product is discussed. In this section, a new approach to identify
related words to the identified main features is proposed. Related words of a main
feature are words that are usually associated with the main feature in online reviews.
Those related words provide information about the main feature and make the main
feature more understandable. In general, two types of related words are defined that
can deliver the information about the main feature to the reader: sentiment words and
related feature words. The definition and the way of identification of each type of those
related words are discussed from Section 3.2.1 to 3.2.4. The final sets of related words
will be concluded in Section 3.2.5.
3.2.1 Sentiment words Identification
Reviewers normally use adjectives to compliment or criticise the main features of
product. For example, in Figure 6, words such as “amazing”, “cosy”, “chill”, “modern”
and “welcoming” are used to describe the main feature “atmosphere” and related
features “air” and “ambience”. These words contribute to the user’s opinion about
“atmosphere” and should be considered to be relevant to the feature “atmosphere”.
According to Marneffee et al. (2006), these words are called sentiment/dependent
words because they have a grammatical dependent relationship with the main feature.
Sentiment words are in fact widely utilised in the field of sentiment mining of online
review. They help to determine sentiment directions of the features of the product (Liu,
2010; Liu, et al., 2007; Popescu & Etzioni, 2007; Scaffidi et al., 2007). It is clear that
those sentiment words give certain information about the main feature and thus are
clearly related to the main feature. Therefore, the first kind of related words to the main
feature is defined as the related sentiment words.
Chapter 3: Main Feature Selection and Related Feature Selection 33
Figure 6. Similar and sentiment words of Feature “atmosphere”
Because the sentiment words are normally used to provide the attitude of the reviewer
to the main feature, they normally stand near the main feature. Therefore, adjectives
within a threshold of 𝜎 distance from the main feature are chosen as the sentiment
words of the main feature. For a review collection 𝑅 and a main feature 𝑓 ∈ 𝐹𝑅, let
𝑆𝑊𝑅(𝑓), 𝑆𝑊𝑠𝑒𝑛(𝑓) denote a set of sentiment words to the main feature 𝑓 for R and an
individual sentence 𝑠𝑒𝑛 ∈ 𝑅, respectively, then 𝑆𝑊𝑅(𝑓) = ⋃ 𝑆𝑊𝑠𝑒𝑛(𝑓)𝑠𝑒𝑛 ∈ 𝑅 .
In order to measure the degree of relatedness of each word 𝑤 ∈ 𝑆𝑊𝑅(𝑓) to the main
feature f, the distances between the sentiment word w and f in sentences of online
reviews are used. Let 𝑆𝑠(𝑤, 𝑓) be the set of n sentences in the review collection where
each sentence in 𝑆𝑠(𝑤, 𝑓) contains w and f, i.e., 𝑆𝑠(𝑤, 𝑓) = {𝑠𝑒𝑛1, 𝑠𝑒𝑛2, . . , 𝑠𝑒𝑛𝑛}. The
distance between w to f in an individual sentence 𝑠𝑒𝑛 ∈ 𝑆𝑠(𝑤, 𝑓) can be calculated as
the number of words between w and f in sen, denoted as 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓). The
distance between w in 𝑆𝑊𝑅(𝑓) to f can be measured as the average distance from w to
f of sentences in 𝑆𝑠(𝑤, 𝑓).
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑅(𝑤, 𝑓) = ∑ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓)𝑠𝑒𝑛∈𝑆𝑠(𝑤,𝑓)
|𝑆𝑠(𝑤, 𝑓)|
The weight or relatedness of sentiment word w to the feature f for R is calculated as
follow.
𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑤, 𝑓) =1
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑅(𝑤, 𝑓)
Or
𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑤, 𝑓) =|𝑆𝑠(𝑤, 𝑓)|
∑ 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠𝑒𝑛(𝑤, 𝑓)𝑠𝑒𝑛∈𝑆𝑠(𝑤,𝑓) (1)
34 Chapter 3: Main Feature Selection and Related Feature Selection
Let 𝑆𝑊𝑅(𝐹) denote the sets of related sentiment words to the main features of the
product, then 𝑆𝑊𝑅(𝐹) = ⋃ 𝑆𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹) denote sets of
corresponding weights or relatedness of the word sets in 𝑆𝑊𝑅(𝐹), then
𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 1 illustrates the method of
generating 𝑆𝑊𝑅(𝐹) and 𝑊𝑒𝑖𝑔ℎ𝑡𝑆𝑅(𝐹).
Chapter 3: Main Feature Selection and Related Feature Selection 35
3.2.2 WordNet based method to find similar features words
In Section 3.1.1, the sentiment words, which are the first type of related words, are
figured out. In this section, the second type of related words, which are related similar
feature words, can be identified. Reviewers may use different words to refer to the
main feature of product in the reviews. For instance, users can use words like “picture”,
“image”, “photo”, “photograph” or “pic” to refer to the main feature “picture” for
camera product or the words “atmosphere” and “ambiance” for a restaurant’s
atmosphere as described in Figure 6. Long, et al. (2014) proposed to use Google
Distance, which is an external resource, to find those words. In addition to Google
Distance, WordNet is also a popular external electronic lexical network resource to
find related concepts of the target concept which was developed by a group of
psychologists and linguists at the Princeton University in 1985 (Miller & Fellbaum,
1998). WordNet can be seen as an ontology for natural language terms that contains
around 100,000 terms, organised into taxonomic hierarchies. It stores information
about words belong to four parts-of-speech, nouns, verbs, adjectives and adverbs,
which are structured into a nodes (synsets or set of synonyms) and links (relationship
between two synsets) network. The basic relationship between the terms of the same
synset is the synonymy. Moreover, the different synsets are linked by various semantic
relations such as antonymy (opposite), hypernymy (superconcept)/ hyponymy
(subconcept) (also called Is-A hierarchy / taxonomy), and meronymy (part-of)/
holonymy (has-a). In this thesis, we use WordNet to find similar feature words to the
main feature. There are two reasons for the idea of using WordNet. First of all,
WordNet has been recognised for its practical value in various text mining tasks and
natural language processing (Liao et al., 2010). Secondly, in addition to synonym
words, WordNet can help to identify hyponym words or sub-feature words, which are
the sub-concepts of the target concept. As finding sub-features of main features is also
this study’s focus, WordNet is expected to identify both synonym features and sub-
features of the main feature in the review collection.
Let 𝑊𝑊𝑅(𝑓), denote a set of similar words to the main feature 𝑓, 𝑊𝑊𝑅(𝑓) ⊆ 𝑊.
Words in 𝑊𝑊𝑅(𝑓) are synonyms and sub-concepts of f found by using WordNet.
36 Chapter 3: Main Feature Selection and Related Feature Selection
Calculating the weight of related word found by WordNet
Words in 𝑊𝑊𝑅(𝑓) have different degrees of similarity or different weights to the main
feature f. This study proposed to use information content similarity metrics to evaluate
the similarity of each word w to the main feature f. The similarity between two words
is related to how much information they have in common.
Information content was first proposed by Resnik (1995) to calculate the similarity
distance between concepts (synsets in WordNet taxonomy) by linking probabilities to
concepts in the WordNet hierarchy. The author first defined the probability of concept
c, P(c), as the probability encountering an instance of the concept c. Let
𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡𝑠(𝑐) be the set of all concept words that are sub-concepts of concept c.
The occurrence frequency, 𝑓𝑟𝑒𝑞(𝑐), of the concept c can be calculated by the
cumulative sum of occurrence frequency of words in 𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡𝑠(𝑐)
𝑓𝑟𝑒𝑞(𝑐) = ∑ 𝑐𝑜𝑢𝑛𝑡(𝑤)𝑤∈𝑠𝑢𝑏_𝑐𝑜𝑛𝑐𝑒𝑝𝑡(𝑐) .
The probability of concept c was calculated by normalising with the number of
concepts (nouns) observed in the corpus (N).
𝑃(𝑐) = 𝑓𝑟𝑒𝑞(𝑐)
𝑁
Resnik (1995) then quantified the information content of a concept c as the negative
likelihood of probability of concept c, IC(c) = -logP(c). The argument here is that the
more two words have in common, the more similar they are. The commonality of two
words can be represented by the most informative subsume, which is the lowest
common subsumer (LCS) of two concepts in the WordNet hierarchy. The information
content of the most informative subsumer of the two concepts c1 and c2 is defined as
the similarity score of them,
𝑠𝑖𝑚𝑟𝑒𝑠𝑛𝑖𝑘(𝑐1, 𝑐2) = −𝑙𝑜𝑔 𝑃( 𝐿𝐶𝑆(𝑐1, 𝑐2) )
where 𝐿𝐶𝑆(𝑐1, 𝑐2) is the lowest node in the hierarchy that subsumes both c1 and c2.
Lin (1998) improved Resnik’s similarity measure by the argument that the similarity
between two words is not just what they have in common, but also depends on the
differences between them. In other words, the similarity between two concepts c1 and
c2 is measured by the ratio between the amount of information needed to state the
commonality of I and c2 and the information needed to fully describe c1 and c2. Lin
(1998) altered Resnik’s measure and proposed an improved version of the formula as
below.
Chapter 3: Main Feature Selection and Related Feature Selection 37
𝑠𝑖𝑚𝐿𝑖𝑛(𝑐1, 𝑐2) = 𝐼𝐶(𝑐𝑜𝑚𝑚𝑜𝑛(𝑐1, 𝑐2))
𝐼𝐶 (𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛(𝑐1, 𝑐2))
Or
𝑠𝑖𝑚𝐿𝑖𝑛(𝑐1, 𝑐2) = 2𝑙𝑜𝑔(𝐿𝐶𝑆(𝑐1, 𝑐2))
𝐿𝑜𝑔𝑃(𝑐1) + 𝐿𝑜𝑔𝑃 (𝑐2)
In this study, the measure proposed by (Lin) is applied to calculate the similarity score
or the weight of the words in 𝑆𝑊𝑅(𝑓) to the feature f. Let 𝑊𝑊𝑅(𝑓) denote a set of
word weights in 𝑊𝑊𝑅(𝑓)to the feature f. The weight of word 𝑤 ∈ 𝑊𝑊𝑅(𝑓), to the
feature f can be calculated by using the equation below.
𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝑤, 𝑓) = 2𝑙𝑜𝑔(𝐿𝐶𝑆(𝑤, 𝑓))
𝐿𝑜𝑔𝑃(𝑤) + 𝐿𝑜𝑔𝑃 (𝑓) (2)
38 Chapter 3: Main Feature Selection and Related Feature Selection
Let 𝑊𝑊𝑅(𝐹) denote the sets of related WordNet or similar words to the main features
of the product, then 𝑊𝑊𝑅(𝐹) = ⋃ 𝑊𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) denote
the sets of corresponding weight or relatedness sets of 𝑊𝑊𝑅(𝐹), then
𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 2 illustrates the method of
identifying 𝑊𝑊𝑅(𝐹) and calculating 𝑊𝑒𝑖𝑔ℎ𝑡𝑊𝑅(𝐹) .
3.2.3 Topic model based method to find related features
In Section 3.2.2, similar related feature words to the main feature can be found by
using the external ontology resource, WordNet. The task of identifying similar words
using external ontology, which is mainly formed from previous knowledge such as
Google code of length or WordNet, is in fact not new in text mining (Cilibrasi &
Vitanyi, 2007). Those external ontology sources find similar words, which are most
likely similar to the main feature words generally. However, in some circumstances,
the found similar words are not related to certain contexts (e.g., for a particular
product) that the main features have discussed. For example, word “celebrity”, “idol”,
“stellar” and “genuine” are synonym words of the feature “star” according to Google
Distance. However, feature “star”, in the domain of restaurant, indicates the ranking
of restaurant and clearly does not have similar meaning to word “genius”, “lead” or
“stellar’. In addition, there are always intrinsic relationships among features and those
relationships among features are different from dataset to dataset. WordNet is based
on the external standard knowledge ontology so it cannot find intrinsic relationships
among features buried in the dataset itself. Given a feature, WordNet always returns
the same set of similar words to the feature, in spite of whatever datasets are used.
Topic modelling is considered to be the state-of-the-art text mining technique, which
provides a tool to discover semantic spaces in large archives of text. Topic models do
not use any external source but the text corpus itself (domain-specific corpora) to
describe the textual collection by semantic spaces (topics) and semantic representation
(related words in each topic) (Steyvers & Griffiths, 2006). In more detail, given a
collection of documents, topic modelling can learn and discover topics, each of which
is represented by a group of words that tend to co-occur in the documents. Therefore,
words in topics generated by topic model have a tight relationship with each other. As
main features are the main topics of discussion in online review, and related features
to the main feature also frequently occur with the main feature in online review, a topic
Chapter 3: Main Feature Selection and Related Feature Selection 39
model is expected to discover correct related features of the main features, without
using any external source of language.
Latent Dirichlet Allocation (LDA) is currently the most popular approach in
generating topic models. Given 𝐷 = {𝑑1, 𝑑2, . . , 𝑑𝑚} is a collection of m documents.
The topic model generated by using LDA consists of topic representations at collection
level and topic distributions at document level. At collection level, each topic 𝑍𝑖 is
represented by a probability distribution over words, 𝜙𝑖 =
(𝜑𝑖,1 , 𝜑𝑖,2, ⋯ , 𝜑𝑖,𝑛), ∑ 𝜑𝑖,𝑘𝑛𝑘=1 = 1 , 𝜑𝑖,𝑘 is the weight for the kth word. At document
level, each document is represented by probability distribution over topics, 𝜃𝑑𝑗 =
(𝜗𝑑𝑗,1, 𝜗𝑑𝑗,2, … , 𝜗𝑑𝑗,𝑉) where 𝑉 is the number of topics, 𝜗𝑑𝑗,𝑖 is the probability of 𝑍𝑖
for document𝑑𝑗. (Blei, et al., 2003a)
In this thesis, a topic modelling technique, particularly the LDA modelling method is
employed, to find related words or features to the main features. More specifically,
LDA is applied to the review corpus to generate a set of topics. Let 𝑍 = {𝑍1, 𝑍2, . . 𝑍𝑘}
represent the list of k topics generated by LDA, each topic 𝑍𝑖 ∈ 𝑍 is a collection tion
of words, i.e., 𝑍𝑖 = {𝑤1, 𝑤2, . . . , 𝑤𝑛}, where 𝑤𝑘 is 𝑘𝑡ℎ word assigned to topic 𝑍𝑖, the
corresponding probability over words for 𝑍𝑖 is 𝜙𝑖 = (𝜑𝑖,1 , 𝜑𝑖,2, ⋯ , 𝜑𝑖,𝑛), ∑ 𝜑𝑖,𝑘𝑛𝑘=1 =
1 , 𝜑𝑖,𝑘 is the weight indicating the degree of importance of the work 𝑤𝑘 in 𝑍𝑖. By
filtering out the low-weighted words in each topic based on a minimum threshold of
𝜎, we choose the top high-weighted words to represent each topic. Let 𝑍𝑖′ be the 𝑖𝑡ℎ
topic after removing the low-weighted words, 𝑍𝑖′ is defined as below.
𝑍𝑖′ = {𝑤𝑘|𝑤𝑘 ∈ 𝑍𝑖 , 𝜑𝑖,𝑘 = ≥ 𝜎} (3)
Figure 7 shows an example of the chosen topical words in topic 5 of the topic models
generated from a restaurant dataset.
Figure 7. Topic 5 after removing words having low weight for one restaurant dataset
40 Chapter 3: Main Feature Selection and Related Feature Selection
The words in 𝑍𝑖′ are considered related with each other because they are selected to
represent the same topic. Let 𝑍′ = {𝑍′1, 𝑍′2, . . 𝑍′𝑘} be a list of topics after filtering low-
weight words. For a given main feature𝑓, if 𝑓 is a word of 𝑍𝑖′, 𝑍𝑖
′ can be considered as
the related topic of f and the words in 𝑍𝑖′can be considered topical related words to 𝑓.
Let 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)be the list of related topics of f, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) = {𝑍𝑖′ | 𝑓 ∈
𝑍𝑖′ 𝑎𝑛𝑑 𝑍𝑖
′ ∈ 𝑍′}
Let 𝑇𝑊𝑅(𝑓) denote the related topical words of feature 𝑓 in the review collection 𝑅.
𝑇𝑊𝑅(𝑓) = ⋃ ⋃ {𝑤}
𝑤∈𝑍𝑖′
(4)
𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)
Calculating the weight of topical related words
As discussed above, words in a topic are related to each other and reflect an aspect of
the review collection. We propose a method to measure the relatedness of each topical
word 𝑤 ∈ 𝑇𝑊𝑅(𝑓) to f by comparing the weights of them in the topic. Given two
words existing in the same topic, the more similar weight they are in the topic, the
more relatedness those two words are. More precisely, let 𝑊𝑒𝑖𝑔ℎ𝑇𝑅(𝑤, 𝑓) be the
corresponding weight or relatedness of word 𝑤 ∈ 𝑇𝑊𝑅(𝑓) to f, 𝑊𝑒𝑖𝑔ℎ𝑇𝑅(𝑤, 𝑓) can
be calculated as follows:
𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑤, 𝑓) = ∑ |𝜑𝑖,𝑤 − 𝜑𝑖,𝑓|
𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓,𝑤),𝜑𝑖,𝑤≠𝜑𝑖,𝑓
|𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤)| (5)
Where 𝜑𝑖,𝑤, 𝜑𝑖,𝑓 are the weights of word w and feature word 𝑓 in topic 𝑍𝑖′,
𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) are the collection of 𝑍𝑖′having f and w, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) =
{𝑍𝑖′ | 𝑤 ∈ 𝑍𝑖
′ 𝑎𝑛𝑑 𝑍𝑖′ ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)}
Let 𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,
then 𝑇𝑊𝑅(𝐹) = ⋃ 𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) denote the sets of
corresponding weights or relatedness of the word sets in 𝑇𝑊𝑅(𝐹), then
𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 3 illustrates the method of
identifying 𝑇𝑊𝑅(𝐹) and computing 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) .
Let 𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,
then 𝑇𝑊𝑅(𝐹) = ⋃ 𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) denote the sets of
corresponding weights or relatedness of the word sets in 𝑇𝑊𝑅(𝐹), then
𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 3 illustrates the algorithm of
generating 𝑇𝑊𝑅(𝐹) and computing 𝑊𝑒𝑖𝑔ℎ𝑡𝑇𝑅(𝐹) .
Chapter 3: Main Feature Selection and Related Feature Selection 41
42 Chapter 3: Main Feature Selection and Related Feature Selection
3.2.4 Pattern-enhanced Topic Model related words Identification.
Although topic models can contribute to find related feature words for the main
feature, using topical words still has problems. One of the common problems of LDA
is that the word-based or term-based topic representations may not be able to
semantically represent the topic and thus make the topic hard to understand. In
addition, popular or general words dominantly occur as the top words in some topics,
which make topics themselves not effectively distinctive represent different aspects of
the whole corpus. In the field of review selection, where each feature of a product is
the main discussion topic, some words in the topic model do not contribute very
effectively to represent the feature of the review collection. In order to solve this
problem, we need a way to represent topics generated from the LDA topic model more
effectively. In the Pattern based Topic Model proposed in (Gao, et al., 2013), each
topic 𝑍𝑗 is represented by a set of patterns instead of single words. Since phrases or
word patterns carry better semantic meaning than single words; for example, ex. “data
mining” is easier to understand than “data” or “mining”; pattern based topics therefore
are more discriminative by patterns. We are therefore inspired to apply a pattern based
topic model to enhance the task of identifying related words of the main feature.
According to Gao, et al. (2013), each topic in a pattern based topic model is represented
by a set of patterns instead of a set of words, i.e., 𝑃𝑍𝑗 = {𝑝𝑗,1, 𝑝𝑗,2, ⋯ , 𝑝𝑗,𝑙 }, each
pattern 𝑝𝑗,𝑘 is a subset of words in 𝑊, i.e., 𝑝𝑗,𝑘 ⊆ 𝑊 and l is the number of patterns
in topic 𝑃𝑍𝑗. In addition, as Gao, et al. (2013) use a pattern mining technique to
generate a pattern based topic model, each pattern in 𝑃𝑍𝑗 has a corresponding value of
support, representing the frequency occurrence of words in the pattern. Therefore, the
words in a pattern are considered closely related with each other. Let 𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑃𝑍𝑗 be
the collection of associated supports of pattern in 𝑃𝑍𝑗, 𝑆𝑢𝑝𝑝𝑜𝑟𝑡𝑃𝑍𝑗 =
{𝑠𝑢𝑝𝑃𝑗,1, 𝑠𝑢𝑝𝑃𝑗,2, . . , 𝑠𝑢𝑝𝑃𝑗,l}
Given a set of reviews 𝑅 = {𝑟1, 𝑟2, . . . , 𝑟𝑚} and vocabulary 𝑊 = {𝑤1, 𝑤2, … , 𝑤𝑛}, for
each topic 𝑍𝑗 generated by LDA, a corresponding pattern based topic 𝑃𝑍𝑗 is generated
by applying the pattern based topic model method in (Gao, et al., 2013). Let
𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) be the list of related topics of f, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓) = {𝑃𝑍𝑗 | 𝑝𝑗𝑙 ∈
𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓 ∈ 𝑝𝑗𝑙}. The related pattern based topical words of 𝑓, denoted as 𝑃𝑇𝑊𝑅(𝑓)
are defined as follows:
Chapter 3: Main Feature Selection and Related Feature Selection 43
P𝑇𝑊𝑅(𝑓) = ⋃ ⋃ {𝑤}
𝑝𝑗𝑙∈𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓,𝑤 ∈𝑝𝑗𝑙
(6)
𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)
The words in 𝑃𝑇𝑊𝑅(𝑓) are considered closely related to f.
Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) denote the set of corresponding weight or relatedness of the
related pattern based topic word w in 𝑃𝑇𝑊𝑅(𝑓), 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) can be calculated as
follows.
𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓) = ∑ 𝑠𝑢𝑝𝑃𝑗,l𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓,𝑤),𝑝𝑗𝑙∈𝑃𝑍𝑗 𝑎𝑛𝑑 𝑓,𝑤 ∈𝑝𝑗𝑙
|𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤)| (7)
Where 𝑠𝑢𝑝𝑃𝑗,l is the support of 𝑝𝑗𝑙 in topic 𝑃𝑍𝑗 , 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) are the collection
of related pattern based topics of f and w, 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓, 𝑤) = {𝑃𝑍𝑗 | 𝑤, 𝑓 ∈
𝑃𝑍𝑗 𝑎𝑛𝑑 𝑃𝑍𝑗 ∈ 𝑅𝑒𝑙𝑇𝑜𝑝𝑖𝑐𝑠(𝑓)}. The set of associated weights of words in P𝑇𝑊𝑅(𝑓),
denoted as 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓).
𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑤, 𝑓)𝑤 ∈ 𝑃𝑇𝑊𝑅(𝑓) (8)
44 Chapter 3: Main Feature Selection and Related Feature Selection
Let 𝑃𝑇𝑊𝑅(𝐹) denote the sets of related topical words to main features of the product,
then 𝑃𝑇𝑊𝑅(𝐹) = ⋃ 𝑃𝑇𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) denote the sets of
corresponding weights or relatedness of the word sets in 𝑃𝑇𝑊𝑅(𝐹), then
𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 4 shows the method of generating
𝑃𝑇𝑊𝑅(𝐹) and 𝑊𝑒𝑖𝑔ℎ𝑡𝑃𝑅(𝐹) .
Chapter 3: Main Feature Selection and Related Feature Selection 45
3.2.5 Final Related words Identification
As mentioned earlier at the beginning of this chapter, this study focuses on finding
related words of the main features. The previous four sections discuss different
proposed methods to identify related words of a main feature. More specifically, in
Section 3.2.1, the set of sentiment words (𝑆𝑊𝑅(𝑓)) are identified as adjectives standing
near the main feature. The related feature words can be identified by using the external
ontology resource, WordNet, in Section 3.2.2 (𝑊𝑊𝑅(𝑓)) and Topic Model in Section
3.2.2 (𝑇𝑊𝑅(𝑓)). In Section 3.2.4, the method of identifying related feature words is
improved by applying a pattern mining topic model instead of a traditional topic model
(𝑃𝑇𝑊𝑅(𝑓)). In general, the final set of related words found by our methods can be
visualised in Figure 8. 𝑅𝑊𝑅(𝑓) is then defined as:
𝑅𝑊𝑅(𝑓) = {𝑤|𝑤 ∈ 𝑆𝑊𝑅(𝑓) ∪ 𝑊𝑊𝑅(𝑓) ∪ 𝑇𝑊𝑅(𝑓) } (topic model used)
𝑅𝑊𝑅(𝑓) = {𝑤|𝑤 ∈ 𝑆𝑊𝑅(𝑓) ∪ 𝑊𝑊𝑅(𝑓) ∪ 𝑃𝑇𝑊𝑅(𝑓) } (pattern based topic model
used)
Figure 8. Related words of feature f (𝑅𝑊𝑅(𝑓))
It is noticed that words in 𝑆𝑊𝑅(𝑓) are adjectives and words in 𝑊𝑊𝑅(𝑓) are nouns
while words in 𝑇𝑊𝑅(𝑓) or 𝑃𝑇𝑊𝑅(𝑓) are both noun and adjective. As 𝑆𝑊𝑅(𝑓),
𝑊𝑊𝑅(𝑓) and 𝑇𝑊𝑅(𝑓) or 𝑃𝑇𝑊𝑅(𝑓) contain words that are originated from the review
collection R, there are a number of words existing in different word sets at the same
time. Let 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) denote the set of shared sentiment words (adjectives) found by
46 Chapter 3: Main Feature Selection and Related Feature Selection
the method in Section 3.2.1 and Pattern based Topic Model, 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) =
𝑆𝑊𝑅(𝑓) ∩ 𝑃𝑇𝑊𝑅(𝑓). Let 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓) denote the set of shared related feature words
(Noun) found by WordNet and Pattern based Topic Model, 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓) =
𝑊𝑊𝑅(𝑓) ∩ 𝑃𝑇𝑊𝑅(𝑓). The set of shared related words, 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓), is the
combination of 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) and 𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓), 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓) = 𝑆ℎ𝑎𝑟𝑒𝑆𝑅(𝑓) ∪
𝑆ℎ𝑎𝑟𝑒𝐹𝑅(𝑓). Because the words in 𝑆ℎ𝑎𝑟𝑒𝑑𝑊𝑅(𝑓) can be found by different methods
proposed in Sections 3.2.1, 3.2.2, 3.2.3 and 3.2.4, those words are expected to have
higher certainty in relatedness to the feature f than others words in 𝑅𝑊𝑅(𝑓). Recalled
in four previous sections, every found related word has an associated weight indicating
the degree of relatedness of the word to the main feature. The method of calculating
the word weight depends on which method is used to identify the related features.
Therefore, words in 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓) can have more than one value of weight because
they are identified by different methods. In this study, the higher weight is chosen as
the final weight for the related word if it existing in 𝑆ℎ𝑎𝑟𝑒𝑊𝑅(𝑓)).
Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) denote the set of associated weight of each word in 𝑅𝑊𝑅(𝑓).
𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) = ⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓)
𝑤 ∈ 𝑅𝑊𝑅(𝑓)
(9)
where 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is the weight or relatedness of the word w to feature f.
𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is chosen depending on whether word w belongs to 𝑆ℎ𝑎𝑟𝑒𝑑𝑊𝑅(𝑓) or
not.
Let 𝑅𝑊𝑅(𝐹) denote the sets of related words to main features of the product, then
𝑅𝑊𝑅(𝐹) = ⋃ 𝑅𝑊𝑅(𝑓)𝑓∈𝐹𝑅. Similarly, let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) denote the sets of
corresponding weights or relatedness of the word sets in 𝑅𝑊𝑅(𝐹), then 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) =
⋃ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓)𝑓∈𝐹𝑅
. Algorithm 5 shows the algorithm of generating 𝑅𝑊𝑅(𝐹) and
𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝐹) .
Chapter 3: Main Feature Selection and Related Feature Selection 47
48 Chapter 3: Main Feature Selection and Related Feature Selection
3.3 SUMMARY
In this chapter, a new method is proposed to analyse the content of online consumer-
generated reviews. The online content is represented by a list of features and related
words found by using data mining, natural language processing and probabilistic topic
model (Figure 9). While main features are the main topics in online reviews, related
words provide the information about the features. Features and related features
together make the content easily understandable. In general, there are a number of key
points in this chapter. First of all, an improved version of the main feature extraction
is proposed in chapter 3.1. Secondly, we also proposed to identify related words to the
main features by different approaches, including WordNet, Topic Model and pattern
based topic model. While WordNet can find similar words to the main features, the
topic model method and pattern based topic model can help to identify related words
within the context of the product (domain specific). The combination of using an
external ontology resource (WordNet) and probabilistic topic model allows the
identification of a complete and accurate related word set to the features. As topic
modelling provide an interpretable low dimensional representation of the review
collection by using a number of topics where words in each topic frequently occur in
online reviews, issues of review content analysis, such as polysemy and synonym, can
be reduced.
Figure 9. The representation of review content by main features and related words
Chapter 4: Review Selection for Single Feature 49
Chapter 4: Review Selection for Single
Feature
As online commerce activities continue to grow, online reviews are the most important
and helpful resources for consumers to get product information, thus facilitate their
purchase decision process. However, the abundant quantity of review data has become
a barrier for users to go through the reviews and get vivid pictures of the products in
which they are interested. Understanding problems of overloading information in the
context of online reviews, researchers have investigated identifying high-quality,
helpful reviews. This chapter will discuss our contribution to the review selection
problem by proposing a new method to optimise the performance of the review
selection task.
4.1 OVERVIEW OF HELPFUL REVIEW SELECTION
A prior research stream on product review helpfulness is to classify a set of helpfulness
reviews from the original review corpus. Therefore, tasks of effectively differentiating
between those helpful and unhelpful reviews, together with criteria to determine the
helpfulness, are the key concerns in the field of review selection. Researchers such as
(Kim, et al., 2006) estimate the helpfulness of reviews based on review writing style,
review length and grammar, reviewer information and timeliness by employing
supervised learning approaches such as classification. However, reviews written in a
professional style and with correct grammar do not always reflect that reviews contain
useful information for customers. Recently, scholars have greatly focused on finding
helpful reviews based on the content of reviews, specifically in terms of product
features. Their primary concern is, for example, features such as “price”, “lens”,
“battery”, etc. of a digital camera, which are the main points of discussion in the
reviews and should be taken into consideration. The argument is necessarily
considered because reviewers write comments to share their experiences about a
certain product – in more detail, their opinion on features of the product may affect
their evaluation.
50 Chapter 4: Review Selection for Single Feature
Tsaparas, et al. (2011) proposed a method to find reviews covering as many features
as possible, in which they consider all features are equally important and independent.
However, from a customer’s point of view, each feature plays a different important
role in their consideration. For example, in general, the “price” feature is important
and considered by customers on a tight budget. In addition, some users might be
interested in only a feature or a small group of features, but not all the product features
at the same time. Therefore, finding reviews that discuss a main feature should be taken
into consideration. However, there are not many studies regarding this issue. Our aim
is finding helpful reviews that extensively discuss a main feature. The output reviews
are helpful and carry comprehensive information about the target features.
Recently, Long, et al. (2014) proposed a novel method, called Specialised Review
Selection (SRS), in finding specialised reviews that extensively discuss a feature. In
detail, for a specific product feature in the review corpus, the model extracts a set of
words which are similar to the target feature. Then the authors use Kolmogorov
complexity and information distance to calculate the amount of information from these
related word sets. For each feature, all reviews are then ranked according to their
information distance score. The review which most extensively discusses the feature
is the one with minimal information distance.
4.2 SPECIALISED REVIEW SELECTION (SRS) METHOD
The idea of using Kolmogorov complexity and information distance in SRS to select
reviews for a given feature is effective because both are well known algorithms to find
related objects for a given feature based on the information contained in the objects.
However, there are two significant drawbacks of their method.
The success of SRS is based on the correctness of the identified similar words
of the main feature. The authors identify similar words of the main feature by
using Google code of length (Cilibrasi & Vitanyi, 2007). As discussed in
Chapter 3, Google code is an external ontology resource, which may not always
return correct relevant words to the main feature in the domain of target dataset.
SRS uses information distance which bases only on the remaining related
words of the target feature as the method of selecting reviews. According to
Long, et al. (2014), given a set of related words to a feature and a review,
remaining related words toward the review are the set of related words that are
Chapter 4: Review Selection for Single Feature 51
not existing in the review. Their formula of calculating the score of each review
only uses those remaining related words and ignores the related words in the
review. The related words in the review in fact also play an important role in
deciding the degree of feature relevance of the review. We believe that those
related words should be taken into account when calculating the score of
relevance for each review. This study therefore propose an improved method
by taking into consideration both the related words and the remaining words.
This will be discussed in more detail in Section 4.3.2.
4.3 THE PROPOSED REVIEW SELECTION METHOD
4.3.1 Criteria of helpful reviews for single feature.
The criteria of a helpful review is changing with the research timeline on online
reviews as stated in Section 2.1.1 of the Literature Review. At the current time, most
of the current research studies agree that content or features of reviews are the most
important indicator of a helpful review. In general, according to a number of studies
(Lappas, et al., 2012; Liu, et al., 2007; Tsaparas, et al., 2011; Zhang & Zhang, 2014),
three criteria have been identified for a helpful review.
First criteria: the number of features in the review.
Readers read reviews in order to understand about the features of product. Reviews
that discuss many features certainly get the interest from reviewers, as they can
generally provide information about all of the features of a product.
Second criteria: The amount of opinion about features of the review.
A helpful review is the review that can deliver the opinion of the author about the
feature to readers. People read the review in order to decide whether they will buy the
product. They are thus looking for opinion about the features of a product from other
users in the review. If the review does not provide any comment about the features, it
is useless to readers. In online reviews, the opinions about features of a product are
normally expressed by sentiment words such as “expensive”, “cheap”, “heavy”,
“light”, etc. Therefore, the high number of sentiment or opinion words associated with
features in review is a signal of a helpful review.
52 Chapter 4: Review Selection for Single Feature
Third criteria: How detailed the features of a product are discussed in the
review.
The main difference of helpful reviews and unhelpful reviews is the detailed level of
discussion about the feature in the review. For example, if a reviewer mentions the
feature “display” - such as “the display is not good” - and stops there, the review is not
helpful. The writer gives his idea about the feature but does not state reasons at all. A
helpful review should provide evidence to prove the opinion of the author. For those
kinds of review, the author will continue to more deeply discuss the target feature,
such as, “The screen is too dark. The resolution of the camera is too low and font is
also small.” etc. That extra information provided makes the review more
comprehensive and persuading. We therefore believe that reviews having a high level
of comprehensive feature discussion and analysis are more likely to be helpful reviews.
It is noticed that features such as “resolution” and “font” are related sub-features of the
target feature “display”. The high amount of those sub-features can contribute to the
level of deep discussion of the review.
As our review selection model focuses on a single feature, the first criterion is not
applicable for our method. We therefore take into consideration two remaining criteria
into our proposed model. In more detail, we propose to find a helpful review for a
single feature, by using the set of related words to that feature. The selection of related
features therefore should cover the two criteria above. In Chapter 3, we already
proposed a method of identifying those related words, which are a complete set of
sentiment words and related sub-feature words. Those related words clearly cover the
second and third criteria of a helpful review. In next section, we will describe our
review selection approach for a single feature. It is noted that we use several math
notations in each section to easily explain our proposed feature relevance measure..
4.3.2 Review Selection Method
For a given feature𝑓, we want to find a set of reviews𝑅𝑓, each of which provides the
information about 𝑓. In order to find 𝑅𝑓, we need to measure the information contained
in a review, especially the information which is about the feature 𝑓. Inspired by the
work in Long, et al. (2014), which measures the amount of information in a review
using Kolmogorov complexity, we propose to measure the information in a review
that relates to a given feature using the Kolmogorov complexity of the feature’s set of
related words.
Chapter 4: Review Selection for Single Feature 53
For an object 𝑤, the Kolmogorov complexity of 𝑤, denoted as 𝐾(𝑤), expresses the
information contained in 𝑤. Theoretically, the Kolmogorov complexity of 𝑤 is defined
as the length of the shortest effective binary description of producing the word w
(Grünwald & Vitányi, 2003). However, 𝐾(𝑤) is not computable in general. Following
the idea in Long, et al. (2009), in this thesis, we use the relatedness of a word 𝑤 to
feature 𝑓 and Shannon-Fano code to measure the 𝐾(𝑤) relative to 𝑓 (Ming &
Vitányi, 1997). Given a feature 𝑓, the relevance of a word 𝑤 to 𝑓 can be measured by
the conditional probability 𝑃(𝑤|𝑓) = 𝑃(𝑤, 𝑓)/𝑃(𝑓), where 𝑃(𝑤, 𝑓) can be
approximated by the document co-occurrence of 𝑤 and 𝑓 and 𝑃(𝑓) can be
approximated by document frequency of 𝑓, that is,
𝐾(𝑤) = −𝑙𝑜𝑔𝑃(𝑤|𝑓) = −𝑙𝑜𝑔𝑃(𝑤, 𝑓) + 𝑙𝑜𝑔𝑃(𝑓)
Let 𝑅𝑊𝑅(𝑓) and 𝑅𝑊𝑟(𝑓) denote the set of related words to 𝑓 in the review collection
R and an individual review r, respectively, then 𝑅𝑊𝑅(𝑓) = ⋃ 𝑅𝑊𝑟(𝑓)𝑟∈𝑅 . The
following score is calculated to measure the Kolmogorov complexity of a review 𝑟 in
terms of feature 𝑓 by calculating the Kolmogorov complexity of the words in other
reviews rather than in 𝑟:
𝑆𝑃𝐸𝑟,𝑓 = ∑ 𝐾(𝑤)
𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)
= ∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))
𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)
(10)
The value of 𝑆𝑃𝐸𝑟 ,𝑓 is considered as the information distance between 𝑅𝑊𝑅(𝑓) and
𝑅𝑊𝑟(𝑓) . The less the distance, the more related the words in 𝑟 to 𝑓 are. Therefore,
reviews having the lowest score of 𝑆𝑃𝐸𝑟 ,𝑓 are selected as the output of our system.
The normalized value of 𝑆𝑃𝐸𝑟,𝑓 can be calculated by:
𝑆𝑃𝐸𝑟,𝑓𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = ∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))𝑤 ∈ 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)
| 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)| (11)
One of the drawbacks in equation (11) is that the related words of the main feature
themselves are not taken into consideration. It is undeniable that the related words of
a main feature 𝑓 in the review directly contribute to the relevance of the review to the
feature𝑓. For example, for a set of related words 𝑅𝑊𝑅(𝑓)to the main feature𝑓, if
review A contains 20 related words while review B contains only 10 related words,
54 Chapter 4: Review Selection for Single Feature
i.e, |𝑅𝑊𝑟𝐴(𝑓)| = 20 > |𝑅𝑊𝑟B(𝑓)| = 10, review A will be more likely to be related to 𝑓
than review B. Although Equation (11) uses the set of remaining words, which is
extracted from the set of related words, the related words themselves still have
important meaning in deciding the degree of relevance. Therefore, they should not be
ignored. In this study, we also take into account those related words into our method
as well. To achieve this task, two factors need to be included in the formula. The first
factor is the number of related words in each review, since a review having a high
number of related words is more likely to be related than a review having less number
of related words. The second factor is how important each related word is, in the review
to 𝑓. Most of the previous studies assume the equal importance of the related words.
However, each related word related to f certainly has different importance to the f. For
example, related features such as “resolution” and “brightness” are more related to
feature “display” than other related word of the feature “display”. It is true for
comprehensive reviews where a reviewer tries to discuss more details about the main
feature. In Chapter 3, we propose some methods to calculate the weight of related
words. In addition to this weight, the occurrence frequency of the related words and
the feature together should also be taken into consideration, as the most important
related words are normally the words that most go with the feature. In general, we
propose to measure the direct relevance by using conditional probability of related
words given 𝑓 and the weight of words.
Let 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) denote the set of associated weights of words in 𝑅𝑊𝑅(𝑓),
𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) = ⋃ 𝑤𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓)𝑤 ∈ 𝑅𝑊𝑅(𝑓) , where 𝑤𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) is the relatedness
or weight of word 𝑤 ∈ 𝑅𝑊𝑅(𝑓) to the feature f (Note: 𝑅𝑊𝑅(𝑓), 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑓) can be
identified and calculated by the methods proposed in Chapter 3).
The direct relevance and direct relevance distance of review r are calculated as follows.
Direct Relevance:
𝑑𝑖𝑟𝑒𝑅𝑒𝑙𝑟,𝑓normalized =∑ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤, 𝑓) ∗ 𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)
|𝑅𝑊𝑟(𝑓)| (12)
The higher the value of 𝑑𝑖𝑟𝑒𝑅𝑒𝑙𝑟,𝑓, the more relevant the review is to the main feature
𝑓.
Chapter 4: Review Selection for Single Feature 55
Direct Relevance Distance:
𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓normalized = 1 − ∑ 𝑊𝑒𝑖𝑔ℎ𝑡𝑅(𝑤,𝑓)∗𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)
|𝑅𝑊𝑟(𝑓)| (13)
The lower the value of 𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓, the more relevant the review is to the main
feature f.
The equation (13) can be then incorporated into the equation (11) to get the final
equation for calculating the weighted relevance score of one individual review to the
main feature.
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓 = 𝑑𝑖𝑟𝑒𝐷𝑖𝑠𝑡𝑅𝑒𝑙𝑟,𝑓normalized + 𝑆𝑃𝐸𝑟,𝑓𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑
or
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓
= (1 − ∑ 𝑊𝑒𝑖𝑔ℎ𝑡(𝑤) ∗ 𝑙𝑜𝑔𝑃(𝑤|𝑓)𝑤∈𝑅𝑊𝑟(𝑓)
|𝑅𝑊𝑟(𝑓)|)
+ (∑ (𝑙𝑜𝑔𝑃(𝑓) − 𝑙𝑜𝑔𝑃(𝑤, 𝑓))𝑤 ∈𝑅𝑊𝑟(𝑓)\𝑅𝑊𝑟(𝑓)
| 𝑅𝑊𝑅(𝑓)\𝑅𝑊𝑟(𝑓)|) (14)
The lower the value of 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑟𝑒𝑙𝑟,𝑓, the more relevant the review is to the main
feature 𝑓. Based on the weighted relevance score, a set of reviews will be selected. We
call the method Review Selection based on Weighted Relevance (RSWR).
56 Chapter 4: Review Selection for Single Feature
4.4 SUMMARY
This chapter discusses the proposed method to select helpful reviews for a single
feature. Firstly, an overview of current helpful review selection methods was discussed
to see the need of review selection for a single feature. Criteria of a helpful review for
a single feature were then identified based on previous studies of online review
selection. The proposed review selection method for a single feature is then proposed
to cover two important criteria of a helpful review, including the amount of opinion
and the detail of discussion about the target feature in the review. The review selection
model was then proposed by using the combination of direct relevance and information
distance of related words. As related words are the complete information about the
feature, a set of helpful reviews intensively discussing about the feature can be
retrieved. The experiment and evaluation in the next part provide the evidence for the
outperformance of the proposed review selection model.
Chapter 5: Experiments and Evaluation 57
Chapter 5: Experiments and Evaluation
In this chapter, we describe several experiments to evaluate our proposed methods
from Chapter 3 and Chapter 4. We first evaluate the review selection method
proposed in Chapters 4. We then use this review selection method to evaluate our
method of related word selection proposed in Chapter 3. For both evaluations, we
compare the performance of our models based on their ability to generate helpful
reviews according to a single feature.
5.1 EXPERIMENTAL ENVIRONMENT
For our development environment, we used a local laptop Intel Core i7 CPU,
installed memory of 16 GB RAM with a built-in Windows 7 operating system. All of
our experiments were implemented using Java Programming Language. A Graphic
User Interface (GUI) is built using NetBeans IDE 8.0.2 for the easy interaction of
user and the software during our experiment. A number of Java library packages
have been used in our software include:
Mathematic Statistics Library: Apache Common Mathematics Library release
3.5.
Natural Language Processing Package: json-simple-1.1, Stanford Log-linear
Part-Of-Speech Tagger and Java WordNet Library (JWNL).
Open-source Data Mining Library (SPMF).
5.2 EXPERIMENT DESIGN
5.2.1 Dataset constructions
The output of our model is a subset of reviews, which need to be evaluated. The
helpfulness score of the reviews has been considered as the gold standard to determine
the review helpfulness and has been using in a variety of studies. In this thesis, we
base the evaluation of our proposed model on this gold standard. The helpfulness
score is therefore one first and foremost criterion for choosing a reviews dataset for
experiment. In detail, a majority of reviews in the dataset should be voted for by users,
to indicate if they are helpful. In our experiment, we select two kinds of datasets
having helpfulness votes by customers, including a review collection from electronic
58 Chapter 5: Experiments and Evaluation
products (digital camera) and review collection from the food industry (restaurant) to
conduct our experiments.
In the first kind of dataset, we choose a digital camera review dataset collected from
Amazon (http://amazon.com) as Amazon has become a most popular website for
many research works about reviews recommendations recently (Ghose & Ipeirotis,
2006; Kim, et al., 2006; Liu, 2012). In addition, products of Amazon especially digital
cameras, have a significant amount of reviews, which is sufficient for our experiment.
We crawled a collection of customer reviews of a number of digital camera products
published before December 2011 on Amazon.com. The downloaded reviews were
pre-processed by stripping HTML tags and removing irrelevant information. Each
review has the following information.
Product rating: from 1 to 5 star.
Author Name: unique Amazon user identification.
Time: date when the review is posted.
Product name: name of the product. Ex., Canon 7D, etc.
Review text content.
Helpfulness vote: the number of readers saying the review is helpful.
For the second kind of dataset, we use a publicly available dataset provided by the
RecSys conference which was used in a RecSys competition organised by Yelp in
2013 (https://www.kaggle.com/c/yelp-recsys-2013). The Yelp datasets include
detailed data of over 10,000 businesses, 8,000 check-in sites, 40,000 users, and
200,000 reviews from the Phoenix, AZ metropolitan area. The reason of using the
datasets is due to the fact that they have been popularly used in research areas such as
opinion mining and recommendation systems, so their reliability and feasibility have
been confirmed. The structure of each review in the Yelp datasets is in structure of
JSON file as below:
{
'type': 'review',
'business_id': (encrypted business id),
'user_id': (encrypted user id),
'stars': (star rating),
'text': (review text),
'date': (date),
'votes': {'useful': (count), 'funny': (count), 'cool': (count)}
}
Chapter 5: Experiments and Evaluation 59
In this study, we only keep the text review content to carry out the experiment and
review helpfulness voting to evaluate our results. The reviews having less than two
votes of helpfulness were not sufficient for the evaluation so we removed them.
Two criteria used to choose potential datasets are as following:
Criteria 1: Dataset having sufficient number of reviews (at least 300 reviews)
and each review having at least three votes.
Criteria 2: Review having sufficient number of words (average number of
words in a review should be greater than 100 words).
According to the criteria above, we firstly filtered out reviews having less than three
votes of helpfulness. We then selected datasets having sufficient reviews (greater
than 300 reviews) and sufficient average number of words in review (greater than
100 words). We recognized that even though The Yelp datasets include datasets of
over 10000 business of 200,000 reviews, most of them do not satisfy our criteria of
selecting datasets. For example, dataset named “Four Peaks Brewing Co” has 735
reviews original but end up 197 reviews after filtering out and cannot be used. Three
datasets meeting the criteria were selected for our experiment were “Cibo”, “Fez”
and “Pizzeria Bianco”.
In general, four datasets in the digital camera category and three datasets from
restaurant category of Yelp dataset are used in our experiment and shown in Table 3.
Table 3. Dataset Information for Digital Camera and Restaurant Businesses
Category Dataset Name Number of
reviews
Average number of
words in a review
Digital Camera Canon 6D (CAM1) 350 127
Canon 5D (CAM2) 363 145
Canon 7D (CAM3) 323 119
Canon T3 (CAM4) 375 123
Restaurant Cibo (REST1) 421 133
Fez (REST2) 375 142
Pizzeria Bianco (REST3) 332 122
60 Chapter 5: Experiments and Evaluation
5.2.2 Baseline model
Random
This method randomly selected a set of reviews and used this set for comparison. This
is the basic selection task, which can produce a set of random reviews without any bias
in a short period of time. In our research, we use this method to randomly select N-
reviews and use these N-reviews as the baseline of comparison with our method.
Maximum Coverage Greedy (MCG)
The method proposed in (Tsaparas, et al. (2011)) is to select a set of high quality
reviews which cover many different aspects of the product or services.
Specialised Review Selection (SRS)
The most related approach with our method is the SRS method, proposed by (Long, et
al., 2014). Although this method focuses on estimating the feature rating, their work
does include selecting the specialised reviews for a single feature. We therefore choose
SRS as our baseline model.
5.2.3 Proposed Methods
The first proposed method is our review selection method proposed in Chapter 4. We
named our proposed selection method Review Selection based on Weighted Relevance
(RSWR). In order to evaluate RSWR, we compare the performance of our model and
the baseline SRS.
Secondly, we aim to test our methods of selecting related words proposed in Chapter
3. In general, there are three main proposed methods of selecting related words.
WordNet and Sentiment Related words Selection (WSRWS)
This related word selection identifies the set of related words including similar
words and sentiment words, to the main features. These are discussed in
Section 3.2.1 and Section 3.2.2.
Topical and WordNet Related words Selection (TRWS)
The method proposed to find related words to the main feature by using
external ontology WordNet and traditional Topic Model (LDA) which is
discussed in Chapter 3.2.3 and 3.2.5.
Chapter 5: Experiments and Evaluation 61
Pattern based Topic Model and WordNet Related Word Selection (PTRWS)
The improved version of TRWS is where a Pattern based Topic Model is used
instead of a Traditional Topic Model (LDA) (discussed in Section 3.2.4 and
3.2.5).
5.2.4 Evaluation Metrics
The performance of the proposed review selection system can be evaluated according
to the method’s ability to select high-quality and helpful reviews for the target
feature. In order to evaluate the performance of the proposed approaches, we
compare the top-N-reviews selected from our proposed methods and the baseline
models. A various number of different metrics including Helpfulness Average Value,
Amazon Top Ranking, Precision, Recall, F-score, and Discounted Normalize Gain
were used as our evaluation metrics.
Helpfulness Average Score
As mentioned, our collected review datasets are from two sources, digital camera from
Amazon.com and restaurant business from Yelp.com. In this
section, we discuss our method of obtaining helpful scores for
reviews in our review datasets.
Reviews collected from digital camera datasets have votes for helpfulness and
unhelpfulness. We use the number of helpfulness votes and total number of votes to
determine the helpfulness score of the review.
𝐻𝑒𝑙𝑝𝑓𝑢𝑙(𝑟) = 𝐻𝑒𝑙𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)
𝐻𝑒𝑙𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)+𝑈𝑛𝐻𝑒𝑝𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)
where 𝑟 ∈ 𝑅
For example, for a review, if 120 people say this review is helpful and 80 people say
the review is unhelpful, then the helpfulness score is 0.6 (120/200).
Similarly, the Yelp website allows users voting for each review to indicate if it is
helpful from their perspective. Each review is associated with votes in three different
categories namely “useful”, “funny” and “cool”. We use the votes in the “useful”
category to determine the helpfulness of the review. In detail, the helpfulness score of
each review is calculated by the ratio of the count of usefulness of the review and the
total count of the review.
62 Chapter 5: Experiments and Evaluation
𝐻𝑒𝑙𝑝𝑓𝑢𝑙(𝑟) = 𝑢𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠_𝑣𝑜𝑡𝑒𝑠(𝑟)
𝑢𝑠𝑒𝑓𝑢𝑙𝑙𝑣𝑜𝑡𝑒𝑠(𝑟) + 𝑓𝑢𝑛𝑛𝑦𝑣𝑜𝑡𝑒𝑠(𝑟) + 𝑐𝑜𝑜𝑙_𝑣𝑜𝑡𝑒𝑠(𝑟)
We evaluate the performance of our approach by comparing the average helpful score
of the top 10 and 15 reviews generated by our proposed approach to that of the
baseline. The higher the value of the average helpful score, the better the performance
of the approach. The result is confirmed by using t-test and p-value to determine the
significant difference of the results.
t-test and p-value
t-test is one of the most popular test in statistics to determine the significant difference
of the mean of a population to the mean of another population. Given two set of
reviews generated from two models, the significant difference of the mean of the first
set of reviews (X1) and the mean of the second set of reviews (X2) can be measured
by using t-test. The t-value can be calculated as:
𝑡 = 𝑋1 − 𝑋2
√𝑠1
2 + 𝑠22
𝑛1 + 𝑛2
Where 𝑛1, 𝑛2 are the size of two set 𝑋1and 𝑋2, 𝑠1 and 𝑠2are the standard deviation of
two sets, respectively. p-value can be obtained directly from the t-value. In our
experiment, we choose the level of significance (alpha) of 0.05 as the significant
boundary.
5.2.4.2 Amazon Top Ranking
Amazon.com is the most popular commerce website and it has its own algorithm to
rank the reviews in descending order of quality. The algorithm is designed by experts
where factors such as helpfulness votes, times when the review was created, etc. are
taken into account. In addition, they have approaches to avoid spam reviews thus the
top reviews of digital cameras on Amazon.com can be considered as high-quality
reviews. In our experiment, we also used top-N reviews ranking by Amazon.com as a
ground truth for the evaluation of our proposed models. A number of traditional
metrics including Precision, Recall and F-score measure were used to analyse the
result of the experiment. Precision indicate the ratio of how many of the selection
reviews are high quality, while recall reflects a portion of selected reviews that are
inside the top reviews of Amazon.com. The F test is the weighted average of the
Precision and Recall.
Chapter 5: Experiments and Evaluation 63
Let 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 = {𝑟1, 𝑟2, . . , 𝑟𝑘) 𝑎𝑛𝑑 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 = {𝑟1, 𝑟2, . . , 𝑟𝑡}, 𝑡 < 𝑘 are the top
𝑘 high quality reviews returned by Amzazon.com and top 𝑡 reviews returned by the
examined method, respectively. In our experiment, we choose k = 30 as the top 30
reviews from Amazon.com. Theses top 30 reviews serve as the ground truth and the
top 10, 15 (t =10, 15) reviews returned from the examined model serve as the examined
review sets.
Precison
Precision at tk of model (𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙) is defined as:
𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 ∩ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
𝑡
Where 𝑡 = {10,15}
Recall
Recall at tk of model (𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙) is defined as:
𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛∩𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
𝑘
Where 𝑘 = 30
F test
F- Measure at tk of model (𝐹@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴) is defined as:
𝐹@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴 =2 ∗ 𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙𝐴
∗ 𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴
𝑃@𝑡𝑘𝑚𝑜𝑑𝑒𝑙𝐴+ 𝑅@𝑡𝑘𝑚𝑜𝑑𝑒𝑙_𝐴
Normalised discounted cumulative gain
Precision and Recall can be used to check whether the return reviews are in
𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛. However, they are unable to examine the position of the reviews in the
list. In fact, the position of the reviews in 𝑇𝑜𝑝𝑡𝑚𝑜𝑑𝑒𝑙is also a factor to decide the
performance of the model. Given the list of top 10 high-quality reviews in descending
quality order by two models, for example, Model A and Model B (Figure 10). Model
A and Model B return six reviews where four of them are in the Ground Truth Review
set. It is clear that the Precision of two models are equal (4/6). Nevertheless, Reviews
Set 1 ranks review 3 higher than review 4, which is the same with the Ground Truth
Review Set. This is opposite to review set 2. Therefore, Model A should have a higher
performance than Model B.
64 Chapter 5: Experiments and Evaluation
Figure 10. Review Position Problem
Discounted cumulative gain (DCG) is a measure of ranking quality and use is very
popular in Information Retrival (Järvelin & Kekäläinen, 2002). DCG is therefore used
to measure the gain of each review by comparing its position in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 with its
position in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 in this thesis. For a review 𝑟𝑖 ∈ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙, the gain of
review 𝑟𝑖 is defined as:
𝐺𝑟𝑖@𝑘 = {
5 ∗ (1 − |𝐼𝑝𝑟𝑖,𝐴𝑚𝑎𝑧𝑜𝑛 − 𝐼𝑝𝑟𝑖,𝑚𝑜𝑑𝑒𝑙|
𝑡) , 𝑖𝑓 𝑟𝑖 ∈ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
0 , 𝑖𝑓 𝑟𝑖 ∉ 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
Where 𝐼𝑝𝑟𝑖,𝐴𝑚𝑎𝑧𝑜𝑛 is the rank position of the review 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛 reviews, and
𝐼𝑟𝑖,𝑚𝑜𝑑𝑒𝑙 is the ranked position of review 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙.
According to the formula, the maximum value of 𝐺𝑟𝑖@𝑘 is 5 when the ranked position
of 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛is the same as the ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙. The value
of 𝐺𝑟𝑖@𝑘 is smaller when the distance of ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛and the
ranked position of 𝑟𝑖 in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is higher. In the worst case where the returned
review is not in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛, then the gain is minimum (=0).
The gain for 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is called discounted cumulative Gain (DCG) and is calculated
by accumlulating the gain of each review in 𝑇𝑜𝑝𝑡𝑚𝑜𝑑𝑒𝑙. It is noted that there is the word
“discounted” because the review that is ranked lower in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙is reduced by an
amount that is logarithmically proportional to the position of the result review.
Discounted cummulative gain of the 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is defined as:
𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 = 𝐷𝐶𝐺𝑟 @𝑘 = ∑2𝐺𝑟𝑖
@𝑘 − 1
log2(𝑖 + 1)
𝑡
𝑖=1
Finally, the Discounted Cumulate Gain is normalized as:
𝑛𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙=
𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
𝐼𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙
Chapter 5: Experiments and Evaluation 65
Where 𝐼𝐷𝐶𝐺𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 is the ideal DCG (IDCG) which is the maximum possible DCG
for all reviews in 𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙. This ideal DCG is obtained when all reviews in
𝑇𝑜𝑝_𝑡𝑚𝑜𝑑𝑒𝑙 have the exact position with the reviews in 𝑇𝑜𝑝_𝑘𝑎𝑚𝑎𝑧𝑜𝑛. The value of
normalized discounted cummulative gain will be calculated for both baseline model
and our proposed model and used to compare and evaluate the performance of our
proposed methods.
5.3 RESULT ANALYSIS AND EVALUATION
In this section, we carry out analysis and evaluate the results obtained from the
experiment. First of all, we evaluate our review selection method (Chapter 4) in section
5.3.1. We then evaluate our method of finding related words of the main features in
section 5.3.2.
Both evaluations require a main feature as the inputs to the model. We used the pattern
mining method (Section 3.1) to extract the list of main features. The table below gives
the list of the top examined features in each dataset, which are used as the inputted
main features to review selection model.
Table 4. Main Features of Seven Datasets
Datasets f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11
CAM1 Body Display Sensor Picture Mode Grip Shutter Zoom Battery Auto Video
CAM2 Body Sensor Menu Software Picture Battery Shutter Iso Autofocus Detail Exposure
CAM3 Viewfinder Button Zoom Aperture Speed Battery Weight Appearance Noise Light Exposure
CAM4 Viewfinder Focus Aperture Noise Light Battery Weight Auto Manual Memory
REST1 Service Cheese Atmosphere Burger Dessert Location Menu Star Staff Wine Time
REST2 Service Place Hour Drink Menu Atmosphere Cheese Price Brunch Sauce Music
REST3 Hour Wait Oven Drink Service Atmosphere Wine Pie Salad Time Onion
66 Chapter 5: Experiments and Evaluation
5.3.1 Review selection Evaluation
In Chapter 4, we proposed our review selection method for a single feature named
Review Selection based on Weighted Relevance (RSWR). In this section, we verify
the performance of RSWR by comparing its ability of selecting helpful reviews with
the specialized review selection method (SRS) proposed by Long, et al. (2014). In
more detail, we use the same input, which is an examined main feature and the set of
its related words, to SRS and RSWR to generate reviews sets. We evaluate RSWR and
SRS based on the top 10 and top 15 reviews in those generated review sets using
different evaluation metrics discussed in Section 5.2.5. Examined features are listed in
Table 4 while the related words to the main features is the set of similar words and
sentiment words, which is discussed in section 3.2.1 and 3.2.2.
5.3.1.1 Helpfulness Score
Table 5 provides the results of average helpfulness scores for eleven examined single
features of dataset CAM1 and Table 6 shows the final average helpfulness scores of
six datasets. In general, the average helpful scores of both the top 10 and top 15 reviews
generated by RSWR are always higher than SRS. Those results prove the improved
performance of our model in selecting helpful reviews for single features. The reason
is that our methods take into consideration both the direct relevant and information
distance (as discussed in Section 4.3.2). The direct relevance indicates the degree of
relevance of the review to the feature while the information distance indicates how far
off the review to the feature is. The review is of more relevance to the feature if the
direct relevance is high and the information distance is low. The SRS method only
takes into consideration the information distance but not the direct relevance.
Therefore, RWSR can better extract the helpful reviews.
Table 5. Helpfulness Score for Main Features of CAM1
Model f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 Average
Top
10
Random 0.573 0.621 0.531 0.832 0.498 0.721 0.560 0.592 0.614 0.549 0.369 0.587
MCG 0.787 0.732 0.669 0.695 0.738 0.689 0.780 0.829 0.715 0.818 0.714 0.742
SRS 0.814 0.831 0.831 0.798 0.835 0.792 0.841 0.846 0.845 0.797 0.812 0.822
RSWR 0.849 0.881 0.814 0.821 0.846 0.814 0.860 0.873 0.875 0.807 0.874 0.847
Random 0.423 0.538 0.401 0.612 0.406 0.553 0.641 0.473 0.699 0.602 0.585 0.539
Chapter 5: Experiments and Evaluation 67
Top
15
MCG 0.717 0.709 0.677 0.689 0.728 0.826 0.714 0.682 0.605 0.719 0.671 0.703
SRS 0.824 0.815 0.815 0.803 0.815 0.788 0.861 0.832 0.801 0.820 0.790 0.815
RSWR 0.860 0.830 0.832 0.820 0.814 0.794 0.871 0.822 0.816 0.821 0.814 0.827
Table 6. Average Helpful Score of seven datasets
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3
Top 10 Random 0.587 0.587 0.568 0.609 0.554 0.462 0.526
MCG 0.742 0.718 0.730 0.729 0.710 0.619 0.732
SRS 0.822 0.843 0.805 0.852 0.712 0.739 0.717
RSWR 0.847 0.865 0.861 0.896 0.734 0.773 0.773
Top 15 Random 0.587 0.543 0.555 0.576 0.502 0.590 0.527
MCG 0.742 0.702 0.683 0.782 0.751 0.715 0.729
SRS 0.822 0.844 0.792 0.857 0.757 0.748 0.747
RSWR 0.827 0.854 0.844 0.882 0.781 0.785 0.788
t-test
In order to confirm the outperformance of RSWR over SRS in selecting helpful
reviews, we further use a t-test (as discussed in section 5.2.4.1) to verify the significant
difference of the average helpfulness score. Table 7 shows the average p-value of each
dataset.
Table 7. Mean Significance Difference t-test
CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3
p-value
(Top 10)
0.00563 0.0005703 0.0007311 0.000493 0.000865 0.0001242 0.0027621
p-value
(Top 15)
0.0007772 0.06091 0.0005713 0.000379 0.0004801 0.000776 0.002323
The p-value in the table clearly shows that most of the p-values obtained from the t-
test are smaller than the significant level (5%) except for the top 15 reviews of
68 Chapter 5: Experiments and Evaluation
dataset CAM2. Although we cannot conclude the significant difference of RSWR
and SRS for dataset CAM2, the value of 0.0691 is not very far from 5%. In general,
the t-test gives a confidence of the improved performance of RSWR over SRS.
5.3.1.2 Amazon Top Ranking
In addition to review‘s helpful scores, we continue to use the Amazon ranking top
reviews as the second evaluation metric to compare our method to the baseline
method. Given a top-N high ranking reviews returned by the Amazon algorithm, we
would like to measure how many reviews in top-N can be selected by RSWR and
baselines. Precision, Recall and F-score of returned review sets were used to
determine the performance of RSWR and baselines. Table 8,
Table 9 and
Table 10 show the average value of Precision, Recall and F1 of the seven datasets.
Table 8. Precision of top-10 and top-15 reviews returned by RSWR and baselines
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 Random 0.064 0.036 0.055 0.055 0.073 0.045 0.071 0.057
MCG 0.129 0.102 0.125 0.112 0.105 0.126 0.178 0.125
SRS 0.127 0.109 0.100 0.136 0.127 0.173 0.162 0.133
RSWR 0.164 0.155 0.200 0.127 0.164 0.100 0.174 0.155
Top 15 Random 0.036 0.073 0.048 0.067 0.073 0.055 0.049 0.057
MCG 0.192 0.121 0.149 0.179 0.138 0.178 0.121 0.154
SRS 0.200 0.188 0.145 0.170 0.127 0.200 0.213 0.178
RSWR 0.230 0.145 0.170 0.224 0.164 0.242 0.221 0.199
Table 9. Recall of top-10 and top-15 reviews returned by RSWR and baselines
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 Random 0.070 0.067 0.042 0.070 0.073 0.039 0.024 0.055
MCG 0.017 0.082 0.089 0.091 0.092 0.078 0.065 0.073
SRS 0.052 0.112 0.073 0.073 0.048 0.082 0.071 0.073
RSWR 0.079 0.079 0.124 0.106 0.112 0.100 0.097 0.100
Chapter 5: Experiments and Evaluation 69
Top 15 Random 0.076 0.073 0.073 0.052 0.067 0.109 0.118 0.081
MCG 0.155 0.079 0.167 0.092 0.089 0.091 0.129 0.115
SRS 0.136 0.100 0.118 0.061 0.112 0.112 0.103 0.106
RSWR 0.212 0.200 0.200 0.188 0.170 0.158 0.147 0.182
Table 10. F-score of top-10 and top-15 reviews returned by RSWR and baselines
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 Random 0.067 0.047 0.048
0.062 0.073 0.042 0.036 0.053
MCG 0.030 0.091 0.104
0.100 0.098 0.096 0.095 0.088
SRS 0.074 0.110 0.084
0.095 0.070 0.111 0.099 0.092
RSWR 0.107 0.105 0.153
0.116 0.133 0.100 0.125 0.120
Top 15 Random 0.049 0.073 0.058
0.059 0.070 0.073 0.069 0.064
MCG 0.172 0.096 0.157
0.122 0.108 0.120 0.125 0.129
SRS 0.162 0.131 0.130
0.090 0.119 0.144 0.139 0.131
RSWR 0.221 0.168 0.184
0.204 0.167 0.191 0.177 0.187
5.3.1.3 Normalized Discounted Cumulative Gain
As discussed in Evaluation Metrics part, normalised discounted cumulative gain takes
position of reviews into consideration, thus it can help to further verify performance
of RSWR. Table 11 demonstrates the improved performance of our RSWR over other
two base line models.
Table 11. Normalized Discounted Cumulative Gain
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 Random 0.019 0.004 0.029 0.014 0.028 0.013 0.011 0.017
MCG 0.011 0.175 0.324 0.036 0.121 0.104 0.115 0.127
SRS 0.167 0.165 0.239 0.089 0.219 0.214 0.134 0.175
RSWR 0.287 0.167 0.378 0.134 0.318 0.276 0.175 0.248
Top 15 Random 0.0206 0.0039 0.0221 0.0184 0.0064 0.015 0.0021 0.013
MCG 0.117 0.104 0.189 0.752 0.190 0.189 0.110 0.236
70 Chapter 5: Experiments and Evaluation
SRS 0.286 0.298 0.205 0.1 0.178 0.176 0.173 0.202
RSWR 0.295 0.269 0.288 0.167 0.298 0.204 0.281 0.257
In general, according to the results of helpfulness scores and Amazon top ranking, our
proposed review selection always have a higher results than the baseline models. This
clearly proves the outperformance of RSWR.
5.3.2 Related Word Selection Evaluation
In Chapter 3, we propose a new method to identify the related words of the main
feature. The correct identification of the related words is important as they assist in
making the features more understandable. Wrong identification of those related words
can provide wrong information about the target features of a product. In this section,
we evaluate our proposed related word selection, named WSRWS, TRWS and
PTRWS. First, WSRWS, TRWS and PTRWS are used to generate different sets of
related words. Those related word sets are then inputted to our proposed review
selection method (RSWR) to generate different corresponding sets or reviews. Similar
to evaluation of our review selection method in 5.3.1, we evaluate WSRWS, TRWS
and PTRWS based on the top 10 and top 15 reviews of those generated sets of reviews
by using different evaluation metrics, including Helpfulness Score, Amazon Top
Ranking and Normalized Discounted Cumulative Gain.
5.3.2.1 Helpfulness Score
Table 12 provides the detailed results of average helpfulness scores for one dataset
(CAM1) while Table 13 summaries the average helpfulness scores for six datasets.
According to the results, TRWS and PTRWS have a higher helpfulness score than
WSRWS. This confirms the usefulness of incorporating a probabilistic topic model
into the task of related word identification. First of all, Topic Model is domain specific,
which help to identify related feature words that are buried in the review collection.
Those related features cannot be found by external ontology such as WordNet and
Google Distance or by other supervised methods. Secondly, words in each topic
reflecting one aspect of the product have a tight relationships with each other. Because
of these relationships, related words to the feature can be further confirmed. In
addition, the combination of using WordNet and Topic Model can help to find a set of
shared related words. As those shared related words can be found by different methods,
the degree of relatedness of the words in the shared group to the main feature is further
Chapter 5: Experiments and Evaluation 71
confirmed. The update of higher weights given to shared related words can increase
the importance of those related words to the main feature (more detailed in Section
3.2.5). As a result, our method can not only find related words but also discover the
corresponding weight of those related word correctly. Among the three methods, the
highest helpfulness score of PTRWS also confirms the usefulness of Pattern based
Topic Model. As patterns can better represent the semantic meaning than single words,
patterns in the Pattern based Topic Model can assist in more effective identifying
related words.
Table 12. Helpfulness Score for Main Features of CAM1
Model f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 Average
Top
10
WSRWS 0.849 0.881 0.814 0.821 0.846 0.814 0.860 0.873 0.875 0.807 0.874 0.841
TRWS 0.868 0.905 0.834 0.821 0.872 0.863 0.868 0.884 0.807 0.874 0.868 0.862
PTRWS 0.878 0.939 0.877 0.845 0.886 0.879 0.879 0.897 0.812 0.871 0.878 0.883
Top
15
WSRWS 0.860 0.830 0.832 0.820 0.814 0.794 0.871 0.822 0.816 0.821 0.814 0.832
TRWS 0.871 0.860 0.862 0.824 0.834 0.873 0.876 0.845 0.821 0.844 0.871 0.857
PTRWS 0.876 0.874 0.883 0.850 0.844 0.842 0.874 0.888 0.866 0.839 0.884 0.863
Table 13. Average Helpful Score of seven Datasets
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 WSRWS 0.847 0.865 0.861 0.896 0.734 0.773 0.771 0.821
TRWS 0.860 0.863 0.883 0.916 0.769 0.801 0.792 0.841
PTRWS 0.876 0.894 0.895 0.931 0.782 0.818 0.801 0.857
Top 15 WSRWS 0.827 0.854 0.844 0.882 0.781 0.785 0.712 0.812
TRWS 0.853 0.863 0.862 0.894 0.805 0.796 0.774 0.835
PTRWS 0.865 0.882 0.871 0.921 0.811 0.823 0.810 0.855
t-test
The t-test is also used to compare the significance difference in the means value of two
sets of reviews generated. More specifically, we compare the significant difference
between the review sets generated by WSRWS and TRWS, and significant difference
72 Chapter 5: Experiments and Evaluation
between the review sets generated by WSRWS and PTRWS. Table 14 shows the p-
value of the t-test for six datasets. In general, most of p-values have the value of less
than significant level (5%) except for CAM3. Although we cannot conclude the
significant difference in the average value of WSRWS and TRWS for dataset CAM3,
the result of the five remaining datasets can still prove the significant difference in
most cases. The performance of TRWS and PTRWS over WSRWS in term of
helpfulness score are evidenced.
Table 14. Mean Significance Difference t-test
p-value Models CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3
p-value
(Top 10)
WSRWS
and
TRWS
0.0008953 0.0019027 0.4893854 0.0092010 0.0007412 0.0003682 0.0020888
WSRWS
and
PTRWS
0.0000752 0.0004558 0.0065319 0.0002363 0.0003490 0.0000731 0.003002
p-value
(Top 15)
WSRWS
and
TRWS
0.0001960 0.0007283 0.0141589 0.0487490 0.0001530 0.0008521 0.007183
WSRWS
and
PTRWS
0.0004969 0.0003715 0.0042177 0.0074329 0.000031 0.000027 0.0062827
5.3.2.2 Amazon Top Ranking
Similar to review selection evaluation, we use Amazon Top Ranking from
Amazon.com to further verify the performance of our proposed related word selection
methods. Table 15,
Table 16, Table 17 show the average results of Precision, Recall and F-score for the
six datasets for top 10 and top 15 reviews. According to the results, PTRWS
outperforms the two remaining proposed methods and TRWS has a higher
performance than WSRWS in most cases. This again confirms the outperformance of
our methods in term of Amazon Top ranking.
Table 15. Precision of top-10 and top-15 returned reviews
Chapter 5: Experiments and Evaluation 73
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 WSRWS 0.164 0.155 0.200 0.127 0.164 0.100 0.132 0.149
TRWS 0.282 0.255 0.245 0.264 0.218 0.227 0.219 0.244
PTRWS 0.318 0.291 0.336 0.400 0.309 0.364 0.297 0.331
Top 15 WSRWS 0.230 0.145 0.170 0.224 0.164 0.242 0.216 0.199
TRWS 0.248 0.248 0.261 0.224 0.327 0.285 0.276 0.267
PTRWS 0.382 0.339 0.345 0.352 0.285 0.352 0.314 0.338
Table 16. Recall of top-10 and top-15 returned reviews
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 WSRWS 0.079 0.079 0.124 0.106 0.112 0.100 0.124 0.103
TRWS 0.088 0.097 0.118 0.100 0.121 0.088 0.167 0.111
PTRWS 0.142 0.161 0.133 0.115 0.091 0.158 0.177 0.140
Top 15 WSRWS 0.133 0.121 0.103 0.124 0.115 0.133 0.121 0.121
TRWS 0.188 0.112 0.145 0.139 0.136 0.115 0.156 0.142
PTRWS 0.155 0.209 0.191 0.197 0.167 0.218 0.194 0.190
Table 17. F-score of top-10 and top-15 returned reviews
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 WSRWS 0.107 0.105 0.153 0.116 0.133 0.100 0.128 0.120
TRWS 0.134 0.141 0.159 0.145 0.156 0.127 0.189 0.150
PTRWS 0.196 0.207 0.191 0.179 0.141 0.220 0.222 0.194
Top 15 WSRWS 0.169 0.132 0.128 0.160 0.135 0.172 0.155 0.150
TRWS 0.214 0.154 0.186 0.172 0.192 0.164 0.199 0.183
PTRWS 0.221 0.259 0.246 0.253 0.211 0.269 0.240 0.242
74 Chapter 5: Experiments and Evaluation
5.3.2.3 Normalized Discounted Cumulative Gain
Finally, normalised discounted cumulative gain obtained from Table 18. Normalized
Discounted Cumulative Gain8 reaffirms that TRWS and PTRWS outperform
WSRWS, where PTRWS has the best performance.
Table 18. Normalized Discounted Cumulative Gain
Model CAM1 CAM2 CAM3 CAM4 REST1 REST2 REST3 Average
Top 10 WSRWS 0.287 0.167 0.378 0.134 0.318 0.276 0.221 0.254
TRWS 0.295 0.179 0.372 0.169 0.341 0.293 0.227 0.268
PTRWS 0.297 0.252 0.381 0.185 0.319 0.314 0.254 0.286
Top 15 WSRWS 0.295 0.269 0.288 0.167 0.298 0.204 0.218 0.248
TRWS 0.298 0.286 0.312 0.212 0.326 0.211 0.220 0.266
PTRWS 0.384 0.373 0.367 0.283 0.372 0.237 0.256 0.325
In general, according to the results of helpfulness scores, Amazon top ranking and
normalized discounted cumulative gain, our proposed review selection always have
higher results than the baseline models. This clearly proves the outperformance of our
proposed review selection method.
5.4 SUMMARY
In this chapter, we have designed a number of experiments to evaluate our proposed
methods, in Chapter 3 and Chapter 4. The evaluation is carried out in two parts: review
selection evaluation and related words selection. In the first evaluation, we compare
the performance of our review selection method (RSWR) with the specialised review
selection method (SRS) proposed by Long, et al. (2014). The results clearly show the
outstanding performance of RSWR over SRS in term of Helpfulness Score and
Amazon Top Ranking. Further evaluation with normalized discounted cumulative gain
reaffirms that our review selection method is more effective in identifying helpful
reviews according to a single feature. In the second evaluation, we evaluate our three
related word selection methods, WSRWS, TRWS and PTRWS. Similarly, we compare
Chapter 5: Experiments and Evaluation 75
the performance of those methods using Helpfulness Score, Amazon top ranking and
normalized discounted cumulative. The results indicate that PTRWS produces the
most accurate helpful review sets, followed by TRWS. This clearly proves the power
of incorporating Topic Model and Pattern based Topic Model in the proposed method.
76 Chapter 6: Conclusions
Chapter 6: Conclusions
In this chapter we list the achievements and limitations of this study. Potential research
works which can be done for the future are also proposed.
6.1 CONCLUSION
Online reviews have become an invaluable source for customer‘s reference in recent
times. However, information overloading in review content is a big issue to readers.
The research area of review selection has been facing two research problems:
ambiguity in content reviews and the need for work on review selection according to
single feature. In this thesis, we proposed methods to solve those research problems
and gain two primary research achievements.
First of all, the new methods are employed to reduce the ambiguity of review
content by identifying features and related words of the features. By using data
mining techniques, natural language processing, ontology and probabilistic
Topic Model, this work can effectively identify related words to the main
features of the product. As a result, polysemy and synonym issues on review
content can be significant reduced. Our experiments in Section 5.3.2 verify the
outperformance of our related word selection methods.
Secondly, we propose a new method of selecting reviews for a single feature.
As customers have different background context and situations, the importance
of each feature to them are also different. They normally expect to know about
features that are more necessary for them than other features of the product.
However, most previous research works do not focus on selecting helpful
reviews that intensively discuss one single feature. In this research, we propose
to apply information distance and direct relevance of related words in order to
identify helpful reviews for a single feature. This was discussed in detail in
Chapter 4. Our experiment in Section 5.3.1 verifies the outperformance of our
review selection method.
Chapter 6: Conclusions 77
6.2 LIMITATIONS
There are two limitations in this study.
The related word selection methods use Topic Model and Pattern based Topic
Model to discover related words of the target feature. Topic model requires a
sufficient amount of reviews in the review corpus in order to work well.
Therefore, datasets having a small number of reviews or sparse datasets are not
applicable for our proposed methods.
The review selection method only focuses on selecting reviews for a single
feature. However, some people may be interested in a group of features of a
product, or all of the features. Therefore, the review selection method should
be improved to deal with multiple features at the same time.
6.3 FUTURE WORK
Probabilistic topic model is employed to discover related words to the main
feature in this study. However, we did not analyse the detailed relationship
among those related words within each topic. Understanding relationships
among related words and relationships of each related word to the target feature
surely provides more insight about the task of identifying related words. In
addition, according to many studies about Topic Model (Chang et al.),
incoherent topics do exist. Therefore, the need of topic interpretation and
evaluation before use should be considered in future work.
As mentioned in the limitation section, selecting reviews for a group of features
should be incorporated into our proposed review selection method. In our
study, associated related words of a single main feature can be identified as
represented in Chapter 3. Therefore, related words of multiple main features
can be combined in order to obtain overall related words. The final multiple
main features and their associated related words can be used as the input to our
review selection model in order to generate reviews which discussing about
those multiple features. This can help to extend our review selection not only
for a single feature but also for multiple features.
REFERENCES 79
REFERENCES
AlSumait, L., Barbará, D., Gentle, J., & Domeniconi, C. (2009). Topic significance
ranking of LDA generative models. In Machine Learning and Knowledge
Discovery in Databases (pp. 67-82): Springer.
Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating domain knowledge
into topic modeling via Dirichlet forest priors. In Proceedings of the 26th
Annual International Conference on Machine Learning (pp. 25-32): ACM.
Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising the WordNet
domains hierarchy: semantics, coverage and balancing. In Proceedings of the
Workshop on Multilingual Linguistic Ressources (pp. 101-108): Association
for Computational Linguistics.
Blei, D. M., & McAuliffe, J. D. (2007). Supervised Topic Models. Advances in
Neural Information Processing Systems (NIPS).
Blei, D. M., & McAuliffe, J. D. (2008). Supervised Topic Models. Advances in
Neural Information Processing Systems (NIPS).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003a). Latent dirichlet allocation. the
Journal of machine Learning research, 3, 993-1022.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003b). Latent Dirichlet Allocation. Journal
of Machine Learning Research 3, 993-1022.
Brody, S., & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online
reviews. In Human Language Technologies: The 2010 Annual Conference of
the North American Chapter of the Association for Computational Linguistics
(pp. 804-812): Association for Computational Linguistics.
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical
semantic relatedness. Computational Linguistics, 32(1), 13-47.
Chan, L. M. (1995). Library of Congress subject headings: principles and
application: ERIC.
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading
tea leaves: How humans interpret topic models. In Advances in neural
information processing systems (pp. 288-296).
80 REFERENCES
Chen, C. C., & Tseng, Y.-D. (2011). Quality evaluation of product reviews using an
information quality framework. Decision Support Systems, 50(4), 755-768.
Cilibrasi, R. L., & Vitanyi, P. (2007). The google similarity distance. Knowledge and
Data Engineering, IEEE Transactions on, 19(3), 370-383.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R.
(1990). Indexing by Latent Semantic Analysis. Journal of the American
Society for Information Science, 41(6), 391-407.
Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online
product reviews in forecasting sales: The case of motion pictures. Journal of
Interactive marketing, 21(4), 23-45.
Fellbaum, C. (2010). WordNet. In Theory and applications of ontology: computer
applications (pp. 231-243): Springer.
Fischer, K. S. (2005). Critical views of LCSH, 1990–2001: The third bibliographic
essay. Cataloging & classification quarterly, 41(1), 63-109.
Gangemi, A., Guarino, N., & Oltramari, A. (2001). Conceptual analysis of lexical
taxonomies: The case of WordNet top-level. In Proceedings of the
international conference on Formal Ontology in Information Systems-Volume
2001 (pp. 285-296): ACM.
Gao, Y., Xu, Y., & Li, Y. (2013). Pattern based Topic Models for Information
Filtering. In IEEE 13th International Conference on Data Mining Workshops
(pp. 921-928): IEEE.
Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact
of product reviews: Mining text and reviewer characteristics. Knowledge and
Data Engineering, IEEE Transactions on, 23(10), 1498-1512.
Griffiths, T. L., Mark, S., David, M. B., & Joshua B., T. (2005). Integrating topics
and syntax. Advances in Neural Information Processing Systems, 17, 537–
544. Retrieved from
Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating
topics and syntax. In Advances in neural information processing systems (pp.
537-544).
REFERENCES 81
Gruber, T. R. (1995). Toward principles for the design of ontologies used for
knowledge sharing? International journal of human-computer studies, 43(5),
907-928.
Hofmann, T. (1999, 1999). Probabilistic latent semantic indexing. In 22nd Annual
international conference on research and development in information
retrieval (pp. 50-57): ACM.
Hong, Y., Lu, J., Yao, J., Zhu, Q., & Zhou, G. (2012). What Reviews are
Satisfactory: Novel Features for Automatic Helpfulness Voting. In SIGIR
conference on research and development in information retrieval (pp. 495 -
504): ACM.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In
Proceedings of the tenth ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 168-177): ACM.
Hung, C., Wermter, S., & Smith, P. (2004). Hybrid neural document clustering using
guided self-organization and WordNet. Intelligent Systems, IEEE, 19(2), 68-
77.
Kim, S.-M., & Hovy, E. (2004). Determining the sentiment of opinions. In
Proceedings of the 20th international conference on Computational
Linguistics (pp. 1367): Association for Computational Linguistics.
Kim, S.-M., Pantel, P., Chklovsk, T., & Pennacchiotti, M. (2006). Automatically
Assessing Review Helpfulness. In Association for Computational Linguistics
(pp. 423-430).
Krestel, R., & Dokoohaki, N. (2011). Diversifying Product Review Rankings:
Getting the Full Picture. In IEEE/WIC/ACM International Conferences on
Web Intelligence and Intelligent Agent Technology (pp. 138 - 145): ACM.
Lakkaraju, H., Bhattacharyya, C., Bhattacharya, I., & Merugu, S. (2011). Exploiting
Coherence for the Simultaneous Discovery of Latent Facets and associated
Sentiments. In SDM (pp. 498-509): SIAM.
Lappas, T., Crovella, M., & Terzi, E. (2012). Selecting a characteristic set of
reviews. In 18th SIGKDD international conference on knowledge discovery
and data mining: ACM.
82 REFERENCES
Lau, R. Y., Lai, C. C., Ma, J., & Li, Y. (2009). Automatic domain ontology
extraction for context-sensitive opinion mining. ICIS 2009 Proceedings, 35-
53.
Liang, H., Xu, Y., Li, Y., & Nayak, R. (2009). Tag based collaborative filtering for
recommender systems. In Rough Sets and Knowledge Technology (pp. 666-
673): Springer.
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In
Proceedings of the 18th ACM conference on Information and knowledge
management (pp. 375-384): ACM.
Liu, B. (2010). Sentiment analysis: A multi-faceted problem. IEEE Intelligent
Systems, 25(3), 76-80.
Liu, B. (2012). Sentiment Analysis and Opinion Mining
Liu, J., Yunbo, C., Chin-Yew, L., Yalou, H., & Ming, Z. (2007). Low-quality
product review detection in opinion summarisation. In Association for
Computational Linguistics (pp. 334-342).
Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document
retrieval via utilizing WordNet and recognizing phrases. In Proceedings of
the 27th annual international ACM SIGIR conference on Research and
development in information retrieval (pp. 266-272): ACM.
Long, C., Zhang, J., Huang, M., Zhu, X., Li, M., & Ma, B. (2014). Estimating feature
ratings through an effective review selection approach. Knowledge and
Information Systems, 38(2), 419-446.
Lu, Y., Zhai, C. X., & Sundaresan, N. (2009). Rated aspect summarization of short
comments. In (pp. 131-140).
Ma, Z., Pant, G., & Sheng, O. R. L. (2007). Interest-based personalized search. ACM
Transactions on Information Systems (TOIS), 25(1), 5.
Manna, S., & Mendis, B. S. U. (2010). Fuzzy word similarity: a semantic approach
using WordNet. In Fuzzy Systems (FUZZ), 2010 IEEE International
Conference on (pp. 1-8): IEEE.
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007, 2007). Topic sentiment
mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th
international conference on world wide web (pp. 171-180): ACM.
REFERENCES 83
Mei, Q., Shen, X., & Zhai, C. (2007). Automatic labeling of multinomial topic
models. In Proceedings of the 13th ACM SIGKDD international conference
on Knowledge discovery and data mining (pp. 490-499): ACM.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990).
Introduction to WordNet: An on-line lexical database*. International journal
of lexicography, 3(4), 235-244.
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011).
Optimizing semantic coherence in topic models. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing (pp. 262-
272): Association for Computational Linguistics.
Minka, T., & Lafferty, J. (2002). Expectation-propagation for the generative aspect
model. In Proceedings of the Eighteenth conference on Uncertainty in
artificial intelligence (pp. 352-359): Morgan Kaufmann Publishers Inc.
Misra, H., Cappé, O., & Yvon, F. (2008). Using LDA to detect semantically
incoherent documents. In Proceedings of the Twelfth Conference on
Computational Natural Language Learning (pp. 41-48): Association for
Computational Linguistics.
Missen, M. M. S., Boughanem, M., & Cabanac, G. (2009, 2009). Challenges for
Sentence Level Opinion Detection in Blogs. In (pp. 347-351).
Moghaddam, S., & Ester, M. (2011). ILDA: interdependent LDA model for learning
latent aspects and their ratings from online product reviews. In 34th
international ACM SIGIR conference on research and development in
information retrieval (pp. 665 - 674): ACM.
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of
customer reviews on Amazon. com. MIS quarterly, 34(1), 185-200.
Mukherjee, A., & Bing, L. (2012). Aspect Extraction through SemiSupervised
Modeling. In Proceedings of 50th Anunal Meeting of Association for
Computational Linguistics (pp. 339-348).
Navigli, R., Velardi, P., & Gangemi, A. (2003). Ontology learning and its application
to automated terminology translation. Intelligent Systems, IEEE, 18(1), 22-31.
Newman, D., Karimi, S., & Cavedon, L. (2009). External evaluation of topic models.
In in Australasian Doc. Comp. Symp., 2009: Citeseer.
84 REFERENCES
Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of
topic coherence. In Human Language Technologies: The 2010 Annual
Conference of the North American Chapter of the Association for
Computational Linguistics (pp. 100-108): Association for Computational
Linguistics.
Noy, N. F. (2004). Semantic integration: a survey of ontology-based approaches.
ACM Sigmod Record, 33(4), 65-70.
O'Mahony, M. P., & Smyth, B. (2009). Learning to recommend helpful hotel
reviews. In Proceedings of the third ACM conference on Recommender
systems (pp. 305-308): ACM.
Ockerbloom, J. M. (2006). New maps of the library: Building better subject
discovery tools using Library of Congress Subject Headings.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and
trends in information retrieval, 2(1-2), 1-135.
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity:
measuring the relatedness of concepts. In Demonstration papers at HLT-
NAACL 2004 (pp. 38-41): Association for Computational Linguistics.
Sabou, M., Lopez, V., Motta, E., & Uren, V. (2006). Ontology selection: Ontology
evaluation on the real semantic web.
Sowa, J. F. (2001). Building, sharing, and merging ontologies. web site: http://www.
jfsowa. com/ontology/ontoshar. htm.
Steyvers, M., & Griffiths, T. (Singer-songwriters). (2006). Probabilistic topic
models. Latent Semantic Analysis: A Road to Meaning. T. Landauer, D.
McNamara, S. Dennis, and W. Kintsch, eds. On: Lawrence Erlbaum.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent
semantic analysis, 427(7), 424-440.
Stojanovic, N. (2005). On the query refinement in the ontology-based searching for
information. Information Systems, 30(7), 543-563.
REFERENCES 85
Tao, X., Li, Y., Lau, R. Y., & Wang, H. (2012). Unsupervised multi-label text
classification using a world knowledge ontology. In Advances in Knowledge
Discovery and Data Mining (pp. 480-492): Springer.
Teh, Y. W., Newman, D., & Welling, M. (2006). A collapsed variational Bayesian
inference algorithm for latent Dirichlet allocation. In Advances in neural
information processing systems (pp. 1353-1360).
Tian, N., Xu, Y., & Li, Y. (2014). A review selection method using product feature
taxonomy. In 15th International Conference on Web Information Systems
Engineering, WISE 2014 (pp. 408-417): Springer.
Titov, I., & McDonald, R. (2008). Modeling online reviews with multi-grain topic
models. In Proceedings of the 17th international conference on World Wide
Web (pp. 111-120): ACM.
Tsaparas, P., Ntoulas, A., & Terzi, E. (2011). Selecting a comprehensive set of
reviews. In 17th SIGKDD nternational conference on knowledge discovery
and data mining (pp. 168 - 176): ACM.
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In
SIGIR’94 (pp. 61-69): Springer.
Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation
methods for topic models. In Proceedings of the 26th Annual International
Conference on Machine Learning (pp. 1105-1112): ACM.
Zhang, L., & Liu, B. (2011). Identifying noun product features that imply opinions.
In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies: short papers-
Volume 2 (pp. 575-580): Association for Computational Linguistics.
Zhang, Y., & Zhang, D. (2014, 13-15 Aug. 2014). Automatically predicting the
helpfulness of online reviews. In Information Reuse and Integration (IRI),
2014 IEEE 15th International Conference on (pp. 662-668).
Zhao, W. X., Jiang, J., Yan, H., & Li, X. (2010). Jointly modeling aspects and
opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference
on Empirical Methods in Natural Language Processing (pp. 56-65):
Association for Computational Linguistics.
Zhao, W. X., Jing, J., Hongfei, Y., & Xiaoming, L. (2010). Jointly modeling aspects
and opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010
86 REFERENCES
Conference on Empirical Methods in Natural Language Processing (pp. 56-
65).
Baeza-Yates, R., & Navarro, G. (1998). Fast approximate string matching in a dictionary. In String Processing and Information Retrieval: A South American Symposium, 1998. Proceedings (pp. 14-22): IEEE.
Blei, D. M., & McAuliffe, J. D. (2008). Supervised Topic Models. Advances in Neural
Information Processing Systems (NIPS). Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003a). Latent Dirichlet Allocation. Journal of
Machine Learning Research 3, 993-1022. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003b). Latent dirichlet allocation. the Journal
of machine Learning research, 3, 993-1022. Brody, S., & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online
reviews. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 804-812): Association for Computational Linguistics.
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea
leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296).
Chen, C. C., & Tseng, Y.-D. (2011). Quality evaluation of product reviews using an
information quality framework. Decision Support Systems, 50(4), 755-768. Cilibrasi, R. L., & Vitanyi, P. (2007). The google similarity distance. Knowledge and
Data Engineering, IEEE Transactions on, 19(3), 370-383. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion
extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (pp. 519-528): ACM.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990).
Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391-407.
Gao, Y., Xu, Y., & Li, Y. (2013). Pattern-Based Topic Models for Information Filtering.
In IEEE 13th International Conference on Data Mining Workshops (pp. 921-928): IEEE.
Ghose, A., & Ipeirotis, P. G. (2006). Designing ranking systems for consumer
reviews: The impact of review subjectivity on product sales and review
REFERENCES 87
quality. In Proceedings of the 16th annual workshop on information technology and systems (pp. 303-310).
Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact
of product reviews: Mining text and reviewer characteristics. Knowledge and Data Engineering, IEEE Transactions on, 23(10), 1498-1512.
Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating
topics and syntax. In Advances in neural information processing systems (pp. 537-544).
Grünwald, P. D., & Vitányi, P. M. (2003). Kolmogorov complexity and information
theory. With an interpretation in terms of questions and answers. Journal of Logic, Language and Information, 12(4), 497-529.
Hoang, L., Lee, J.-T., Song, Y.-I., & Rim, H.-C. (2008). A model for evaluating the
quality of user-created documents. In Asia Information Retrieval Symposium (pp. 496-501): Springer.
Hofmann, T. (1999, 1999). Probabilistic latent semantic indexing. In 22nd Annual
international conference on research and development in information retrieval (pp. 50-57): ACM.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings
of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177): ACM.
Hung, C., Wermter, S., & Smith, P. (2004). Hybrid neural document clustering using
guided self-organization and wordnet. Intelligent Systems, IEEE, 19(2), 68-77. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR
techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.
Kim, S.-M., Pantel, P., Chklovsk, T., & Pennacchiotti, M. (2006). Automatically
Assessing Review Helpfulness. In Association for Computational Linguistics (pp. 423-430).
Korfiatis, N., GarcíA-Bariocanal, E., & Sánchez-Alonso, S. (2012). Evaluating content
quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content. Electronic Commerce Research and Applications, 11(3), 205-217.
Krestel, R., & Dokoohaki, N. (2011). Diversifying Product Review Rankings: Getting
the Full Picture. In IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (pp. 138 - 145): ACM.
88 REFERENCES
Lakkaraju, H., Bhattacharyya, C., Bhattacharya, I., & Merugu, S. (2011). Exploiting Coherence for the Simultaneous Discovery of Latent Facets and associated Sentiments. In SDM (pp. 498-509): SIAM.
Lappas, T., Crovella, M., & Terzi, E. (2012). Selecting a characteristic set of reviews.
In 18th SIGKDD international conference on knowledge discovery and data mining: ACM.
Li, M., & Vitányi, P. (2013). An introduction to Kolmogorov complexity and its
applications: Springer Science & Business Media. Liao, J., Mendis, B., & Manna, S. (2010). Improving hierarchical document signature
performance by classifier combination. Neural Information Processing. Theory and Algorithms, 695-702.
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In
Proceedings of the 18th ACM conference on Information and knowledge management (pp. 375-384): ACM.
Lin, D. (1998). An information-theoretic definition of similarity. In ICML (Vol. 98, pp.
296-304): Citeseer. Liu, B. (2010). Sentiment analysis: A multi-faceted problem. IEEE Intelligent Systems,
25(3), 76-80. Liu, B. (2012). Sentiment Analysis and Opinion Mining Liu, J., Yunbo, C., Chin-Yew, L., Yalou, H., & Ming, Z. (2007). Low-quality product
review detection in opinion summarisation. In Association for Computational Linguistics (pp. 334-342).
Liu, Y., Huang, X., An, A., & Yu, X. (2008). Modeling and predicting the helpfulness of
online reviews. In Data mining, 2008. ICDM'08. Eighth IEEE international conference on (pp. 443-452): IEEE.
Long, C., Zhang, J., Huang, M., Zhu, X., Li, M., & Ma, B. (2014). Estimating feature
ratings through an effective review selection approach. Knowledge and Information Systems, 38(2), 419-446.
Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). Exploiting social context for
review quality prediction. In Proceedings of the 19th international conference on World wide web (pp. 691-700): ACM.
Lu, Y., Zhai, C. X., & Sundaresan, N. (2009). Rated aspect summarization of short
comments. In (pp. 131-140).
REFERENCES 89
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999): MIT Press.
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007, 2007). Topic sentiment
mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on world wide web (pp. 171-180): ACM.
Miller, G., & Fellbaum, C. (Singer-songwriters). (1998). Wordnet: An electronic
lexical database. On: MIT Press Cambridge. Ming, L., & Vitányi, P. (1997). An introduction to Kolmogorov complexity and its
applications: Springer Heidelberg. Minka, T., & Lafferty, J. (2002). Expectation-propagation for the generative aspect
model. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (pp. 352-359): Morgan Kaufmann Publishers Inc.
Missen, M. M. S., Boughanem, M., & Cabanac, G. (2009, 2009). Challenges for
Sentence Level Opinion Detection in Blogs. In (pp. 347-351). Moghaddam, S., & Ester, M. (2011). ILDA: interdependent LDA model for learning
latent aspects and their ratings from online product reviews. In 34th international ACM SIGIR conference on research and development in information retrieval (pp. 665 - 674): ACM.
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of
customer reviews on Amazon. com. Mukherjee, A., & Bing, L. (2012). Aspect Extraction through SemiSupervised
Modeling. In Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (pp. 339-348).
O'Mahony, M. P., & Smyth, B. (2009). Learning to recommend helpful hotel reviews.
In Proceedings of the third ACM conference on Recommender systems (pp. 305-308): ACM.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and
trends in information retrieval, 2(1-2), 1-135. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification
using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86): Association for Computational Linguistics.
Popescu, A.-M., & Etzioni, O. (2007). Extracting Product Features and Opinions from
Reviews. In (pp. 9-28). London: Springer London.
90 REFERENCES
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.
Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H., & Jin, C. (2007, 2007). Red
Opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce (pp. 182-191): ACM.
Siering, M., & Muntermann, J. (2013). What Drives the Helpfulness of Online
Product Reviews? From Stars to Facts and Emotions. In Wirtschaftsinformatik (Vol. 7).
Steyvers, M., & Griffiths, T. (Singer-songwriters). (2006). Probabilistic topic models.
Latent Semantic Analysis: A Road to Meaning. T. Landauer, D. McNamara, S. Dennis, and W. Kintsch, eds. On: Lawrence Erlbaum.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent
semantic analysis, 427(7), 424-440. Teh, Y. W., Newman, D., & Welling, M. (2006). A collapsed variational Bayesian
inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (pp. 1353-1360).
Tian, N., Xu, Y., & Li, Y. (2014). A review selection method using product feature
taxonomy. In 15th International Conference on Web Information Systems Engineering, WISE 2014 (pp. 408-417): Springer.
Titov, I., & McDonald, R. (2008). Modeling online reviews with multi-grain topic
models. In Proceedings of the 17th international conference on World Wide Web (pp. 111-120): ACM.
Tsaparas, P., Ntoulas, A., & Terzi, E. (2011). Selecting a comprehensive set of
reviews. In 17th SIGKDD nternational conference on knowledge discovery and data mining (pp. 168 - 176): ACM.
Wang, D., Zhu, S., & Li, T. (2013). SumView: A Web-based engine for summarizing
product reviews and customer opinions. Expert Systems with Applications, 40(1), 27-33.
Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room
sales. International Journal of Hospitality Management, 28(1), 180-182. Zhang, Y., & Zhang, D. (2014, 13-15 Aug. 2014). Automatically predicting the
helpfulness of online reviews. In Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on (pp. 662-668).
REFERENCES 91
Zhang, Z., & Varadarajan, B. (2006). Utility scoring of product reviews. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 51-57): ACM.
Zhao, W. X., Jiang, J., Yan, H., & Li, X. (2010). Jointly modeling aspects and opinions
with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 56-65): Association for Computational Linguistics.
Zhao, W. X., Jing, J., Hongfei, Y., & Xiaoming, L. (2010). Jointly modeling aspects and
opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 56-65).
REFERENCES 93