Fyp ca2
description
Transcript of Fyp ca2
![Page 1: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/1.jpg)
MINING USER’S OPINIONS ON
HOTELS
![Page 2: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/2.jpg)
BRIEF RECAP ON CA1
![Page 3: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/3.jpg)
Literature Review / Background
Web is a huge database of opinions on hotels
Commercial Possibilities / Business Intelligence
“What others think” is an important element in decision making
Opinion Mining / Sentiment Analysis
![Page 4: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/4.jpg)
Far From a Solved Problem
Impossible for human read every single opinions Machines can be trained to do this
People always express more than one opinion
Use of Sarcasm and Negation
Expression of sentiments in different topic and domain eg big: Positive when swimming pool is big enough
to swim, Negative when the queue is long
![Page 5: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/5.jpg)
How to train a machine to analyze sentiments
Natural Language Processing (NLP) Transform opinion to a format the machine
understand
Artificial Intelligence Machine are able to use information given by NLP
and a lot of math to analyze sentiments Make the machine determine what is facts and
opinions like how a normal human understand them by reading
![Page 6: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/6.jpg)
Problems of Machine
Subjectivity and Sentiment
Analyze polarity
Opinion rating
Sentiment intensity
Different domains / topic context
Facts Vs Opinion
![Page 7: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/7.jpg)
Ambiguity to machine examples
“The swimming pool is better than the tennis court”. Comparisons are hard to classify
“This hotel is very boleh lah” Use of Slang and cultural communication
“This breakfast is as good as none” Negativity not obvious to machine
“The weather is hot” In different context, the statement has different
polarity
![Page 8: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/8.jpg)
WHAT IS DONE IN CA1
![Page 9: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/9.jpg)
EXTRACTION – Preparing machine to analyze data
![Page 10: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/10.jpg)
Review and aspects extraction process
Extract important datasets from review websites
Word handling to refine datasets
Use part of speech tagging to label text to extract aspects which are nouns
Determine aspects / features that people are concerned about from these reviews by occurrence and context
![Page 11: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/11.jpg)
Part of Speech Tagging
Assigning a label to every word in the text to allow machine to do something with it
![Page 12: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/12.jpg)
Word Handling
Dictionary / Spelling Correction
Slang Check
Foreign language check
Singular / Plural conversion
Duplicate check
![Page 13: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/13.jpg)
END OF CA1
![Page 14: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/14.jpg)
CA2 : Data Processing
![Page 15: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/15.jpg)
Classifying Sentiments using some existing methods
Naïve Bayes To determine polarity of sentiments
Maximum Entropy Using probability distributions on the basis of partial
knowledge
Support Vector machine Analyze patterns and classify sentiments
![Page 16: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/16.jpg)
Naïve Bayes Classifier
To determine polarity of sentiments
P(X | Y) = P(X)P(Y | X) / P(Y)
Probability that a sentiments is positive or negative, given it's contents
Probability of a word occurring given a positive or negative sentiment
Assumptions: There is no link between words
P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
![Page 17: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/17.jpg)
Problem with Naïve Bayes
Polarity does not change with domain
Words within sentiments have no relationship with each other
Words not found in lexicon might be missed by Naïve Bayes resulting in inaccuracy of polarity
No opinion rating to determine which sentiment is more polar
![Page 18: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/18.jpg)
Solution to Naïve Bayes
Establish domain sentiment relations
Establish domain aspects relations
Establish aspects sentiments relations
Estimate polarity for unseeded sentiments
Estimate strength of polarity on sentiments
![Page 19: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/19.jpg)
Establishing relations
Establish domain by categorizing aspects founded into domains such as food, location and security
Finding occurrence of aspects / sentiments within sentences for a particular domain
Finding polarity of sentences, aspects and sentiments and establishing relations
Domain
Aspects Sentiments
![Page 20: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/20.jpg)
Finding polarity for unseeded sentiments
After establishing relations, we have a graph of nodes (Sentiments / Aspects)
Some nodes have no polarity after naïve bayes but its connected nodes might have polarity
Determine the probability that the node is positive or negative given its surrounding nodes
![Page 21: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/21.jpg)
Estimating the strength of polarity
Determine the strength of the polarity of an unseeded node given that amount of traversal surrounding nodes with polarity has to take to reach it
Find the shortest path to reach an unseeded node which will result in a spanning tree
This will determine the strength of polarity
![Page 22: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/22.jpg)
Implementation
Using Dijkstra Algorithm to find the spanning tree
![Page 23: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/23.jpg)
Implementation
Find the cost to get from surrounding nodes to an unseed node
![Page 24: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/24.jpg)
END OF CA2
![Page 25: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/25.jpg)
What is going to happen in CA3?
![Page 26: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/26.jpg)
Prototyping
Refining parameters to come up with a prototype mainly to solve the following problems: Analyze polarity Opinion rating Sentiment intensity Different domains / topic context
Manually analyze reviews myself and check prototype for effectiveness and seek to improve accuracy
![Page 27: Fyp ca2](https://reader033.fdocuments.net/reader033/viewer/2022061221/54be8dd64a795999188b4622/html5/thumbnails/27.jpg)
Prototype testing
Enlarging dataset from various hotel review site
Merging results to find correlations between sentiments expression on different sites
Testing on different domain such as food to get domain dependent results