AMANDA COHEN MOSTAFAVI Applying Entity Discovery and Assignment to video games in order to mine...
-
Upload
madlyn-wilkins -
Category
Documents
-
view
217 -
download
0
Transcript of AMANDA COHEN MOSTAFAVI Applying Entity Discovery and Assignment to video games in order to mine...
AMANDA COHEN MOSTAFAVI
Applying Entity Discovery and Assignment to video games in
order to mine opinions
Project Purpose
• Many differing opinions on a video game’s quality, difficult to determine general opinion
• Usually look to professional video game reviews • Can gather review scores, normalize and
average score in order to determine general consensus– Done on GameRankings.com
• However, this ignores the discussion by everyday players– Debate takes place most commonly on message boards
Project Purpose
Solution: Mine opinions expressed on message board posts and derive a consensus from the results
Using the algorithm for entity discovery and assignment and opinion mining as defined in this paper: Entity Discovery and Assignment for Opinion Mining
Applications. Xiaowen Ding, Bing Liu, Lei Zhang. SIGKDD, 2009
Goal:
To mine opinions on selected games expressed on video game message boards, derive an average opinion and compare results to the review scores gathered by GameRankings.com
Games
• Total games examined: 10• All released in 2007• 5 were top-selling games of the year,
according to the NPD group (market research group that studies the video game industry, among other things)
• 5 are among the highest reviewed games according to GameRankings.com
• Ensures a mix of critically and commercially successful
• Note: Duplicate Games are removed
Games
• High selling– Halo 3 (360, Microsoft) -
4.82 million– Wii Play with Wii Remote
(Wii, Nintendo) - 4.12 million
– Call of Duty 4: Modern Warfare (360, Activision) - 3.04 million
– Guitar Hero III: Legends of Rock (PS2, Activision) - 2.72 million
– Super Mario Galaxy (Wii, Nintendo) - 2.52 million
• Highly Reviewed– The Orange Box (PC,
Xbox 360) – 96%– BioShock (PC, Xbox 360)
– 94%– Elder Scrolls IV: Oblivion
(PS3) - 92%– God of War II (PS2) –
92%– Team Fortress 2 (PC) –
92%
Game Issues
Alternate Names: Games are often referenced by shorthand or abbreviation
Solution: include an array of possible alternate names in defining the entity object
Message Boards
Principally from video game websites, or websites with large portions devoted to video games
Looking at comments in relation to articles about top selling games or reviews to ensure that the posts are relevant to the games Lots of comparative statements as well
Message Board Posts
1UP.com: 26 postsGamespot.com: 14 postsIGN.com: 20 postsTotal: 60 posts
Post Issues
Unusual ways of expressing opinions: message board posters may not express their opinions in the same way as someone writing a review would. For instance: “Call of Duty 4 was a very good game” <- this
sentence would make for very easy opinion mining “COD4 IS TEH WIN, OMG!!!!111” <- more likely on a
message board, and much harder to mine
Solution: The opinion mining algorithm allows for “opinion grammar”. More later…
The Process
Implements Entity Discovery and assignment algorithm, with a couple modifications: Entity discovery section reduced to better fit purposes
of the project Ordinarily would use pattern mining in order to find
entities, not an issue in this case since there are a predetermined set of games examined
Data preprocessing
Each word in every post is given a part-of-speech tag Designates the grammatical role of each word Done with Stanford’s POS tagger, developed by the
Stanford Natural Language Processing group http://nlp.stanford.edu/software/tagger.shtml
A list of the entities used are created, and their alternate names are define
Entity Discovery and Assignment
Each post is parsed to separate sentences and find each entity
If entity is found, and matches the game title, the entity is assigned to that sentence
If there is no entity, the entity of the previous sentence is assigned Works on the assumption that when someone starts talking
about an entity, subsequent sentences deal with the same entity without explicitly stating it
If an alternate name for the entity is found, it is replaced with the original title to reduce future processing time
Opinion Grammar
The original authors suggest that hard-coding every possible opinion words is not recommended
Instead, they suggest using a system to define grammar that will pick out opinion words and statements
A combination of hard-coded word list and grammar rules were used for this project Hard coded words for regular English grammar,
defined rules for more unexpected words and phrases
Indicator Word Symbols
Po: PositiveNe: NegativeNeu: NeutralNg: NegationBut: But-like
Opinion Mining
Step 1: Apply indicator word symbolsStep 2: Apply phrase rulesStep 3: Search for negations, and change the
opinion of the subsequent word (if it was positive, it would be negative and vice versa)
Step 4: Aggregate opinions Search for indicators, Po = 1, Ne = -1, Neu = 0
Comparative Sentences
If a sentence has more than one entity, it is a comparative sentence This sentence compares one entity to another, i.e.
“Game-A is better than Game-B”
In order to find the superior and inferior entities, look for comparative or superlative words (according to POS tags) and whether it’s a positive or negative word If negative, the entity after the comparative word is
superior. If positive the entity before the comparative word is superior
Up next: demo and results…