Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

25
Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li

Transcript of Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Page 1: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Deriving Topics and Opinions from

Microblogs

Feng JiangSupervisors: Jixue Liu & Jiuyong Li

Page 2: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Contents• Background of research• Significance of research• Problems and challenges• Main tasks• Literature review• Methodology • Improvement and innovation • Experiment Result

Page 3: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Background• Microblogs: Twitter

Twitter allows users to post short messages

(i.e. maximum 140 characters) called “tweets” to communicate to each otherInformation platform

allow people to publish, spread and share information, knowledge and personal viewpoint.

Publish easily and convenientlyAuthors publish tweets, so they often publish blogs which are useless as well as good articles by using laptops and smart phones.

Page 4: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Significance• Find useful information

Extract hot topic Extract opinion

• Save plenty of time and energy Do not have to read all the tweets, can quickly know the

content. Quickly find the opinion classification for the hot topic.

• Seek and track the important events• Identify fashion trends • Find popular products

Page 5: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Problems and challenges• It is very hard for individuals to manually find interesting and

popular things due to numerous posts

• We could not directly utilise the existing web and text mining methods to extract hot topics and opinions from mircoblogs because of unique characteristics of mircoblogs.

Page 6: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Problems and challenges mass data• At the end of 2009, Twitter had 75 million account

holders, of which about 20% are active. There are approximately 2.5 million Twitter posts per day.

• While the majority posts are conversational or not very meaningful, about 3.6% of the posts concern topics of mainstream news.

Page 7: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Problems and challengesSemi-structured and unstructured data

there are no restrictions and rules on content and style to write posts on Microblogs.

A great variety of topics and viewsAuthors may discuss the popular movies in one paragraph, and then express their opinions for the sports events in next paragraph in one article, which makes the topic of one tweet is not clear.

Page 8: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Main tasks• Topic extraction

Generate a complete and meaningful sentence to summary a popular current event (e.g. 2012 London Olympics ) from relevant posts of blogs.

Page 9: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Main tasks• Sentiment analysis

find who support this topic and who oppose it from the comments

Page 10: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Literature review

• M. Chau, et al., "A blog mining framework," It Professional, vol. 11, pp. 36-41, 2009.

Page 11: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Literature review

• M. Hutton, et al., "Summarizing microblogs automatically," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, 2010.

Page 12: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Literature review• B. Sharifi, et al., "Experiments in Microblog Summarization," in Social

Computing (SocialCom), 2010 IEEE Second International Conference on, 2010, pp. 49-56.

Page 13: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Methodology

Page 14: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Methodology• 1 Text pre-processing

Part-of-speech (POS) tagging Feature filteringStop Words list: and, or, ofWord Stemming: wants, wanted -> wantSynonyms and antonymsHypernyms and hyponyms: love -> emotionTF IDF: term frequency * inverse document frequencyVector Space ModelSimilarity analysis

Page 15: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Methodology

• 2 Detect topics: clustering MethodK Means clustering,SOM clusteringwordnet-based clustering

• 3 Detect opinionBayesian classificationSVM (support vector machine)

Page 16: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

• Using wordnet to improve clustering, assign the weight to wrods and generate topic sentence.

• WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets.

• For example:• Suppose the weight of “defeat” is 5, the weight of “overcome” is 3. They

are in the same synset, so the weight of “defeat” is 8

Page 17: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

• Using clustering method to cluster the tweets before detect hot topics and opinions

• wordnet-based clustering• Other’s work only calculate the word frequency

Page 18: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

• Consider Related factors Word Frequency Posts Occurrence time Author: celebrity or have a lot of followers Users’ Discrete Degrees: describe the discrete distribution

level of users who release or forward posts Keywords: some words in twitter are signed by using hashtag:

#Happy Sweetest Day, #beijing, #Alex Cross

Page 19: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

• Grammar Analysis• Noun: not changed.• Verb: word stemming. • Adjective and adverb: word stemming, analysed and processed by

wordnet. Synonyms and antonyms• For example: the love of hypernyms and hyponyms, entity——> abstract

entity ——>abstraction ——> attribute ——> state ——> feeling ——> emotion ——> love

• Create subject set, verb set and object set to generate the simple sentence of the topic

Page 20: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation • 3-layer tree structure• The first layer is subject set, the second layer is verb set, the last layer

is object set • Create subject set, verb set and object set to generate the simple

sentence of the topic• the basic sentence unit: SUBJECT plus VERB, or SUBJECT plus

VERB plus OBJECT. • Remember that the subject names what the sentence is about, the verb

tells what the subject does or is, and the object receives the action of the verb.

• Although many other structures can be added to this basic unit, the pattern of SUBJECT plus VERB (or SUBJECT plus VERB plus OBJECT) can be found in even the longest and most complicated structures.

Page 21: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

Page 22: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Improvement and innovation

Page 23: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Experiment

• Input:Australian Olympic shooters have had a tough morning . They lost - Dina Aspandiyarova finished 14th and Lalita Yauhleuskaya was 40th

Germany defeats Aussies beach volleyball pair Bec Palmer and Louise Bawden in three sets

Germany overcomes Aussies beach volleyball pair Bec Palmer and Louise Bawden in August.

Aussies Palmer and Bawden take it to a deciding set in the beach volleyball against Germany

Australian team lost the men's water polo to Italy 8-5 . The Sharks play Kazakhstan next on Tuesday.

They lost the men's water polo to Italy. They came back last night.

Page 24: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Experiment Result

Page 25: Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Questions