EMOTIVE: Data Collection

20
EMOTIVE: Data Collection 19 th July, Loughborough University, UK Dr. Tom Jackson, Dr. Ann O’Brien and Dr. Martin Sykora

description

EMOTIVE: Data Collection. 19 th July, Loughborough University, UK Dr. Tom Jackson, Dr. Ann O’Brien and Dr. Martin Sykora. EMOTIVE Work Packages and Milestones. Contents. Previous Work Technical Considerations Retrieval of Datasets Exploring Taxonomy / Ontology. - PowerPoint PPT Presentation

Transcript of EMOTIVE: Data Collection

Page 1: EMOTIVE: Data Collection

EMOTIVE: Data Collection

19th July, Loughborough University, UKDr. Tom Jackson, Dr. Ann O’Brien and Dr. Martin Sykora

Page 2: EMOTIVE: Data Collection

EMOTIVE Work Packages and Milestones

Page 3: EMOTIVE: Data Collection

Contents

Previous Work

Technical Considerations

Retrieval of Datasets

Exploring Taxonomy / Ontology

Page 4: EMOTIVE: Data Collection

- Guardian (with academic collaboration) work on London 2011 Riots - Crisees Demonstrator- Terrorism Informatics: Cheong and Lee (2011)

Previous Work Technical Considerations Retrieval of Datasets

Cheong M. and Lee V. C. S., 2011. A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter, Journal of Information Systems Frontiers – Springer 13 (1), pp. 45-59

Page 5: EMOTIVE: Data Collection

- Crisees Demonstrator; Kingston Bridge (Glasgow) – Scottish storms

Previous Work Technical Considerations Retrieval of Datasets

Maxwell, D. and Raue, S. and Azzopardi, L. and Johnson, C. and Oates, S., 2012. Crisees: Real-time monitoring of social media streams to support crisis management, Advances in Information Retrieval – Springer, pp. 573-575

Page 6: EMOTIVE: Data Collection

- Terrorism Informatics: Cheong and Lee (2011)

Previous Work Technical Considerations Retrieval of Datasets

Cheong M. and Lee V. C. S., 2011. A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter, Journal of Information Systems Frontiers – Springer 13 (1), pp. 45-59

Page 7: EMOTIVE: Data Collection

Brief overview of technical challenges:

Collecting and processing data in real-time

Filtering (spam detection / influential Tweets / Influential Twitter Accounts) and aggregating the datasets

Assessing the integrity of the datasets

Potential need to store massive datasets (MongoDB)

Previous Work Technical Considerations Retrieval of Datasets

Page 8: EMOTIVE: Data Collection

DATASETS based on UK REGIONS

Retrieve geo-tagged tweets only (small minority) …other ways of restricting Tweet retrieval to UK based Tweets

Use look-up lists of entities / places / cities / main-roads / etc. in the UK (tricky; King’s Cross station vs. King’s Cross city in Sydney)

Complementary Approach; differences in dialects, e.g. Nottingham / Nottinghamshire, Newcastle, Manchester, London,… (identify heuristics & automated classification)

Location inferred from the event

Previous Work Technical Considerations Retrieval of Datasets

Page 9: EMOTIVE: Data Collection

DATASETS based on NATIONAL SECURITY RISKS

Hashtags # concerning national security starting point; threats list compiled by Cheong and Lee (2011)

Tweet collection operating at all times, collect Tweets before events happen: preselect known events such as the G20 or G8

Also collect datasets relating to emergencies; i.e. floods in UK

understanding the “lifecycle” of such an event on Twitter Sentiment of Terse natural language…

Previous Work Technical Considerations Retrieval of Datasets

Page 10: EMOTIVE: Data Collection

Exploratory Data Collection: M6

Previous Work Technical Considerations Retrieval of Datasets

Chris Halpin @KatsJonouchiThis M6 toll thing... If you're going to try and blow up a coach, you wouldn't choose a Megabus, surely...

Armani Music @ArmaniBmusicSomeone was seen pouring a liquid in something that made fumes on a Mega bus on the M6 Toll. I will full on wee myself if it's a pot noodle.

Johnny O'Grady @JohnnnnyyyyyyyyWatching the news in Tescos, bit confused about what's happened on the M6 toll road

Ellie Henderson @EllieH_86Distracted at work watching the M6 Toll footage on the news.#M6Toll

Ian Watson @IanWatoopM6 Toll. "I picked the wrong day to give up smoking!". #Megabus

Splashy♥ ❤@_Splashy_2001, terrorists hijack planes. 2012, terrorism alert on a Megabus on the M6 toll. This recession has hit hard.

Zora Suleman @ZoraSulemanPolice confirm .. the closure of the M6 toll road was caused by an Electric cigarette

Page 11: EMOTIVE: Data Collection

Exploratory Data Collection: Kings Cross Station

Previous Work Technical Considerations Retrieval of Datasets

@Pearcesport|17 days before Olympics, and Kings Cross station, one of main gateways to Olympic Park, has been closed this morning due to overcrowding

@thefashionturdGot evacuated from Kings cross st pancras! not half as scary as Boris' Olympic messages that are being played into tube stations though...

@corrina_kylieKings Cross is filled with scary characters, I'll say that much.

@subhajitbSome pretty angry people outside Kings Cross tube station#fasterhigherstrongerhttp://twitter.com/subhajitb/status/222599112976105472/photo/1

@MoniqueBabyxRIP to the 18 year old boy who died in kings cross. so sad, so young, so early. prayers go to the innocent family.

Page 12: EMOTIVE: Data Collection

Taxonomy / Ontology 1

Make a Taxonomy first This establishes correct relationships Then create ontology (OWL)

Scope1. Civic unrest? UK?2. Emotions

Range of Language for Ontology 1 ‘normal’ English Urban slang Street talk Text language

Exploring Taxonomy / Ontology

Page 13: EMOTIVE: Data Collection

Vocabulary sources

Thorne, T. Dictionary of contemporary slang

www.urbandictionary.com www.odps.org www.chavscum.co.uk etc.

and from Twitter feeds and archives

Exploring Taxonomy / Ontology

Page 14: EMOTIVE: Data Collection

Taxonomy / Ontology 2

A number of ‘lists’ of emotions exist: Project Muse Art and architecture thesaurus Tate Gallery Etc.

Exploring Taxonomy / Ontology

Page 15: EMOTIVE: Data Collection

Tate galleries – emotion

Exploring Taxonomy / Ontology

Page 16: EMOTIVE: Data Collection

Linguistic and semantic issues

At the extraction stage: Single terms and noun phrases: ‘police car’, etc.

Barriers to understanding Sarcasm and metaphor

Types of relationships between terms: Synonyms Broad to narrow (Classes and subclasses)

Exploring Taxonomy / Ontology

Page 17: EMOTIVE: Data Collection

For example…

Exploring Taxonomy / Ontology

Guns

glock

piece

pistol

etc.

Page 18: EMOTIVE: Data Collection

‘Guns’ is one subclass of ‘Weapons’

Exploring Taxonomy / Ontology

Weapons

Glock

piece

pistol

guns

knives

Page 19: EMOTIVE: Data Collection

The issues and approaches discussed in the context of Twitter, also apply to public Facebook messages.

Other considerations:- Age and Gender detection from Terse natural language- Scope of Communication (social network analysis)- Followers of certain bands / parties

(maybe extreme; neo-nazi, leftist, etc.)

Final Remarks…

Page 20: EMOTIVE: Data Collection

Thanks