Integrating and Interpreting Social Data from Heterogeneous Sources
-
Upload
matthew-rowe -
Category
Technology
-
view
106 -
download
1
description
Transcript of Integrating and Interpreting Social Data from Heterogeneous Sources
![Page 1: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/1.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrating and Interpreting Social Data from Heterogeneous Sources
Matthew Rowe Organisations, Information and
Knowledge GroupUniversity of Sheffield
Suvodeep MazumdarDepartment of Information Studies
University of Sheffield
![Page 2: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/2.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Outline
• Information overload– Increase in social data publication
• Interlinking social data– Metadata Generation– Integrating Social Data
• Application: Interpreting Social Data– Cumbrian Floods Use Case– Interacting with Social Data
• Conclusions
![Page 3: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/3.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Information Overload
• Masses of social data are published every day– E.g. 50 million tweets (600 per second)
• http://blog.twitter.com– 22 million Facebook users in the UK
• http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/
• Too much information to deal with!• Social data is multi-faceted:
– Provenance– Topic– Geo
• Trend services (e.g. trendistic, blogpulse):– Focus on majority consensus– Need to listen in to a specific topic– Concentrate on a single source/platform– Do not consider geo facet
![Page 4: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/4.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
![Page 5: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/5.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
![Page 6: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/6.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interlinking Social Data
• Consider multi-faceted nature of social data:– Allows fine-grained analysis– Show geo-localised social data– Relevant past social data
• Solution: Interlink social data from heterogeneous sources– Use semantics!– Consistent data interpretation
![Page 7: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/7.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
![Page 8: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/8.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<photo id="949406913" media="photo"> <owner nsid="54948696@N00”/> <title>DSC00171.JPG</title> <description></description> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /> <tags> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag> </tags> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country> </location></photo>
![Page 9: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/9.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
![Page 10: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/10.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ;
![Page 11: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/11.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;
![Page 12: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/12.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
![Page 13: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/13.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
![Page 14: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/14.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Metadata Generation
• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas
• Need to link data together from disparate sources• A social data fragment = a single piece of social data
– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:
1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI
2. Assign the content to the instance (topic)• Use hashtags of the microblog
3. Create an instance of gml:Geometry (geo)• Capture geo facet
4. Assign timestamp of fragment creation (provenance)• Using dc:created
5. Assign the fragment to its owner (provenance)• Create foaf:Person instance
<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>
<http://twitter.com/mattroweshow> rdf:type foaf:Person ;rdf:type itr:LocalizedResource ;foaf:name "Matthew Rowe" ;foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;
<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for
#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;sioc:hasCreator <http://twitter.com/mattroweshow> ;itr:has_Localization _:a2 .
_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .
![Page 15: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/15.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Integrated Social Data
• Triplify social data from multiple platforms– Flickr XML response -> RDF– Picassa XML response -> RDF
• Use common semantics– Can perform SPARQL queries
PREFIX dcterms:<http://purl.org/dc/terms>SELECT ?itemWHERE {
?item dcterms:subject "iranelections" .
?item dcterms:created ?date}ORDER BY DESC(?date)
PREFIX dcterms:<http://purl.org/dc/terms>PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>PREFIX gml:<http://www.opengis.net/gml/>SELECT DISTINCT ?post ?tagWHERE {
?post dcterms:subject ?tag .?post itr:has_Localization ?geo .?geo gml:pos "53.4813,-2.2392"
}
![Page 16: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/16.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interpreting Social Data
• Cumbrian Use Case– UK region suffered worst floods in centuries– Observe the effects in social data
• Rise in publication• Fine-grained geocoded social data
• Dataset:– Microblogs from 200 Cumbrian Twitter users
• Published during 2009• 3513 microblogs• Produced 475,043 triples
– Images from Flickr taken in Cumbria• 6663 images• Produced 182,304
![Page 17: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/17.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Interacting with Social Data
• Built a visualisation application to analyse social data fragmentshttp://www.dcs.shef.ac.uk/~suvodeep/ViziSocial
• Filter by date– Lower slider
• Fine-grained focus– Zoom in
• Tag cloud– Shows fragment topics– Window controls tag cloud topics
• Markers contain number of fragments
![Page 18: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/18.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Conclusions
• Consistent interpretation of social data– Across heterogeneous sources
• Application– Allows analyses of social data
• To fine-grained detail– Utilises multiple facets of social data– Requires metadata
• Issue of scalability
• Future Work– Adapting to real time data acquisition
• Focussing on South Yorkshire region at present• Assess scalability issue
![Page 19: Integrating and Interpreting Social Data from Heterogeneous Sources](https://reader034.fdocuments.net/reader034/viewer/2022051819/54c65b414a795934598b458a/html5/thumbnails/19.jpg)
Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010
Questions?
Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]