Taming Social Media with MongoDB
-
Upload
humangeo-group -
Category
Technology
-
view
564 -
download
1
description
Transcript of Taming Social Media with MongoDB
2
Overview
• Introduction• Social Media Challenges• MongoDB Setup• Collecting Tweets• Querying Tweets• Accessing the Data• Finding Most Active Tweeter• Lessons Learned• Building an Interface• Demo
3
Introduction
• Built a tool to collect tweets over Australia and interact with them on a map
• Working at HumanGeo– Building tools and services for geospatial analysis
of Big Data– Using MongoDB for horizontally scalable storage
and geospatial analysis
4
Social Media Challenges
• No control over data– “Consumers of Tweets should tolerate the addition
of new fields and variance in ordering of fields with ease.” - Twitter
• High Volume– ~17k tweets in a day or 6.2M per year with exact
coordinates in Australia– Record high of >25k tweets per second or >788B
per year around the world - Twitter
5
MongoDB Setup
• Create database• Create capped collections• Create indexes
6
Collecting Tweets
• Using tweetstream to collect tweets over Australia from statuses/filter endpoint
• Insert results into collections
7
Collecting Tweets (cont)
• Augment results for better queries– Twitter provides date strings like "Wed Jun 13
23:17:58 +0000 2012“
8
Querying Tweets
• Get all of the latest tweets
• Get all the tweets from a user
9
Querying Tweets (cont)
• Get tweets near a point
• Get tweets within a bounding box
11
Finding Most Active Tweeter
• Calculate tweet count for each user and return tweets for that user
12
Lessons Learned
• Use Longitude, Latitude ordering for coordinates
• Default index value range is exclusive of upper bound
• Twitter has bugs too• Making your own maps isn’t hard (it can take
some time)
13
Building an Interface
• Dust javascript templating library• Leaflet javascript interactive map library• jQuery javascript library• TileStream map tile server