Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight...
Transcript of Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight...
Spatial Big Data
Shashi Shekhar McKnight Distinguished University Professor
Department of Computer Science and Engineering, University of Minnesota
www.cs.umn.edu/~shekhar
AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012)
More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM SIGMOD Workshop on Data Engineering for Wireless and Mobile Access, 2012.
Research Theme 1: Spatial Databases
only in old plan
Only in new plan
In both plans
Evacutation Route Planning
Parallelize
Range Queries
Storing graphs in disk blocks Shortest Paths
Theme 2 : Spatial Data Mining
Nest locations Distance to open water
Vegetation durability Water depth
Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35
Co-location Patterns Tele connections
4
Outline
• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions
Big Data
Mining and analyzing these big new data sets can open
the door to a new wave of innovation, accelerating
productivity and economic growth. Some economists,
academics and business executives see an opportunity to
move beyond the payoff of the first stage of the Internet,
which combined computing and low-cost
communications to automate all kinds of commercial
transactions.
Estimated Value >Usd 1 Trillion per year by 2020
Location-based service: usd 600 B
Health Informatics: usd 300 B
Manufacturing:
…
6
Spatial Big Data Definitions
• Spatial datasets exceeding capacity of current computing systems • To manage, process, or analyze the data with reasonable effort
• Due to Volume, Velocity, Variety, …
• SBD Components • Data-intensive Computing: Cloud Computing
• Middleware, e.g., Map-Reduce, Pregel, Big-Table, …
• Big-Data analytics, e.g., data mining, machine learning, computational statistics, …
• Big Data science and societal applications
• Ex. Social media datasets, e.g., Google Flu Trend
• Which patterns may be detected in these datasets?
• Flu outbreaks ?
7
Traditional Spatial Data
Spatial attribute:
Neighborhood and extent
Geo-Reference: longitude, latitude, elevation
Spatial data genre
Raster: geo-images e.g., Google Earth
Vector: point, line, polygons
Graph, e.g., roadmap: node, edge, path Raster Data for UMN Campus
Courtesy: UMN
Vector Data for UMN Campus
Courtesy: MapQuest
Graph Data for UMN Campus
Courtesy: Bing
8
Raster SBD
LiDAR & Urban Terrain
Change Detection Feature Extraction
Average Monthly Temperature
(Courtsey: Prof. V. Kumar)
Data Sets >> Google Earth
Geo-videos from UAVs, security cameras
Satellite Imagery (periodic scan), LiDAR, …
Geo-sensor networks
Climate simulation, EPA Air Quality
Example use cases
Patterns of Life
Change detection, Feature extraction, Urban terrain
9
Weekday GPS track for 3 months
Patterns of life
Activity Space: Usual places and visits
Rare places, Rare visits
Work
Home Club
Farm
Morning 7am – 12am
Afternoon 12noon – 5pm
Evening 5pm – 12pm
Midnight 12midnight –
7pm
Total
Home 10 2 15 29 54
Work 19 20 10 1 50
Club 4 5 4 15
Farm 1 1
Total 30 30 30 30 120
Use Case: Patterns of Life, e.g., activity space
10
Vector SBD from Geo-Social Media
Vector data sub-genre
Point: location of a tweet, Ushahidi report, checkin,
…
Line-strings, Polygons: roads in openStreetMap
Use cases: Persistent Surveillance
Outbreaks of disease, Disaster, Unrest, Crime, …
Hot-spots, emerging hot-spots
Spatial Correlations: co-location, teleconnection
11
Persistent Surveillance at American Red Cross
• Even before cable news outlets began reporting the tornadoes that ripped through Texas on
Tuesday, a map of the state began blinking red on a screen in the Red Cross' new social
media monitoring center, alerting weather watchers that something was happening in the
hard-hit area. (AP, April 16th, 2012)
12
Graphs SBDs: Temporally Detailed
Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, …
Temporally detailed roadmaps [Navteq]
Use cases: Accessibility by time of week, Best start time, Best route at
different start-times
13
Outline
• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions
Big Data and Science
Nature, 7209(4), September 4, 2008
"Above all, data on today's scale require scientific and
computational intelligence. Google may now have its critics, but no
one can deny its impact, which ultimately stems from the cleverness
of its informatics. The future of science depends in part on such
cleverness again being applied to data for their own sake,
complementing scientific hypotheses as a basis for exploring today's
information cornucopia."
Science in the Petabyte Era – • Increasing Volume
• Heightened Complexity
• Demands for Interoperability
Preparing Science for Big-Data
Nature, 7209(4), September 4, 2008
Big Data Translates into Big Opportunities...
and Big Responsibilities
Sudden influxes of data have transformed researchers' understanding of
nature before — even back in the days when 'computer' was still a job
description.
Unfortunately, the institutions and culture of science remain rooted
in that pre-electronic era. Taking full advantage of electronic data
will require a great deal of additional infrastructure, both technical
and cultural
Models in Science
Science: understand natural world
Subjective Objective, (transparent, reproducible)
Methods: Forward models, Backward models
Engineering: Solve problems optimizing cost, efficiency, etc.
Models Manual (Paper,
Pencil, Slide-rules,
log-tables, …)
Assisted by computers (HPCC, cyber-
infrastructure, data-intensive, big-
data)
Forward Differential
Equations (D.E.),
Algebraic equations,
…
Computational Simulations using D.E.s,
Agent-based models, etc.
Backward Parametric models,
e.g. Regression,
Correlations,
sampling,
Experiment design,
Hypothesis testing,
…
Bayesian: resampling, local regression,
MCMC, kernel density estimation,
neural networks, generalized additive
models, …
Frequentist: frequent patterns, Model
ensembles, hypothesis generation, …
Exploratory Data Analysis: data
visualization, visual analytics,
geographic information science, spatial
data mining, …
17
Outline
• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• SBD Infrastructure
• Conclusions
Pre-Electronic Era Models: Example 1
1854 Cholera in London
Broad St. water pump except a brewery
Recent Decades
Proximity vs. Accessibility
From Hotspots To Mean Streets
19
• Complication Dimensions
• Spatial Networks • Time
• Challenges: Trade-off b/w • Semantic richness and
• Scalable algorithms
KMR Routes (10) – thick lines, Crimestat K-Means (10) – ellipses,
Roads – gray lines, Burglaries - points
Innovative Technique: K Main Routes (KMR)
Summarizes Urban Activities
Pre-Electronic Models: Example 2
Location Prediction
Models to predict location, time, path, …
Nest sites, minerals, earthquakes, tornadoes, …
Pre-electronic models, e.g. Regression
Assumed i.i.d
To simplify parameter estimation
Least squares – easy to hand-compute
Alternatives
Spatial Autoregression,
Geographic Weighted (Local) Regression
Parameter estimation is compute-intensive!
Next Non-i.i.d errors: Distance based
Spatio-temporal vector fields (e.g. flows, motion)
εxβWyy ρ
SSEnn
L 2
)ln(
2
)2ln(ln)ln(
2WI
River
Station
Example 3: Global vs. Local Regression
Example: Lilac Phenology data
Yearly date of first leaf and first bloom
1126 locations in US & Canada
―Global‖ regression model shows a mystery
Postive Slope => blooms delayed in recent years!
Spatial decomposition solves the mystery
East of Mississippi, West of Mississippi
Each half has Negative Slope => blooms earlier in recent years!
However slopes are different across east & west
More reports in west in recent years
23
Outline
• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions
24
Spatial Big Data (SBD) Summary SBD are becoming available
Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, …
Big Opportunities
Data: Quicker detection of disease outbreaks, e.g., Google Flu Trends
Multi-decade large-area studies, e.g., Gulf Study, Exposomics, …
Intervention:
How can geo-social network induce desired behavior?
Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, …
Large scale Collaboration on Complex Questions
Studies with thousands of doctors and hundred million humans
... and Big Responsibilities
Institutions and culture of science remain rooted in that pre-electronic era.
Ex. Hotspots to Mean Streets
Big data exceeding capacity of traditional systems
25
25
CCC Workshop: Spatial Computing Visioning (9/10-11/2012) http://cra.org/ccc/spatial_computing.php