Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight...

25
Spatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University of Minnesota www.cs.umn.edu/~shekhar AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012) More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM SIGMOD Workshop on Data Engineering for Wireless and Mobile Access, 2012.

Transcript of Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight...

Page 1: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Spatial Big Data

Shashi Shekhar McKnight Distinguished University Professor

Department of Computer Science and Engineering, University of Minnesota

www.cs.umn.edu/~shekhar

AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012)

More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM SIGMOD Workshop on Data Engineering for Wireless and Mobile Access, 2012.

Page 2: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Research Theme 1: Spatial Databases

only in old plan

Only in new plan

In both plans

Evacutation Route Planning

Parallelize

Range Queries

Storing graphs in disk blocks Shortest Paths

Page 3: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Theme 2 : Spatial Data Mining

Nest locations Distance to open water

Vegetation durability Water depth

Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35

Co-location Patterns Tele connections

Page 4: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

4

Outline

• Motivation

• What is Spatial Big Data (SBD)?

• SBD and Science

• SBD Analytics

• Conclusions

Page 5: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Big Data

Mining and analyzing these big new data sets can open

the door to a new wave of innovation, accelerating

productivity and economic growth. Some economists,

academics and business executives see an opportunity to

move beyond the payoff of the first stage of the Internet,

which combined computing and low-cost

communications to automate all kinds of commercial

transactions.

Estimated Value >Usd 1 Trillion per year by 2020

Location-based service: usd 600 B

Health Informatics: usd 300 B

Manufacturing:

Page 6: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

6

Spatial Big Data Definitions

• Spatial datasets exceeding capacity of current computing systems • To manage, process, or analyze the data with reasonable effort

• Due to Volume, Velocity, Variety, …

• SBD Components • Data-intensive Computing: Cloud Computing

• Middleware, e.g., Map-Reduce, Pregel, Big-Table, …

• Big-Data analytics, e.g., data mining, machine learning, computational statistics, …

• Big Data science and societal applications

• Ex. Social media datasets, e.g., Google Flu Trend

• Which patterns may be detected in these datasets?

• Flu outbreaks ?

Page 7: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

7

Traditional Spatial Data

Spatial attribute:

Neighborhood and extent

Geo-Reference: longitude, latitude, elevation

Spatial data genre

Raster: geo-images e.g., Google Earth

Vector: point, line, polygons

Graph, e.g., roadmap: node, edge, path Raster Data for UMN Campus

Courtesy: UMN

Vector Data for UMN Campus

Courtesy: MapQuest

Graph Data for UMN Campus

Courtesy: Bing

Page 8: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

8

Raster SBD

LiDAR & Urban Terrain

Change Detection Feature Extraction

Average Monthly Temperature

(Courtsey: Prof. V. Kumar)

Data Sets >> Google Earth

Geo-videos from UAVs, security cameras

Satellite Imagery (periodic scan), LiDAR, …

Geo-sensor networks

Climate simulation, EPA Air Quality

Example use cases

Patterns of Life

Change detection, Feature extraction, Urban terrain

Page 9: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

9

Weekday GPS track for 3 months

Patterns of life

Activity Space: Usual places and visits

Rare places, Rare visits

Work

Home Club

Farm

Morning 7am – 12am

Afternoon 12noon – 5pm

Evening 5pm – 12pm

Midnight 12midnight –

7pm

Total

Home 10 2 15 29 54

Work 19 20 10 1 50

Club 4 5 4 15

Farm 1 1

Total 30 30 30 30 120

Use Case: Patterns of Life, e.g., activity space

Page 10: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

10

Vector SBD from Geo-Social Media

Vector data sub-genre

Point: location of a tweet, Ushahidi report, checkin,

Line-strings, Polygons: roads in openStreetMap

Use cases: Persistent Surveillance

Outbreaks of disease, Disaster, Unrest, Crime, …

Hot-spots, emerging hot-spots

Spatial Correlations: co-location, teleconnection

Page 11: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

11

Persistent Surveillance at American Red Cross

• Even before cable news outlets began reporting the tornadoes that ripped through Texas on

Tuesday, a map of the state began blinking red on a screen in the Red Cross' new social

media monitoring center, alerting weather watchers that something was happening in the

hard-hit area. (AP, April 16th, 2012)

Page 12: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

12

Graphs SBDs: Temporally Detailed

Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, …

Temporally detailed roadmaps [Navteq]

Use cases: Accessibility by time of week, Best start time, Best route at

different start-times

Page 13: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

13

Outline

• Motivation

• What is Spatial Big Data (SBD)?

• SBD and Science

• SBD Analytics

• Conclusions

Page 14: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Big Data and Science

Nature, 7209(4), September 4, 2008

"Above all, data on today's scale require scientific and

computational intelligence. Google may now have its critics, but no

one can deny its impact, which ultimately stems from the cleverness

of its informatics. The future of science depends in part on such

cleverness again being applied to data for their own sake,

complementing scientific hypotheses as a basis for exploring today's

information cornucopia."

Science in the Petabyte Era – • Increasing Volume

• Heightened Complexity

• Demands for Interoperability

Page 15: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Preparing Science for Big-Data

Nature, 7209(4), September 4, 2008

Big Data Translates into Big Opportunities...

and Big Responsibilities

Sudden influxes of data have transformed researchers' understanding of

nature before — even back in the days when 'computer' was still a job

description.

Unfortunately, the institutions and culture of science remain rooted

in that pre-electronic era. Taking full advantage of electronic data

will require a great deal of additional infrastructure, both technical

and cultural

Page 16: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Models in Science

Science: understand natural world

Subjective Objective, (transparent, reproducible)

Methods: Forward models, Backward models

Engineering: Solve problems optimizing cost, efficiency, etc.

Models Manual (Paper,

Pencil, Slide-rules,

log-tables, …)

Assisted by computers (HPCC, cyber-

infrastructure, data-intensive, big-

data)

Forward Differential

Equations (D.E.),

Algebraic equations,

Computational Simulations using D.E.s,

Agent-based models, etc.

Backward Parametric models,

e.g. Regression,

Correlations,

sampling,

Experiment design,

Hypothesis testing,

Bayesian: resampling, local regression,

MCMC, kernel density estimation,

neural networks, generalized additive

models, …

Frequentist: frequent patterns, Model

ensembles, hypothesis generation, …

Exploratory Data Analysis: data

visualization, visual analytics,

geographic information science, spatial

data mining, …

Page 17: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

17

Outline

• Motivation

• What is Spatial Big Data (SBD)?

• SBD and Science

• SBD Analytics

• SBD Infrastructure

• Conclusions

Page 18: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Pre-Electronic Era Models: Example 1

1854 Cholera in London

Broad St. water pump except a brewery

Recent Decades

Proximity vs. Accessibility

Page 19: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

From Hotspots To Mean Streets

19

• Complication Dimensions

• Spatial Networks • Time

• Challenges: Trade-off b/w • Semantic richness and

• Scalable algorithms

Page 20: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

KMR Routes (10) – thick lines, Crimestat K-Means (10) – ellipses,

Roads – gray lines, Burglaries - points

Innovative Technique: K Main Routes (KMR)

Summarizes Urban Activities

Page 21: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

Pre-Electronic Models: Example 2

Location Prediction

Models to predict location, time, path, …

Nest sites, minerals, earthquakes, tornadoes, …

Pre-electronic models, e.g. Regression

Assumed i.i.d

To simplify parameter estimation

Least squares – easy to hand-compute

Alternatives

Spatial Autoregression,

Geographic Weighted (Local) Regression

Parameter estimation is compute-intensive!

Next Non-i.i.d errors: Distance based

Spatio-temporal vector fields (e.g. flows, motion)

εxβWyy ρ

SSEnn

L 2

)ln(

2

)2ln(ln)ln(

2WI

Page 22: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

River

Station

Example 3: Global vs. Local Regression

Example: Lilac Phenology data

Yearly date of first leaf and first bloom

1126 locations in US & Canada

―Global‖ regression model shows a mystery

Postive Slope => blooms delayed in recent years!

Spatial decomposition solves the mystery

East of Mississippi, West of Mississippi

Each half has Negative Slope => blooms earlier in recent years!

However slopes are different across east & west

More reports in west in recent years

Page 23: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

23

Outline

• Motivation

• What is Spatial Big Data (SBD)?

• SBD and Science

• SBD Analytics

• Conclusions

Page 24: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

24

Spatial Big Data (SBD) Summary SBD are becoming available

Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, …

Big Opportunities

Data: Quicker detection of disease outbreaks, e.g., Google Flu Trends

Multi-decade large-area studies, e.g., Gulf Study, Exposomics, …

Intervention:

How can geo-social network induce desired behavior?

Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, …

Large scale Collaboration on Complex Questions

Studies with thousands of doctors and hundred million humans

... and Big Responsibilities

Institutions and culture of science remain rooted in that pre-electronic era.

Ex. Hotspots to Mean Streets

Big data exceeding capacity of traditional systems

Page 25: Spatial Big Data - Association of American · PDF fileSpatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University

25

25

CCC Workshop: Spatial Computing Visioning (9/10-11/2012) http://cra.org/ccc/spatial_computing.php