Identifying Patterns In Spatial Data

30
IDENTIFYING PATTERNS IN SPATIAL DATA Xun Zhou University of Iowa September 5, 2014

description

Identifying Patterns In Spatial Data. Xun Zhou University of Iowa September 5, 2014. Outline. Introduction Spatial Data and Models Statistical models Spatial Pattern Families Computational Challenges. What is spatial Data mining (SDM). - PowerPoint PPT Presentation

Transcript of Identifying Patterns In Spatial Data

Page 1: Identifying Patterns In Spatial Data

IDENTIFYING PATTERNS INSPATIAL DATAXun ZhouUniversity of Iowa

September 5, 2014

Page 2: Identifying Patterns In Spatial Data

OUTLINE• Introduction

• Spatial Data and Models

• Statistical models

• Spatial Pattern Families

• Computational Challenges

Page 3: Identifying Patterns In Spatial Data

WHAT IS SPATIAL DATA MINING (SDM)• Identifying interesting, non-trivia, and useful patterns from

large spatial datasets

• “Spatial” is general – includes spatio-temporal

• Examples of spatial/spatio-temporal datasets:• GPS traces• Facebook / Twitter check-ins• Climate observations (e.g., rainfall, temperature, etc).• Remotely sensed images (e.g., NASA products)• Crime reports• Disease maps and records• Traffic statistics and road networks• Sales/market price data, supply maps

Page 4: Identifying Patterns In Spatial Data

WHY IS SDM IMPORTANT• Location/time information brings rich context• Support decision making• Understanding natural phenomenon• Improve the quality of knowledge

• London Cholera 1854 – John Snow

• Modern examples• Predict land cover type with limited samples• Which animals often live in the same area?• Detect outbreaks of diseases/crimes• Find anomalous climate events

Picture Courtesy: Prof. Shashi Shekhar @ UMN

Page 5: Identifying Patterns In Spatial Data

WHAT IS “SPECIAL” ABOUT “SPATIAL”Traditional Data Mining Spatial Data Mining

Data Types Age, salary, text… (in addition) Location, shape, time …

Relationships Arithmetic, Ordering, Subset…

Topological, directional, metric…

Statistical models

Data follows i.i.d. Data is auto-correlated & heterogeneous

Output pattern Diaper + beer = frequent set

Diaper + beer only frequent in blue-collar neighborhoods

Computation … …

Picture Source: [1]

Page 6: Identifying Patterns In Spatial Data

SPATIAL DATA MINING COMPONENTS• Input Data

• Statistical Foundations

• Output patterns

• Computational Process

Page 7: Identifying Patterns In Spatial Data

OUTLINE• Introduction

• Spatial Data and Models

• Statistical models

• Spatial Pattern Families

• Computational Challenges

Page 8: Identifying Patterns In Spatial Data

SPATIAL DATA TYPES• Two data representation models

Vector Data (Object Model)

Raster Data (Field Model)

Data representation Geometric objects Continuous field with attribute functions

Examples Disease reports (point)GPS traces (lines/curves)Counties, states (polygons)

Satellite imagesTemperature map of the U.S.Vegetation cover in Africa

Picture source: [2]

Page 9: Identifying Patterns In Spatial Data

SPATIAL RELATIONSHIPS AND OPERATIONS• Between spatial objects:• Set-oriented: Union, Intersection, Membership…• Topological: Meet, within, overlap, connected…• Directional: North, East, left, above, below…• Metric: Distance, area, perimeter

• Spatial field operations• Local, Focal, Zonal, Global

Individual location(elevation > 1000 ft.)

A small neighborhood(slope, gradient)

Part of a region(Mountain peak)

Among all the locations(The Everest)

Page 10: Identifying Patterns In Spatial Data

OUTLINE• Introduction

• Spatial Data and Models

• Statistical models

• Spatial Pattern Families

• Computational Challenges

Page 11: Identifying Patterns In Spatial Data

TWO KEY FEATURES• Spatial Autocorrelation• The first law of geography[*]: “Everything is related to everything, but

near things are more relevant than distant things”.• Spatial features are usually auto-correlated or clustered rather than

randomly distributed

• Spatial heterogeneity• Spatial patterns are not uniform globally – they vary from place to

place.

[*] Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.

Page 12: Identifying Patterns In Spatial Data

STATISTICAL FOUNDATIONS• Spatial statistics – a brunch of statistics

Models[4] Geostatistical Lattice(Areal) Point Process

Scenarios Continuous space Disjoint and complete partitions of the space (e.g., grids, areas)

Distribution of points

Examples Temperature in US Population of counties Locations of birds

Major techniques

Kriging (spatial interpolation)

Spatial Autoregressive Regression (SAR)Markov Random Field (MRF)

Ripley’s K-functionCross k-functionComplete Spatial Randomness (CSR)

* These are statistical models (like normal distribution) and may not lineup with data representation models.

Page 13: Identifying Patterns In Spatial Data

SPATIAL NEIGHBORHOOD• A collection of nearby location/spatial object • Adjacent/connected objects/locations• Within a certain distance

• The W-matrix:

r

[0 11 01 00 1

1 00 10 11 0

]𝐴 𝐵 𝐶𝐷

𝐴𝐵𝐶𝐷

[ 0 0.50.5 00.5 00 0.5

0.5 00 0.50 0.50.5 0

]𝐴 𝐵 𝐶𝐷

𝐴𝐵𝐶𝐷

A B

C D

Page 14: Identifying Patterns In Spatial Data

OUTLINE• Introduction

• Spatial Data and Models

• Statistical models

• Spatial Pattern Families

• Computational Challenges

Page 15: Identifying Patterns In Spatial Data

SPATIAL PATTERN FAMILIES• A comparison with traditional DM tasks

Traditional Data Mining Pattern Families

Spatial Data Mining Pattern Families

Prediction/Classification Spatial Prediction/Geographic Classification

Clustering Spatial Clustering/Hotspot detection

Anomaly Detection Spatial Anomaly/Outlier Detection

Association Rule Mining Spatial Co-location Patterns

Page 16: Identifying Patterns In Spatial Data

SPATIAL PREDICTION• Traditional classifiers based on i.i.d. and global model• Linear regression, Decision Tree, SVM, CART, etc.• Spatial auto-correlation and variation are not modeled

• Predicting land cover types, location-based recommendation

• Regression

• Spatial Decision Tree[5]

• Information gain function: add spatial autocorrelation measure• Decision rules:

Linear regression

SAR GWR

()

Traditional Spatial

f(x) > 1? Left : Right

Flip if neighbors classified differentlyIllustration of focal-test-based spatial decision tree[5]

C4.5 results on land cover data [5]

Page 17: Identifying Patterns In Spatial Data

SPATIAL OUTLIER DETECTION• Traditional Anomaly Detection• Data is anomalous w.r.t. global data distribution

• Spatial outlier[6]

• Data is anomalous w.r.t. its neighbors (discontinuity)• Finding Suspicious buildings, broken sensors, or other points of interest…• Methods: • Variogram clouds• Moran scatterplot• Spatial Statistic (S)

1 1

1 5

1 2

1 21 1

2 2

1 2

2 2

4

44

4

4 4 4 4

5 5 5 5

4 5

5 5

5

55

5

1-D spatial data and distribution [1]

Page 18: Identifying Patterns In Spatial Data

SPATIAL ASSOCIATION• Spatial Co-location pattern[7]

• Given a number of spatial object types and instances• Find sets of types that are frequently located in proximity• Example: {Fox, Rabbits}, {Nile Crocodiles, Egyptian Plover}

Frequent item set

Co-location Comment

Transactions Neighbor set Space is continuous, no transactions

Support, Confidence

Participation index

PI = min(AB/A, AB/B)

{‘+’, ‘x’}, {‘o’, ‘*’}

Pictures source: [1]

Page 19: Identifying Patterns In Spatial Data

SPATIAL CLUSTERING• Grouping spatial objects into clusters such that• Intra-cluster similarity is maximized• Inter-cluster similarity is minimized

• Detecting communities, crowds, building blocks, etc.

• Is there a clustering tendency of data in space (point data)?

Complete Spatial Randomness(CSR) Clustered Di-clustered

1. Hierarchical2. Partitioning: k-means3. Density-based: DBSCAN

Picture Courtesy: Prof. Shashi Shekhar @ UMN

Page 20: Identifying Patterns In Spatial Data

SPATIAL HOTSPOT DETECTION• Special case of clustering• Identify regions with high density - not a complete partitioning of data• Ignore noise or sparse clusters• Crime/disease outbreaks, traffic jam, water pollution…• Statistical significance – avoid random clusters

• Density-based approaches: DBSCAN[8]

• Statistical tests – spatial scan statistics[9] (public health)

Spatial Scan Statistics

Spatial Scan StatisticsDBSCAN DBSCAN

Page 21: Identifying Patterns In Spatial Data

NEW DIMENSIONS OF SPATIAL PATTERNS• Patterns on Spatial Networks• Hotspots (Dangerous routes with high risk of accidents)[10]

• Clusters (Crimes along the streets, bus/bike route planning)• Predictions

• Irregular/complex-shaped Spatial Patterns• Complex-shaped clusters (terrain constraints)• Irregular Hotspots (gerrymander …)

Results on pedestrian fatality data from Orlando, FL.[10]

Page 22: Identifying Patterns In Spatial Data

ADDING TIME• Input data• Spatial data Spatio-temporal data• Time series • Vector: point sequences, polygon series…• Raster: image sequences, spatial time series (a time series at each grid)

• Relationship: before, after, during, simultaneous, …

• Statistical Foundations• Markov Chain, Hidden Markov Model…• Spatiotemporal Statistics

Page 23: Identifying Patterns In Spatial Data

ADDING TIME - PATTERNSSpatial Data Mining Pattern Families

Spatiotemporal Patterns

Spatial Prediction/Geographic Classification

ST prediction (trajectory prediction, climate projection, market prediction…)

Spatial Anomaly/Outlier Detection ST Anomaly (abnormal climate events, traffic sensors…)

Spatial Co-location Patterns Co-occurrence[11], Cascading pattern[12]

(Crime associations, potential social connections)

Spatial Clustering/Hotspot detection Space-time clusters[13] (disease monitoring)Moving clusters (flocks, fleet, etc)Emerging Hotspot (New market…)Spreading hotspot (Strikes, Arabic Spring…)

Page 24: Identifying Patterns In Spatial Data

ADDING TIME – NEW PATTERNS• New Dimensions of Temporal Information• Change• Repeating/periodicity

Temporal dimensions Spatiotemporal Patterns

Change Change Footprint Pattern Discovery[2]

- Where and When changes occur- Climate change, Business grow, urban sprawl,

etcChange Prediction- Where and When will change occur

Repeating/periodic Finding periodic travel patterns, schedules, habitsAn annual increase of 11.5%, 2001-2012

2001 2006 2012

Vegetation increase in Saudi Arabia due to irrigation [14]

Page 25: Identifying Patterns In Spatial Data

CHANGE FOOTPRINT PATTERNS

Time

Time

Time

Time

Local

Focal

Zonal

Static

Between snapshots

Point in time series

Interval in time series

Page 26: Identifying Patterns In Spatial Data

OUTLINE• Introduction

• Spatial Data and Models

• Statistical models

• Spatial Pattern Families

• Computational Challenges

Page 27: Identifying Patterns In Spatial Data

COMPUTATIONAL CHALLENGES• Neighborhood graph generation

• Parameter Estimation

• Better Interpretability• Complex-shapes of pattern• Filter-n-refine approach

• Pattern Completeness • High combinatorics of patterns • Enumeration and pruning strategies

• Interest measure property• DP or Greedy may not be used

• HPC with Spatial Data Mining• Parallel/Cloud Computing• GIS on Hadoop (ESRI)

Conceptual Modeling

Algorithm Design

Interest measure

Computational Scalability

Patt

ern

In

terp

reta

bili

ty

balance

Page 28: Identifying Patterns In Spatial Data

SUMMARY• What is SDM and why it’s important

• What’s special about spatial

• Pattern families, potential directions and applications

• Computational Challenges

Page 29: Identifying Patterns In Spatial Data

ACKNOWLEDGEMENT• This presentation is prepared based on materials from Prof. Shashi

Shekhar and the Spatial Database and Spatial Data Mining Group at the University of Minnesota (http://www.spatial.cs.umn.edu/).

Page 30: Identifying Patterns In Spatial Data

REFERENCES AND READINGS[1]. Shekhar, Shashi, et al. "Identifying patterns in spatial information: A survey of methods." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1.3 (2011): 193-214. [2]. Xun Zhou, Shashi Shekhar, and Reem Y. Ali. "Spatiotemporal change footprint pattern discovery: an inter‐disciplinary survey." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4.1 (2014): 1-23.[3]. Shashi Shekhar and Sanjay Chawla. Spatial Database: A Tour. Prentice Hall 2003.[4]. Banerjee, Sudipto, Alan E. Gelfand, and Bradley P. Carlin. Hierarchical modeling and analysis for spatial data. CRC Press, 2004.[5]. Jiang, Z., Shekhar, S., Zhou, X., Knight, J., & Corcoran, J. (2013, December). Focal-test-based spatial decision tree learning: A summary of results. In Data Mining (ICDM), 2013 IEEE 13th International Conference on (pp. 320-329). IEEE.[6]. Shekhar, Shashi, Chang-Tien Lu, and Pusheng Zhang. "A unified approach to detecting spatial outliers." GeoInformatica 7, no. 2 (2003): 139-166.[7]. Y Huang, S Shekhar, H Xiong, Discovering colocation patterns from spatial data sets: a general approach. Knowledge and Data Engineering, IEEE Transactions on 16 (12), 1472-1485[8]. Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). "A density-based algorithm for discovering clusters in large spatial databases with noise". In Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)[9]. Kulldorff, Martin. "A spatial scan statistic." Communications in Statistics-Theory and methods 26.6 (1997): 1481-1496.[10]. Dev Oliver, Shashi Shekhar, Xun Zhou, Emre Eftelioglu, Michael Evans, Qiaodi Zhuang, James Kang, Renee Laubscher and Christopher Farah. Significant Route Discovery: A Summary of Results. In GIScience 2014 (to appear).[11]. Celik, Mete, et al. "Mixed-drove spatiotemporal co-occurrence pattern mining." Knowledge and Data Engineering, IEEE Transactions on 20.10 (2008): 1322-1335.[12]. Mohan, Pradeep, Shashi Shekhar, James A. Shine, and James P. Rogers. "Cascading spatio-temporal pattern discovery." Knowledge and Data Engineering, IEEE Transactions on 24, no. 11 (2012): 1977-1992.[13]. Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005[14]. Xun Zhou, Shashi Shekhar, Dev Oliver. "Discovering Persistent Change Windows in Spatiotemporal Datasets: A Summary of Results". In 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial-2013), Nov 5, 2013, Orlando, Florida, USA.