shan's spm ppt
Transcript of shan's spm ppt
-
7/29/2019 shan's spm ppt
1/27
SPATIAL DATA MINING
Name: Shan Parvej Gayen
Roll No :-069123
Department Of MtechCSEHeritage Institute Of Technology
5 April 2012 1
-
7/29/2019 shan's spm ppt
2/27
LEARNING OBJECTIVES
Understand the concept of Spatial Data Mining
Describe the concepts of patterns and SDM
Describe the motivation for SDM
Learn about patterns explored by SDM Learn techniques on how to find spatial patterns
5April2012
2
-
7/29/2019 shan's spm ppt
3/27
EXAMPLESOF SPATIAL PATTERNS
Historic Examples (section 7.1.5, pp. 186)
1855 Asiatic Cholera in London : A water pump
identified as the source
Fluoride and healthy gums near Colorado river
Theory of Gondwanaland - continents fit like pieces of
a jigsaw puzzle
Modern ExamplesCancer clusters to investigate environment health
hazardsCrime hotspots for planning police patrol routes
Bald eagles nest on tall trees near open water
Nile virus spreading from north east USA to south andwest
Unusual warming of Pacific ocean (El Nino) affectsweather in USA
5April2012
3
-
7/29/2019 shan's spm ppt
4/27
WHATISA SPATIAL PATTERN?
What is not a pattern?
Random, haphazard, chance, stray, accidental,
unexpected.
Without definite direction, trend, rule, method,design, aim, purpose.
What is a Pattern?
A frequent arrangement, configuration,
composition, regularity. A rule, law, method, design, description.
A major direction, trend, prediction.
5April2012
4
-
7/29/2019 shan's spm ppt
5/27
DEFINING SPATIAL DATA MINING
Search for spatial patterns.
Non-trivial searchas automated aspossible. Large search space of plausible hypothesis
Ex. Asiatic cholera : causes water, food, air, insects. Interesting, useful, and unexpected spatial
patterns. Useful in certain application domain
Ex. Shutting off identified water pump => saved human
lives. May provide a new understanding of the world
Ex. Water pumpCholera connection lead to the germtheory.
5April2012
5
-
7/29/2019 shan's spm ppt
6/27
WHATIS NOT SPATIAL DATA MINING
Simple querying of Spatial Data
Finding neighbors of Canada given names and
boundaries of all countries (Search space not large)
Uninteresting or obvious patterns Heavy rainfall in Minneapolis is correlated with
heavy rainfall in St. Paul (10 miles apart).
Common knowledge, nearby places have similar
rainfall
Mining of non-spatial data
Diaper sales and beer sales are correlated in
evenings
5April2012
6
-
7/29/2019 shan's spm ppt
7/27
FAMILIESOF SPATIAL DATA MINING
PATTERNS
Location Prediction:
Where will a phenomenon occur?
Spatial Interactions
Which subset of spatial phenomena interact?
Hot spot
Which locations are unusual or share commonalities?
Note:
Other families of spatial patterns may be defined
SDM is a growing field, which should accommodate
new pattern families
5April2012
7
-
7/29/2019 shan's spm ppt
8/27
LOCATION PREDICTION
Where will a phenomenon occur?
Which spatial events are predictable?
How can a spatial event be predicted from
other spatial events? Examples
Where will an endangered bird nest?
Which areas are prone to fire given maps of
vegitation and drought? What should be recommended to a traveler in a
given location?
5April2012
8
-
7/29/2019 shan's spm ppt
9/27
SPATIAL INTERACTIONS
Which spatial events are related to each other?
Which spatial phenomena depend on otherphenomenon?
Examples Earth science:
climate and disturbance => {wild fires, hot, dry, lightning}
Epidemiology: Disease type and enviornmental events => {West Nile
disease, stagnant water source, dead birds, mosquitoes}
5April2012
9
-
7/29/2019 shan's spm ppt
10/27
HOTSPOTS
Is a phenomenonspatially clutered?
Which spatial entities
are unusual or sharecommoncharacteristics?
Examples
Crime hot spots to planpolice patrols
Cancer cluster to learn
investigations
5April2012
10
-
7/29/2019 shan's spm ppt
11/27
SPATIAL QUERIES
Spatial Range Queries Find all cities within 50 miles of Paris
Query has associated region (location,boundary)
Answer includes overlapping or contained dataregions
Nearest-Neighbor Queries Find the 10 cities nearest to Paris
Results must be ordered by proximity Spatial Join Queries
Find all cities near a lake
Join condition involves regions and proximity.
5April2012
11
-
7/29/2019 shan's spm ppt
12/27
UNIQUE PROPERTIESOF SPATIAL
PATTERNS
Items in a traditional data are independent of each
other, where as properties of location in a map are
often auto-correlated (patterns exist)
Traditional data deals with simple domains, e.g.
numbers and symbols where as spatial data types
are complex
Items in traditional data describe discrete objects
where as spatial data is continuous
5April2012
12
-
7/29/2019 shan's spm ppt
13/27
ASSOCIATION RULES
Support = the number of time a rule shows
up in a database
Confidence = Conditional probability of Y
given X Example
(Bedrock type = limestone), (soil depth < 50 ft)
=> (sink hole risk = high)
Support = 20 %, confidence = 0.8
Interpretation: Locations with limestone bedrock
and low soil depth have high risk of sink hole
formation.
5April2012
13
-
7/29/2019 shan's spm ppt
14/27
APRIORI ALGORITHMTOMINE
ASSOCIATIONRULES
Key challenge
Very large search space
Key assumption
Few associations are support above given threshold
Associations with low support are not interesting
Key insight
If an association item set has high support, then so do
all its subsets
5A
pril2012
14
-
7/29/2019 shan's spm ppt
15/27
ASSOCIATIONRULES EXAMPLE 5A
pril2012
15
-
7/29/2019 shan's spm ppt
16/27
TECHNIQUESFOR ASSOCIATION
MINING
Classical method
Association rules given item types and transactions
Assumes spatial data can be decomposed into
transactions
Such decomposition may alter spatial patterns
New spatial method
Spatial association rule
Spatial co-location
5A
pril2012
16
-
7/29/2019 shan's spm ppt
17/27
ASSOCIATIONS, SPATIAL
ASSOCIATIONS, CO-LOCATION 5A
pril2012
17
-
7/29/2019 shan's spm ppt
18/27
ASSOCIATIONS, SPATIALASSOCIATINS,
CO-LOCATION 5A
pril2012
18
-
7/29/2019 shan's spm ppt
19/27
CO-LOCATION RULES
For point data in space
Does not need transaction, works directly with
continuous space
Use neighborhood definition and spatial joins
5A
pril2012
19
-
7/29/2019 shan's spm ppt
20/27
CO-LOCATIONRULES
Co location
5A
pril2012
20
-
7/29/2019 shan's spm ppt
21/27
CLUSTERING
Process of discovering groups in large databases
Spatial view: rows in a database = points in a multi-
dimentional space.
Visualization may reveal interesting groups
5A
pril2012
21
-
7/29/2019 shan's spm ppt
22/27
CLUSTERING
Hierarchical
All points in one cluster
Split and merge till a stop criterion is reached
Partitional Start with random central point
Assign points to nearest central point
Update the central points
Approach with statistical rigor Density
Find clusters based on density of regions
5A
pril2012
22
-
7/29/2019 shan's spm ppt
23/27
OUTLIERS
Observations inconsistent with rest of the dataset
Observations inconsistent with their neighborhoods
A local instability or discontinuity
5A
pril2012
23
-
7/29/2019 shan's spm ppt
24/27
VARIOGRAM CLOUD
Create a variogram by plotting attributedifference, distance for each pair of points
Select points common to many outlying pairs
5A
pril2012
24
-
7/29/2019 shan's spm ppt
25/27
MORAN SCATTER PLOT
Plot normalized attribute values, weighted average in theneighborhood for each location
Select points in upper left and lower right quadrant
5A
pril2012
25
-
7/29/2019 shan's spm ppt
26/27
SCATTERPLOT
Plot normalized attribute values, weighted average in theneighborhood for each location
Fit a liner regression line
Select points which are unusually far from the regressionline.
5A
pril2012
26
-
7/29/2019 shan's spm ppt
27/27
CONCLUSION
Patterns are opposite of random
Common spatial patterns:
Location prediction
Feature interaction
Hot spot
Spatial patterns may be discovered using:
Techniques like associations, clustering and outlier
detection
5A
pril2012
27