shan's spm ppt

download shan's spm ppt

of 27

Transcript of shan's spm ppt

  • 7/29/2019 shan's spm ppt

    1/27

    SPATIAL DATA MINING

    Name: Shan Parvej Gayen

    Roll No :-069123

    Department Of MtechCSEHeritage Institute Of Technology

    5 April 2012 1

  • 7/29/2019 shan's spm ppt

    2/27

    LEARNING OBJECTIVES

    Understand the concept of Spatial Data Mining

    Describe the concepts of patterns and SDM

    Describe the motivation for SDM

    Learn about patterns explored by SDM Learn techniques on how to find spatial patterns

    5April2012

    2

  • 7/29/2019 shan's spm ppt

    3/27

    EXAMPLESOF SPATIAL PATTERNS

    Historic Examples (section 7.1.5, pp. 186)

    1855 Asiatic Cholera in London : A water pump

    identified as the source

    Fluoride and healthy gums near Colorado river

    Theory of Gondwanaland - continents fit like pieces of

    a jigsaw puzzle

    Modern ExamplesCancer clusters to investigate environment health

    hazardsCrime hotspots for planning police patrol routes

    Bald eagles nest on tall trees near open water

    Nile virus spreading from north east USA to south andwest

    Unusual warming of Pacific ocean (El Nino) affectsweather in USA

    5April2012

    3

  • 7/29/2019 shan's spm ppt

    4/27

    WHATISA SPATIAL PATTERN?

    What is not a pattern?

    Random, haphazard, chance, stray, accidental,

    unexpected.

    Without definite direction, trend, rule, method,design, aim, purpose.

    What is a Pattern?

    A frequent arrangement, configuration,

    composition, regularity. A rule, law, method, design, description.

    A major direction, trend, prediction.

    5April2012

    4

  • 7/29/2019 shan's spm ppt

    5/27

    DEFINING SPATIAL DATA MINING

    Search for spatial patterns.

    Non-trivial searchas automated aspossible. Large search space of plausible hypothesis

    Ex. Asiatic cholera : causes water, food, air, insects. Interesting, useful, and unexpected spatial

    patterns. Useful in certain application domain

    Ex. Shutting off identified water pump => saved human

    lives. May provide a new understanding of the world

    Ex. Water pumpCholera connection lead to the germtheory.

    5April2012

    5

  • 7/29/2019 shan's spm ppt

    6/27

    WHATIS NOT SPATIAL DATA MINING

    Simple querying of Spatial Data

    Finding neighbors of Canada given names and

    boundaries of all countries (Search space not large)

    Uninteresting or obvious patterns Heavy rainfall in Minneapolis is correlated with

    heavy rainfall in St. Paul (10 miles apart).

    Common knowledge, nearby places have similar

    rainfall

    Mining of non-spatial data

    Diaper sales and beer sales are correlated in

    evenings

    5April2012

    6

  • 7/29/2019 shan's spm ppt

    7/27

    FAMILIESOF SPATIAL DATA MINING

    PATTERNS

    Location Prediction:

    Where will a phenomenon occur?

    Spatial Interactions

    Which subset of spatial phenomena interact?

    Hot spot

    Which locations are unusual or share commonalities?

    Note:

    Other families of spatial patterns may be defined

    SDM is a growing field, which should accommodate

    new pattern families

    5April2012

    7

  • 7/29/2019 shan's spm ppt

    8/27

    LOCATION PREDICTION

    Where will a phenomenon occur?

    Which spatial events are predictable?

    How can a spatial event be predicted from

    other spatial events? Examples

    Where will an endangered bird nest?

    Which areas are prone to fire given maps of

    vegitation and drought? What should be recommended to a traveler in a

    given location?

    5April2012

    8

  • 7/29/2019 shan's spm ppt

    9/27

    SPATIAL INTERACTIONS

    Which spatial events are related to each other?

    Which spatial phenomena depend on otherphenomenon?

    Examples Earth science:

    climate and disturbance => {wild fires, hot, dry, lightning}

    Epidemiology: Disease type and enviornmental events => {West Nile

    disease, stagnant water source, dead birds, mosquitoes}

    5April2012

    9

  • 7/29/2019 shan's spm ppt

    10/27

    HOTSPOTS

    Is a phenomenonspatially clutered?

    Which spatial entities

    are unusual or sharecommoncharacteristics?

    Examples

    Crime hot spots to planpolice patrols

    Cancer cluster to learn

    investigations

    5April2012

    10

  • 7/29/2019 shan's spm ppt

    11/27

    SPATIAL QUERIES

    Spatial Range Queries Find all cities within 50 miles of Paris

    Query has associated region (location,boundary)

    Answer includes overlapping or contained dataregions

    Nearest-Neighbor Queries Find the 10 cities nearest to Paris

    Results must be ordered by proximity Spatial Join Queries

    Find all cities near a lake

    Join condition involves regions and proximity.

    5April2012

    11

  • 7/29/2019 shan's spm ppt

    12/27

    UNIQUE PROPERTIESOF SPATIAL

    PATTERNS

    Items in a traditional data are independent of each

    other, where as properties of location in a map are

    often auto-correlated (patterns exist)

    Traditional data deals with simple domains, e.g.

    numbers and symbols where as spatial data types

    are complex

    Items in traditional data describe discrete objects

    where as spatial data is continuous

    5April2012

    12

  • 7/29/2019 shan's spm ppt

    13/27

    ASSOCIATION RULES

    Support = the number of time a rule shows

    up in a database

    Confidence = Conditional probability of Y

    given X Example

    (Bedrock type = limestone), (soil depth < 50 ft)

    => (sink hole risk = high)

    Support = 20 %, confidence = 0.8

    Interpretation: Locations with limestone bedrock

    and low soil depth have high risk of sink hole

    formation.

    5April2012

    13

  • 7/29/2019 shan's spm ppt

    14/27

    APRIORI ALGORITHMTOMINE

    ASSOCIATIONRULES

    Key challenge

    Very large search space

    Key assumption

    Few associations are support above given threshold

    Associations with low support are not interesting

    Key insight

    If an association item set has high support, then so do

    all its subsets

    5A

    pril2012

    14

  • 7/29/2019 shan's spm ppt

    15/27

    ASSOCIATIONRULES EXAMPLE 5A

    pril2012

    15

  • 7/29/2019 shan's spm ppt

    16/27

    TECHNIQUESFOR ASSOCIATION

    MINING

    Classical method

    Association rules given item types and transactions

    Assumes spatial data can be decomposed into

    transactions

    Such decomposition may alter spatial patterns

    New spatial method

    Spatial association rule

    Spatial co-location

    5A

    pril2012

    16

  • 7/29/2019 shan's spm ppt

    17/27

    ASSOCIATIONS, SPATIAL

    ASSOCIATIONS, CO-LOCATION 5A

    pril2012

    17

  • 7/29/2019 shan's spm ppt

    18/27

    ASSOCIATIONS, SPATIALASSOCIATINS,

    CO-LOCATION 5A

    pril2012

    18

  • 7/29/2019 shan's spm ppt

    19/27

    CO-LOCATION RULES

    For point data in space

    Does not need transaction, works directly with

    continuous space

    Use neighborhood definition and spatial joins

    5A

    pril2012

    19

  • 7/29/2019 shan's spm ppt

    20/27

    CO-LOCATIONRULES

    Co location

    5A

    pril2012

    20

  • 7/29/2019 shan's spm ppt

    21/27

    CLUSTERING

    Process of discovering groups in large databases

    Spatial view: rows in a database = points in a multi-

    dimentional space.

    Visualization may reveal interesting groups

    5A

    pril2012

    21

  • 7/29/2019 shan's spm ppt

    22/27

    CLUSTERING

    Hierarchical

    All points in one cluster

    Split and merge till a stop criterion is reached

    Partitional Start with random central point

    Assign points to nearest central point

    Update the central points

    Approach with statistical rigor Density

    Find clusters based on density of regions

    5A

    pril2012

    22

  • 7/29/2019 shan's spm ppt

    23/27

    OUTLIERS

    Observations inconsistent with rest of the dataset

    Observations inconsistent with their neighborhoods

    A local instability or discontinuity

    5A

    pril2012

    23

  • 7/29/2019 shan's spm ppt

    24/27

    VARIOGRAM CLOUD

    Create a variogram by plotting attributedifference, distance for each pair of points

    Select points common to many outlying pairs

    5A

    pril2012

    24

  • 7/29/2019 shan's spm ppt

    25/27

    MORAN SCATTER PLOT

    Plot normalized attribute values, weighted average in theneighborhood for each location

    Select points in upper left and lower right quadrant

    5A

    pril2012

    25

  • 7/29/2019 shan's spm ppt

    26/27

    SCATTERPLOT

    Plot normalized attribute values, weighted average in theneighborhood for each location

    Fit a liner regression line

    Select points which are unusually far from the regressionline.

    5A

    pril2012

    26

  • 7/29/2019 shan's spm ppt

    27/27

    CONCLUSION

    Patterns are opposite of random

    Common spatial patterns:

    Location prediction

    Feature interaction

    Hot spot

    Spatial patterns may be discovered using:

    Techniques like associations, clustering and outlier

    detection

    5A

    pril2012

    27