4.2 spatial data mining

24
1 Spatial Data Mining

Transcript of 4.2 spatial data mining

Page 1: 4.2 spatial data mining

1

Spatial Data Mining

Page 2: 4.2 spatial data mining

Spatial Database

2

Stores a large amount of space-related dataMapsRemote Sensing Medical ImagingVLSI chip layout

Have Topological and distance informationRequire spatial indexing, data access,

reasoning ,geometric computation and knowledge representation techniques

Page 3: 4.2 spatial data mining

Spatial Data Mining

3

Extraction of knowledge, spatial relationships from spatial databases

Can be used for understanding spatial data and spatial relationships

Applications:GIS, Geomarketing, Remote Sensing, Image

database exploration, medical imaging, Navigation

ChallengesComplexity of spatial data types and access

methodsLarge amounts of data

Page 4: 4.2 spatial data mining

Cont.

4

Non-spatial InformationSame as data in traditional data miningNumerical, categorical, ordinal, boolean, etc

e.g., city name, city populationSpatial Information

Spatial attribute: geographically referenced Neighborhood and extentLocation, e.g., longitude, latitude, elevation

Spatial data representationsRaster: gridded spaceVector: point, line, polygonGraph: node, edge, path

Page 5: 4.2 spatial data mining

Spatial Data

5

Page 6: 4.2 spatial data mining

Cont.

6

Page 7: 4.2 spatial data mining

Statistical techniques

7

Popular approach to analyze spatial dataAssumes independence among spatial dataCan be performed only by expertsDo not work well with symbolic values

Page 8: 4.2 spatial data mining

Spatial Data Warehousing

8

Spatial data warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data repository.It consists of both spatial and non spatial in support of spatial

data mining and spatial-data-related decision-making processes.

Spatial data cube: multidimensional spatial databaseBoth dimensions and measures may contain spatial

components.Challenging issues:

Spatial data integration: a big issueStructure-specific formats (raster- vs. vector-based, OO vs. relational

models, different storage and indexing, etc.)Vendor-specific formats (ESRI, MapInfo, Intergraph, IDRISI, etc.)

Realization of Fast and flexible OLAP in spatial data warehouses.

Page 9: 4.2 spatial data mining

Dimensions and Measures in Spatial Data Warehouse

9

Dimensionsnon-spatial

e.g. “25-30 degrees” generalizes to“hot” (both are strings)spatial-to-non spatial

e.g. Seattle generalizes to description “Pacific Northwest” (as a string)

spatial-to-spatiale.g. Seattle generalizes to Pacific Northwest (as a spatial

region)Measures

numerical (e.g. monthly revenue of a region)distributive (e.g. count, sum)algebraic (e.g. average)holistic (e.g. median, rank)

spatialcollection of spatial pointers (e.g. pointers to all regions

with temperature of 25-30 degrees in July)

Page 10: 4.2 spatial data mining

Example: British Columbia Weather Pattern Analysis

10

InputA map with about 3,000 weather probes scattered in B.C.

Recording daily data for temperature, precipitation, wind velocity, etc. for a designated small area and transmitting signal to a provincial weather station.

Data warehouse using star schemaOutput

A map that reveals patterns: merged (similar) regionsGoals

Interactive analysis (drill-down, slice, dice, pivot, roll-up)Fast response timeMinimizing storage space used

ChallengeA merged region may contain hundreds of “primitive” regions

(polygons)

Page 11: 4.2 spatial data mining

Star Schema of the BC Weather WarehouseSpatial data

warehouseDimensions

region_nametimetemperatureprecipitation

Measurementsregion_mapareacount

11Fact tableDimension table

Page 12: 4.2 spatial data mining

12

Can we precompute all of the possible spatial merges and store them in the corresponding cuboid cells of a spatial data cube?

Probably not.It requires multi-megabytes of storage.On-line computation is slow and expensive.

Page 13: 4.2 spatial data mining

Dynamic Merging of Spatial Objects

13

Page 14: 4.2 spatial data mining

Methods for Computing Spatial Data Cubes

14

On-line aggregation: collect and store pointers to spatial objects in a spatial data cubeexpensive and slow, need efficient aggregation

techniquesPrecompute and store all the possible

combinationshuge space overhead

Precompute and store rough approximations in a spatial data cubeaccuracy trade-off, MBR

Selective computation: only materialize those which will be accessed frequentlya reasonable choice

Page 15: 4.2 spatial data mining

Mining Spatial Association and Co-location Patterns

15

Spatial association rule: A B [s%, c%]A and B are sets of spatial or non-spatial

predicatesTopological relations: intersects, overlaps, disjoint, etc.Spatial orientations: left_of, west_of, under, etc.Distance information: close_to, within_distance, etc.

s% is the support and c% is the confidence of the rule

Examples is_a(x, “School”) ^ Close_to(x, “Sports_Center”)

close_to(x, “Park”) [7%, 85%]

Page 16: 4.2 spatial data mining

Progressive Refinement

16

Progressive Refinement:spatial association mining needs to evaluate multiple

spatial relationships among a large no. of spatial object – expensive.

Hierarchy of spatial relationship:First search for rough relationship and then refine itSuperset coverage property – all the potential answers

should be perserved (i.e.false-positive test).Two-step mining of spatial association:

Step 1: Rough spatial computation (as a filter) Using MBR for rough estimation

Step2: Detailed spatial algorithm (as refinement) Apply only to those objects which have passed the rough

spatial association test (no less than min_support)

Page 17: 4.2 spatial data mining

Spatial co-locations

17

Just what one really wants to explore.Based on the property of spatial

autocorrelation, interesting features likely coexist in closely located regions.

Efficient methods - Apriori , progressive refinement,etc.

Page 18: 4.2 spatial data mining

18

Page 19: 4.2 spatial data mining

Spatial Cluster Analysis

19

• Mining clusters—k-means, k-medoids, hierarchical, density-based, etc.

• Analysis of distinct features of the clusters

Page 20: 4.2 spatial data mining

Spatial Classification

20

Analyze spatial objects to derive classification schemes, such as decision trees, in relevance to certain spatial properties (district, highway, river, etc.)Classifying medium-size families according to income,

region, and infant mortality ratesMining for volcanoes on Venus

Employ methods such as:Decision-tree classification, Naïve-Bayesian classifier +

boosting, neural network, genetic programming, etc.

Page 21: 4.2 spatial data mining

Spatial Trend Analysis

21

FunctionDetect changes and trends along a spatial dimensionStudy the trend of non-spatial or spatial data changing

with spaceApplication examples

Observe the trend of changes of the climate or vegetation with increasing distance from an ocean

Crime rate or unemployment rate change with regard to city geo-distribution.

Traffic flows in highways and in cities.

Page 22: 4.2 spatial data mining

Mining Raster Databases

22

Vector data MiningMapsGraphsMolecular chains

Raster data miningSatellite Images

Page 23: 4.2 spatial data mining

23

Page 24: 4.2 spatial data mining

Other Applications

24

Spatial data mining is used in NASA Earth Observing System (EOS): Earth

science dataNational Inst. of Justice: crime mapping Census Bureau, Dept. of Commerce: census

data Dept. of Transportation (DOT): traffic dataNational Inst. of Health(NIH): cancer clusters Commerce, e.g. Retail Analysis