sdm_han.ppt
description
Transcript of sdm_han.ppt
April 10, 2023 Spatial Data Mining 1
Data Mining in Spatial Databases: A Multi-Disciplinary Promise
Jiawei HanDatabase Systems Research Lab.
Department of Computing Science
University of Illinois at Urbana-Champaign
http://www.cs.uiuc.edu/~hanj
April 10, 2023 Spatial Data Mining 2
Outline
Why geo-spatial data mining?
Spatial data mining: major progress
Spatial OLAP
Spatial association
Spatial classification
Spatial clustering and outlier analysis
Research challenges in spatial data
mining
April 10, 2023 Spatial Data Mining 3
Why Geo-Spatial Data Mining?
Spatial data mining Mining interesting knowledge/patterns from
huge amount of spatial data Necessity is the mother of invention
Data explosion problem: Data is overwhelming and everywhere—automated data collection, satellite images, remote sensing, GPS, mobile computing and network technology, WWW, etc.)
Making data in use: Data mining may lead to important discoveries
April 10, 2023 Spatial Data Mining 4
Spatial Data Mining vs. Traditional Spatial Data Analysis
Scalability and performance Handle gigabytes of data, interactive exploration, multi-
dimensional drilling/rolling, visualization, ...
Tight integration of database systems and GIS systems
Most of spatial/aspatial data have been stored in relational database systems (e.g., Oracle, MS/SQLServer, DB2, Informix), GIS (e.g., ArcInfo, MapInfo), or data warehouses
Tight coupling and seamless integration Data cleaning, data integration, and data consolidation
New methods and functionalities Association, sequential patterns, classification methods, ...
April 10, 2023 Spatial Data Mining 5
Spatial Data Mining: Confluence of Multiple Disciplines
Spatial Data Mining
SpatialDB System
Statistics
Mobile Computing
Geography
MachineLearning (AI) Visualization
Remote Sensing
April 10, 2023 Spatial Data Mining 6
Outline
Why geo-spatial data mining?
Spatial data mining: major progress
Spatial OLAP
Spatial association
Spatial classification
Spatial clustering and outlier analysis
Research challenges in spatial data
mining
April 10, 2023 Spatial Data Mining 7
Spatial Data Mining—Major Progress
Geo-spatial data warehouse and spatial OLAP Spatial data classification/predictive modeling Spatial clustering/segmentation Spatial association and correlation analysis Spatial regression analysis Spatio-temporal pattern analysis Many more to be explored
April 10, 2023 Spatial Data Mining 8
Spatial Data Warehousing
Spatial data warehouse Integrated, subject-oriented, time-variant, and
nonvolatile spatial data repository for data analysis Spatial data integration: a big issue
Structure-specific formats (raster- vs. vector-based, OO vs. relational models, different storage and indexing, etc.)
Vendor-specific formats (ESRI, MapInfo, Integraph, etc.)
Spatial data cube: Multidimensional spatial database Both dimensions and measures may contain
spatial components
April 10, 2023 Spatial Data Mining 9
Star Schema of the BC Weather Warehouse
Spatial data warehouse Dimensions
region_name time temperature precipitation
Measurements region_map area count Fact tableDimension table
April 10, 2023 Spatial Data Mining 10
Spatial OLAP—OLAP on Map Data
April 10, 2023 Spatial Data Mining 11
Dynamic Merging of Spatial Objects?
Materializing (precomputing) all?—too much storage space
On-line merge?—slow, expensive! A better way: object-based,
selective (partial) materialization
April 10, 2023 Spatial Data Mining 12
Spatial Association and Correlation Mining
FIND SPATIAL ASSOCIATION RULE DESCRIBING "Golf Course" FROM Washington_Golf_courses, WashingtonWHERE CLOSE_TO(Washington_Golf_courses.Obj, Washington.Obj, "3 km") AND Washington.CFCC <> "D81" IN RELEVANCE TO Washington_Golf_courses.Obj, Washington.Obj, CFCC SET SUPPORT THRESHOLD 0.5
What kind of objects are usually located close to golf course?
April 10, 2023 Spatial Data Mining 13
Efficient Mining of Spatial Associations
Progressive refinement Hierarchy of spatial relationship:
g_close_to: near_by, touch, intersect, contain, etc. First search for rough relationship and then refine it
Rough spatial computation (as a filter) Using MBR or R-tree for rough estimation
Detailed spatial algorithm (as refinement) Apply only to those objects which have passed the
rough spatial association test (no less than min_support)
Micro-clustering and join indexing methods
April 10, 2023 Spatial Data Mining 14
Spatial Classification and Model Construction
Generalization- or clustering- based induction
Interactive classification
April 10, 2023 Spatial Data Mining 15
Can Typical Classification Methods Be Applied to Spatial Classification?
Decision-tree classification: Entropy-based information-gain vs. Gini-index
vs. MDL Tree pruning methods: boosting/bagging
Naïve-Bayesian classifier + boosting Bayesian belief networks Neural network Genetic programming Nearest neighbor and case-based reasoning Support vector machine method Association-based multi-dimensional classification
April 10, 2023 Spatial Data Mining 16
What Kind of Houses Are Highly Valued?—Associative
Classification
L
HH
H H
L
LLL
HH
HH
HH
H
HH
HH
LL
L
L
L
HH
HH
C03
C04
C02
C08
L
LL
LL C07
Highway
C05
C06
C01 HH
HH
H
C09
L LL
C10
lake
April 10, 2023 Spatial Data Mining 17
Grouping and Associating Spatial Features for Classification
House_ID
MCluster_ID
Spatial Features Yrs
Sqr_ft Class
H01 C05 close_to(como lake), next_to(Futureshop), ...
16 2300 H
H03 C08 close_to(Lougheed_Hwy), next_to(Austin_elmntary), ...
32 2500 L
H45 C09 next_to (QueenEliz_park), next_to (Cambie_road), ...
20 3100 H
H82 C05 close_to(como lake), next_to(Futureshop), ...
18 3400 H
... ... ...... ... ... ...
H1857
c18 inside(east_Vancouver), close_to (Fraser_st), close_to (sky_train_station)
41 2100 L
April 10, 2023 Spatial Data Mining 18
Mining volcanoes on Venus Training set provided by experts Model constructed can be used for
prediction Finding stars in galaxies (JPL’96) QuakeFinder
Find earth quakes related to spatial info
Spatial Classification: Typical Examples
April 10, 2023 Spatial Data Mining 19
Function Detect changes and trends along a spatial dimension Study the trend of non-spatial or spatial data changing
with space Application examples
Observe the trend of changes of the climate or vegetation with the increasing distance from an ocean
Crime rate or unemployment rate change with regard to city geo-distribution
Spatial Trend Analysis
April 10, 2023 Spatial Data Mining 20
Spatial Cluster Analysis
Mining clusters—k-means, k-medoids, hierarchical, density-based, etc.
Analysis of distinct features of the clusters
April 10, 2023 Spatial Data Mining 21
Density-Based Cluster analysis: OPTICS & Its Applications
April 10, 2023 Spatial Data Mining 22
Clustering and Distribution Density Functions: Density Attractor
April 10, 2023 Spatial Data Mining 23
Center-Defined and Arbitrary Shaped
April 10, 2023 Spatial Data Mining 24
STING: A Statistical Information Grid Approach
Wang, Yang and Muntz (VLDB’97) Each cell stores statistical distribution of
measure at low level Multi-level resolution
April 10, 2023 Spatial Data Mining 25
WaveCluster
G. Sheikholeslami, et al. (1998) Multiple wavelet transformation-based cluster analysis
April 10, 2023 Spatial Data Mining 26
Constraints-Based Clustering
Constraints on individual objects Simple selection of such objects before clustering
Clustering parameters as constraints K-means, density-based: radius, min-# of points
Constraints imposed by physical obstacles Clustering with Obstructed Distance
Constraints specified on clusters using SQL aggregates Sum of the profits in each cluster > 1 million $ Average sales in each cluster > 20 million $s Min # of golden customers (in each cluster) > 1000
April 10, 2023 Spatial Data Mining 27
Constraint-Based Clustering: Planning ATM Locations
Mountain
RiverBridge
Spatial data with obstacles
C1
C2C3
C4
Clustering without takingobstacles into consideration
April 10, 2023 Spatial Data Mining 28
Clustering with Spatial Obstacles
Taking obstacles into account
Not Taking obstacles into account
April 10, 2023 Spatial Data Mining 29
Towards Spatial Data Mining System: An Architecture
Graphic User Interface
Spatial DB meta data: hierarchyNon-Spatial DB
Geo-Classifier
Geo-OLAP Analyzer
Geo-Predictor
Geo-Clustor
Geo-Associator
Future Modules Future Modules
Spatial Database and Warehouse Server
April 10, 2023 Spatial Data Mining 30
Outline
Why geo-spatial data mining? Spatial data mining: major progress
Spatial OLAP Spatial association Spatial classification Spatial clustering and outlier analysis
Research challenges in spatial data mining
April 10, 2023 Spatial Data Mining 31
Research Challenges in Spatial Data Mining
Mining temporal spatial data
Mining spatial-related stream data
Spatial data mining applications (land use,
bio-medical)
April 10, 2023 Spatial Data Mining 32
Conclusions
Spatial data mining vs. traditional spatial analysis
Scalability, architecture, functions, methods
Good progress has been made on spatial data
mining
OLAP, association, clustering, classification,
outlier analysis, etc.
Still lots to be done! Young and promising direction
Joint efforts (from multiple disciplines) lead to
joyous promises!
April 10, 2023 Spatial Data Mining 33
http://www.cs.uiuc.edu/~hanj
Thank you !!!Thank you !!!
April 10, 2023 Spatial Data Mining 34
Some References on Spatial Data Mining
H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001.
Ester M., Frommelt A., Kriegel H.-P., Sander J.: Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support, Data Mining and Knowledge Discovery, an International Journal. 4, 2000, pp. 193-216.
J. Han, M. Kamber, and A. K. H. Tung, "Spatial Clustering Methods in Data Mining: A Survey", in H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2000.
Y. Bedard, T. Merrett, and J. Han, "Fundamentals of Geospatial Data Warehous ing for Geo-graphic Knowledge Discovery", in H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2000