Raster Data in ArcSDE 9 - Esri - GIS Mapping Software, Solutions
Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation
-
Upload
blue-raster -
Category
Documents
-
view
1.382 -
download
4
Transcript of Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation
Brendan Collins
“The function of the brain and nervous system is to protect us from being overwhelmed and confused by this mass of largely useless and irrelevant knowledge, by shutting out most of what we should otherwise perceive or remember at any moment, and leaving only that very small and special selection which is likely to be practically useful.”
-Aldous Huxley
103,000 Public Schools (No Clustering)
103,000 Public Schools (Count)
103,000 Public Schools (Mean Student Teacher Ratio)
Operation Point Cluster
• Review general clustering algorithms
• Suggest strategies & implementations for clustering for web applications– Server-side (C#)– Offline w/ArcGIS (Python)– Offline w/3rd Party (Python)
Data Classification(One Dimensional Clustering)
• Equal-interval– Clusters have same max – min (interval)
• Quantile– Clusters have same count
• Natural Breaks (Jenks)– Clusters have minimum deviation from mean
KMeans(Centroid-based)
KMeans(Centroid-based)
1. Choose random starting points2. Assign each target point to cluster candidates 3. Replace randomly centroid point with mean of group.4. Repeat steps 2 & 3 until convergence.
Grid Clustering(Grid-based)
1. Overlay mesh sized appropriate for zoom level
2. Compare point coordinates to mesh to create clusters.
• Very common on client-side• Can lead to undesired “Grid” effect
• Somewhat non-deterministic
QuadTree(Distance-based)
http://en.wikipedia.org/wiki/QUADTREE
QuadTree(Distance-based)
1.Input minimum cluster tolerance2.Recursively insert points into
existing tree1. Where distance < tolerance, number
of points++2. Where distance > tolerance, insert
to child node.
• Easy to implement• Can lead to “Grid” affect
http://en.wikipedia.org/wiki/DBSCAN
DBSCAN(Density-based)
DBSCAN(Density-based)
1. Takes search radius and minimum number of points for cluster2. Visit each point and count
number of points in search radius
• Clusters can be any shape• Search radius determined by zoom level
Strategies & Implementations for Web Apps(Server Object Extension vs. Pre-Crunched)
Where should clustering occur?
Client-side• Small number of points ( < 10,000 )• No addition server load• Widely available within client APIs• Limited by client-side languages
Server-side• Medium number of points ( < 1M )• Many language/library options• Robust querying• Very maintainable / extendible
Offline• Large number of points( > 1M)• Many language/library options• Limited querying• Output Normal Feature Class
Clustering Server Object Extension(C#/QuadTree)
1.Extends MapServer 2.Wraps map query based on extent3.returns clustered results4.Stateless5.Problems
1. Re-calculates tree on each request 2. Client-side wrappers3. Lost out-of-box ArcGIS Server
functions
Clustering with Arcpy(distance-based / offline)
1.Divide data into logical chunks (where clause)
2.Integrate using tolerance3.Collect Events4.Spatial Joinadd descriptive statistics
4.Append all results
Clustering w/Python
• Numpy/Scipy– Defacto
• Scikit-Learn – (Python machine learning library)
• PyTables– HDF5, akin to NetCDF, but with support for hierarchical tables and very scalable
– http://bcdcspatial.blogspot.com/2013/02/converting-arcgis-feature-class-to.html
Scikit-Learn
SciKit – Learn…btw it’s awesome - http://scikit-learn.org/stable/
Bleeding Edge Python
• PyPy, Cython, Anaconda, Numba Pro, Pandas
• Python is now a first-class citizen on the GPU!
In Summary:
• Clustering is not Panning• Think outside Count• Clustering is not only for spatial data
Thank You!
Follow us on Twitter:@blueraster@brendancol
Visit us at:blueraster.com/blogbcdcspatial.blogspot.com