Developing a Tutorial for Grouping Analysis in ArcGIS

download Developing a Tutorial for Grouping Analysis in ArcGIS

of 47

Embed Size (px)


This presentation describes tools and possible workflows using the Grouping Analysis tool in ArcGIS. The tutorial developed from this material highlights practical usage of Grouping Analysis with additional tools to solve real-world problems in two scenarios and is suitable for ArcGIS users at any level of experience. The tutorial was produced as a Major Research Project in GIS for Business at the Centre of Geographic Sciences, sponsored by Esri.

Transcript of Developing a Tutorial for Grouping Analysis in ArcGIS

  • Developing a Tutorial for Grouping Analysis in ArcGIS Daniel Pierre May 29, 2014
  • 1. Introduction 2. Data 3. Grouping Analysis Workflows 4. Tutorial Exercises 5. Conclusions: Recommendations Presentation Outline
  • Lauren Rosenshein Bennett, MS Geoprocessing Product Engineer, Esri Dr. Konrad Dramowicz Faculty, Centre of Geographic Sciences Dr. Ela Dramowicz Faculty, Centre of Geographic Sciences Introduction Project Sponsor & Supervisors
  • Introduction Experimental testing of tool with multiple datasets Incorporation of Grouping Analysis with other tools Review of technical literature on clustering algorithms Review of existing tutorials Project Overview
  • Introduction Introduced at ArcGIS 10.1 Available with Basic, Standard and Advanced license levels Found in the Spatial Statistics toolbox, within the Mapping Clusters toolset Script tool Grouping Analysis Tool
  • Introduction ...Performs a classification procedure that tries to find natural clusters in your data. - Esri An aid for data comprehension Feature similarity is based on attributes specified as analysis fields and optionally, spatial constraints Given a number of groups, features within each output group are as similar as possible while groups are as different as possible Grouping Analysis Tool
  • Introduction Two algorithm types: cluster analysis (traditional K-means) and regionalization (spatial K-means) Thirteen parameters (six required) Grouping results contingent on the number of groups, analysis fields, and type of spatial constraint Grouping Analysis Tool
  • Data Features: Esri City of Vancouver Multivariate Data: World Bank BBC Weatherbase Statistics Canada Data Sources
  • Data Data Enrichment (ArcGIS Online) HTML table import Spreadsheet reformatting Table joins Feature class edits Data Preparation
  • Data Selection Criteria: Two scales of analysis Illustration of various spatial constraint effects on results Sufficient number of features Visible spatial patterns in results Tutorial Datasets
  • General Steps: Exploratory data analysis Preprocessing Determining appropriate Grouping Analysis settings Postprocessing, interpretation and evaluation of results Grouping Analysis Workflows
  • Exploratory Data Analysis 1. Distribution of variable values Thematic mapping Spatial autocorrelation 2. Spatial relationships among features Contiguity of features and number of neighbours Spatial autocorrelation Exploratory Data Analysis
  • Exploratory Data Analysis Explore distribution of dataset variables Choropleth maps and graduated symbol maps Identify set of variables to be used for Grouping Analysis Thematic Mapping
  • Exploratory Data Analysis Analyze contiguity relationships among features Polygon Neighbors tool Determine relative connectivity of features by counting number of neighbours Frequency tool Spatial Relationships
  • Exploratory Data Analysis Analyze contiguity and/or proximity relationships among features using GeoDa Create spatial weights Display histogram of feature connectivity according to defined spatial relationships Histogram linked to map and attribute table Alternative Approach
  • Exploratory Data Analysis Considers attribute values and location of features simultaneously Morans I statistic determines whether spatial pattern of values is dispersed, random or clustered Significance of pattern evaluated with corresponding z-score One variable at a time Spatial Autocorrelation
  • Preprocessing Use hot spots to limit study area for Grouping Analysis: Calculate incremental spatial autocorrelation Identify distance band of most intense clustering Create hot spot map Select features from original dataset based on location of hot spots Preprocessing
  • Grouping Analysis Settings 1. How many groups should be created? 2. Which analysis fields should be used? 3. Is a spatial constraint necessary? If so, which type is appropriate? Grouping Analysis Settings: Key Considerations
  • Grouping Analysis Settings Default number is 2 Sturges rule: C = 1 + 3.3 log(n), where C is the number of groups and n is the number of features Evaluate the optimal number of groups (up to a maximum of 15) Number of Groups
  • Grouping Analysis Settings Two vs. Three Groups
  • Grouping Analysis Settings Generally driven by research purpose and objectives of grouping Guide selection of analysis fields with exploratory data analysis findings Spatial variables may be used as indirect spatial constraints Assess effectiveness of fields to distinguish features with output report Analysis Fields
  • Grouping Analysis Settings Temperature: Spatial Variable
  • Grouping Analysis Settings Choice of spatial constraint or no spatial constraint determines which algorithm is used for grouping No spatial constraint traditional K-Means (data space only) Any spatial constraint Spatial Kluster Analysis by Tree Edge Removal (SKATER) method (spatial K-Means) Spatial Constraints
  • Grouping Analysis Settings No Spatial Constraint vs. Spatial Constraint
  • Grouping Analysis Settings Contiguity edges only (rook type) or edges and corners (queen type) Delaunay triangulation contiguity of representations of features as Voronoi polygons Proximity K nearest neighbours Spatial weights Spatial Constraint Types
  • Grouping Analysis Settings Evaluate optimal number of groups Guide selection of analysis fields with calculated R2 values Visually assess results of specified spatial constraint Iterative Process for Optimizing Grouping Analysis
  • Interpretation & Evaluation Spatial distribution of groups (map) Global statistics (output report) Group and variable statistics (output report) Group profiles Interpretation of Results
  • Interpretation & Evaluation Compare group means with each other and global range Group Profiles
  • Interpretation & Evaluation Compare group means and ranges for each variable Group Profiles (2)
  • Consider global mean, median and range for each variable Group Profiles (3) Interpretation & Evaluation
  • Interpretation & Evaluation Global Morans I statistic Determine spatial pattern of group membership Measure spatial compactness of group membership Clustered groups generally desired Evaluation of Results: Spatial Autocorrelation Dispersed Clustered Random
  • Interpretation & Evaluation Smallest to largest group Indicator of balance in group membership Balanced number of group members generally desired for comparison of statistics Frequency tool Evaluation of Results: Cluster Size Ratio
  • Interpretation & Evaluation Goodness measure that combines concepts of cohesion and separation Adapted from cluster analysis to consider attribute data and location Silhouette coefficient is calculated for every feature and the average is taken for the entire dataset Evaluation of Results: Silhouette
  • Interpretation & Evaluation (B A) / max(A, B) where A is the distance between a feature and its group center B is the distance between the feature and its neighbouring group center Silhouette Coefficient
  • Interpretation & Evaluation Range between 1 (poor) and 1 (excellent) < 0.2 indicates poor clustering > 0.5 indicates good partition of the data Silhouette Coefficient Values
  • Tutorial Exercises Six exercises Two scenarios (3 exercises for each) Suitable for users at all levels of experience Exercises take the user through the steps of preprocessing, group creation, interpretation and evaluation of results outlined here Grouping Analysis Tutorial