A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration
description
Transcript of A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration
A Rank-by-Feature Framework for Interactive Multi-dimensional
Data Exploration
Jinwook Seo and Ben ShneidermanHuman-Computer Interaction Lab. &
Department of Computer ScienceUniversity of Maryland, College Park
Hierarchical Clustering Explorer (HCE)
Hierarchical Clustering Explorer (HCE)
“HCE enabled us to find important clusters that we don’t know about yet.”
Goal: Find Interesting Features in Multidimensional Data
• Finding correlations, clusters, outliers, gaps, … is difficult in multidimensional data– Cognitive difficulties in >3D
• Therefore utilize low-dimensional projections– Perceptual efficiency in 1D and 2D– Use Rank-by-Feature Framework to guide discovery
Do you see anything interesting?
Do you see any interesting feature?Scatter Plot
Ionization Energy50 75 100 125 150 175 200 225 250
0
10
20
30
40
50
Correlation…What else?Scatter Plot
Ionization Energy50 75 100 125 150 175 200 225 250
0
10
20
30
40
50
OutliersScatter Plot
Ionization Energy50 75 100 125 150 175 200 225 250
0
10
20
30
40
50
He
Rn
Demonstration
• Breakfast Cereals– 77 cereals– 8 dimensions (or variables) : sugar, potassium,
fiber, protein, etc.• US counties census data
– 3138 counties– 14 dimensions : population density, poverty
level, unemployment, etc.
Low-dimensional Projections• Techniques
– General • combination of variables for an axis
– Axis parallel • a variable for an axis
• Number of projections
• Interface for Exploration
X1+2X2
-2X1+X2
X1
X3
Exploration by Projections• XGobi, GGobi – Scatterplot Browsing
www.ggobi.orgwww.research.att.com/areas/stat/xgobi/
Exploration by Projections• Spotfire DecisionSite – Scatterplots
www.spotfire.com
Exploration by Projections• XGobi, GGobi – Grand Tour
Exploration by Projections• XmdvTool – Scatterplot Matrix
Worcester Polytechnic Institute
Dimension selection toolCorrgram by Michael Friendly
Square Matrix Display
in GeoVISTA studioby Alan M. MacEachren
Exploration by Projections• Spotfire DecisionSite– View Tip orders scatterplots
Design Considerations• Hard to interpret arbitrary linear projections
Axis-parallel projections
• Interestingness depends on applications Incorporate users’ interest
• Overview of all possible projections
• Rapid change of axis
Demonstration
• Breakfast Cereals– 77 cereals– 11 dimensions (or variables) : sugar, potassium,
fiber, protein, etc.• US counties census data
– 3138 counties– 14 dimensions : population density, poverty
level, unemployment, etc.
Rank-by-Feature Framework: 1DRanking Criterion
Rank-by-Feature Prism
Score List
Manual Projection
Browser
Rank-by-Feature Framework: 2DRanking Criterion
Rank-by-Feature Prism
Score List
Manual Projection
Browser
Ranking Criterion: Pearson correlation (0.996, 0.31, 0.01, -0.69)
Ranking Criterion: Uniformity (entropy) (6.7, 6.1, 4.5, 1.5)
A Ranking Example3138 U.S. counties with 17 attributes
Ongoing and Future Work• Identify & implement more ranking criteria
– Gaps, outliers, etc.• Ranking based on users’ selection of items
– Separability of the selected items– Ranking by using only the selected items
• Scalability Issue– How to handle a large number of dimensions– Grouping by clustering dimensions – Filtering uninteresting entries in the prism
More about HCE• In collaboration and sponsored by Eric
Hoffman: Children’s National Medical Center • Freely downloadable at www.cs.umd.edu/hcil
/hce• Version 3.0 beta, May 2004• About 2,000 downloads since April 2002• Licensing to ViaLactia Biosciences (NZ) Ltd.
More Applications?
• Try HCE and the Rank-by-Feature Framework with your problems and data
• Join the case studies on the use of HCE and the Rank-by-Feature Framework
• Welcome suggestions and comments
Thank you !