A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

25
A Rank-by-Feature Framework for Interactive Multi- dimensional Data Exploration Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab. & Department of Computer Science University of Maryland, College Park

description

A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration. Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab. & Department of Computer Science University of Maryland, College Park. Hierarchical Clustering Explorer (HCE). - PowerPoint PPT Presentation

Transcript of A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Page 1: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

A Rank-by-Feature Framework for Interactive Multi-dimensional

Data Exploration

Jinwook Seo and Ben ShneidermanHuman-Computer Interaction Lab. &

Department of Computer ScienceUniversity of Maryland, College Park

Page 2: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Hierarchical Clustering Explorer (HCE)

Page 3: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Hierarchical Clustering Explorer (HCE)

“HCE enabled us to find important clusters that we don’t know about yet.”

Page 4: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Goal: Find Interesting Features in Multidimensional Data

• Finding correlations, clusters, outliers, gaps, … is difficult in multidimensional data– Cognitive difficulties in >3D

• Therefore utilize low-dimensional projections– Perceptual efficiency in 1D and 2D– Use Rank-by-Feature Framework to guide discovery

Page 5: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Do you see anything interesting?

Page 6: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Do you see any interesting feature?Scatter Plot

Ionization Energy50 75 100 125 150 175 200 225 250

0

10

20

30

40

50

Page 7: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Correlation…What else?Scatter Plot

Ionization Energy50 75 100 125 150 175 200 225 250

0

10

20

30

40

50

Page 8: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

OutliersScatter Plot

Ionization Energy50 75 100 125 150 175 200 225 250

0

10

20

30

40

50

He

Rn

Page 9: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Demonstration

• Breakfast Cereals– 77 cereals– 8 dimensions (or variables) : sugar, potassium,

fiber, protein, etc.• US counties census data

– 3138 counties– 14 dimensions : population density, poverty

level, unemployment, etc.

Page 10: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Low-dimensional Projections• Techniques

– General • combination of variables for an axis

– Axis parallel • a variable for an axis

• Number of projections

• Interface for Exploration

X1+2X2

-2X1+X2

X1

X3

Page 11: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Exploration by Projections• XGobi, GGobi – Scatterplot Browsing

www.ggobi.orgwww.research.att.com/areas/stat/xgobi/

Page 12: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Exploration by Projections• Spotfire DecisionSite – Scatterplots

www.spotfire.com

Page 13: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Exploration by Projections• XGobi, GGobi – Grand Tour

Page 14: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Exploration by Projections• XmdvTool – Scatterplot Matrix

Worcester Polytechnic Institute

Page 15: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Dimension selection toolCorrgram by Michael Friendly

Square Matrix Display

in GeoVISTA studioby Alan M. MacEachren

Page 16: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Exploration by Projections• Spotfire DecisionSite– View Tip orders scatterplots

Page 17: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Design Considerations• Hard to interpret arbitrary linear projections

Axis-parallel projections

• Interestingness depends on applications Incorporate users’ interest

• Overview of all possible projections

• Rapid change of axis

Page 18: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Demonstration

• Breakfast Cereals– 77 cereals– 11 dimensions (or variables) : sugar, potassium,

fiber, protein, etc.• US counties census data

– 3138 counties– 14 dimensions : population density, poverty

level, unemployment, etc.

Page 19: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Rank-by-Feature Framework: 1DRanking Criterion

Rank-by-Feature Prism

Score List

Manual Projection

Browser

Page 20: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Rank-by-Feature Framework: 2DRanking Criterion

Rank-by-Feature Prism

Score List

Manual Projection

Browser

Page 21: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Ranking Criterion: Pearson correlation (0.996, 0.31, 0.01, -0.69)

Ranking Criterion: Uniformity (entropy) (6.7, 6.1, 4.5, 1.5)

A Ranking Example3138 U.S. counties with 17 attributes

Page 22: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Ongoing and Future Work• Identify & implement more ranking criteria

– Gaps, outliers, etc.• Ranking based on users’ selection of items

– Separability of the selected items– Ranking by using only the selected items

• Scalability Issue– How to handle a large number of dimensions– Grouping by clustering dimensions – Filtering uninteresting entries in the prism

Page 23: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

More about HCE• In collaboration and sponsored by Eric

Hoffman: Children’s National Medical Center • Freely downloadable at www.cs.umd.edu/hcil

/hce• Version 3.0 beta, May 2004• About 2,000 downloads since April 2002• Licensing to ViaLactia Biosciences (NZ) Ltd.

Page 24: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

More Applications?

• Try HCE and the Rank-by-Feature Framework with your problems and data

• Join the case studies on the use of HCE and the Rank-by-Feature Framework

• Welcome suggestions and comments

Page 25: A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

Thank you !