Interactive Visual Exploration of High Dimensional ...

49
1 Interactive Visual Exploration of High Dimensional Datasets of High Dimensional Datasets Jing Yang S i 2010 1 Spring 2010 Challenges of High Dimensional Datasets High dimensional datasets are common: digital libraries, bioinformatics, simulations, process monitoring, and surveys Example: image analysis dataset: tens to hundreds of dimensions newswire dataset: hundreds to thousands of dimensions 2 Challenges of visualizing high dimensional datasets: Clutter on the screen Difficult user navigation in the data space

Transcript of Interactive Visual Exploration of High Dimensional ...

Page 1: Interactive Visual Exploration of High Dimensional ...

1

Interactive Visual Exploration of High Dimensional Datasetsof High Dimensional Datasets

Jing Yang

S i 2010

1

Spring 2010

Challenges of High Dimensional Datasets

High dimensional datasets are common: digital g glibraries, bioinformatics, simulations, process monitoring, and surveys

Example: image analysis dataset: tens to hundreds of dimensionsnewswire dataset: hundreds to thousands of dimensions

2

Challenges of visualizing high dimensional datasets:Clutter on the screenDifficult user navigation in the data space

Page 2: Interactive Visual Exploration of High Dimensional ...

2

Example

OHSUMED dataset: 215 dimensions

3

215*215 = 46,225 plots 215 axes

Visual Hierarchical Dimension Reduction (VHDR)Reduction (VHDR)

J. Yang, M.O. Ward, E.A. Rundensteiner and S. Huang

4

VisSym’03

Page 3: Interactive Visual Exploration of High Dimensional ...

3

Motivation - Dimension ReductionIdea:

Project a high-dimensional dataset to a lower-Project a high dimensional dataset to a lowerdimensional subspaceVisualize data items in the lower-dimensional subspace

Existing Approaches:Principal Component Analysis Multidimensional Scaling

5

gKohonen’s Self Organizing Map

Problems: Information lossNo intuitive meaning of generated dimensionsLittle user interaction allowed.

Inspiration

6

Hierarchical parallel coordinate: data item hierarchy

Page 4: Interactive Visual Exploration of High Dimensional ...

4

Key Ideas of VHDR

Use dimension hierarchy to convey dimension y yrelationshipsAllow users to learn the dimension hierarchy Allow users to select dimensions or dimension clusters to form subspaces of interests

7

Dimension Hierarchy

Similar dimensions form cluster, clusters are grouped into larger clusters

8

a dimension hierarchy of a 5-d dataset

Page 5: Interactive Visual Exploration of High Dimensional ...

5

VHDR Framework

Step 1: build dimension hierarchy Step 2: navigate and manipulate dimension hierarchyStep 3: interactively select clusters from dimension hierarchy to form lower-dimensional subspaces

9

Overview

10

Page 6: Interactive Visual Exploration of High Dimensional ...

6

Build Dimension Hierarchy

Automatic dimension clusteringgCluster dimensions according to dissimilarities*among them*Dissimilarity - measure of how dimensions are dissimilar to each other

Manual hierarchy modificationDiscussion:

11

Discussion:How to calculate dissimilarity between two dimensions?

Ref: Ankerst, M., Berchtold, S., and Keim, D. A. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. InfoVis’98

Navigate and Manipulate Dimension Hierarchy

InterRing - Radial space g pfilling hierarchy navigation tool [yang:2002]ModificationSelection

12

Radius distortionCircular distortionRolling up/Drilling downRotationZooming/Panning

Page 7: Interactive Visual Exploration of High Dimensional ...

7

Construct Lower-Dimensional Subspaces

13

Strategy 1: construct a subspace with closely related dimensions

Construct Lower-Dimensional Subspaces

14

Strategy 2: construct a subspace that covers major variance of the dataset

Page 8: Interactive Visual Exploration of High Dimensional ...

8

Dimension Cluster Representation

Representative Dimension - a dimension that represents a cluster of dimensions

Approaches to assigning or generating a representative dimension:

1. Select a dimension from the cluster 2. Average all dimensions in the cluster

15

3. Use principal component analysis to generate weighted sum of dimensions within a cluster

Examples

16Approach - averageApproach - select

representative dimensionrepresentative dimension

Page 9: Interactive Visual Exploration of High Dimensional ...

9

Dissimilarity Representation

Goal: visualize dissimilarity of dimensions in a di i l t

Approaches:Axis WidthThree AxesDiagonal Plots

dimension cluster.

17

Outer and Inner SticksMean-Band

Example: select 3 dimension clusters (dimensions) in Census-Income dataset

Dissimilarity Representation : Axis Width Method

18

Page 10: Interactive Visual Exploration of High Dimensional ...

10

Dissimilarity Representation : Three Axis Method

19

Dissimilarity Representation : Three Axis Method

20

Page 11: Interactive Visual Exploration of High Dimensional ...

11

Dissimilarity Representation : Diagonal Plots Method

21

No dissimilarities representation

Dissimilarities represented in diagonal plots

GeneralityVHDR is a general framework that can be

extended to multiple display techniquesextended to multiple display techniques

We have applied VHDR to:

Parallel CoordinatesStar Glyphs

Hierarchical Parallel CoordinatesHi hi l St Gl h

22

Scatterplot MatricesDimensional Stacking

Hierarchical Star GlyphsHierarchical Scatterplot

MatricesHierarchical Dimensional

Stacking

Page 12: Interactive Visual Exploration of High Dimensional ...

12

Other Clustering Approach

Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree, D. Brodbeck et. al. Infovis 2003

23

24

Page 13: Interactive Visual Exploration of High Dimensional ...

13

Interactive Hierarchical Dimension Ordering, Spacing and Filtering For Exploration of High Dimensional DatasetsDimensional Datasets

Jing Yang, Wei Peng, Matthew O. Wardand Elke A. Rundensteiner

InfoVis’03

25

InfoVis 03

Motivation

Large number of dimensions need to be managed

Ordering, spacing, filtering etc.

26

Page 14: Interactive Visual Exploration of High Dimensional ...

14

Overview

General: includes dimension ordering, dimension spacing and dimension filteringInteractive: allows user interactions throughout the whole processHierarchical: groups dimensions into a hi h d b ild t l ith d

27

hierarchy and builds most algorithms and user interactions upon this hierarchy to increase scalability

Dimension Ordering (1)

28Random order

Page 15: Interactive Visual Exploration of High Dimensional ...

15

Dimension Ordering (2)

29Ordered by similarity

Dimension Ordering (3)

Order dimensions for different purposes:Similarity-oriented ordering: put similar dimensions close to each otherImportance-oriented ordering: map more important dimensions to more significant

iti tt ib t Th d f

30

positions or attributes. The order of importance can be decided by Principal Component Analysis (PCA)

Page 16: Interactive Visual Exploration of High Dimensional ...

16

Dimension Ordering (4)Challenges for ordering high dimensional datasets:

Similarity-oriented ordering is NP-CompleteIt is hard to decide the order of the importance of a large number of dimensions using PCA

Our solution: reduce the complexity of the ordering problem using the dimension hierarchy

31

p g yOrder each dimension clusterthe order of the dimensions is decided in a depth-first traversal of the dimension hierarchy

Hierarchical OrderingIllustration

32

Page 17: Interactive Visual Exploration of High Dimensional ...

17

Dimension Ordering (6)

33

Random order Similarity-oriented order

Dimension Spacing (1)

Idea of dimension spacing:Convey dimension relationship information by varying the spacing between adjacent axes

34

Page 18: Interactive Visual Exploration of High Dimensional ...

18

Dimension Spacing (2)

35

Dimensions spaced according to similarity: similar dimensions are close to each other

Dimension Spacing Distortion (1)

36

Page 19: Interactive Visual Exploration of High Dimensional ...

19

Dimension Spacing Distortion (2)

37

Before After

Dimension Filtering (1)

Idea of dimension filtering:Similar dimensions can be omitted;Unimportant dimensions can be omitted.

38

Page 20: Interactive Visual Exploration of High Dimensional ...

20

Dimension Filtering (2)

39

Unfiltered Filtered

Conclusion

The proposed approachImproves the manageability of dimensions in high dimensional data sets and reduces the complexity of the ordering, spacing and filtering tasks;All fl ibl i t ti f

40

Allows flexible user interactions for dimension ordering, spacing and filtering with dimension hierarchies.

Page 21: Interactive Visual Exploration of High Dimensional ...

21

Value and Relation (VaR) Display p y

Jing Yang, Anilkumar Patro, Shiping Huang, NishantMehta, Matthew O. Ward and Elke A. Rundensteiner

41

InfoVis’04

Motivation

Challenges:gCan high dimensional datasets be visualized without dimension reduction to avoid information loss ?Can dimension relationships be visualized in the same display as data values?

42

p y

Page 22: Interactive Visual Exploration of High Dimensional ...

22

Challenge - Visualization without Dimension Reduction

Visualize SkyServer dataset (361 dimensions) using existing techniques: Parallel Coordinates: 361 axesScatterplot Matrix: 130,321 scatterplotsPixel-Oriented techniques without overlaps: 50,000 data items: 18,050,000 pixels (23 times of number of pixels in a 1024*768 screen)

43

pixels in a 1024 768 screen)

Hint:Use Pixel-Oriented techniques and allow overlaps

Challenge - Dimension Relationship Visualization

Sorting dimensions in a 1DSorting dimensions in a 1D or 2D grid [Ankerst 98]

Not effective beyond hundreds of dimensions

Spacing between dimensions [Yang 2003]

44

dimensions [Yang 2003]Only relationships of adjacent dimensions are revealed Pixel-Oriented: Sort 50 dimensions

in a 2D grid [Ankerst 98]

Page 23: Interactive Visual Exploration of High Dimensional ...

23

Challenge - Dimension Relationship Visualization (con.)

R ll d t it l ti hiRecall data item relationship visualization:MDS: SPIRE Galaxies [Wise:95]

Hint:

45

SPIRE Galaxies: Map data items to a 2D display using MDS [Wise: 95]

Hint: Using MDS to layout dimensions

Our Proposal: Value and Relation (VaR) Display

d3

d4d1d2

d3

Pixel-Oriented glyph

46

d1 d2 d3 d4

d1d2d3

0 0.1 0.2 0.7

0.1 0 0.3 0.6

0.2 0.3 0 0.7

0.7 0.6 0.7 0d4

Multi-Dimensional Scaling

Page 24: Interactive Visual Exploration of High Dimensional ...

24

Value and Relation Display

Features:Features: Explicitly conveys data values without dimension reductionExplicitly conveys dimension relationships

47

SkyServer dataset: 361 dimensions, 50,000 data items

Provides a rich set of interaction tools

Overlap Detection and Reduction

Extent ScalingExtent ScalingDynamic MaskingZooming and PanningShowing NamesLayer ReorderingManual Relocation

48

Manual Relocation

Automatic Shifting

SkyServer Dataset

Page 25: Interactive Visual Exploration of High Dimensional ...

25

Distortion

G lGoal: Focus-within-context

Features: Enlarges clicked glyphs Keeps size of other l h

49

glyphs

SkyServer Dataset

Data Item Reordering

Pixel-oriented htechniques:

Data item ordering is critical

VaR display: Initial displayM l d i

50

Manual reordering

Census-Income-Part Dataset: 42 dimensions, 20,000 data items

Page 26: Interactive Visual Exploration of High Dimensional ...

26

Comparing

Goal: Compare base dimension with all others

Feature: Coloring by value difference of dimensions

51

difference of dimensions being compared

AAUP Dataset: 14 dimensions, 1,131 data items

Selection

Goal:Goal:Select dimensions for further interaction or visualization

Selection tools in VaR display:Manual selection - flexibilityAutomatic selection - efficiency

Select related dimensions

52

Select related dimensionsSelect unrelated dimensions

Page 27: Interactive Visual Exploration of High Dimensional ...

27

Automatic Selection for Unrelated Dimensions

Input: A base dimension“Related” threshold

Output: Dimensions covering major data variance

Algorithm: It ti l l t

53

Iteratively select unrelated dimensions and filter related dimensions

Related work:Maximum subspace [MacEachren:03] SkyServer Dataset

Scale to Large Datasets

Store glyphs as texture objects

Out5D Dataset

Store glyphs as texture objectsExtent scaling and relocating: resize, relocate texture objects ☺Reordering and recoloring: regenerate texture objects

Use random sampling

Without sampling (16K data items)

54

Use random sampling Users interactively set thresholdRandom sampling is triggered automatically

With sampling (5K data items)

Page 28: Interactive Visual Exploration of High Dimensional ...

28

Discussion

Is pixel-oriented technique the only choice forIs pixel oriented technique the only choice for generating dimension glyphs?Histogram, Scatterplot, …

Is 2D MDS the only approach to layout dimensions? 3D MDS SOM Treemap Animation

55

3D MDS, SOM, Treemap, Animation…

Is correlation the most informative relationshipamong dimensions? Multivariate relationships

Value and Relation Display: Interactive Visual Exploration of Large Datasets with Hundreds of Dimensions.Dimensions.

J. Yang, D. Hubball, M. Ward, E. Rundensteiner and W. Ribarsky

IEEE Transactions on Visualization and Computer Graphics 13(3)

56

Computer Graphics 13(3)

Page 29: Interactive Visual Exploration of High Dimensional ...

29

XRay Dimension Glyphs

Each glyph: a scatterplot X b di i th t iX: a base dimension that is the same for all glyphs

Y: the dimension it represents

Density based displayBright: sparseDark: dense

57

Dark: dense

Unoccupied area: semi-transparent

XRay Dimension Glyphs

58

A real dataset with 89 dimensions and 10,417 data items in Pixel and XRay Vars.

Page 30: Interactive Visual Exploration of High Dimensional ...

30

Jigsaw Map Dimension LayoutDimension hierarchyUsing H-Curve to create a Jigsaw Map

59

M. Wattenberg. A note on space-filling visualizations and space-filling curves. InfoVis 2005, pages 181–186

Jigsaw Map Dimension Layout

60

A real dataset with 838 dimensions and 11,413 data items in Pixel-Jigsaw VaR and XRay-Jigsaw VaR

Page 31: Interactive Visual Exploration of High Dimensional ...

31

Rainfall Dimension Layout

Metaphor: RainCenter Bottom: focus dimension DSpeed of a dimension: related to its correlation to D

61

Time: user controllable

Rainfall Dimension Layout

62

Page 32: Interactive Visual Exploration of High Dimensional ...

32

Data Item Selection and Masking

Visual query style data item selectionData item based masking

63

(a) No mask (b) Opaque mask (c) Semi-transparent mask

Labeling

64

(a) All labels are shown

(b) Labels of selected dimensions are shown

(c) Angled labels in Jigsaw map layout

Page 33: Interactive Visual Exploration of High Dimensional ...

33

Possible Applications of VaR Display

Interactively exploring high dimensional dataRevealing data item relationshipsRevealing dimension relationships

Guiding automatic data analysisAssessing resultsManually tuning parameters

65

Human-driven dimension reductionConstructing subspaces using selection toolsVisualizing subspaces in VaR or other displays

Semantic Image Browser: Bridging Information Visualization with Automated Intelligent Image AnalysisIntelligent Image Analysis

Jing Yang1, Jianping Fan1, Daniel Hubball1, Yuli Gao1, Hangzai Luo1, William Ribarsky1, and Matthew Ward2

1 University of North Carolina at Charlotte2Worcester Polytechnic Institute

66

Acknowledgements: This work is supported by NVAC

Page 34: Interactive Visual Exploration of High Dimensional ...

34

MotivationInteractive image exploration:

A li ti l i t t llitApplications: personal image management, satellite image analysis, ...

Background: Automated semantic image analysisGap between semantic image analysis and image exploration

67

Goals:Facilitate image exploration using analysis resultsEvaluate, monitor and improve analysis processes

Semantic Image Browser Overview

Annotation engineA t t d ti i l ifi tiAutomated semantic image classification process

Multiple coordinated viewsImage view – MDS, Rainfall, SequentialContent view – VaR

Interactions Search by sample image

68

Search by sample imageSearch by semantic contentInteractive annotation examination and modificationZooming, panning, distortion

Page 35: Interactive Visual Exploration of High Dimensional ...

35

Annotation Engine

Content-Based Image Annotation [fan:2004][fan:2004]

Low level visual featuresSemantic contentsSemantic concepts

Semantic contents: hi h di i l d t t

69

high dimensional datasetdata items: imagesdimensions: contentsvalues: 1 (image contains the content) or 0 (otherwise)

Image View – MDS layout

70Corel collection (1100 images, 20 contents)

Page 36: Interactive Visual Exploration of High Dimensional ...

36

Navigation Tools

71

Image View – Rainfall Layout

72

Page 37: Interactive Visual Exploration of High Dimensional ...

37

Content View

VaR display [yang:2004]Content blocksContent blocks

Pixel-oriented techniques [Keim 94]Color assignment

Unselected images:Red - 1 Gray – 0

Selected images:Blue – 1

73

Light gray - 0

MDS layout of content blocksInteractions

Corel image collection (1100 images, 20 contents)

Search by Sample Image

74

Page 38: Interactive Visual Exploration of High Dimensional ...

38

Search by Semantic Content

75

Annotation Evaluation and Modification

Case 1: RedflowerCase 2: Sailcloth

76

Page 39: Interactive Visual Exploration of High Dimensional ...

39

User StudySubjects

10 UNCC students10 UNCC students Dataset: Corel dataset (20 contents, 1100 images)Systems compared

No annotation: Unsorted Thumbnails in ExplorerSemantic contents: Semantic image browserSemantic concepts: Thumbnails sorted by concepts in Explorer

Tasks

77

TasksTask1: Find three given imagesTask2: Find images with certain contentsTask3: Estimate percentage of images containing certain contents

User Study ResultsTask1: Find three given images

Result:Result: Sorted Explorer was better than Semantic browserSemantic browser was better than Unsorted Explorer

Major reason: Annotations in the semantic concept level were more “error tolerant”

Task2: Find images with certain contentsResult was similar to task1

Task3: Estimate percentage of images containing certain

78

Task3: Estimate percentage of images containing certain contents

Result: Semantic browser was faster and more accurate than sorted Explorer and unsorted Explorer

Post experiment questionnaire (1 to 10 scale)Semantic browser was preferred Semantic browser was useful

Page 40: Interactive Visual Exploration of High Dimensional ...

40

Multivariate Visual Explanation for High Dimensional Datasetsfor High Dimensional Datasets

S. Barlowe, T. Zhang, Y. Liu, J. Yang and D. Jacobs

VAST 2008

79

Worldview GapWorldview Gap - gap between what is being shown and what actually needs tobeing shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making

- Amar and stasko, InfoVis 2004 best paper

80

Filling Worldview Gap:Our approach - Embedding automatic analysis

into information visualization

Page 41: Interactive Visual Exploration of High Dimensional ...

41

Multivariate Visual ExplanationMultivariate Visual Explanation

With Scatt Barlowe, Tianyi Zhang, Yujie Liu, and Donald Jacobs

81

Motivation

Understanding multivariate relationships is critical in a vast number of applicationsExample:

Economic forecasting <- unemployment, interest rates, consumer confidence, and inflation

82

Existing multivariate visualization techniques are quite limited in helping multivariate analysis

Page 42: Interactive Visual Exploration of High Dimensional ...

42

What is the relationship?

y0 = x0x1 + x2

83Scatterplot MatrixParallel Coordinates

Worldview Gap

Worldview Gap - gap between what isWorldview Gap gap between what is being shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making

A d k I f Vi 2004 b

84

- Amar and stasko, InfoVis 2004 best paper

Page 43: Interactive Visual Exploration of High Dimensional ...

43

Multivariate Visual ExplanationGoals:

Multivariate relationship understandingDimension Reduction Model Construction

Approach: Integrate partial derivative calculation into multivariate visualization

85

Partial derivative calculation and inspectionStep by step visual exploration with interactive model construction and dimension reduction

Partial Derivative

Derivative: measurement of how a functionchanges when values of its inputs change

Example: derivative at a point in time of the position of a car: instantaneous speed

Partial derivative of a function of several variables: derivative with respect to one of

86

variables: derivative with respect to one of those variables with the others held constant

Page 44: Interactive Visual Exploration of High Dimensional ...

44

Partial Derivative Inspection

Partial derivative calculation introduces errorsPartial derivative calculation introduces errors

87

Partial Derivative InspectionVisually present errors to users

88

Error inspection of a segmented dataset:y = 8x0 +x1 if x0 ≥ 0.6 and x1 ≤ 0.3 y = x0−7x1 otherwise

Page 45: Interactive Visual Exploration of High Dimensional ...

45

Visual Exploration of Partial Derivatives

Show all partial derivatives together with the original dimensions? Scalability Challenge: 4-d dataset with dependent variable y and independent variables x0, x1, and x2

1st order derivatives: ∂y/∂x0 ∂y/∂x1 ∂y/∂x2

89

1st order derivatives: ∂y/∂x0, ∂y/∂x1, ∂y/∂x22nd order derivatives: ∂y∂y/∂x0∂x0, ∂y∂y/∂x0∂x1, ∂y∂y/∂x0∂x2, ∂y∂y/∂x1∂x1, ∂y∂y/∂x1∂x2, ∂y∂y/∂x2∂x2

Screen will be cluttered!

Visual Exploration of Partial Derivatives

Examine all types of relationships from one display? Users would be overwhelmed!

90

Page 46: Interactive Visual Exploration of High Dimensional ...

46

Step By Step Visual Exploration

Different types of correlations are examined i diff t tin different stepsCorrelations easier to be detected will be examined firstVariable with detected relationships will be excluded from further analysis

91

y

Step1: 1st Order Partial Derivative Histograms

Display: the histograms of 1st order partial d i tiderivativesInformation to be detected:

Significant independent variablesIgnorable independent variablesIndependent variables linearly impact

92

p y pdependent variable Independent variable Positively or negatively impact?

Page 47: Interactive Visual Exploration of High Dimensional ...

47

93

94

Page 48: Interactive Visual Exploration of High Dimensional ...

48

95

Step2: 1st Order Partial Derivativesvs. Original Dimensions Scatterplots

Information:Entangled?g

96Dataset: y0 = x0x1 + x2, 1000 data items

Page 49: Interactive Visual Exploration of High Dimensional ...

49

Coordinated Visual Exploration

97

Model Construction

98