Interactive Visual Exploration of High Dimensional ...
Transcript of Interactive Visual Exploration of High Dimensional ...
1
Interactive Visual Exploration of High Dimensional Datasetsof High Dimensional Datasets
Jing Yang
S i 2010
1
Spring 2010
Challenges of High Dimensional Datasets
High dimensional datasets are common: digital g glibraries, bioinformatics, simulations, process monitoring, and surveys
Example: image analysis dataset: tens to hundreds of dimensionsnewswire dataset: hundreds to thousands of dimensions
2
Challenges of visualizing high dimensional datasets:Clutter on the screenDifficult user navigation in the data space
2
Example
OHSUMED dataset: 215 dimensions
3
215*215 = 46,225 plots 215 axes
Visual Hierarchical Dimension Reduction (VHDR)Reduction (VHDR)
J. Yang, M.O. Ward, E.A. Rundensteiner and S. Huang
4
VisSym’03
3
Motivation - Dimension ReductionIdea:
Project a high-dimensional dataset to a lower-Project a high dimensional dataset to a lowerdimensional subspaceVisualize data items in the lower-dimensional subspace
Existing Approaches:Principal Component Analysis Multidimensional Scaling
5
gKohonen’s Self Organizing Map
Problems: Information lossNo intuitive meaning of generated dimensionsLittle user interaction allowed.
Inspiration
6
Hierarchical parallel coordinate: data item hierarchy
4
Key Ideas of VHDR
Use dimension hierarchy to convey dimension y yrelationshipsAllow users to learn the dimension hierarchy Allow users to select dimensions or dimension clusters to form subspaces of interests
7
Dimension Hierarchy
Similar dimensions form cluster, clusters are grouped into larger clusters
8
a dimension hierarchy of a 5-d dataset
5
VHDR Framework
Step 1: build dimension hierarchy Step 2: navigate and manipulate dimension hierarchyStep 3: interactively select clusters from dimension hierarchy to form lower-dimensional subspaces
9
Overview
10
6
Build Dimension Hierarchy
Automatic dimension clusteringgCluster dimensions according to dissimilarities*among them*Dissimilarity - measure of how dimensions are dissimilar to each other
Manual hierarchy modificationDiscussion:
11
Discussion:How to calculate dissimilarity between two dimensions?
Ref: Ankerst, M., Berchtold, S., and Keim, D. A. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. InfoVis’98
Navigate and Manipulate Dimension Hierarchy
InterRing - Radial space g pfilling hierarchy navigation tool [yang:2002]ModificationSelection
12
Radius distortionCircular distortionRolling up/Drilling downRotationZooming/Panning
7
Construct Lower-Dimensional Subspaces
13
Strategy 1: construct a subspace with closely related dimensions
Construct Lower-Dimensional Subspaces
14
Strategy 2: construct a subspace that covers major variance of the dataset
8
Dimension Cluster Representation
Representative Dimension - a dimension that represents a cluster of dimensions
Approaches to assigning or generating a representative dimension:
1. Select a dimension from the cluster 2. Average all dimensions in the cluster
15
3. Use principal component analysis to generate weighted sum of dimensions within a cluster
Examples
16Approach - averageApproach - select
representative dimensionrepresentative dimension
9
Dissimilarity Representation
Goal: visualize dissimilarity of dimensions in a di i l t
Approaches:Axis WidthThree AxesDiagonal Plots
dimension cluster.
17
Outer and Inner SticksMean-Band
Example: select 3 dimension clusters (dimensions) in Census-Income dataset
Dissimilarity Representation : Axis Width Method
18
10
Dissimilarity Representation : Three Axis Method
19
Dissimilarity Representation : Three Axis Method
20
11
Dissimilarity Representation : Diagonal Plots Method
21
No dissimilarities representation
Dissimilarities represented in diagonal plots
GeneralityVHDR is a general framework that can be
extended to multiple display techniquesextended to multiple display techniques
We have applied VHDR to:
Parallel CoordinatesStar Glyphs
Hierarchical Parallel CoordinatesHi hi l St Gl h
22
Scatterplot MatricesDimensional Stacking
Hierarchical Star GlyphsHierarchical Scatterplot
MatricesHierarchical Dimensional
Stacking
12
Other Clustering Approach
Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree, D. Brodbeck et. al. Infovis 2003
23
24
13
Interactive Hierarchical Dimension Ordering, Spacing and Filtering For Exploration of High Dimensional DatasetsDimensional Datasets
Jing Yang, Wei Peng, Matthew O. Wardand Elke A. Rundensteiner
InfoVis’03
25
InfoVis 03
Motivation
Large number of dimensions need to be managed
Ordering, spacing, filtering etc.
26
14
Overview
General: includes dimension ordering, dimension spacing and dimension filteringInteractive: allows user interactions throughout the whole processHierarchical: groups dimensions into a hi h d b ild t l ith d
27
hierarchy and builds most algorithms and user interactions upon this hierarchy to increase scalability
Dimension Ordering (1)
28Random order
15
Dimension Ordering (2)
29Ordered by similarity
Dimension Ordering (3)
Order dimensions for different purposes:Similarity-oriented ordering: put similar dimensions close to each otherImportance-oriented ordering: map more important dimensions to more significant
iti tt ib t Th d f
30
positions or attributes. The order of importance can be decided by Principal Component Analysis (PCA)
16
Dimension Ordering (4)Challenges for ordering high dimensional datasets:
Similarity-oriented ordering is NP-CompleteIt is hard to decide the order of the importance of a large number of dimensions using PCA
Our solution: reduce the complexity of the ordering problem using the dimension hierarchy
31
p g yOrder each dimension clusterthe order of the dimensions is decided in a depth-first traversal of the dimension hierarchy
Hierarchical OrderingIllustration
32
17
Dimension Ordering (6)
33
Random order Similarity-oriented order
Dimension Spacing (1)
Idea of dimension spacing:Convey dimension relationship information by varying the spacing between adjacent axes
34
18
Dimension Spacing (2)
35
Dimensions spaced according to similarity: similar dimensions are close to each other
Dimension Spacing Distortion (1)
36
19
Dimension Spacing Distortion (2)
37
Before After
Dimension Filtering (1)
Idea of dimension filtering:Similar dimensions can be omitted;Unimportant dimensions can be omitted.
38
20
Dimension Filtering (2)
39
Unfiltered Filtered
Conclusion
The proposed approachImproves the manageability of dimensions in high dimensional data sets and reduces the complexity of the ordering, spacing and filtering tasks;All fl ibl i t ti f
40
Allows flexible user interactions for dimension ordering, spacing and filtering with dimension hierarchies.
21
Value and Relation (VaR) Display p y
Jing Yang, Anilkumar Patro, Shiping Huang, NishantMehta, Matthew O. Ward and Elke A. Rundensteiner
41
InfoVis’04
Motivation
Challenges:gCan high dimensional datasets be visualized without dimension reduction to avoid information loss ?Can dimension relationships be visualized in the same display as data values?
42
p y
22
Challenge - Visualization without Dimension Reduction
Visualize SkyServer dataset (361 dimensions) using existing techniques: Parallel Coordinates: 361 axesScatterplot Matrix: 130,321 scatterplotsPixel-Oriented techniques without overlaps: 50,000 data items: 18,050,000 pixels (23 times of number of pixels in a 1024*768 screen)
43
pixels in a 1024 768 screen)
Hint:Use Pixel-Oriented techniques and allow overlaps
Challenge - Dimension Relationship Visualization
Sorting dimensions in a 1DSorting dimensions in a 1D or 2D grid [Ankerst 98]
Not effective beyond hundreds of dimensions
Spacing between dimensions [Yang 2003]
44
dimensions [Yang 2003]Only relationships of adjacent dimensions are revealed Pixel-Oriented: Sort 50 dimensions
in a 2D grid [Ankerst 98]
23
Challenge - Dimension Relationship Visualization (con.)
R ll d t it l ti hiRecall data item relationship visualization:MDS: SPIRE Galaxies [Wise:95]
Hint:
45
SPIRE Galaxies: Map data items to a 2D display using MDS [Wise: 95]
Hint: Using MDS to layout dimensions
Our Proposal: Value and Relation (VaR) Display
d3
d4d1d2
d3
Pixel-Oriented glyph
46
d1 d2 d3 d4
d1d2d3
0 0.1 0.2 0.7
0.1 0 0.3 0.6
0.2 0.3 0 0.7
0.7 0.6 0.7 0d4
Multi-Dimensional Scaling
24
Value and Relation Display
Features:Features: Explicitly conveys data values without dimension reductionExplicitly conveys dimension relationships
47
SkyServer dataset: 361 dimensions, 50,000 data items
Provides a rich set of interaction tools
Overlap Detection and Reduction
Extent ScalingExtent ScalingDynamic MaskingZooming and PanningShowing NamesLayer ReorderingManual Relocation
48
Manual Relocation
Automatic Shifting
SkyServer Dataset
25
Distortion
G lGoal: Focus-within-context
Features: Enlarges clicked glyphs Keeps size of other l h
49
glyphs
SkyServer Dataset
Data Item Reordering
Pixel-oriented htechniques:
Data item ordering is critical
VaR display: Initial displayM l d i
50
Manual reordering
Census-Income-Part Dataset: 42 dimensions, 20,000 data items
26
Comparing
Goal: Compare base dimension with all others
Feature: Coloring by value difference of dimensions
51
difference of dimensions being compared
AAUP Dataset: 14 dimensions, 1,131 data items
Selection
Goal:Goal:Select dimensions for further interaction or visualization
Selection tools in VaR display:Manual selection - flexibilityAutomatic selection - efficiency
Select related dimensions
52
Select related dimensionsSelect unrelated dimensions
27
Automatic Selection for Unrelated Dimensions
Input: A base dimension“Related” threshold
Output: Dimensions covering major data variance
Algorithm: It ti l l t
53
Iteratively select unrelated dimensions and filter related dimensions
Related work:Maximum subspace [MacEachren:03] SkyServer Dataset
Scale to Large Datasets
Store glyphs as texture objects
Out5D Dataset
Store glyphs as texture objectsExtent scaling and relocating: resize, relocate texture objects ☺Reordering and recoloring: regenerate texture objects
Use random sampling
Without sampling (16K data items)
54
Use random sampling Users interactively set thresholdRandom sampling is triggered automatically
With sampling (5K data items)
28
Discussion
Is pixel-oriented technique the only choice forIs pixel oriented technique the only choice for generating dimension glyphs?Histogram, Scatterplot, …
Is 2D MDS the only approach to layout dimensions? 3D MDS SOM Treemap Animation
55
3D MDS, SOM, Treemap, Animation…
Is correlation the most informative relationshipamong dimensions? Multivariate relationships
Value and Relation Display: Interactive Visual Exploration of Large Datasets with Hundreds of Dimensions.Dimensions.
J. Yang, D. Hubball, M. Ward, E. Rundensteiner and W. Ribarsky
IEEE Transactions on Visualization and Computer Graphics 13(3)
56
Computer Graphics 13(3)
29
XRay Dimension Glyphs
Each glyph: a scatterplot X b di i th t iX: a base dimension that is the same for all glyphs
Y: the dimension it represents
Density based displayBright: sparseDark: dense
57
Dark: dense
Unoccupied area: semi-transparent
XRay Dimension Glyphs
58
A real dataset with 89 dimensions and 10,417 data items in Pixel and XRay Vars.
30
Jigsaw Map Dimension LayoutDimension hierarchyUsing H-Curve to create a Jigsaw Map
59
M. Wattenberg. A note on space-filling visualizations and space-filling curves. InfoVis 2005, pages 181–186
Jigsaw Map Dimension Layout
60
A real dataset with 838 dimensions and 11,413 data items in Pixel-Jigsaw VaR and XRay-Jigsaw VaR
31
Rainfall Dimension Layout
Metaphor: RainCenter Bottom: focus dimension DSpeed of a dimension: related to its correlation to D
61
Time: user controllable
Rainfall Dimension Layout
62
32
Data Item Selection and Masking
Visual query style data item selectionData item based masking
63
(a) No mask (b) Opaque mask (c) Semi-transparent mask
Labeling
64
(a) All labels are shown
(b) Labels of selected dimensions are shown
(c) Angled labels in Jigsaw map layout
33
Possible Applications of VaR Display
Interactively exploring high dimensional dataRevealing data item relationshipsRevealing dimension relationships
Guiding automatic data analysisAssessing resultsManually tuning parameters
65
Human-driven dimension reductionConstructing subspaces using selection toolsVisualizing subspaces in VaR or other displays
Semantic Image Browser: Bridging Information Visualization with Automated Intelligent Image AnalysisIntelligent Image Analysis
Jing Yang1, Jianping Fan1, Daniel Hubball1, Yuli Gao1, Hangzai Luo1, William Ribarsky1, and Matthew Ward2
1 University of North Carolina at Charlotte2Worcester Polytechnic Institute
66
Acknowledgements: This work is supported by NVAC
34
MotivationInteractive image exploration:
A li ti l i t t llitApplications: personal image management, satellite image analysis, ...
Background: Automated semantic image analysisGap between semantic image analysis and image exploration
67
Goals:Facilitate image exploration using analysis resultsEvaluate, monitor and improve analysis processes
Semantic Image Browser Overview
Annotation engineA t t d ti i l ifi tiAutomated semantic image classification process
Multiple coordinated viewsImage view – MDS, Rainfall, SequentialContent view – VaR
Interactions Search by sample image
68
Search by sample imageSearch by semantic contentInteractive annotation examination and modificationZooming, panning, distortion
35
Annotation Engine
Content-Based Image Annotation [fan:2004][fan:2004]
Low level visual featuresSemantic contentsSemantic concepts
Semantic contents: hi h di i l d t t
69
high dimensional datasetdata items: imagesdimensions: contentsvalues: 1 (image contains the content) or 0 (otherwise)
Image View – MDS layout
70Corel collection (1100 images, 20 contents)
36
Navigation Tools
71
Image View – Rainfall Layout
72
37
Content View
VaR display [yang:2004]Content blocksContent blocks
Pixel-oriented techniques [Keim 94]Color assignment
Unselected images:Red - 1 Gray – 0
Selected images:Blue – 1
73
Light gray - 0
MDS layout of content blocksInteractions
Corel image collection (1100 images, 20 contents)
Search by Sample Image
74
38
Search by Semantic Content
75
Annotation Evaluation and Modification
Case 1: RedflowerCase 2: Sailcloth
76
39
User StudySubjects
10 UNCC students10 UNCC students Dataset: Corel dataset (20 contents, 1100 images)Systems compared
No annotation: Unsorted Thumbnails in ExplorerSemantic contents: Semantic image browserSemantic concepts: Thumbnails sorted by concepts in Explorer
Tasks
77
TasksTask1: Find three given imagesTask2: Find images with certain contentsTask3: Estimate percentage of images containing certain contents
User Study ResultsTask1: Find three given images
Result:Result: Sorted Explorer was better than Semantic browserSemantic browser was better than Unsorted Explorer
Major reason: Annotations in the semantic concept level were more “error tolerant”
Task2: Find images with certain contentsResult was similar to task1
Task3: Estimate percentage of images containing certain
78
Task3: Estimate percentage of images containing certain contents
Result: Semantic browser was faster and more accurate than sorted Explorer and unsorted Explorer
Post experiment questionnaire (1 to 10 scale)Semantic browser was preferred Semantic browser was useful
40
Multivariate Visual Explanation for High Dimensional Datasetsfor High Dimensional Datasets
S. Barlowe, T. Zhang, Y. Liu, J. Yang and D. Jacobs
VAST 2008
79
Worldview GapWorldview Gap - gap between what is being shown and what actually needs tobeing shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making
- Amar and stasko, InfoVis 2004 best paper
80
Filling Worldview Gap:Our approach - Embedding automatic analysis
into information visualization
41
Multivariate Visual ExplanationMultivariate Visual Explanation
With Scatt Barlowe, Tianyi Zhang, Yujie Liu, and Donald Jacobs
81
Motivation
Understanding multivariate relationships is critical in a vast number of applicationsExample:
Economic forecasting <- unemployment, interest rates, consumer confidence, and inflation
82
Existing multivariate visualization techniques are quite limited in helping multivariate analysis
42
What is the relationship?
y0 = x0x1 + x2
83Scatterplot MatrixParallel Coordinates
Worldview Gap
Worldview Gap - gap between what isWorldview Gap gap between what is being shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making
A d k I f Vi 2004 b
84
- Amar and stasko, InfoVis 2004 best paper
43
Multivariate Visual ExplanationGoals:
Multivariate relationship understandingDimension Reduction Model Construction
Approach: Integrate partial derivative calculation into multivariate visualization
85
Partial derivative calculation and inspectionStep by step visual exploration with interactive model construction and dimension reduction
Partial Derivative
Derivative: measurement of how a functionchanges when values of its inputs change
Example: derivative at a point in time of the position of a car: instantaneous speed
Partial derivative of a function of several variables: derivative with respect to one of
86
variables: derivative with respect to one of those variables with the others held constant
44
Partial Derivative Inspection
Partial derivative calculation introduces errorsPartial derivative calculation introduces errors
87
Partial Derivative InspectionVisually present errors to users
88
Error inspection of a segmented dataset:y = 8x0 +x1 if x0 ≥ 0.6 and x1 ≤ 0.3 y = x0−7x1 otherwise
45
Visual Exploration of Partial Derivatives
Show all partial derivatives together with the original dimensions? Scalability Challenge: 4-d dataset with dependent variable y and independent variables x0, x1, and x2
1st order derivatives: ∂y/∂x0 ∂y/∂x1 ∂y/∂x2
89
1st order derivatives: ∂y/∂x0, ∂y/∂x1, ∂y/∂x22nd order derivatives: ∂y∂y/∂x0∂x0, ∂y∂y/∂x0∂x1, ∂y∂y/∂x0∂x2, ∂y∂y/∂x1∂x1, ∂y∂y/∂x1∂x2, ∂y∂y/∂x2∂x2
Screen will be cluttered!
Visual Exploration of Partial Derivatives
Examine all types of relationships from one display? Users would be overwhelmed!
90
46
Step By Step Visual Exploration
Different types of correlations are examined i diff t tin different stepsCorrelations easier to be detected will be examined firstVariable with detected relationships will be excluded from further analysis
91
y
Step1: 1st Order Partial Derivative Histograms
Display: the histograms of 1st order partial d i tiderivativesInformation to be detected:
Significant independent variablesIgnorable independent variablesIndependent variables linearly impact
92
p y pdependent variable Independent variable Positively or negatively impact?
47
93
94
48
95
Step2: 1st Order Partial Derivativesvs. Original Dimensions Scatterplots
Information:Entangled?g
96Dataset: y0 = x0x1 + x2, 1000 data items
49
Coordinated Visual Exploration
97
Model Construction
98