Visual Data Mining: Problems and Applications1 Daniel Keim University of Konstanz Visual Data...
Transcript of Visual Data Mining: Problems and Applications1 Daniel Keim University of Konstanz Visual Data...
1
Daniel Keim
University of Konstanz
Visual Data Mining:Problems and Applications
Summer School of the Graduiertenkolleg Exploration and Visualization of Large Data Sets
St. Vigil, 16. September 2004
Databases
Visual Data Exploration
Data Mining
MultimediaSim. Search
High-dim. Indexing
Interactive Queries
NN-Queries
Circle Seg.
X-TreeTree Stripping
Cost Models
Denclue OptiGridHD-Eye
InfoFusion
Rec. Pattern
Pixel Bar ChartsVisualPoints
VisDB
CartoDraw
GRADIImage SimSearch
2D Partial SimSearch
Geometric3D SimSearch
VisOpt
Clustering NN-Search in High-dim. DBs
Visual Interfaces
Car Industry
MolecularBiology
Medicine
Image
3D-Objects
CAD
Telcom
Marketing
Monitoring
Cartography
Fraud Detection
Finance
2
Daniel A. Keim: Visual Data Mining3
Overview
1. Introduction
2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier Bibliographic Database)
3. Pixel Bar Charts1. Problem2. Solution3. Application
4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization
5. Conclusions
InfoVis Contest
3
Daniel A. Keim: Visual Data Mining5
Daniel A. Keim: Visual Data Mining6
4
Daniel A. Keim: Visual Data Mining7
Daniel A. Keim: Visual Data Mining8
5
Daniel A. Keim: Visual Data Mining9
Coauthors of G. Robertson
Daniel A. Keim: Visual Data Mining10
Coauthors of B. Shneiderman
6
Daniel A. Keim: Visual Data Mining11
Coauthors of D. Keim
DBLP –Trier Bibliographic Database
7
Daniel A. Keim: Visual Data Mining13
All Profs
Keim
SaupeReiterer
Berthold
Scholl Waldvogel
Deussen Kuhlen
Leue
Brandes
Daniel A. Keim: Visual Data Mining14
Prof. U. Brandes
8
Daniel A. Keim: Visual Data Mining15
Prof. M. Berthold
Daniel A. Keim: Visual Data Mining16
Prof. R. Kuhlen
9
Daniel A. Keim: Visual Data Mining17
Prof. O. Deussen
Daniel A. Keim: Visual Data Mining18
Prof. D. Keim
10
Daniel A. Keim: Visual Data Mining19
Prof. S. Leue
Daniel A. Keim: Visual Data Mining20
Prof. H. Reiterer
11
Daniel A. Keim: Visual Data Mining21
Prof. D. Saupe
Daniel A. Keim: Visual Data Mining22
Prof. M. Scholl
12
Daniel A. Keim: Visual Data Mining23
Prof. M. Waldvogel
Daniel A. Keim: Visual Data Mining24
All Profs
Keim
SaupeReiterer
Berthold
Scholl Waldvogel
Deussen Kuhlen
Leue
Brandes
13
Daniel A. Keim: Visual Data Mining25
Overview
1. Introduction
2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier)
3. Pixel Bar Charts1. Problem2. Solution3. Application
4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization
5. Conclusions
Daniel A. Keim: Visual Data Mining27
Equal-Width Pixel Bar Chart
From Bar Charts to Pixel Bar Charts
14
Daniel A. Keim: Visual Data Mining28
From Bar Charts to Pixel Bar Charts
Equal-Height Pixel Bar Chart
Product Type
Product Type
Daniel A. Keim: Visual Data Mining29
Multi Pixel Bar Charts
1 2 3 4 5 6 7 10 11 12
Color=quantity
Product Type
$ A
mou
nt
Product Type
Color=number of visits
1 2 3 4 5 6 7 10 11 121 2 3 4 5 6 7 10 11 12
Color=dollar amount
Product Type
low
high
From Bar Charts to Pixel Bar Charts
15
Daniel A. Keim: Visual Data Mining30
Definition of Pixel Bar Charts
A pixel bar chart is defined by a five tuple
<Dx, Dy, Ox, Oy, C>
where Dx, Dy, Ox, Oy, C ∈ {Al, …, Ak,} and
• Dx / Dy are the dividing attributes on the x-/y-axes
• Ox / Oy are the ordering attributes on the x-/y-axes
• C defines the coloring attribute(s)
Daniel A. Keim: Visual Data Mining31
Definition of Pixel Bar Charts
Basic Idea of Pixel Bar Charts:
Dividing Attribute on x-Axis (e.g. Product Type)Dx
16
Daniel A. Keim: Visual Data Mining32
Definition of Pixel Bar Charts
Basic Idea of Pixel Bar Charts:
Dividing Attribute on y-Axis (e.g. Region)Dx
Dy
Daniel A. Keim: Visual Data Mining33
Definition of Pixel Bar Charts
Basic Idea of Pixel Bar Charts:
Ordering Attributes on x- and y-Axes
Ox
Oy
C
17
Daniel A. Keim: Visual Data Mining34
Basic Idea of Multi Pixel Bar Charts:
Multi-Pixel Bar Charts
C1 C2
C3 C4
Definition of Pixel Bar Charts
Daniel A. Keim: Visual Data Mining36
Definition of Pixel Bar Charts
Formalization of the Pixel Positioning Problem:
1. Dense Display Constraint
⎣ ⎦ ),()(:/..1,..1 jidfwithdwpjwi íí =∃=∀=∀
(equal-width Pixel Bar Chart)
⎣ ⎦ ),()(:..1,/..1 jidfwithdhjhpi íí =∃=∀=∀
(equal-height Pixel Bar Chart)
18
Daniel A. Keim: Visual Data Mining37
Definition of Pixel Bar Charts
3. Locality Constraint
min)),1(),,((
))1,(),,((1
1 111
1
1
111
→+
++
∑ ∑∑ ∑
−
= =−−
=
−
=−−
w
x
h
y
w
x
h
y
yxfyxfsim
yxfyxfsim
2. No-Overlap Constraint
)()(:, jiji dfdfjiDBdd ≠⇒≠∈∀
Daniel A. Keim: Visual Data Mining38
Definition of Pixel Bar Charts
4. Ordering Constraint
xdfxdfaanji jiji ).().(:..1, 11 >⇒>∈∀
ydfydfaanji jiji ).().(:..1, 22 >⇒>∈∀
min2)).,1().,(
).,1().,((
2)).1,().,(
).1,().,((
21
21
1
1 1 21
21
11
11
1
1
1 11
11
→+−
++−
++−
++−
−−
−
= =−−
−−
=
−
=−−
∑ ∑
∑ ∑
ayxfayxf
ayxfayxf
ayxfayxf
ayxfayxf
w
x
h
y
w
x
h
y
19
Daniel A. Keim: Visual Data Mining39
Pixel Placement Algorithm
1. Step:
Determine the -quantiles of Ox
and the - quantiles of Oy.
w
X1 Xw. . .
Y1
Yh
.
.
. h
⎥⎦
⎤⎢⎣
⎡ −w
ww
1,,1
⎥⎦
⎤⎢⎣
⎡ −h
hh
1,,1
Daniel A. Keim: Visual Data Mining40
Pixel Placement Algorithm
2. Step:Position bottom left corner pixel
or{ }⎭⎬⎫
⎩⎨⎧
=∈
−2
1
1 .min|)1,1( addf sXsds { }⎭⎬⎫
⎩⎨⎧
=∈
−1
1
1 .min|)1,1( addf sYsds
Position left and bottom pixels
{ } wiaddif siXsds ..1.min|)1,( 2
1 =∀⎭⎬⎫
⎩⎨⎧
=∈
−
{ } hjaddjf sjYsds ..1.min|),1( 1
1 =∀⎭⎬⎫
⎩⎨⎧
=∈
−
20
Daniel A. Keim: Visual Data Mining41
Pixel Placement Algorithm
3. Step:Position remaining pixels
{ } ∅≠∩⎭⎬⎫
⎩⎨⎧
=∩∈
−jYiXifadadsumdjif ss
jYiXsds ).,.(min|),( 211
∅=∩ jYiX
jYXXd iis ∩∪∈ + )( 1
)()( 11 ++ ∪∩∪∈ jjiis YYXXd
If
consider
. . .
Daniel A. Keim: Visual Data Mining42
Application
Multi-Pixel Bar Chart for Mining 106,199 Customer Buying Transactions(Dx= product type, Dy= ⊥,Ox=dollar amount, Oy=region, C)
Product Type Product TypeProduct Type Product Type
C=Dollar amount C=No. of Visits C=QuantityC= Region
1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12 1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12
low
high
1
2
3
4
5
78
9
10
6
21
Daniel A. Keim: Visual Data Mining43
Application
Multi-Pixel Bar Chart for Mining 405,000 Sales Transaction Records
(Dx= product type, Dy= ⊥, Ox=no. of visits, Oy= dollar amount, C)
customer A$345,000
Product Type Product TypeProduct Type
C=No. of Visits C=QuantityC=Dollar Amount
1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12 1 2 3 4 5 6 7 10 12
low
high
customer A25 visits
customer A500 items
Daniel A. Keim: Visual Data Mining44
Application
Pixel Bar Charts for mining over 150,000 E-Customer Purchasing Activities (in 1999)
(Dx= month, Dy= ⊥, Ox=no. of visits, Oy= purchase amount, C)
high
customer Ain September customer A
purchase $1,500,000customer Avisits 125times items
customer Apurchase 200items
low
high quantityhigh visits
top dollar amountcustomershighest number of customers
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
C=Month C=Purchase Amount C= # of Visits C=Quantity
22
Daniel A. Keim: Visual Data Mining45
Application
Pixel Bar Charts for Mining 106,199 IT Resource Center Customer Activities
IT Resource Center – Search http://itrc.hp.com
(Dx= Search Criteria, Dy= ⊥, Ox=No. of Keywords, Oy= Search Type, C)
Search Criteria
Typ
e
1 2 3 4 1 2 3 4 1 2 3 4
low
high
Search Criteria Search Criteria
fix/solve problem
patch search
C= Search Criteria C= Search Type C= No. of Keywords
search criteria= boolean
type = product search
Daniel A. Keim: Visual Data Mining46
ApplicationParallel Coordinate Visualization of 150,000 E-Customer Purchasing Activities
Month Purchase Amount Quantity #of Orders Region State Code Month Purchase Amount Quantity #of Orders Region State Code
Focus on one Month (June) Focus on one Purchase Amount (Medium Price)
23
Daniel A. Keim: Visual Data Mining47
Overview
1. Introduction
2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier)
3. Pixel Bar Charts1. Problem2. Solution3. Application
4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization
5. Conclusions
SOMs of 3D Feature Vectors
24
Daniel A. Keim: Visual Data Mining49
3D-Retrieval on CAD Database
Upper Rows: Combination of 4 Feature VectorsLower Rows: Best Single FV
Daniel A. Keim: Visual Data Mining50
• Upper Row: Depth Buffer• Middle Row: Silhouette• Lower Row: Linear Combination of Depth Buffer and Silhouette
3D-Retrieval on Internet Database
25
Daniel A. Keim: Visual Data Mining51
3D-Retrieval on Internet Database
Daniel A. Keim: Visual Data Mining52
Clustering of CAD Database
26
Daniel A. Keim: Visual Data Mining53
Clustering of Internet Database
Route Visualization
27
Daniel A. Keim: Visual Data Mining55
Route Visualization
Daniel A. Keim: Visual Data Mining56
Route Visualization
28
E-mail / SPAM Visualization
Daniel A. Keim: Visual Data Mining58
E-Mail Visualization
29
Powerwall
Daniel A. Keim: Visual Data Mining60
30
Daniel A. Keim: Visual Data Mining61
The EndThe End
31
Fachbereich Informatik und Informations-wissenschaften
Datenbanken &Informationssysteme
Algorithmen & Datenstrukturen
Informationswissenschaft
MultimediaSignalverarbeitung
Mensch-ComputerInteraktion
Datenbanken & Visualisierung
Computergrafik &Medieninformatik
BioInformatik & Information Mining
Prof. Dr. D.A.Keim
Prof. Dr. M.Scholl
Prof. Dr. O.Deussen
Prof. Dr. R.Kuhlen
Prof. Dr. U.Brandes
Prof. Dr. D.Saupe
Prof. Dr. H.Reiterer
Prof. Dr. M.R.Berthold
Software Engineering
Computernetze
Prof. Dr. S. Leue
Prof. Dr. M. Waldvogel
Fachbereich Informatik und Informations-wissenschaften
Algorithmen & Datenstrukturen
Informationswissenschaft
Datenbanken & Visualisierung
Computergrafik &Medieninformatik
BioInformatik & Information Mining
Software engineering
Computernetzwerke
Datenbanken &Informationssysteme
MultimediaSignalverarbeitung
Mensch-ComputerInteraktion