Visual Data Mining: Problems and Applications1 Daniel Keim University of Konstanz Visual Data...

Post on 23-Jun-2020

5 views 0 download

Transcript of Visual Data Mining: Problems and Applications1 Daniel Keim University of Konstanz Visual Data...

1

Daniel Keim

University of Konstanz

Visual Data Mining:Problems and Applications

Summer School of the Graduiertenkolleg Exploration and Visualization of Large Data Sets

St. Vigil, 16. September 2004

Databases

Visual Data Exploration

Data Mining

MultimediaSim. Search

High-dim. Indexing

Interactive Queries

NN-Queries

Circle Seg.

X-TreeTree Stripping

Cost Models

Denclue OptiGridHD-Eye

InfoFusion

Rec. Pattern

Pixel Bar ChartsVisualPoints

VisDB

CartoDraw

GRADIImage SimSearch

2D Partial SimSearch

Geometric3D SimSearch

VisOpt

Clustering NN-Search in High-dim. DBs

Visual Interfaces

Car Industry

MolecularBiology

Medicine

Image

3D-Objects

CAD

Telcom

Marketing

Monitoring

Cartography

Fraud Detection

Finance

2

Daniel A. Keim: Visual Data Mining3

Overview

1. Introduction

2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier Bibliographic Database)

3. Pixel Bar Charts1. Problem2. Solution3. Application

4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization

5. Conclusions

InfoVis Contest

3

Daniel A. Keim: Visual Data Mining5

Daniel A. Keim: Visual Data Mining6

4

Daniel A. Keim: Visual Data Mining7

Daniel A. Keim: Visual Data Mining8

5

Daniel A. Keim: Visual Data Mining9

Coauthors of G. Robertson

Daniel A. Keim: Visual Data Mining10

Coauthors of B. Shneiderman

6

Daniel A. Keim: Visual Data Mining11

Coauthors of D. Keim

DBLP –Trier Bibliographic Database

7

Daniel A. Keim: Visual Data Mining13

All Profs

Keim

SaupeReiterer

Berthold

Scholl Waldvogel

Deussen Kuhlen

Leue

Brandes

Daniel A. Keim: Visual Data Mining14

Prof. U. Brandes

8

Daniel A. Keim: Visual Data Mining15

Prof. M. Berthold

Daniel A. Keim: Visual Data Mining16

Prof. R. Kuhlen

9

Daniel A. Keim: Visual Data Mining17

Prof. O. Deussen

Daniel A. Keim: Visual Data Mining18

Prof. D. Keim

10

Daniel A. Keim: Visual Data Mining19

Prof. S. Leue

Daniel A. Keim: Visual Data Mining20

Prof. H. Reiterer

11

Daniel A. Keim: Visual Data Mining21

Prof. D. Saupe

Daniel A. Keim: Visual Data Mining22

Prof. M. Scholl

12

Daniel A. Keim: Visual Data Mining23

Prof. M. Waldvogel

Daniel A. Keim: Visual Data Mining24

All Profs

Keim

SaupeReiterer

Berthold

Scholl Waldvogel

Deussen Kuhlen

Leue

Brandes

13

Daniel A. Keim: Visual Data Mining25

Overview

1. Introduction

2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier)

3. Pixel Bar Charts1. Problem2. Solution3. Application

4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization

5. Conclusions

Daniel A. Keim: Visual Data Mining27

Equal-Width Pixel Bar Chart

From Bar Charts to Pixel Bar Charts

14

Daniel A. Keim: Visual Data Mining28

From Bar Charts to Pixel Bar Charts

Equal-Height Pixel Bar Chart

Product Type

Product Type

Daniel A. Keim: Visual Data Mining29

Multi Pixel Bar Charts

1 2 3 4 5 6 7 10 11 12

Color=quantity

Product Type

$ A

mou

nt

Product Type

Color=number of visits

1 2 3 4 5 6 7 10 11 121 2 3 4 5 6 7 10 11 12

Color=dollar amount

Product Type

low

high

From Bar Charts to Pixel Bar Charts

15

Daniel A. Keim: Visual Data Mining30

Definition of Pixel Bar Charts

A pixel bar chart is defined by a five tuple

<Dx, Dy, Ox, Oy, C>

where Dx, Dy, Ox, Oy, C ∈ {Al, …, Ak,} and

• Dx / Dy are the dividing attributes on the x-/y-axes

• Ox / Oy are the ordering attributes on the x-/y-axes

• C defines the coloring attribute(s)

Daniel A. Keim: Visual Data Mining31

Definition of Pixel Bar Charts

Basic Idea of Pixel Bar Charts:

Dividing Attribute on x-Axis (e.g. Product Type)Dx

16

Daniel A. Keim: Visual Data Mining32

Definition of Pixel Bar Charts

Basic Idea of Pixel Bar Charts:

Dividing Attribute on y-Axis (e.g. Region)Dx

Dy

Daniel A. Keim: Visual Data Mining33

Definition of Pixel Bar Charts

Basic Idea of Pixel Bar Charts:

Ordering Attributes on x- and y-Axes

Ox

Oy

C

17

Daniel A. Keim: Visual Data Mining34

Basic Idea of Multi Pixel Bar Charts:

Multi-Pixel Bar Charts

C1 C2

C3 C4

Definition of Pixel Bar Charts

Daniel A. Keim: Visual Data Mining36

Definition of Pixel Bar Charts

Formalization of the Pixel Positioning Problem:

1. Dense Display Constraint

⎣ ⎦ ),()(:/..1,..1 jidfwithdwpjwi íí =∃=∀=∀

(equal-width Pixel Bar Chart)

⎣ ⎦ ),()(:..1,/..1 jidfwithdhjhpi íí =∃=∀=∀

(equal-height Pixel Bar Chart)

18

Daniel A. Keim: Visual Data Mining37

Definition of Pixel Bar Charts

3. Locality Constraint

min)),1(),,((

))1,(),,((1

1 111

1

1

111

→+

++

∑ ∑∑ ∑

= =−−

=

=−−

w

x

h

y

w

x

h

y

yxfyxfsim

yxfyxfsim

2. No-Overlap Constraint

)()(:, jiji dfdfjiDBdd ≠⇒≠∈∀

Daniel A. Keim: Visual Data Mining38

Definition of Pixel Bar Charts

4. Ordering Constraint

xdfxdfaanji jiji ).().(:..1, 11 >⇒>∈∀

ydfydfaanji jiji ).().(:..1, 22 >⇒>∈∀

min2)).,1().,(

).,1().,((

2)).1,().,(

).1,().,((

21

21

1

1 1 21

21

11

11

1

1

1 11

11

→+−

++−

++−

++−

−−

= =−−

−−

=

=−−

∑ ∑

∑ ∑

ayxfayxf

ayxfayxf

ayxfayxf

ayxfayxf

w

x

h

y

w

x

h

y

19

Daniel A. Keim: Visual Data Mining39

Pixel Placement Algorithm

1. Step:

Determine the -quantiles of Ox

and the - quantiles of Oy.

w

X1 Xw. . .

Y1

Yh

.

.

. h

⎥⎦

⎤⎢⎣

⎡ −w

ww

1,,1

⎥⎦

⎤⎢⎣

⎡ −h

hh

1,,1

Daniel A. Keim: Visual Data Mining40

Pixel Placement Algorithm

2. Step:Position bottom left corner pixel

or{ }⎭⎬⎫

⎩⎨⎧

=∈

−2

1

1 .min|)1,1( addf sXsds { }⎭⎬⎫

⎩⎨⎧

=∈

−1

1

1 .min|)1,1( addf sYsds

Position left and bottom pixels

{ } wiaddif siXsds ..1.min|)1,( 2

1 =∀⎭⎬⎫

⎩⎨⎧

=∈

{ } hjaddjf sjYsds ..1.min|),1( 1

1 =∀⎭⎬⎫

⎩⎨⎧

=∈

20

Daniel A. Keim: Visual Data Mining41

Pixel Placement Algorithm

3. Step:Position remaining pixels

{ } ∅≠∩⎭⎬⎫

⎩⎨⎧

=∩∈

−jYiXifadadsumdjif ss

jYiXsds ).,.(min|),( 211

∅=∩ jYiX

jYXXd iis ∩∪∈ + )( 1

)()( 11 ++ ∪∩∪∈ jjiis YYXXd

If

consider

. . .

Daniel A. Keim: Visual Data Mining42

Application

Multi-Pixel Bar Chart for Mining 106,199 Customer Buying Transactions(Dx= product type, Dy= ⊥,Ox=dollar amount, Oy=region, C)

Product Type Product TypeProduct Type Product Type

C=Dollar amount C=No. of Visits C=QuantityC= Region

1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12 1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12

low

high

1

2

3

4

5

78

9

10

6

21

Daniel A. Keim: Visual Data Mining43

Application

Multi-Pixel Bar Chart for Mining 405,000 Sales Transaction Records

(Dx= product type, Dy= ⊥, Ox=no. of visits, Oy= dollar amount, C)

customer A$345,000

Product Type Product TypeProduct Type

C=No. of Visits C=QuantityC=Dollar Amount

1 2 3 4 5 6 7 10 12 1 2 3 4 5 6 10 12 1 2 3 4 5 6 7 10 12

low

high

customer A25 visits

customer A500 items

Daniel A. Keim: Visual Data Mining44

Application

Pixel Bar Charts for mining over 150,000 E-Customer Purchasing Activities (in 1999)

(Dx= month, Dy= ⊥, Ox=no. of visits, Oy= purchase amount, C)

high

customer Ain September customer A

purchase $1,500,000customer Avisits 125times items

customer Apurchase 200items

low

high quantityhigh visits

top dollar amountcustomershighest number of customers

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

C=Month C=Purchase Amount C= # of Visits C=Quantity

22

Daniel A. Keim: Visual Data Mining45

Application

Pixel Bar Charts for Mining 106,199 IT Resource Center Customer Activities

IT Resource Center – Search http://itrc.hp.com

(Dx= Search Criteria, Dy= ⊥, Ox=No. of Keywords, Oy= Search Type, C)

Search Criteria

Typ

e

1 2 3 4 1 2 3 4 1 2 3 4

low

high

Search Criteria Search Criteria

fix/solve problem

patch search

C= Search Criteria C= Search Type C= No. of Keywords

search criteria= boolean

type = product search

Daniel A. Keim: Visual Data Mining46

ApplicationParallel Coordinate Visualization of 150,000 E-Customer Purchasing Activities

Month Purchase Amount Quantity #of Orders Region State Code Month Purchase Amount Quantity #of Orders Region State Code

Focus on one Month (June) Focus on one Purchase Amount (Medium Price)

23

Daniel A. Keim: Visual Data Mining47

Overview

1. Introduction

2. Visualization of bibliographic Data 1. InfoVis Contest2. DBLP (Trier)

3. Pixel Bar Charts1. Problem2. Solution3. Application

4. Other Problems and Applications1. SOMs of 3D Feature Vectors2. Route Visualization3. E-mail / SPAM Visualization

5. Conclusions

SOMs of 3D Feature Vectors

24

Daniel A. Keim: Visual Data Mining49

3D-Retrieval on CAD Database

Upper Rows: Combination of 4 Feature VectorsLower Rows: Best Single FV

Daniel A. Keim: Visual Data Mining50

• Upper Row: Depth Buffer• Middle Row: Silhouette• Lower Row: Linear Combination of Depth Buffer and Silhouette

3D-Retrieval on Internet Database

25

Daniel A. Keim: Visual Data Mining51

3D-Retrieval on Internet Database

Daniel A. Keim: Visual Data Mining52

Clustering of CAD Database

26

Daniel A. Keim: Visual Data Mining53

Clustering of Internet Database

Route Visualization

27

Daniel A. Keim: Visual Data Mining55

Route Visualization

Daniel A. Keim: Visual Data Mining56

Route Visualization

28

E-mail / SPAM Visualization

Daniel A. Keim: Visual Data Mining58

E-Mail Visualization

29

Powerwall

Daniel A. Keim: Visual Data Mining60

30

Daniel A. Keim: Visual Data Mining61

The EndThe End

31

Fachbereich Informatik und Informations-wissenschaften

Datenbanken &Informationssysteme

Algorithmen & Datenstrukturen

Informationswissenschaft

MultimediaSignalverarbeitung

Mensch-ComputerInteraktion

Datenbanken & Visualisierung

Computergrafik &Medieninformatik

BioInformatik & Information Mining

Prof. Dr. D.A.Keim

Prof. Dr. M.Scholl

Prof. Dr. O.Deussen

Prof. Dr. R.Kuhlen

Prof. Dr. U.Brandes

Prof. Dr. D.Saupe

Prof. Dr. H.Reiterer

Prof. Dr. M.R.Berthold

Software Engineering

Computernetze

Prof. Dr. S. Leue

Prof. Dr. M. Waldvogel

Fachbereich Informatik und Informations-wissenschaften

Algorithmen & Datenstrukturen

Informationswissenschaft

Datenbanken & Visualisierung

Computergrafik &Medieninformatik

BioInformatik & Information Mining

Software engineering

Computernetzwerke

Datenbanken &Informationssysteme

MultimediaSignalverarbeitung

Mensch-ComputerInteraktion