1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning...

20
ICDM 2004 Business Meeting 11/4/2004 1 Data Mining Data Mining on ICDM Submission on ICDM Submission Data Data Shusaku Tsumoto Ning Zhong and Xindong Wu

Transcript of 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning...

Page 1: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 1

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Shusaku Tsumoto

Ning Zhong and Xindong Wu

Page 2: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 2

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

38 countries, 445 Submissions Regular Papers: 39 (9%) Short Papers: 66 (14.8%)

High Acceptance Ratio (Regular)– Germany: 4/15 (26.7%)– Finland: 2/ 9 (22.2%)– USA: 20/109 (18.3%)

Page 3: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 3

CountryCountry

Country Regular Short Total Ratio

USA 20 28 109 44.0%

China 3 4 55 12.7%

UK 1 6 39 17.9%

Japan 0 5 28 17.9%

Canada 3 3 25 24.0%

Taiwan 0 1 18 5.6%

Australia 2 1 17 17.6%

Germany 4 5 15 60.0%

France 0 2 14 14.3%

India 1 0 14 7.1%

Singapore 0 3 12 25.0%

Brazil 0 1 12 8.3%

Italy 2 1 10 30.0%

Finland 2 1 9 33.3%

Spain 0 1 7 14.3%

HongKong 1 1 6 33.3%

Top 15 39 63 390 26.2%

Total 39 66 445 23.8%

Page 4: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 4

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Top 5 Areas of Submissions:– Data mining applications– Data mining and machine learning algorithms and methods– Mining text and semi-structured data, and mining temporal, spatial and multimedia

data– Data pre-processing, data reduction, feature selection and feature transformation– Soft computing and uncertainty management for data mining

High Acceptance Ratio Areas (Regular+Short)– Quality assessment and interestingness metrics of data mining results

5/10 50.0%– Data pre-processing, data reduction, feature selection and feature transfor

mation 14/35 40.0%– Complexity, efficiency, and scalability issues in data mining

4/11 36.4%

Page 5: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

5

TopicsTopics

TopicRegular

Short

Total

Ratio

Data mining applications 4 10 8416.7

%

Data mining and machine learning algorithms and methods

9 20 8135.8

%

Mining text and semi-structured data, and mining temporal, spatial and multimedia data

3 8 4425.0

%

Data pre-processing, data reduction, feature selection and feature transformation

7 7 3540.0

%

Soft computing and uncertainty management for data mining

  3 348.8

%

Foundations of data mining 2 1 2611.5

%

Mining data streams 3 4 2528.0

%

Human-machine interaction and visual data mining   1 166.3

%

Security, privacy and social impact of data mining 2 1 1520.0

%

Data and knowledge representation for data mining 1 1 1216.7

%

Pattern recognition and trend analysis   1 119.1

%

Complexity, efficiency, and scalability issues in data mining

2 2 1136.4

%

Quality assessment and interestingness metrics of data mining results

2 3 1050.0

%

Statistics and probability in large-scale data mining 1   911.1

%

Integration of data warehousing, OLAP and data mining

  1 911.1

%

Collaborative filtering/personalization   2 728.6

%

Post-processing of data mining results 1 1 728.6

%

Others 2   633.3

%

High performance and parallel/distributed data mining

1   250.0

%

Query languages and user interfaces for mining     10.0

%

Total 39 66 44523.8

%

Page 6: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 6-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5

Corresponding AnalysisCorresponding Analysis(Country vs Final Decision)(Country vs Final Decision)

Reject

Regular

Short

Slovenia

Japan

Hong Kong

USA

r2=0.177

Germany

ItalyIndia

r1=0.378

Finland

UK France

Canada

Australia

Page 7: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 7-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Corresponding AnalysisCorresponding Analysis(Topics vs Final Decision)(Topics vs Final Decision)

RejectShort

RegularStatistics and probability

Security, privacy

Applications

Post-processing

r2=0.184

Preprocessing, Feature Selection

r1=0.280

High-performance

Quality-assessment

Collaborative Filtering

Soft-computing

DM Methods

Page 8: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 8

Corresponding AnalysisCorresponding Analysis

Country vs Final Decision– Regular: Germany, USA– Short: ? – Reject: Most of the countries are located near this region.

Topics vs Final Decision– Regular: Quality Assessment,

Preprocessing/Feature Selection– Short: DM/ML Methods, Collaborative Filtering– Reject: DM Applications

Page 9: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 9

Rule Mining Rule Mining on ICDM Submission Dataon ICDM Submission Data

Datasets– Sample Size: 445– Attributes: 5

• Paper No. : ordered by submission date• # of Authors• # of Characters in Title• Country• Category

– Analyzed by Clementine 7.1 (and SPSS12.0J)

Page 10: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 10

Rule Mining (C5.0)Rule Mining (C5.0)on ICDM Submission Dataon ICDM Submission Data

C5.0

– [Topic=Mining semi-structured data,…] & [129< Paper No.<=369] => Reject (Confidence 0.87, Support 10)

– [Country=USA] & [Topic=Mining semi-structured data,…] & [Paper No.>369] & [# of Authors <=3] =>Accept (Confidence 0.667, Support 3)

– [Topic=Preprocessing/Feature Selection] & [# of Authors>4] => Accept (Confidence: 1.0, Support 3)

– Topic, Paper No, # of Authors : Important Features

Page 11: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 11

Rule Mining (GRI)Rule Mining (GRI)on ICDM Submission Dataon ICDM Submission Data

Generalized Rule Induction

– [# of Authors <2] & [Paper No. <120.5] => Rejected (Confidence 96.0%, Support 24)

– [# of Chars in Title< 27] & [Paper No. > 212]=> Accepted (Confidence 100%, Support 5)

Paper No., # of Chars in Title, # of Authors: Important Features

Page 12: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 12

Multidimensional ScalingMultidimensional Scaling(2004)(2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country

Page 13: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 13

Summary (2004) of Mining Summary (2004) of Mining on ICDM Submission Data on ICDM Submission Data

Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titles needed

Mining Text/Web/Semi-structured Data are very popular. # of Application papers are growing now. (But, many: rejected) Strong Topics

– Preprocessing/Feature-Selection

– Postprocessing

– Security and Privacy Several topics are emerging in ICDM2004:

– Mining Data Streams

– Collaborative Filtering

– Quality Assessment

Page 14: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 14

Comparison Comparison between 02-04between 02-04Review Scores: Review Scores:

Box-plot Box-plot

2002 2003 2004

year

0.00

1.00

2.00

3.00

4.00

5.00

score

1,1691,176

Page 15: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 15

Comparison between 02-04Comparison between 02-04Countries Countries

CountryAcceptance Ratio (2002)

Country Acceptance Ratio (2003)

Country Acceptance Ratio (2004)

Hong Kong 64.7% Israel 55.0% Germany 60.0%

USA 47.9% Hong Kong 50.0% USA 44.0%

Canada 45.5% Japan 37.0% Finland 33.0%

Finland 33.3% USA 33.0% Hong Kong 33.0%

France 33.3% Germany 32.0% Italy 30.0%

Page 16: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

16

Comparison between 02 and 04Comparison between 02 and 04Topics Topics

Top 5 in 2002

AcceptanceRatio

Top 5 in 2003

AcceptanceRatio

Top 5 in 2004

Acceptance Ratio

Graph Mining

75.0%Process-centric DM

80.0% Quality Assessment 50.0%

Temporal Data

52.6%Security, privacy

57.0%Preprocessing, Feature Selection

40.0%

Theory 42.9%Statistics and Probability

47.0%Complexity/Scalability

36.4%

Text Mining

42.1%Visual Data Mining

38.0%DM and ML Methods

35.8%

Rule 41.7%Post-processing

41.7%Collaborative Filtering

28.6%

        Post-processing 28.6%

Page 17: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 17

Multidimensional ScalingMultidimensional Scaling(2003 and 2004)(2003 and 2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

Topics Paper No.

Country

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

Topics Paper No.

Country

2003

2004

Topological structure w.r.t. similaritiesseems not to be changed in 2003 and 2004.

Page 18: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 18

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Acknowledgements– Many thanks to

• PC chairs, Vice Chairs and PC members

• All the authors• All the contributors to ICDM2004

– See you again in ICDM2005!

Page 19: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 19

Multidimensional ScalingMultidimensional Scaling(2004)(2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country

Page 20: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 20-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Multidimensional ScalingMultidimensional Scaling(2003)(2003)

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country