Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree...

23
Intelligent Database Systems Presenter: WU, MIN-CONG Authors: Yongzheng Zhang , Rajyashree Mukherjee , Benny Soetarman 2012, ACM Concept Extraction for Online Shopping

Transcript of Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree...

Page 1: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Presenter: WU, MIN-CONG

Authors: Yongzheng Zhang , Rajyashree Mukherjee ,

Benny Soetarman

2012, ACM

Concept Extraction for Online Shopping

Page 2: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

1

Page 3: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Motivation• In order to provide a more streamlined user

experience in shopping related research, it is

critical for e-commerce sites to accurately

identify what a Web page is talking about.

2

Page 4: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Objectives• We investigate two concept extraction methods ACE

and KEA in the online shopping context. We discuss

how to upgrade ACE with major improvements into

ICE.

3

Page 5: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Methodology - ACE

ACE

ICE

KEA

5

Page 6: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Methodology - ACE

ACE

ICE

KEA

ACE

ICE

KEA

Trem frequency

6

Page 7: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Methodology - ACE

ACE

ICE

KEA

ACE

HTML Scorer

TF Scorer

ACE

ICE

KEA

Tokenization

Concept Miner

Concept Derivation

7

Page 8: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Methodology - ACE

ACE

ICE

KEA

ACE

HTML Scorer

ACE

ICE

KEA

Tokenization

Concept Miner

Concept Derivation

TF Scorer

8

Page 9: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Methodology - ACE

ACE

ICE

KEA

ACE

TF Scorer

ACE

ICE

KEA

Tokenization

Concept Miner

Concept Derivation

HTML Scorer

9

Page 10: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

10

Page 11: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

11

Page 12: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

<a href = “http://buy.ebay.com/cell-phone”>cell phone</a> <a href =“http://www.ebay.com/”>Home</a>

cell phone homecell

phonehome

12

Page 13: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

Baseball

Professional Baseball

Baseball Players

Professional Baseball Players

13

Page 14: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Baseball

Professional Baseball

Professional Baseball Players

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

Baseball Players

Baseball Players

14

Page 15: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

Emphasis scorer

ACE

HTML Scorer

TF Scorer

Concept Miner

15

Page 16: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

ACE

HTML Scorer

TF Scorer

Concept Miner

15

overlapping title

Emphasis scorer

Page 17: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - ICE

ACE

ICE

ACE

ICE

KEA

Porter stemming

Baseball baseballs Baseballs

baseball

17

Page 18: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

ACE

ICE

KEA

Methodology - KEA

ACE

ICE

KEA

ACE

Human authored

TF-IDF

first appearance

Naïve Bayes model

18

Page 19: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Experiment

11 Evaluation framework

ACE V.S ICE

T

B

λ

ICE V.S KEA

Document Topic

100 shopping related Web pages Dell, HP, and Canon

19

Page 20: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Experiment

11 Evaluation framework

ICE V.S KEA

Document Topic

100 shopping related Web pages Dell, HP, and Canon

ACE V.S ICE

T

B

λ

ACE V.S ICE[B,T]

20

Page 21: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Experiment

11 Evaluation framework

Document Topic

100 shopping related Web pages Dell, HP, and Canon

ACE V.S ICE

T

B

λ

ICE V.S KEA

KEA50 Web

pages for training

ICE

precision recall F1-measure

ICE 0.7383 0.8300 0.7815

KEA 0.6583 0.7600 0.7056

21

Page 22: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Conclusions• The experimental results demonstrate that ICE

significantly outperforms KEA in concept extraction

for online shopping.

22

Page 23: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Yongzheng Zhang, Rajyashree Mukherjee, Benny Soetarman 2012, ACM Concept Extraction for.

Intelligent Database Systems Lab

Comments• Advantages– ICE is an unsupervised method that doesn’t need

to Human-authored keyphrase.• Applications– online shopping, concept extraction, automatic keyphrase extraction.

23