Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010...

9
Research and Design of the Crawler System in a Vertical Search Engine Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010 , Page(s): 790 - 792 Speaker : Chang, Kun- Hsiang 1

description

 The crawler system in a vertical search engine should format a representative sample web page so at to make sure that the page could meet the W3C standards, which make it available that the processed page can be resolved by the visual XPath generator and then the desired XPath value will be found out. 3

Transcript of Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010...

Page 1: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

1

Research and Design of the Crawler System in a

Vertical Search EngineMin Li; Jun Zhao; Tinglei HuangIntelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010 , Page(s): 790 - 792

Speaker : Chang, Kun-Hsiang

Page 2: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

2

Abstract MODEL AND FRAMEWORK DESIGN OF THE

CRAWLERSYSTEM◦ Workflow diagram of a vertical search engine◦ Main business logic in the crawler system◦ Main design patterns in the crawler system◦ Projects and their dependency diagram of the

crawler system

Outline

Page 3: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

3

The crawler system in a vertical search engine should format a representative sample web page so at to make sure that the page could meet the W3C standards, which make it available that the processed page can be resolved by the visual XPath generator and then the desired XPath value will be found out.

Abstract

Page 4: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

4

vertical search engine 垂直搜尋引擎垂直搜尋引擎是針對某一個行業的專業搜尋引擎,是搜尋引擎的細分和延伸,是對網頁庫中的某類專門的訊息進行一次整合,定向分欄位抽取出需要的資料進行處理後再以某種形式返回給使用者。 Xpath - XML Path Language為 XML路徑語言 http://studiesweb.wikidot.com/xml:xpath

Page 5: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

5

Workflow diagram of a vertical search engine

Page 6: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

6

The task configuration parameters are divided into 4 parts:

task basic attributes path configuration retrieving rules schedule configuration

Main business logic in the crawler system

Page 7: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

7

Main design patterns in the crawler system

Page 8: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

8

Projects and their dependency diagram of the crawler system

Page 9: Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010, Page(s): 790.

9

Thank you for listening…

Speaker : Chang, Kun-Hsiang