Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010...
-
Upload
alvin-harvey -
Category
Documents
-
view
219 -
download
0
description
Transcript of Min Li; Jun Zhao; Tinglei Huang Intelligent Computing and Integrated Systems (ICISS), 2010...
1
Research and Design of the Crawler System in a
Vertical Search EngineMin Li; Jun Zhao; Tinglei HuangIntelligent Computing and Integrated Systems (ICISS), 2010 International Conference Publication Year: 2010 , Page(s): 790 - 792
Speaker : Chang, Kun-Hsiang
2
Abstract MODEL AND FRAMEWORK DESIGN OF THE
CRAWLERSYSTEM◦ Workflow diagram of a vertical search engine◦ Main business logic in the crawler system◦ Main design patterns in the crawler system◦ Projects and their dependency diagram of the
crawler system
Outline
3
The crawler system in a vertical search engine should format a representative sample web page so at to make sure that the page could meet the W3C standards, which make it available that the processed page can be resolved by the visual XPath generator and then the desired XPath value will be found out.
Abstract
4
vertical search engine 垂直搜尋引擎垂直搜尋引擎是針對某一個行業的專業搜尋引擎,是搜尋引擎的細分和延伸,是對網頁庫中的某類專門的訊息進行一次整合,定向分欄位抽取出需要的資料進行處理後再以某種形式返回給使用者。 Xpath - XML Path Language為 XML路徑語言 http://studiesweb.wikidot.com/xml:xpath
5
Workflow diagram of a vertical search engine
6
The task configuration parameters are divided into 4 parts:
task basic attributes path configuration retrieving rules schedule configuration
Main business logic in the crawler system
7
Main design patterns in the crawler system
8
Projects and their dependency diagram of the crawler system
9
Thank you for listening…
Speaker : Chang, Kun-Hsiang