DCA Mini Project Presentation

11

Click here to load reader

Transcript of DCA Mini Project Presentation

Page 1: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 1/11

RESULT SETRESULT SETCATEGORIZATIONCATEGORIZATION Categorizing the Document Search 

 Result Set 

BYY. ( )BAKTAVATCHALAM 08MW03( )BAKTAVATCHALAM 08MW03

 SG COLLEGE OF SG COLLEGE OFTECHNOLOGYECHNOLOGY

 A Review on 

Page 2: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 2/11

Page 3: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 3/11

INTRODUCTIONINTRODUCTION

 – Text categorization is the classification of the information resource by its topic(politics, sport, etc), selected from apredetermined set.

 –

 – Here Searching a given keyword set in agiven website set and categorizes thewebsites. If a keyword set is given then it

will determine the documents which aremost relevant to that keyword set and alsoretrieve the category which it belongs tothat keyword set.

 – – Here we do search for all kinds of TextualPSG College Of  Distributed Component Lab 

Page 4: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 4/11

 PSG College Of 

INTRODUCTION…INTRODUCTION…

 – Here each key is associated with some Threshold value for ranking the result set.Each category is associated withcorresponding key set and weights of 

those key set. Ranking is done bydocument key set weights and occurrencecount of those key set. The categories andtheir related categories are maintained

separately to refine result set. – Here we do two independent operations.

First we generate the categories and itsrelated categories. Second, we givekeywords to search engine to search the

document and its corresponding category.Distributed Component Lab 

Page 5: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 5/11

 PSG College Of 

EXISTING SYSTEMEXISTING SYSTEM

DRAWBACKSDRAWBACKS

 – Here the result set is not sorted according torelevance rather it is sorted by filename,date, size … So User will not get accurateresult and each time it search throw all

given File set, So the response time is veryhigh.

 –

 – Here categorized result is not available, so

User doesn’t know which file is whichcategory if many files has same name.Also user doesn’t know which file isrelated to which file.

 Distributed Component Lab 

Page 6: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 6/11

DESIGNDESIGN

PSG College Of 

 Key Set

 Websites

( & )Documents Pages

 Document Finder Categorizer

 Search Keyword Documents

( , )Key Set Category

&Categories Related Categories+Documents Categories

 Distributed Component Lab 

Page 7: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 7/11

IMPLEMENTATIONIMPLEMENTATION

• Server Module  This module contains following sub-modules, Load Details Categorizing

Searching• Load Details In this module we load Categories & its related

categories, Documents & its categories, Categories & itsKeys with Weights. Weight is given as 0 to 100.

• Categorizing In this module we categorize the given document

using key set parsed from that document andcorresponding weights relevant to available categories.

• Searching In this module we search documents and its

category using given key set. PSG College Of  Distributed Component Lab 

Page 8: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 8/11

IMPLEMENTATIONIMPLEMENTATION

• Parser Module  This module contains following sub-modules, Load Module URL Content Grabber Module

• Load Module In this module we load keywords from server and

then retrieve URL to begin searching.

• URL Content Grabber Module Whenever a URL is coming from server then the

parser makes connection to that URL and retrieves thecontents to begin searching and after it collects key setsfrom that site.

 PSG College Of  Distributed Component Lab 

Page 9: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 9/11

EVALUATION RESULTSEVALUATION RESULTS

• Parameters

 – Input Keys:25

 – Input Files: 15 (Size on avg. 15KB )

• Existing System – Time : 5 secs

 – Accuracy : 89% { No categorization }

• Our System – Time : 5 secs

 – Accuracy : 92% { with Category Listing }

 PSG College Of  Distributed Component Lab 

Page 10: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 10/11

CONCLUSIONCONCLUSION

Thus the user can able to do searching of a set of keywords in a list of websites and the user can able toview the each keyword count for a particular website.This searching is very useful for crawl the websites withparticular perspective view of specific content.

PSG College Of  Distributed Component Lab 

Page 11: DCA Mini Project Presentation

8/14/2019 DCA Mini Project Presentation

http://slidepdf.com/reader/full/dca-mini-project-presentation 11/11

THANK YOUTHANK YOU

 Distributed Component Lab   PSG College Of 

:eferencesv ,Saturnino Luz : Implementing a Text Categorization System a 

- - step by step tutorial

v . . .A McCallum and K Nigam  A comparison of event models for naive 

.Bayes text classi cation  / -In AAAI ICML 98  Workshop on Learning ,for Text Categorization  – . , .pages 41 48 AAAI Press 1998

v . . . .Y Yang and J O Pedersen  A comparative study on feature .selection in text categorization