SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E...

28
SEARCH ENGINE SEARCH ENGINE By By Ms. Preeti Patel Ms. Preeti Patel Lecturer Lecturer School of Library and Information School of Library and Information Science Science DAVV, Indore DAVV, Indore E mail: E mail: [email protected] [email protected]

Transcript of SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E...

SEARCH ENGINESEARCH ENGINEBy By

Ms. Preeti PatelMs. Preeti PatelLecturer Lecturer

School of Library and Information School of Library and Information ScienceScience

DAVV, IndoreDAVV, Indore

E mail: E mail: [email protected]@yahoo.co.in

Search EngineSearch Engine

IntroductionIntroduction ComponentsComponents TypeType Functions Subject directories Vs Functions Subject directories Vs

Search Search

engineengine

Introduction: Search Introduction: Search engineengine

Search engine came into existence Search engine came into existence in 1994. According to Yahoo Search in 1994. According to Yahoo Search engine directory – 2003 , there are engine directory – 2003 , there are over 448 major search engines.over 448 major search engines.

A SE is a searchable database of A SE is a searchable database of Internet files collected by a Internet files collected by a computer program (called computer program (called wanderer, crawler, robot, worm and wanderer, crawler, robot, worm and spider).spider).

Indexing is created from the colleted Indexing is created from the colleted files e.g. title, full text, size, URL etc. files e.g. title, full text, size, URL etc. There are no selection criteria for There are no selection criteria for collection of files. SE allows the user collection of files. SE allows the user to enter keywords and SE retrieve to enter keywords and SE retrieve Web documents from its data base Web documents from its data base that match the key words entered by that match the key words entered by the searcher.the searcher.

The SE doesn’t wait for someone The SE doesn’t wait for someone to submit information about a site. It to submit information about a site. It send spider/crawler/web crawler to send spider/crawler/web crawler to visits publicly accessible websites visits publicly accessible websites following all links it comes across following all links it comes across collecting data for search engine collecting data for search engine indexes. indexes.

A Spider discovers new sites and A Spider discovers new sites and update information from sites update information from sites previously visited . A spider can also previously visited . A spider can also be used to check links within be used to check links within websites.websites.

Components of SEComponents of SE

A SE might well be called a search engine A SE might well be called a search engine service or a search service. The service or a search service. The components of SE are following-components of SE are following-

Spider: Programs that traverses the Web Spider: Programs that traverses the Web from link to link, identifying and reading from link to link, identifying and reading pages.pages.

Index: Web database containing a copy of Index: Web database containing a copy of each web page gathered by the spider.each web page gathered by the spider.

SE Mechanism: Software that enables SE Mechanism: Software that enables users to query the index and that usually users to query the index and that usually returns results in relevancy ranked order.returns results in relevancy ranked order.

Types: SE Types: SE

A SE downloads all the information A SE downloads all the information that the page contains and then that the page contains and then examines that information to index examines that information to index key words and phrases that can be key words and phrases that can be used to categories the sites. SE can used to categories the sites. SE can be categorized into three types on be categorized into three types on the basis of the indexing techniques the basis of the indexing techniques employed by them:-employed by them:-

Active SE: It collect all information Active SE: It collect all information by itself. It uses a program calls by itself. It uses a program calls ‘Spider’ or ‘Web robot’ to index ‘Spider’ or ‘Web robot’ to index and categories web pages as well as and categories web pages as well as websites. The spider travel around websites. The spider travel around WWW in search of new sites and add WWW in search of new sites and add entries to their catalogue. entries to their catalogue.

Passive search engines or Subject directories:-Passive search engines or Subject directories:-

This type of SE are possibly more accurately This type of SE are possibly more accurately referred to as directories. It doesn’t seek out referred to as directories. It doesn’t seek out information by itself but it rely on the WWW information by itself but it rely on the WWW users to submit details on their favorite sites users to submit details on their favorite sites in order to build up a database. For example in order to build up a database. For example yahoo directory has 14 main subject yahoo directory has 14 main subject categories and each categories has many sub categories and each categories has many sub categories and sub categories also their own categories and sub categories also their own sub categories, and so on almost ad infinitum. sub categories, and so on almost ad infinitum.

Due to size of the web and constant Due to size of the web and constant transformation ,keeping up with transformation ,keeping up with important sites in all subject areas is important sites in all subject areas is humanly impossible. humanly impossible.

Meta Search engine:Meta Search engine: An increasing number of search An increasing number of search

engines have led to the creation of engines have led to the creation of ‘meta ‘ search tool. A meta search ‘meta ‘ search tool. A meta search engine does not catalogue any web engine does not catalogue any web page by itself. It simultaneously page by itself. It simultaneously searches multiple search engines. searches multiple search engines. When query is put before this type of When query is put before this type of search engine ,it forward that query search engine ,it forward that query to other search engines. to other search engines.

Types of meta Search Types of meta Search engineengine

There are two types of meta Search engineThere are two types of meta Search engine1.1. One type of SE provide separate list of One type of SE provide separate list of

results from each engine that was results from each engine that was searched. With this type of Meta SE , one searched. With this type of Meta SE , one can retrieve comprehensive , and can retrieve comprehensive , and sometimes over whelming , results.sometimes over whelming , results.

2.2. The other type is more common and The other type is more common and returns a single list of results, often with returns a single list of results, often with the duplicate hits removed . This type of the duplicate hits removed . This type of Meta SE always brings the results back to Meta SE always brings the results back to its own site for viewing. its own site for viewing.

Example:Example: Metacrawler (Metacrawler (

www.metacrawler.comwww.metacrawler.com)) SurrfWax ( SurrfWax ( www.surwax.comwww.surwax.com ) ) Zapmeta ( Zapmeta ( www.zapmeta.comwww.zapmeta.com ) )

According scope the Search engine SE According scope the Search engine SE can divided in following categories.can divided in following categories.

General Search engine : It covers a rage General Search engine : It covers a rage of services and facilities and facilitate of services and facilities and facilitate Boolean search . Example: Google, Alta Boolean search . Example: Google, Alta Vista etc.Vista etc.

Regional Search Engine: It refer to Regional Search Engine: It refer to country specific search engine for locating country specific search engine for locating varied resources region –wise . Example : varied resources region –wise . Example : Euro Ferret( Europe) and Excite UK etc.Euro Ferret( Europe) and Excite UK etc.

Subject specific search engine:Subject specific search engine:

It does not attempt to index the entire It does not attempt to index the entire web. It focuses on searching for websites web. It focuses on searching for websites or pages within a defined subject area , or pages within a defined subject area , geographical area or type of resources. geographical area or type of resources. Because this specific search engine aims Because this specific search engine aims for depth of coverage across subject.for depth of coverage across subject.

Examples:Examples:

1. 1. WWW.123india.comWWW.123india.com Regional Regional

2. 2. WWW.in.altavista.comWWW.in.altavista.com Regional Regional

3. 3. WWW.nauri.comWWW.nauri.com EmploymentEmployment

4. 4. WWW.zipcode.comWWW.zipcode.com Weather Weather

5. 5. www.khoj.comwww.khoj.com India specific India specific

Features of SEFeatures of SE When using a Web search engine by When using a Web search engine by

entering more than one words, the entering more than one words, the space between the words has a logical space between the words has a logical meaning that directly affects the results meaning that directly affects the results of the search. This is known as default of the search. This is known as default syntax. Example: Alta Vista , Info seek syntax. Example: Alta Vista , Info seek and excite, a search, a search of word and excite, a search, a search of word ‘bird migration’ means that the searcher ‘bird migration’ means that the searcher will get back documents that contain will get back documents that contain either word’ Birds’ and the word either word’ Birds’ and the word ‘migration’ or both. ‘migration’ or both.

The space between the words defaults The space between the words defaults to the Boolean OR. This is probably to the Boolean OR. This is probably not what the searcher will get back not what the searcher will get back documents that contain both the documents that contain both the words ’ Birds’ and ‘migration’. words ’ Birds’ and ‘migration’.

SE return results in schematic order. SE return results in schematic order. Most SE use various criteria to Most SE use various criteria to contract a term relevancy rating of contract a term relevancy rating of each hit and present the search each hit and present the search results in this order. results in this order.

Criteria can include: search term in the Criteria can include: search term in the title, URL, first heading , HTML META tag; title, URL, first heading , HTML META tag; number of times search appear in the number of times search appear in the document; search terms appearing early in document; search terms appearing early in the document; search term appearing close the document; search term appearing close together; etc. together; etc.

SE technology continuous in developing SE technology continuous in developing stage. To day SE technology is organization stage. To day SE technology is organization of search results by concept, site, domain of search results by concept, site, domain popularity and linking rather than by popularity and linking rather than by relevancy.relevancy.

Following services provided by the SEFollowing services provided by the SE Direct Hit ranks according to sites other Direct Hit ranks according to sites other

searchers have chosen from their results to searchers have chosen from their results to similar queries.similar queries.

Google rank by the number of links from Google rank by the number of links from pages ranked high by services.pages ranked high by services.

Inference find ranks by concept and top-Inference find ranks by concept and top-level domain.level domain.

Meta find sorts results by keywords, Meta find sorts results by keywords, alphabetically or by domain. alphabetically or by domain.

SE do not index all the documents SE do not index all the documents available on the web. Example most SE available on the web. Example most SE cannot index files to password protected cannot index files to password protected sites, behind firewalls or configured by sites, behind firewalls or configured by the host server to be left alone. Other the host server to be left alone. Other web pages may not picked up if they are web pages may not picked up if they are not linked to other pages.not linked to other pages.

SE rarely contain the most recent SE rarely contain the most recent document posted to internet; do not look document posted to internet; do not look yesterday news on search engineyesterday news on search engine

Contents of databases will generally Contents of databases will generally not show up in a search engine not show up in a search engine results. A growing amount of results. A growing amount of valuable information on the Web is valuable information on the Web is not generated from the database.not generated from the database.

Some SE allow users to viewed Some SE allow users to viewed display of the retrieved Web sites/ display of the retrieved Web sites/ Web pages, clustered under Web pages, clustered under different topics related to the search different topics related to the search terms. terms.

FUNCTIONS OF SEFUNCTIONS OF SE

They search the Internet by using a They search the Internet by using a specialized software ,called crawler specialized software ,called crawler or robot ;these software /agent can or robot ;these software /agent can find out web pages by following find out web pages by following hyper links.hyper links.

These agent/ software sent the These agent/ software sent the cached version of web pages to the cached version of web pages to the repository of a search engine and SE repository of a search engine and SE keeps an index of words they find keeps an index of words they find and where (URL) they find themand where (URL) they find them

They allow users to look forwards or They allow users to look forwards or combinations of words found in that combinations of words found in that index index

Diagrammatic Diagrammatic representation of Search representation of Search

EngineEngine

CRAWLARS

Different Websites

Different Websites

Different Websites

Different Websites

Switch

Indexing Software in search engine

Database of search

engine

Search

User Interface

Subject Directories Vs Subject Directories Vs Search EngineSearch Engine

A subject directories is a services that A subject directories is a services that offers a collection of links to Internet offers a collection of links to Internet resources submitted by the site creators or resources submitted by the site creators or evaluators and organized into subject evaluators and organized into subject categories. categories.