Working of search engine
-
Upload
nikhil-deswal -
Category
Engineering
-
view
368 -
download
0
Transcript of Working of search engine
![Page 1: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/1.jpg)
Working Of “Search Engine”
Nikhil D-1
14BTCSERS033Maths Assignment
![Page 2: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/2.jpg)
What is Search Engine ?
“A web search engine is a software system that is designed to search for information on the World Wide Web.”
![Page 3: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/3.jpg)
Purpose of Search Engines
Helping people find what they’re looking for:• Starts with an “information need”• Convert to a query• Gets results
![Page 4: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/4.jpg)
Types of Search Engines
• Search by Keywords (e.g.AltaVista,Google)
• Search by categories (e.g. Yahoo)
![Page 5: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/5.jpg)
The Parts of a Search Engine
Spider (or “crawler”)
Index
Search software (an algorithm)
![Page 6: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/6.jpg)
The “spider” or “crawler”
The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled". This is also known as “harvesting”. The spider returns to the site on a regular basis, such as every month or two, to look for changes.
![Page 7: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/7.jpg)
The Indexer
Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated new information.
![Page 8: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/8.jpg)
Search engine software
It is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.
![Page 9: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/9.jpg)
Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query.
Term Frequency–Inverse Document Frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection.
TF-IDF Ranking Algorithm
wij = weight of Term Tj in Document Ditfij = frequency of Term Tj in Document DjN = number of Documents in collectionn = number of Documents where term Tj occurs at least once
![Page 10: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/10.jpg)
• The equation: PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))• Used by WebQuery and Google• Google simulates users using the search engine to
rank documents.• Google uses citation graph (518 million links)• Google computes 26 million in a few hours.
PageRank
![Page 11: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/11.jpg)
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites
![Page 12: Working of search engine](https://reader036.fdocuments.net/reader036/viewer/2022081605/586fa54e1a28abcc238b7e4f/html5/thumbnails/12.jpg)
The End
Thank you for listening patiently.