Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders,...

17
Searching The Web • Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting with the Home Page, they follow all the internal links in the site, visit every WebPage at the site and read every word on every page and create an index of these words.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders,...

Page 1: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Searching The Web

• Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting with the Home Page, they follow all the internal links in the site, visit every WebPage at the site and read every word on every page and create an index of these words.

Page 2: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Running Effective Searches

• Browsing and Searching are not the same.

• When you browse, you navigate from one Web page to another by following links.

• When you search, you enter keywords in a search engine to display a list of pages that match the keywords.

Page 3: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Word Index versus Subject Directory

• A Word Index database contains billions of pages and from each page hundreds, or even thousands of words, since a Word Index contains every main word (small words such as: in, at, on, etc. are not indexed) from every page it finds at a Website. The Google database contains every main word for 228.000 pages at the Baskent University websites.

• A Subject Directory is extremely small – it contains only basic Subject headings for a few main pages at each Website. For example, the Google Subject Directory database contains only the Subject category and page Titles of about 54 pages at the Baskent University websites.

Page 4: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Searching for Data

Use a Search Engine to find data by keying

in a word or phrase. The word or

phrase is called a keyword and

represents a topic or phrase.

Page 5: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Keyword

Search Expression

Query

Results Page

Hits

Sponsored Links

Page 6: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Ranking

• The positioning of a Web page on the

results page is called a site’s ranking.

– The order of the ranking will vary according

to which search engine is used.

– Search engines only examine their own

databases.

Page 7: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Search Engines Differ

Because they:– use different Web robots (spiders) to collect

information

– choose different Web pages to index

– interpret search expressions differently

– store a different amount of text from a Web page in the database

Page 8: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Word Limiters

• The minus ( - ) sign means a word must not be on the results page.

• if you want to be sure that the words are found in the results then put a plus ( + ) sign before the word.

• Phrase Matching (" ") Putting quotes around a set of words will only find results that match the words in that exact sequence compare

Page 9: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Document Section Limiters

• intitle: Finds pages that contain one specified word in the page title, which appears in the title bar of the browser.

• allintitle: Finds pages containing several words in title. e.g. allintitle: ataturk education requires both words to be in the page Title.

• inurl: Finds pages with one specific word in the URL. • allinurl: If you start a query with allinurl; Google will

restrict the results to those pages with all of the query words in the url. (google-search)

• allintext: Searches only the Text in the BODY of the web page for the words.

• filetype: Finds only a specified filetype such as MS-Word (.doc), MS-Excel (.xls)

Page 10: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Web Directory

Search engines index words in Web pages and then add

them to their databases by

employing automated

programs, such as Web robots.

Real people develop Web

directories and decide which Web

sites should be added to the

directory.

Page 11: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

The content in Yahoo’s Web Site Directory is organized by topic

Page 12: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Drill down through directory levels to

find Web sites

Page 13: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Some Web directories also include search engine features

Page 14: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Natural Language Searches

A conceptual query is one where the search

engine returns only Web pages that are

relevant to the topic, even if the words don’t

precisely match your keywords.

Page 15: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Concept-based Search Engines

www.excite.com

www.askjeeves.com

Can also be queried by natural language

Page 16: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Metasearch Engines

• Metasearch engines will query

several engines simultaneously

– the search will pull results from several

search engines

– www.infospace.com

– www.mamma.com

Page 17: Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

Other Electronic Research Resources

• Web is not the only Electronic source of information.

• Among other sources is the Başkent library website which provides students with access to hundreds of other quality databases that are not found using Search Services like Google or Yahoo, because they are for registered subscribers only. Başkent pays a fee for these services that are then offered at no cost to our Students.