Finding It on the Web

26
Finding It on the Web Search Engines, Tags, Digg

description

Finding It on the Web. Search Engines, Tags, Digg. Introduction. Every web resource has a name - a unique address, or URL (Uniform Resource Locator), of the form: www.webmama.com or This course - PowerPoint PPT Presentation

Transcript of Finding It on the Web

Page 1: Finding It on the Web

Finding It on the Web

Search Engines, Tags, Digg

Page 2: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 2

Introduction

• Every web resource has a name - a unique address, or URL (Uniform Resource Locator), of the form:• www.webmama.com or• This course

• Underlying the URL is an IP Address, an unique12-digit number in dotted decimal form, e.g., 134.117.254.227

• Binary 10000110 01010101 11111110 11100011

Page 3: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 3

Finding the IP Address

• Finding the IP Address that corresponds to a specific URL is a service that is part of the Internet called the Domain Name Service, or DNS.

• That’s all well and good, but what if you don’t know the URL of the site where the information you want is stored?

Page 4: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 4

Search Engine

• That is where Search Engines come into play.

• Finding a web site or an item in a web site

• Some popular search engines are: Google, Yahoo!Search, Ask.com, Live Search, Technorati, Alexa Internet.

Page 5: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 5

Types of Search Engines

General - Open Source - Metasearch Regional - People - Email-based - Visual Job - Forum – Blog – News - Multimedia Code - BitTorrent (P2P file transfer) - Accountancy – Medical – Property – Legal - Business - Comparison Shopping -Geographic - Social - Search engines for kids - Desktop search engines - Answer-based - Google-based Yahoo!-based - Ask.com-based

Page 6: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 6

General Search Engines

• Alexa Internet • Ask.com (formerly Ask Jeeves) • Exalead • Gigablast • Google • Live Search (formerly MSN Search) • MozDex • WiseNut • Yahoo! Search

Page 7: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 7

Market Share – Search Engines

Page 8: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 8

Google• Google's mission is to organize the world's

information and make it universally accessible and useful.

• As a first step to fulfilling that mission, Google's founders Larry Page and Sergey Brin developed a new approach to online search that took root in a Stanford University dorm room and quickly spread to information seekers around the globe.

• Google is now widely recognized as the world's largest search engine -- an easy-to-use free service that usually returns relevant results in a fraction of a second.

Page 9: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 9

Google Query Handling

3. The search results are returned to the user in a fraction of a second.

1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.

2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

http://www.google.ca/intl/en/corporate/tech.html

Page 10: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 10

PageRank Technology• PageRank reflects Google's view of the importance of

web pages by considering more than 500 million variables and 2 billion terms. Pages that Google believes are important pages receive a higher PageRank and are more likely to appear at the top of the search results.

• PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value.

• Important pages receive a higher PageRank and appear at the top of the search results.

• Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement.

Page 11: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 11

Hypertext-Matching Analysis

• Google's search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word.

• Google also analyzes the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.

Page 12: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 12

Google’s Computing Power

• Google searches are conducted by custom-built software on 100’s of thousands of custom-built PC’s housed in huge “computer farms” scattered across the world.

• “The largest computer system in the world”• “Working together, these customized

computers rapidly cary out searches by breaking the queries down into tiny parts. These parts ar eprocessed simultaneously by comparing thenm to copies of the Interent that have been indexed and organized in advance.”

Quotes from: The Google Story, David A. Vise, PAN Books, 2006.

Page 13: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 13

Ten things Google has found to be true

1. Focus on the user and all else will follow.

From its inception, Google has focused on providing the best user experience possible. Google has steadfastly refused to make any change that does not offer a benefit to the users who come to the site:

• The interface is clear and simple. • Pages load instantly. • Placement in search results is never sold to anyone.• Advertising on the site must offer relevant content

and not be a distraction.

Page 14: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 14

2. It's best to do one thing really, really well.

3. Fast is better than slow.4. Democracy on the web works.5. You don't need to be at your desk to

need an answer.6. You can make money without doing

evil. Google’s corporate motto is:

“Don’t Be Evil”

Page 15: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 15

7. There's always more information out there.

8. The need for information crosses all borders.

9. You can be serious without a suit.

10. Great just isn't good enough.

Page 16: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 16

Comparison: “Carleton University”

• Google • 1,820,000 for. (0.22 seconds)

• Yahoo• 5,380,000 (About 0.24 seconds)

• Live Search • 4,680,000 results

• Ask.com• 792,200

Page 17: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 17

Carleton Plug Google Executive Management Group

• Dr. Eric Schmidt, Chairman of the Board and Chief Executive Officer • Larry Page, Co-Founder & President, Products • Sergey Brin, Co-Founder & President, Technology • Shona Brown, Senior Vice President, Business Operations • W. M. Coughran, Jr., Senior Vice President, Engineering • David C. Drummond, Senior Vice President, Corporate Development

and Chief Legal Officer • Alan Eustace, Senior Vice President, Engineering & Research • Urs Hölzle, Senior Vice President, Operations & Google Fellow • Jeff Huber, Senior Vice President, Engineering • Omid Kordestani, Senior Vice President, Global Sales & Business

Development • George Reyes, Senior Vice President & Chief Financial Officer • Jonathan Rosenberg, Senior Vice President, Product Management • Laszlo Bock, Vice President, People Operations • Elliot Schrage, Vice President, Global Communications & Public

Affairs

Page 18: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 18

Shona Brown• Carleton B. Eng. ’87. Computer Systems Engineering

• Rhodes Scholar: MA Oxford Economics and Philosophy

• Ph. D. Stanford in Industrial Engineering and Engineering Management.

• Published her PhD thesis “Competing on the Edge” in co-authorship with her supervisor – became a highly regarded best seller on the business booklists

• Joined McKinsey and Company – a global management consulting firm. Became a Principal.

• Hired by Google in 2003 as Vice President of Business Operations – to guide Google’s growth after the IPO. Now Senior Vice President of Business Operations: Human Resources; Business Operations; and executive trouble shooting.

• She is, essentially, the Chief Operating Officer of Google

Page 19: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 19

Social Bookmarking

• Social bookmarking is a way for Internet users to store, organize, share and search bookmarks of web pages.

• In a social bookmarking system, users save links to web pages that they want to remember or share.

• These bookmarks are usually public, but depending on the service's features, may be saved privately, shared only with specific people or groups, shared only inside certain networks, or another combination of public and private.

• The allowed people can usually view these bookmarks chronologically, by category or tags, via a search engine, or even randomly.

Page 20: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 20

Tags

• A tag is a (relevant) keyword or term associated with or assigned to a piece of information (e.g. a picture, a geographic map, a blog entry, or video clip), thus • describing the item and • enabling keyword-based classification

and search of information.

Page 21: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 21

Some Web Sites That Use Tags.

• del.icio.us - A social bookmarking site that allows users to bookmark many sites and then tag them with many descriptive words, allowing other people to search by those terms to find pages that other people found useful.

• Flickr - A picture service that allows users to tag images with many specific nouns, verbs, and adjectives that describe the picture. This is then searchable.

Page 22: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 22

Other Tag Sites

• Gmail - A webmail site that was one of the first to allow categorization of objects using tags, known as "labels" on emails.

• Technorati - A weblog search engine.

• Last.fm - A social music website that allows users to tag artists, albums and tracks

Page 23: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 23

Digg• A social content website, launched

December 5th 2004.• Digg is a community-based popularity

website with an emphasis on technology and science articles, recently expanding to a broader range of categories such as politics and entertainment.

• It combines social bookmarking, blogging, and syndication with a form of non-hierarchical, democratic editorial control.

• From Wikipedia, the free encyclopedia

Page 24: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 24

If you dig it, man! Digg It!• News stories and websites are submitted by

users, and then promoted to the front page through a user-based ranking system. This differs from the hierarchical editorial system that many other news sites employ.

• Readers can view all of the stories that have been submitted by fellow users in the "digg/News/Upcoming" section of the site. Once a story has received enough "diggs", it appears on Digg's front page.

• Should the story not receive enough diggs, or if enough users report a problem with the submission, the story will remain in the "digg all" area, where it may eventually be removed.

Page 25: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 25

Digging deeper• Articles are short summaries of stories on

other websites with links to the stories, and provisions for readers to comment on the story.

• All content and access to the site is free, but registration is compulsory for certain elements, such as promoting ("digging") stories, submitting stories and commenting on stories.

• Digg also allows for stories to be posted to a user's blog automatically when he or she diggs a story.

Page 26: Finding It on the Web

Winter 2008 Learning in Retirement - The Evolution of the Web 26

More digging

• Originally, stories could be submitted in fifteen different categories which were: deals, gaming, links, mods, music, robots, security, technology, Apple, design, hardware, Linux/Unix, movies, programming, science and software.

• With the release of Digg 3.0 on June 26, 2006, the categories became divided into 6 containers: Technology, Science, World & Business, Sports, Entertainment, Gaming, with sub-categories.