Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for...

16
Searching the Internet Looking for information How are websites connected with each other How does search work Any problems? 10-06-09 Susen Rabold

Transcript of Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for...

Page 1: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Searching the Internet

• Looking for information

• How are websites connected with each other

• How does search work

• Any problems?

10-06-09

Susen Rabold

Page 2: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Looking for information

How do we search the Internet?

Before: Libraries, encyclopaedias (books, journals in general), asking people, ...

Now: Everything we had before + use of Internet.

Page 3: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work
Page 4: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work
Page 5: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Search Engines

Page 6: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Different kinds of search engines

Categorical Search

Directory Search

http://www.ukdirectory.co.uk/

http://www.jobs.ac.uk/

Meta-Searchhttp://www.surfwax.com/

Page 7: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Mark Up<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/dcq-html/"> <title>BBC - Food - Recipes: Pumpkin lasagne</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="description" content="by Antony Worrall Thompson from Saturday Kitchen" /> <meta name="keywords" content="bbc, food, pumpkin" /> <meta name="recipe" content="vegetarian" /> <link rel="schema.dcterms" href="http://purl.org/dc/terms/" /> <meta name="DCTERMS.created" content="24-OCT-03" /> <meta name="DCTERMS.modified" content="26-MAR-09" />

<link rel="stylesheet" type="text/css" media="screen" href="/vision/productisation/includes/css/v1/s-core-models.css" /><link rel="stylesheet" type="text/css" media="screen" href="/vision/productisation/includes/css/v1/global.css" /><link rel="stylesheet" type="text/css" media="screen" href="/vision/productisation/includes/css/v1/colourway.css" /><link rel="stylesheet" type="text/css" media="screen" href="/food/recipes/includes/css/product-specific.css" /><link rel="stylesheet" type="text/css" media="print" href="/vision/productisation/includes/css/v1/print.css" />

<link rel="index" href="/a-z/" title="A to Z" /><link rel="help"

href="/help/" title="BBC Help" /><link rel="copyright" href="/terms/" title="Terms of Use" /><link rel="icon" href="/favicon.ico" type="image/x-icon" /><meta name="viewport" content="width = 974" /><!--[if IE]><![if gte IE 6]><![endif]--><link rel="stylesheet" href="/includes/blq/resources/gvl/r58/style/main.css" type="text/css" />

Page 8: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

<HTML><HEAD><TITLE>Jon Oberlander</TITLE></HEAD> <BODY BGCOLOR="#ffffff"style="font-family: 'gill sans', arial, sans-serif">

<H1>Jon Oberlander</H1>

<A HREF="http://www.hcrc.ed.ac.uk/~jon/jon07b-a.jpg"><IMG STYLE="position:absolute; TOP:110px; LEFT:600px" ALIGN="right"BORDER=2 SRC="http://www.hcrc.ed.ac.uk/~jon/jon07c-b1.jpg"ALT="Jon Oberlander captured by Canon Digital Ixus 40"></A>

<!-- <A HREF="http://www.hcrc.ed.ac.uk/~jon/jon2005-2b.jpg"><IMG ALIGN="right" BORDER=2 SRC="http://www.hcrc.ed.ac.uk/~jon/jon2005-2i.jpg"ALT="Jon Oberlander captured by Canon Digital Ixus 40"></A> -->

<!-- <A HREF="http://www.hcrc.ed.ac.uk/~jon/j6.jpg"><IMG ALIGN="right" BORDER=2 SRC="j3a.jpg"ALT="Jon Oberlander captured by Fuji 4700"></A> -->

<!-- <IMG ALIGN="right" BORDER=2 SRC="jon01c.jpg"ALT="Jon Oberlander captured by Fuji 4700"> -->

<!-- <A HREF="http://www.hcrc.ed.ac.uk/~jon/j5.jpg"><IMG ALIGN="right" BORDER=2 SRC="j2a.jpg"ALT="Jon Oberlander captured by his brother Eric"></A> --><p> <h4>Affiliations</h4><ul><li>Professor of Epistemics in the<A HREF="http://www.ed.ac.uk/">University of Edinburgh</A>'s<A HREF="http://www.inf.ed.ac.uk/">School of Informatics</A>.<li> Fellow of the <a href="http://www.bcs.org">British Computer Society.</a><li>Affiliated to:<ul><li><A HREF="http://www.iccs.informatics.ed.ac.uk/">Institute of Communicating and Collaborative Systems</A>;<li> The ESRC<A HREF="http://www.hcrc.ed.ac.uk/">Human Communication Research Centre</A>;<li>HCRC's<A HREF="http://www.ltg.ed.ac.uk/">Language Technology Group</A>; and<li><a href="http://www.hcrc.ed.ac.uk/language_at_edinburgh"><IMGsrc="http://www.hcrc.ed.ac.uk/language_at_edinburgh/images/logo_crop.jpg"border=0></a></ul><li>Member of the board of <A HREF="http://www.mediascot.org/">New Media Scotland</A>.

Page 9: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

http://www.youtube.com/watch?v=3SnXYnzS3tw

Page 10: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.

Sergey Brin and Lawrence Page

Page 11: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

PageRank

http://infolab.stanford.edu/~backrub/google.html

Page 12: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

PageRank is a link analysis algorithm.

“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyses the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".”

From Wikipedia, the free encyclopaedia

Page 13: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

http://www.youtube.com/watch?v=l8E1SLTTV58

Page 14: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

Problems with PageRank and in general

Spoofing: IP-Spoofing and Email-Spoofing

Manipulating: Companies with high ranked PR are selling their ranking to webmasters.

Page 15: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

What do you need to consider when searching the net?

Page 16: Searching the Internet - The University of Edinburgh · Searching the Internet • Looking for information • How are websites connected with each other • How does search work

http://wcms.inf.ed.ac.uk/iss/courses/il1/