Discover the invisible web

7
Discover the Invisible Web Jeffrey Franklin and David Rakowski

Transcript of Discover the invisible web

Page 1: Discover the invisible web

Discover the Invisible Web

Jeffrey Franklin and David Rakowski

Page 2: Discover the invisible web
Page 3: Discover the invisible web

What is the Invisible Web?

• a/k/a "hidden" "deep" and "dark" web• Google currently indexes approximately 1 trillion Web pages• It is estimated that the invisible web is 400 550 times bigger ‐

and contains 7,500 terabytes of information (as compared to 19 terabytes of information that Google currently indexes)– http://aip.completeplanet.com/aip engines/help/help_deepweb‐

faqs.jsp• "The term ‘invisible web’ mainly refers to the vast repository

of information that search engines and directories don't have direct access to, like databases."– http://websearch.about.com/od/invisibleweb/a/invisible_web.ht

m

Page 4: Discover the invisible web

Examples

• Resides in a database or a table• Created dynamically• Accessible only to registered users• Stored in subdirectories deep within a website• Generally no Flash, zip or executable files• Exists in real time• Social media can be hit or miss‐‐‐• Excluded by the owner (robots.txt)

Page 5: Discover the invisible web

How Does it Differ From the Visible Web?

"The ‘visible web’ is what you can find using general web search engines"

Page 6: Discover the invisible web

Learn to Find “Invisible Documents”

• Include the word “database” as part of your search

• Limit by filetype: PDF, .doc. xls, .ppt

Page 7: Discover the invisible web

Where do old web pages go? Learn to locate them

Wayback Machine (www.archive.org)Public.Resource.Org (http://public.resource.org)CyberCemetery (http://govinfo.library.unt.edu/)Google Instant Preview