Discover the invisible web

Discover the Invisible Web

Jeffrey Franklin and David Rakowski

What is the Invisible Web?

• a/k/a "hidden" "deep" and "dark" web• Google currently indexes approximately 1 trillion Web pages• It is estimated that the invisible web is 400 550 times bigger ‐

and contains 7,500 terabytes of information (as compared to 19 terabytes of information that Google currently indexes)– http://aip.completeplanet.com/aip engines/help/help_deepweb‐

faqs.jsp• "The term ‘invisible web’ mainly refers to the vast repository

of information that search engines and directories don't have direct access to, like databases."– http://websearch.about.com/od/invisibleweb/a/invisible_web.ht

m

http://aip.completeplanet.com/aip%E2%80%90engines/help/help_deepwebfaqs.jsp

http://aip.completeplanet.com/aip%E2%80%90engines/help/help_deepwebfaqs.jsp

http://websearch.about.com/od/invisibleweb/a/invisible_web.htm

http://websearch.about.com/od/invisibleweb/a/invisible_web.htm

Examples

• Resides in a database or a table• Created dynamically• Accessible only to registered users• Stored in subdirectories deep within a website• Generally no Flash, zip or executable files• Exists in real time• Social media can be hit or miss‐‐‐• Excluded by the owner (robots.txt)

How Does it Differ From the Visible Web?

"The ‘visible web’ is what you can find using general web search engines"

Learn to Find “Invisible Documents”

• Include the word “database” as part of your search

• Limit by filetype: PDF, .doc. xls, .ppt

Where do old web pages go? Learn to locate them

Wayback Machine (www.archive.org)Public.Resource.Org (http://public.resource.org)CyberCemetery (http://govinfo.library.unt.edu/)Google Instant Preview

http://www.archive.org/

http://public.resource.org/

http://govinfo.library.unt.edu/

Discover the invisible web

Documents

Transcript of Discover the invisible web