Ryerson University Library and Archives Searching the Deep Web Winter 2012.

Ryerson University Library and Archives Searching the Deep Web Winter 2012

Virtual Parking Lot If you should have questions that are either too time consuming, theoretical or technical in nature to be addressed in this introductory session, then e-mail your question to Jay Wolofsky [email protected]@ryerson.ca the answer to your question(s) will be shared with the group.

The Deep Web The Deep Web is currently 400 to 500 times larger than the commonly defined Surface Web or WWW (7,500 terabytes of information compared to 19 terabytes of information in the Surface Web and is growing exponentially

Deep Web/Surface Web The Deep Web (a.k.a.) the Invisible Web contains high quality information not accessible from conventional conventional search engines such as Google

Deep Web/Surface Web Structured information contained in research databases cannot be accessed from the Surface Web

Deep Web/Surface Web The real problem is the spidering and crawling technology used by conventional search engines that return links based on popularity, not content Surface Web search results are ranked by the Frequency documents link to each other (page rank) The first results are those that have had the most references by other documents, and not necessarily the most relevant or recent Information or content

Federated Search Engines \ Federated search engines execute simultaneous real time search of the Deep Web using sophisticated software connectors The results are collated and presented back to the user in a unified format

Federated Search Engines One type, a web spider variant crawls information from from as many databases as possible creating a giant uniform index, e.g. Google ScholarGoogle Scholar A more advanced type searches across each databases own indexing AND crawls information, e.g., Biznar, Mednar, DeepDyveBiznarMednarDeepDyve

Federated Search Engines There are 3 general types: The first type searches across each database using its own indexing The second type web spider crawls Information from as many databases as possible creating a giant uniform index, e.g. Google Scholar, OpenDOARGoogle ScholarOpenDOAR The third type searches across each databases own indexing AND crawls information, e.g. Biznar, Mednar, DeepDyve BiznarMednarDeepDyve

Accessing Deep Web Content BiznarBiznar (Business) DeepDyveDeepDyve (Multidisciplinary) E-Print Network E-Print Network (Science and Technology) Google ScholarGoogle Scholar (Multidisciplinary) Highbeam Highbeam (Multidisciplinary) HighWireHighWire (Multidisciplinary) MednarMednar (Medicine) MetaPressMetaPress (Multidisciplinary) OpenDOAROpenDOAR (Multidisciplinary) Science.gov Science.gov (Science and Technology) ScirusScirus (Science and Technology) Social Science Research NetworkSocial Science Research Network (Social Sciences) World Wide ScienceWorld Wide Science (Science and Technology)

Ryerson University Library and Archives Searching the Deep Web Winter 2012.

Documents

Transcript of Ryerson University Library and Archives Searching the Deep Web Winter 2012.