Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly...
-
Upload
vivien-may -
Category
Documents
-
view
219 -
download
0
Transcript of Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly...
Computer Science 1000
Information Searching I
Permission to redistribute these slides is strictly prohibited without permission
World Wide Web – The Basics our next topic examines how to find information on
the web we consider a few basic terms here (which you’re
probably familiar with): page/web page link/hyperlink site/web site
later in semester, we will revisit web technologies in much more detail
World Wide Web a system of linked documents accessed via the
internet often simply referred to as the web sometimes used interchangeably with the internet,
but this isn’t exactly correct the internet is the global network of interconnected
devices (computers, routers, etc) that exchange data the web refers to the documents being stored, the
software that broadcasts and receives them, and the protocols used for transmission
Web Page a document stored and accessed on the web identified by a unique URL (Uniform Resource Locator) often referred to simply as a page today’s web pages are very rich in content
text images hyperlinks videos
Web Site a collection of related webpages on the internet typically belong to a common organization or event example
all pages served by the University of Lethbridge make up its website
Hyperlink a part of a web page that refers to a different
location often just called a link hyperlinks can reference:
another place on the same page another webpage
hypertext: text containing hyperlinks
The Age of Information the computer, internet, and web have changed how
we interact with information information storage
the amount of available information is significantly greater (and growing rapidly) than even a generation ago
information transmission large amounts of information are available with a single
mouse click, and transfer almost immediately
Information Age – Rapid Onset the situation has transformed tremendously in your
lifetimes consider the global information capacity:
in 1986: 2.6 exabytes (< 1 CD per person) in 1993: 15.8 exabytes in 2000: 54.5 exabytes in 2007: 295 exabytes (61 CDs per person)
how does one successfully navigate such a mountain of digital content?
Martin and Lopez. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science 332:6025 2011
Information Access even in pre-internet days, there was a
wealth of information large-scale: library medium-scale: Encyclopaedia set small-scale: newspaper
strategies developed to manage information
categories hierarchies indices
Classificationsystematic arrangement in groups or
categories according to established criteria – Merriam Webster
in other words, the information is categorized according to relevant features
consider our course notes: terminology (4 sets of slides) information searching (2-3 sets of slides)etc ...
Classificationclassification is not specific to digital
information library classification:
Dewey Decimal Classification Library of Congress Classification
Classificationclassification is not specific to digital
informationnewspaper classification
Classificationclassification level of detail leads to
tradeoffsconsider a coarse level of detail
e.g. taxonomy of living organisms classify organisms according to Domain
(Archaea, Bacteria, Eukarya) advantage: small number of groups disadvantage: each group is massive
Classificationclassification level of detail leads to
tradeoffsconsider a fine level of detail
e.g. taxonomy of living organisms classify organisms according to Genus
(Canis, Felis) advantage: each group reasonably small disadvantage: massive number of groups
solution: hierarchy
Hierarchya decomposition of classifications according
to detailhierarchies contain levels
at the top (root) level, there is typically a small number of broad categories
each category is decomposed into small categories
a classification group is defined by categorization at each level
Hierarchyorganism taxonomy hierarchy:
each Domain categorized into Kingdoms
Eukarya
Fungi PlantaeAnimalia Protista
Domain:
Kingdom:
Hierarchyorganism taxonomy hierarchy:
each Kingdom classified in Phylum each Phylum classified into Class and so on ..
http://ag.arizona.edu/pubs/garden/mg/entomology/intro.html
Hierarchyan object is still categorized, but by multiple
levels (instead of one)
http://schoolworkhelper.net/scientific-taxonomy/
Hierarchy facilitates efficient searching through exclusion
example (text): suppose you have a collection of a million items these items organized into 10 equal-sized groups each top-level group is also organized into 10 equal
subgroups choosing first category eliminates 900000 items choosing second category eliminates 90000 items and so on …
Hierarchy hierarchies are very popular consider our previous examples:
Library of Congress Classification
Hierarchy hierarchies are very popular consider our previous examples:
Newspaper
Index a detailed list of words, phrases, and/or topics
indicating place of occurrence in essence, it maps keywords of interest to their
location e.g. a page number
a bottom-up approach to information organization as opposed to the top-down structure of a hierarchy
particularly popular in printed material books, magazines, volumes, etc
Index - Example
Index typically used on small-scale
books and volumes vs. libraries
made efficient through organizational scheme alphabetical is very common
some overlap with hierarchies e.g. subtopics
Finding Information – The Webas discussed, the amount of information on
the web is immensemany of the discussed techniques for
information finding also apply digitallyclassification/hierarchies indexing
Classificationmany commercial websites have a
classification structurenavigation bars
Hierarchiesmany websites, especially large ones, will
also arrange their categories in hierarchical fashion
Partition a hierarchy where every object occurs only once
organism taxonomy – every species appears only once
some hierarchies are necessarily partitions e.g. a particular book will only occur at one point in a
library classification
however, a partition in some case is not natural an object might have an inherent fit in more than one
classification
Partitions digital content is often stored using overlapping
hierarchies (non-partition) potentially more intuitive with hyperlinking, it’s easy to accomplish (two links to the
same page)
example (text): Three Books for Frugal Fashionistas was stored on NPR’s
website under: Home > Arts & Life > Books > Three Books for Frugal Fashionistas Home > Listen > Latest Program > Three Books for Frugal Fashionistas
Indexes for the Web unlike hierarchies, indexes are much less common
on individual websites site maps might be considered an index of sorts
however, there are analogous technologies to indexes that pertain to the web as a whole
Search Engines!