LIS510 lecture 3 Thomas Krichel 2005-02-05. information storage & retrieval this area is now more...
-
Upload
barrie-manning -
Category
Documents
-
view
214 -
download
0
Transcript of LIS510 lecture 3 Thomas Krichel 2005-02-05. information storage & retrieval this area is now more...
information storage & retrieval
• this area is now more know as information retrieval
• when I dealt with it I meant storage as including the organization of the information, which is a bit of a stretch
• Ideally, one needs to know the retrieval needs before designing the organization of the information
information retrieval
• has to do with anything of how the user gets to the information out of an information system.
• it is different from data retrieval since the retrieved data has to be “relevant” to the user.
• it is very difficult to say what “relevance” is, objectively.
information retrieval performance
• how was it for you?
• the traditional methods are – precision = number of relevant documents
retrieved divided by total number of retrieved documents
– recall = number of relevant documents retrieved divided by total number of relevant document.
• they only evaluate a search!
information retrieval models
• they give formal account of the search process.
• there are three basic flavor– Boolean information retrieval– Vector information retrieval– Probabilistic information retrieval
• All are mathematical model• I would also add web information retrieval
as a new type
web information retrieval
• this has become big business now
• find a user’s need is a way to connect them with advertising.
• One way that has made Google such a success is that they discovered a way to make appear quality web sites to the top
• Basically, a quality web site is one that has many links to it from other quality sites.
information storage
• can mean the preparation of information before searching– which fields are searchable– can there be a variety of means to rank
searches?– is there use of a controlled vocabulary
• difficult to make general conclusions but to say that advanced search features are not much used.
human-computer interface
• tries to understand how users work with computer systems
• the idea is to build “user-friendly” systems
• but don’t leave that to a “computer designer” as suggested by Rubin
• note that information systems go way beyond computers.
• Web usability is a big topic.
natural language processing
• Rubin classifies this as a part of computer-human interface
• natural language processing is still in its infancy
• speech recognition is the best developed part
• others are working on connecting computers to the brain
artificial intelligence
• This has been around for a while.
• The field has developed a number of theoretical tools
• Some of them are being used in practice now. Things like RDF, the Resource Description Framework, are based on artificial intelligence theory. It is a tool to aggregate knowledge from web resource.
Area 3: defining information & its value
• There is debate on the nature of– data (Thomas: things that can be processed in
the information system)– knowledge (Thomas: stuff that is in people’s
head)– information (something between data and
knowledge). Rubin says its meaning given to data.
• Rubin also talks about wisdom as “knowledge applied for the benefit of humanity”
scientific view of information
• usually information is modeled as something that reduces uncertainty
• people have a rough idea about something, say tomorrow’s temperature
• the information is the fact that this something will actually take a precise value, when we know what the temperature is or when we have less uncertainty.
• usually this uses probability theory.
value of information
• economists can value information precisely but their definition is useless for practical purposes
• much of the work then involves some cost/benefit analysis. in such analysis one can reach almost any result one wants.
elements of value-added in libraries• access to resources
• accuracy (for example of bibliographic data)
• browsing (like in library stacks)
• currency (things are up-to-date)
• flexibility (through human interaction)
• formatting (laying out the collection, signs)
• interfacing (probably close to flexibility)
• ordering (buy access to things)
• access to means to get to resources
area 4: bibliometrics
• is the application of quantitative methods to the study of information resources
• Mainly concerned with the structure of the resources. The typical example is citation analysis.
• Quantitative Studies of use fall more to the first area of interest.
bibliometric laws
• Zipf’s law related to the usage of terms in text.
• Lotka’s law related to the number of papers written by authors.
• Bradford’s law relates to the distribution of articles in a field across a number of periodicals.
citation analysis
• is the heart of bibliometrics.
• Two important concept– bibliographic coupling means two documents
share some reference– co-citation means two documents are cited by
the same documents
• Citation analysis is also important for scientific activity evaluation
area 5: management & admin
• This is an expanding area in libraries.• Rather than collecting physical books,
libraries have to negotiate on-line access. • Area covers all of information policy.
Example problems are– copyright– censorship
• Measuring performance is part of user studies
area 6: information architecture
• art and science of organizing information and its interfaces so that seekers find what they want quickly
• mainly used with respect to large web sites. it looks at the contents rather than technical factors or the look-and-feel
• A related idea is usability
area 7: knowledge management
• this comes from the business environment
• it is a management fad that has overstayed its welcome.