Lutz, Valerie-Anneval22/eport/papers/INFO511R… · Web viewYang, Christopher C. (2004)....
Transcript of Lutz, Valerie-Anneval22/eport/papers/INFO511R… · Web viewYang, Christopher C. (2004)....
Lutz, Valerie-AnneINFO 511Dr. Martha SmithWinter 2005
Image Users and Image Retrieval: Review of the Literature
ABSTRACT
This review of the literature provides an overview of the rapid developments
in the field of image indexing and image retrieval, describes several seminal studies
of image users in the 1990s with some consideration of earlier analyses of issues
relating to the indexing of images, and discusses recent proposed solutions for
problems relating to image access and retrieval, including modifications to traditional
controlled vocabulary indexing and new methods such as browsing of digital images
and automatic retrieval based on image features.
KeywordsImage retrieval, image indexing, image users, digital images, visual materials, pictures
INTRODUCTION
The dramatic expansion of the Internet over the past ten years has seen an
equally rapid growth in its use for image searches, with several major search engines
now featuring image search capabilities that allow users to target only images in
their search for a particular word or concept. During this time, studies of image
users, image indexing, and image retrieval have increased as well. The field has
expanded from a handful of studies relating to the indexing of visual materials and
digital imaging technology to include detailed examinations of image users and their
needs, explorations of automatic content-based image retrieval by color, shape,
texture, and other attributes, image searching on the Internet, and browsing of
thumbnail images as a means of enhancing users’ ability to find and retrieve images.
Image users, image indexing, and image retrieval received relatively little
attention until the publication of several seminal works on the qualities of images
that made them more complex to catalogue than textual materials, demonstrated in
the work of Markey (1983), Shatford (1984 and 1986) and on art historians and their
needs, discussed in the work of Stam (1984 and 1989), Brilliant (1988), and the Getty
study (Bakewell, Beeman, Reese, and Schmidt, 1988). As digital imaging technology
evolved in the late 1980s, pioneers such as Besser (1990) and Shneiderman (1987)
studied the possibilities inherent in digital imaging and information visualization,
respectively, and provided much of the foundation for later studies of access to
images, user interfaces, and human-computer interaction. Research into image user
needs and image retrieval, however, was still relatively scarce as recently as the late
1980s (Shatford, 1986; Stam, 1989).
Prior to the developments in digital imaging technology and the Internet, most
image users retrieved images manually, by using indexes in books or on cards.
Although these methods are still used in many libraries and archives, the growth of
digital imaging technology has expanded our definition of image retrieval to include
digital image retrieval from the Internet and library computer servers as well as
manual indexes. Images have generally been considered more difficult to index, find,
and retrieve than textual materials, but although some of these difficulties extend to
the digital sphere, technology also has provided means by which we may resolve
some of them as well.
This review focuses on the rapid developments in the field after several
seminal studies of image users, including Enser and McGregor’s study of requests
from the Hulton Deutsch collection at the British Library (1993), Hastings’ study of
query categories for intellectual access to images by art historians (1995), Armitage
and Enser’s analysis of queries for visual materials (1997), Jörgensen’s studies of
image description templates (1996) and access to visual materials (1998), and
O’Connors’ study of user needs (1999), giving some consideration to earlier
influences such as Markey’s (1983) and Shatford’s (1984, 1986, 1994) analyses of
issues relating to the indexing of images. Markey’s studies of visual arts and
computers and interindexer consistency and Shatford’s discussions of image
analysis, while antedating current practices in digital imaging and relating more to
cataloguing than to user needs, provide relevant background on the complexity of
visual materials that influences users’ ability to find and retrieve them. Highly
influential on Raya Fidel’s concept of user-centered indexing (1994) and the many
user studies described here, they remain integral to any study of image users and
image retrieval.
In recent years, scholars have debated over the methods that provide the
best solutions for problems relating to image access and retrieval. Traditional
concept-based image indexing uses controlled vocabularies such as the Art and
Architecture Thesaurus (AAT) and the Library of Congress Thesaurus of Graphic
Materials (LCTGM). Concept-based image retrieval remains the most common means
of finding images. Users pose queries to staff members or to cataloguing systems
using verbal descriptions or keywords and receive or retrieve images. This provides
sufficient access for most image users, but such methods are often expensive in
terms of money as well as time, given the laborious process of indexing images and
the inconsistency with which indexers may assign terms (Markey, 1983; Shatford,
1984).
Content-based image retrieval (CBIR), in which images are automatically
extracted based on features such as color and shape, has been proposed as a
solution (Gupta and Jain, 1997; Gevers and Smeulders, 1998). Although promising
results have been demonstrated, CBIR still has not proven applicable for image
retrieval based on attributes beyond those relating to image color, shape, and
texture. As most discussions of CBIR are highly technical and do not employ user
studies, they will be discussed here only in terms of general theory. Studies of
specific image user subgroups such as journalists (Ørnager 1995 and 1997; Markkula
and Sormunen, 2004), art historians (Chen, 2001 and 2002), historians (Choi and
Rasmussen, 2002 and 2003) are included, along with general studies of image users
(Collins, 1998, Greisdorf and O’Connor, 2002) and image searching on the Internet
(Goodrum and Spink, 1999, 2002, and 2004).
IMAGE INDEXING AND RETRIEVAL
Though Markey’s study stemmed from a long tradition of studies of
interindexer consistency, reviewed in her work, it provided the first measurement of
consistency among indexers of visual materials, rather than textual materials.
Thirty-nine subjects were assigned 100 works of art, with three different indexers
describing each work of art. Markey found that of subjects examining the same
images, one of every eight terms elicited by all three indexers matched with respect
to concept consistency and one of every 14 terms elicited by all three indexers
matched with respect to terminology consistency. Conducted prior to the
development of the Art and Architecture Thesaurus (AAT) and the Library of Congress
Thesaurus of Graphic Materials (LCTGM), the study suggested that a controlled
vocabulary for images might improve interindexer consistency, but emphasized that
qualities inherent to images and the varied needs of image users continued to pose
challenges for image indexing.
Further foundation for later studies of image users was provided by Shatford’s
(1984, 1986) work on describing and analyzing images. Shatford described how
analysis of the subject of pictures differed from analysis of textual materials. While
textual materials provide clearer information for development of subject terms in the
form of abstracts and the text itself, images provide more ambiguous information
and require a translation from visual to verbal language. Shatford drew on
Panofsky’s (1955, 1962) three levels of analysis for images: pre-iconographic, which
serves as a basic identification that requires only the knowledge acquired from
everyday experience; iconographic, which requires some knowledge of a given
culture, and iconological, which requires deeper analysis of underlying principles, is
highly subjective, and for which it is difficulty to maintain consistency (Shatford,
1984, 1986).
CATEGORIZING USER QUERIES
Initial user studies focused on categorizing user queries. The first major study
of image users was Enser and McGregor’s (1993) study of almost 3,000 requests for
images from the Hulton-Deutsch collection, which is a general commercial image
collection that includes several distinct subcollections. Enser and McGregor
determined that requests fell into the following categories: 1.) unique (requests for
specific persons, places, things, and events and 2.) non-unique (requests for images
represented by concepts). They found that most requests (69 percent) could be
categorized as unique. Requests in both groups were also frequently accompanied
by specifications of time, location, event, or format; these Enser and McGregor
categorized as unique/refined and non-unique/refined.
This frequently cited study had relatively little widespread influence until
Armitage and Enser published an expanded version of the study that included seven
libraries (1997) which, along with the concurrent work of Jörgensen (1996, 1998),
Hastings (1995, 1999), and O'Connor, O’Connor, and Abbas (1999) and the rising
popularity of digital imaging and the Internet, appears to have been the catalyst for a
subsequent explosion of interest in the subject of image users, image indexing, and
image retrieval between 1997 and 2001.
Over the next few years, several additional works referred to Enser and
McGregor’s work, expanded on the work of Markey and Shatford (Shatford Layne,
1994; Svenonius, 1994), and called for user-centered indexing (Fidel, 1994), but
relatively few user studies were undertaken until Hastings’ (1995) examination of
query categories and intellectual access to images used by art historians, one of the
first, and certainly one of the most influential, studies to incorporate digital images.
This study used a closed system of images available only at the location of the study,
in contrast to a later study that would use images available through the Internet
(Hastings 1999).
During the first part of the study, participants viewed color photographs made
from the digital images, with image retrieval software used for the final investigation.
Four distinct levels of queries were found, with level one representing the least
complex and level four the most complex. Queries for a specific fact, such as artist,
date, medium, number, place, and title were classified as level one and could be
answered from a single inquiry to textual information about a painting or from
viewing a surrogate image, with no differences found between photographs and
digital images.
Table 1: Hastings’ Major Components of Intellectual Access to Digital Art Images in a Closed SystemLevels of Complexity Queries Access Points Computer ManipulationsLevel 1: Least Complex Identification queries
(who, where, when)Includes text fields and image in general
Use of search, sort, and display
Level 2: Complex Queries of the type “what are?”—requires sorting of the text info in the answer set
Includes sorted text information and images
Use of search, select, sort, display, and enlarge
Level 3: More complex Includes queries of style, subject, how, and ID of objects or activities
Includes style, keywords, and complex images
Use of compare, enlarge, mark, resolution, and style
Level 4: Most complex Includes queries for meaning, subject, and why
Includes style and subject
Use of style and subject searches plus access to full-text secondary subject resources
Differences between use of the photographs and use of digital images began
with level two queries, which included queries regarding whether an artist was from a
particular school or whether two paintings were by the same artist and required
some sorting of data cards and photographs or sorting within the database of digital
images. Participants completed their searches more quickly (an average of ten
seconds versus ten minutes) when using the digital images, but then tended to
continue their search, using the sorting options of the database to develop additional
queries and find more information. For level three queries (those that required
comparison of two or more images or required magnification of the surrogate image),
photographs were not used, as these queries relied on functions available only on the
computer. Subject and style queries tended to increase in complexity with use of the
computer, due to the availability of additional search options, with participants
comparing up to four images at once, enlarging particular portions of images, and
sorting identified objects.
At the fourth and most complex level of queries, participants investigated
categories to be used for subject indexing of the collection. The study found that
participants “either sought to apply existing categories used in the study of art
history or their work, or they investigated and explored the images to determine
possible categories for classification.” As with level three queries, there were many
additional queries that were not asked of the photographs, with participants
enlarging portions of images and identifying things that they did not notice in the
photographs. The study also found many queries that could not be answered by
either the photographs or the digital images and would require consultation of
additional historical, biographical, or theoretical sources (Hastings 1995).
A sense of Hastings’ study may be gained by viewing these images on the
Internet at http://www.unt.edu/bryantart/, where one has the option to participate in
a study of access to digitized images. After expanding the study to include
participants’ descriptions of digital images on the web as well as in a closed system,
Hastings found that “browsing, manipulation of the images, and need for user
interaction are important aspects of the search for images on the Web” (Hastings
1999). About 80% of web image queries asked for identification of the artist,
activities, or place, while the remaining 20% of the queries asked about the subject.
Results of the study and responses collected from the survey indicated the need for
users to add their own descriptors and index terms in the search process, the
improvement of application of relevance feedback mechanisms, and the importance
of the ability to browse images for web searching and for users to have the ability to
apply their own categories for searching and browsing.
For evaluation of image retrieval systems, Hastings proposed a model of the
retrieval or search tools required for particular types of query or retrieval tasks and
the evaluation methods best suited to each (Table 2).
Table 2. HASTINGS’ FRAMEWORK FOR EVALUATION OF IMAGE RETRIEVAL SYSTEMSQuery or Retrieval Task Retrieval or Search Tools Evaluation MethodIdentification of known item or image
Index text and field, Browse images
User & relevance feedbackRelevant? Yes or No, measures of time & effort
Identification of unknown item(s) Select & display sets of images, sort sets, enlarge
User supplied terms & categories for browsing, survey form, online user feedback mechanisms, measures of time & effort
Investigations of style and image content
Content-based retrieval tools such as color, texture, shape, and so on
Log analysis, screen captures, survey form
Queries asking “why” and investigations for “aboutness”
Random browsing and extensive answer set displays. May require secondary resources— e.g., biographical and historical information
Amount of user effort, observation of browsing behavior and answer set development. Capture retrieved sets and compare to query task
Enser and Armitage’s analysis of user need in image archives (1997), which
expanded Enser and McGregor’s study. Using Panofsky’s (1955) modes of image
analysis (pre-iconographic, iconographic, and iconological) described by Markey
(1983) and refined by Shatford (1986), Armitage and Enser categorized the queries
by image content, identification, and accessibility, focusing on the image content
requests, for which they developed four main categories (who, what, when, and
where) and three levels of abstraction for each category (specific, general, and
abstract). Despite differences in the missions, collections, and users of the libraries,
Armitage and Enser found similarities in image query formulation among all libraries
and concluded that from this they could formulate a general characterization of
queries based on the established framework for analysis of queries relating to image
content. As it appeared that this schema could be applied to the characterization of
images as well as to the queries for them, they suggested that this schema could be
embedded in the user interface and thus offer a more direct and effective means of
retrieving images.
CLASSIFICATION OF QUERIES AND IMAGE ATTRIBUTES
Another influential study investigated the image attributes noted by
participants in a series of image describing tasks that involved viewing images,
describing them for an image retrieval system, and describing them from memory
(Jörgensen, 1996; Jörgensen, 1998). Using content analysis and descriptive statistics
and incorporating concepts from cognitive science, statements from participants
were analyzed and classified by 47 image attributes that were grouped into 12
higher-level classes of attributes.
Table 3. Jörgensen’s 12 attributes and distribution of attribute classes by percentage for three describing tasks
Tasks Sample attribute Class Viewing Search Memory Average MedianText Objects 34.3% 27.4% 26.2% 29.3% 32.8%People People 8.7% 10.3% 11.1% 10.0% 8.5%Color value Color 9.2% 9.7% 9.0% 9.3% 6.8%Activity Content/
story7.4% 10.8% 9.4% 9.2% 8.6%
Loc—general Location 8.3% 10.7% 7.7% 8.9% 5.7%Number Description 6.0% 9.0% 8.8% 8.0% 4.3%Texture Visual
elements7.2% 5.4% 9.2% 7.2% 5.5%
Artist Art historical info
3.8% 5.7% 7.6% 5.7% 1.3%
Emotion People attributes
5.2% 3.9% 2.6% 3.9% 3.5%
Comparison External relation
3.3% 3.8% 4.0% 3.7% 3.1%
Uncertainty Viewer response
3.7% 1.9% 3.1% 2.9% 0.8%
Theme Abstract 3.0% 1.5% 1.3% 2.0% 1.4%
Results indicated that certain classes of attributes including objects, people or
the human form, color, and location appeared more frequently and that, because
image requests vary greatly, indexers must provide access to a wide range of
attributes to ensure representation of all facets of interest of image users (Jörgensen
1998). One unexpected result found was that many participants assigned terms that
described the “story” within the image, particularly when asked to describe images
specifically for retrieval, providing details that would not normally be included in
indexing systems. Jörgensen points out that differences between these results and
the attributes addressed in most image indexing systems suggests the need to revise
traditional assumptions upon which image indexing and retrieval systems are based.
Building on the work of Enser and McGregor, Jörgensen, and Armitage and
Enser, O’Connor, O’Connor, and Abbas (1999) expanded the categories to a broader
population of subjects who were not professional image users and studied user
reactions as access mechanisms in a study based on image captions. In the first,
flawed portion of the study, 20 users (largely MLS students) were asked to describe
15 images and their reactions to them but instead of providing rich descriptions of
them, the subjects, influenced by their educational backgrounds, constructed Library
of Congress subject headings. In the revised version, 24 subjects were shown 15
images with the instructions to write captions for them, list words or phrases that
they would use to describe them, and list words or phrases that described how the
images made them feel. To allow for a wider range of images and participants, the
third and final version of the study presented 120 respondents (again, all MLS
students, but this time from four different sites in the western United States) with
300 images representing a wide variety of subjects and production values on a web
site. Respondents were asked to choose and describe any 100 images. Results
indicated that the wide variety of responses and descriptions, particularly those that
relating to affective/emotional states, may aid indexers by providing additional
indexing terms that lead to better retrieval by users seeking images that evoke
particular moods and users who seek images that represent concepts that are
difficult to index.
QUERIES, RELEVANCE, AND COGNITIVE PROCESSES
Using as a foundation the work of Enser, Jörgensen, and others, several
studies examined user queries as a means of determining which subject terms and
attributes were used most frequently by users, the specificity of search terms used,
and the relevance that users assigned to the results retrieved (Collins, 1998;
Efthimiadis and Fidel, 2000; Chen, 2001; Greisdorf and O’Connor, 2002; Choi and
Rasmussen, 2002; Choi and Rasmussen, 2003).
Collins (1998) examined 100 user queries at the North Carolina Collection,
University of North Carolina at Chapel Hill and 87 queries at the North Carolina State
Archives in Raleigh. Collins created basic categories of terms and tallied the number
of user requests that employed each term. Results indicated that subject terms were
used more frequently than any other categories of terms (86 percent of queries),
with generic subject terms used for 57 percent of these and specific subject terms
(particular names, locations, etc.) used for 42 percent of these. Subject terms were
followed in frequency by terms relating to time and place, with relatively few
requests for items by genre, visual terms, format, or creator/provenance.
Chen (2001) investigated user queries and image retrieval methods in the
field of art history in two similar studies. In the user query study, Chen collected
queries from 29 art history undergraduates in pre- and post-search questionnaires
and mapped them into the features previously identified by Enser and McGregor
(1993), Jörgensen (1996), and Fidel (1997). The study found high degrees of
matching by three reviewers to Enser and McGregor’s categories of Unique and
Nonunique and Jörgensen’s classes of Location, Literal Object, Art Historical
Information, People, and People-Related Attributes, but with some need for further
refinement. From the results, Chen proposed adding more details to Enser and
McGregor’s four categories and regrouping Jörgensen’s 12 classes of image attributes
(Chen 2001).
In the image retrieval study, Chen investigated image retrieval methods
employed by 26 art history undergraduates, using Jörgensen’s three image retrieval
tasks and Enser’s four models of image retrieval (1995). Students received pre- and
post-search questionnaires and participated in post-search interviews. Chen found a
“significant difference between the mean number of search keywords or phrases
participants planned to use and the mean number of search keywords or phrases
they actually used” along with a significant relationship between search success and
the percentage of search keywords or phrases drawn from the topic title or
description that students had chosen (Chen 2001).
Integrating elements of cognitive psychology, library science, art, and
computer technology, Jaimes, Benitez, Jörgensen and Chang (2001) examined the
cognitive processes of the user and incorporated aspects of content-based image
retrieval as a means of integrating some aspects of the concept-based and content-
based approaches and developed a new conceptual framework based on Jörgensen’s
earlier work. Classifying visual attributes into a “Pyramid” of four syntactic levels
(type/technique, global distribution, local structure, and composition) and six
semantic levels (generic, specific, and abstract levels of both object and scene). Two
groups of participants, naive users with no prior training in indexing of visual
information and indexers (trained in indexing visual information) produced
descriptions of a random group of about 700 images automatically retrieved from the
Internet and 12 color photographs of current news events in different parts of the
world with brief textual descriptions from an Internet newsgroup.
TABLE 4. Jaimes, Benitez, Jörgensen and Chang Mapping of image attributes to the pyramid levels as generated by different methods: experiment I (spontaneous, and retrieval-oriented); II (author, and caption); and III (indexing).Experiment I I II II IIIPyramid Level
Spontaneous
Retrieval
Author Caption Indexing
I Type/Technique
X X X
II Global X X XIII Local Structure
X X X X
IV Global X X XV Generic Objects
X X X X X
VI Specific Objects
X X X X X
VII Abstract Objects
X X X X X
VIII Generic Scene
X X X X X
IX Specific Scene
X X X X X
X Abstract Scene
X X X X X
Results showed that the preciseness of even naïve users’ descriptions improved when they were instructed to describe images in a way that would aid in retrieval,
suggesting the need for greater structure in image indexing. The study supported the 10-level conceptual structure represented by the pyramid, with all attributes classified at some level of the Pyramid. Although they found some variation in the distribution of attributes among the levels of the Pyramid depending on whether the participants were naïve users or indexers, and depending on the task (describing, indexing, retrieval), “the researchers found no instances where an attribute could not be accommodated by a level of the Pyramid.” In addition, they found that the Pyramid assisted indexers “by making explicit specific, generic, and abstract levels of description. Especially useful is the recursive nature of the Pyramid, which permits associations among objects and attributes (e.g., can be applied to a scene, object, section of an image, etc.)” (Jaimes, Benitez, Jorgensen, and Chang, 2001).
TABLE 5. Sample image indexing record terms from Jaimes, Benitez,Jorgensen, and Chang’s indexing template mapped to the pyramid levelsImage term Pyramid levelPainting Type/techniqueOil Type/techniqueCracked Global distributionRed, white Local structureBackground Local structureRectangle Local structureCenter Local structureEye level Global compositionFlag Generic objectHistorical landscape Generic scenePatriotism Abstract objectPride Abstract scene
A new thesaurus for indexing of images across diverse domains has been
proposed by Jörgensen (2004) and promises to reduce the remaining difficulties and
inconsistencies inherent in indexing and retrieving images.
Examining the concept of relevance, integrating elements of cognitive
science, and seeking to synthesize elements of the concept-based and content-based
approaches, Greisdorf and O’Connor (2002) studied how users assign pre-determined
query terms to retrieved images and how they cognitively assess meaningful terms
after retrieval. They found that users’ perceptions of the relevance of retrieved
images “may arise from descriptions of objects and content-based elements that are
not evident or not even present in the image” and that affective or emotion-based
query terms are a significant category for image retrieval. They proposed that image
retrieval efficiency in systems using traditional indexing methods and the more
technological content extraction algorithms could be enhanced by the development
of a system for capturing human interpretations derived from their cognitive
engagements with viewed images.
To explore users’ perception of relevance to the study of image searching in a
photographic archive on the Internet, Choi and Rasmussen used the Library of
Congress’ American Memory photographic archive as the source of images for a
study of 38 faculty and graduate students of American history and investigated
criteria applied by image users when judging the relevance of images for their
research in American history. Building on work by Barry and Schamber (1995) and
others that examined relevance criteria for textual and non-textual documents, Choi
and Rasmussen explored the extent to which these criteria can be applied to visual
documents and the extent to which new and different criteria apply. Subjects
completed a pre-test questionnaire, searched for images in American Memory,
completed a post-test questionnaire, and participated in interviews. An analysis of
38 natural language statements, 185 search terms provided by study participants,
and 219 descriptors noted by participants in relevant records to determine the
distribution of the subject content of the queries found that over half of the queries
could be identified as “general/nameable needs” and most described images in terms
of person, thing, event, or condition, depending on the location or time of interest
(Choi and Rasmussen, 2003).
Applying quantitative statistical methods to analyze the importance of
relevance criteria and determine how much each criterion affected users’ judgments,
they found that the user’s perception of how the images related to his or her topic
still remained the most important factor throughout the information-seeking stages,
but that users also decided on retrieved items according to other criteria. The most
important of these were image quality and clarity, but also included title, date,
subject descriptors, and notes provided (Choi and Rasmussen 2002).
INTERNET IMAGE SEARCHING
The rapid increase in the use of the Internet for image searches has led to
studies that explore web searches for images. The first major study of an Internet
image search engine examined 1,025,908 sequential queries from 211,058 users on
the Excite web search engine, which included a subset of 22,149 image queries by
9,855 users for both still and moving images (Goodrum and Spink, 1999; Goodrum
and Spink, 2001). They found, on average, 2.36 image queries per user, each of
which had an average of 3.74 search terms. Results showed a large number of
unique terms in the queries, with the most frequently occurring terms appearing less
than 10 percent of the time and most terms occurring only once. As might be
expected, a great number of the most frequently occurring terms were for images of
a sexual nature or those relating to celebrities, though to an even greater extent
than one might anticipate, with almost all of the most frequently occurring terms
falling into these categories.
Contrasting this with the work of Enser (1995), who examined written queries
for non-digital images, they discussed implications for development of models for
visual information retrieval and design of web search engines that would allow for
better automatic retrieval of images. At the time of the study, fewer image search
engines existed than currently exist. Goodrum and Spink noted that since they
began their project, Excite had added tools for still and moving image searches, but
pointed out that the continuing problems associated with providing textual queries
for visual information still existed and that content-based image retrieval still
provided retrieval only by color and shape, with little ability to retrieve images using
higher-level attributes.
CONTENT-BASED IMAGE RETRIEVAL
Recent studies demonstrate the growing popularity of digital images and
research on content-based image retrieval and digital image browsers. Content-
based image retrieval emphasizes the automatic retrieval of images based on
characteristics of the image such as color, shape, and texture. As manual indexing
and retrieval of images can be a time-consuming process, content-based image
retrieval has been hailed as a quicker and more efficient means of image retrieval
(Gudivada and Raghavan, 1995; Gupta and Jain, 1997; Gevers and Smeulders, 1998;
Gevers and Smeulders, 2000; Smeulders, Worring, Santini, Gupta and Jain, 2000;
Zachary, Iyengar, and Barhen, 2001; Zachary and Iyengar, 2001).
Most studies of content-based image retrieval are systems oriented, focusing
on the highly technical aspects of retrieval from pixel-level details of digital images
and include pages of algorithms and discussion of color histograms. Although CBIR
holds much promise for image retrieval, studies examining its usefulness in real user
needs and contexts have only begun in recent years. Three studies that have
specifically applied CBIR to studies of image users (Bailey and Graham, 2000;
Markkula and Sormunen, 2001 and 2004; and Yang, 2004) are discussed below.
INDEXING REVISITED: A CALL FOR A NEW PARADIGM
Spurred by the rapid growth of digital imaging and the use of images in
general, new examinations of image indexing by Roberts (2001) and Graham (2001)
urged development of new indexing methods for use in image databases. Roberts
described ways in which historians often consider images as secondary to and
inferior to the text and provided instances of inadequate citations of images that lead
to difficulty in retrieving these images for later use. Discussing the efforts of art
historians and cataloguers to develop the infrastructure needed to properly index
works of art, Roberts argued that image databases need to index not merely basic
descriptions of art but also concepts about what the images represent or have
represented throughout history (Roberts 2001).
Graham proposed the development of a new paradigm for the cataloguing of
images that would integrate traditional concept-based indexing with developments in
content-based image retrieval, considering and allowing for the needs of varied
groups of users, including large numbers of novice users (Graham 2001). The need
for this new paradigm was demonstrated by a study (Bailey and Graham, 2000) in
which 56 art historians contacted through the Association of Art Historians (UK) and
65 art historians who answered an online survey were asked about the effect of
digital images on their working methods and research interests. While a majority of
both groups (64.5 percent) indicated that it had not changed their research interests,
35.4 percent of the online survey group indicated that it had. The 47.1 percent who
indicated that digital images had affected their working methods, included, perhaps
not surprisingly, 61% of the online survey group and while not a large percentage,
still included 30.4% of the non-online group.
Table 6. Bailey and Graham’s study of the effect of digital images on art historians and their work
Access to digital images has AAH survey Online survey Grand totals(a) affected work methods: Yes 17 30.4
%40 61.5% 57 47.1%
No 31 55.4%
18 27.7% 49 40.5%
Not sure 5 8.9% 7 10.8% 12 9.9%(b) affected research interests: Yes 7 12.5
%23 35.4% 30 24.8%
No 42 75% 36 55.4% 78 64.5% Not sure 3 5.4% 6 9.2% 9 7.4%No answer to the question 1 1.8% 0 0.0% 1 0.8%
Source: Compare and Contrast Survey, 1999/2000, University of Northumbria, UK
DIGITAL ARCHIVES: QUERIES AND BROWSING
Digital newspaper image archives were studied by Ørnager (1995, 1997) and
Markkula and Sormunen (1999, 2001, 2004). Ørnager sought to determine future
rules for image indexing by identifying the categories that users employed for subject
analysis, identifying the kinds of requests users made from the archives, and
classifying the users into particular categories based on the kinds of queries they
posed. Findings showed that results could be used to formulate rules for indexing
images and establish a user group typology, with the results of word association tests
serving as the foundation for an image user model based on word clusters. The
typology included five categories: (a) “specific inquirer,” who asks specific questions
due to already having a particular image in mind; (b) “general inquirer,” who asks
broad questions as (s)he wants to choose images with little assistance from the
archives staff; (c) “storyteller inquirer” who tells archivists about his or her research
and seeks their suggestions; (d) “story giver inquirer,” who not only tells about his or
her research, but also turns over responsibility for image selection to archivists,
seeing them as the experts; and “fill in space inquirer,” who seeks merely any
graphic of a particular size to fill empty space.
Markkula and Sormunen (1999, 2001, 2004) compared and contrasted
journalists’ photo needs, queries, and information-seeking behaviors and found that
the journalists were able to retrieve images effectively and satisfactorily for less
complex and more specific requests, such as for images of particular persons, but
had greater difficulty in retrieving images for more complex and/or more general
requests, such as those for generic objects and themes. They also found that,
although journalists employed browsing as their main search strategy, the system did
not provide good support for browsing. Results indicated that browsing is preferable
to querying for the following reasons:
1. Criteria used in selecting photos are difficult to express by words but are easily applied when the photo is seen.
2. Non-professional searchers have difficulties in formulating focused queries. Browsing is a method to compensate for these difficulties.
3. Photo selection criteria depend on a particular work situation. These aspects are difficult to predict in indexing.
4. Browsing of thumbnail images is quite efficient and the journalists feel comfortable with browsing (Markkula and Sormunen, 2001)
An earlier study had raised the possibility that content-based image retrieval
could solve the problem of browsing large query sets in text-based image retrieval
systems (Markkula and Sormunen, 1999). In a later study, they sought to determine
whether selecting one photograph of interest from a set could, using a CBIR
algorithm, lead to the identification of other visually similar images. Using a
database of 50,000 sample digital images from the newspaper archives, with
standard text retrieval and capability for browsing thumbnail images, ten journalists
were asked to select illustrations for sample articles. Results implied a “correlation
between the photograph similarities computed by the algorithms and the similarities
assessed by the users” (Markkula and Sormunen, 2001, p. 16). As in other CBIR
studies, results retrieved based on color and shape showed the most accuracy.
Determining that content-based retrieval based on color and features is
inadequate for journalists browsing in newspaper archives, Markkula and Sormunen
concluded that further development of concept-based, traditional textual indexes is
needed. As such indexes can be produced automatically from captions and other
descriptions, they suggested that archivists and librarians not spend scarce time
indexing these attributes, but instead should focus on developing improved user
interfaces and bringing together images that represent broad concepts or themes,
which would allow for more effective browsing (Markkula and Sormunen, 2004).
In another study that further examined the potential of content-based
methods for real-life users, Yang (2004) investigated and compared the information-
seeking behavior and performances of two content-based image retrieval methods:
query by example (QBE) and self-organizing image browsing map (SIM). Three sets
of images were given to subjects who had received a training session in which they
became familiar with the QBE and SIM systems. Participants were asked to search by
pre-determined textual descriptions and for pre-selected target images. Yang found
that the image browsing map provided more support for information seeking and led
to better performance in image searching than did the query by example method.
IMAGE BROWSING AND VISUALIZATION
As indicated above, image browsers and visualization may provide a solution
to the unique problems associated with indexing and retrieving images.
Developments in digital imaging and web design technology allow image indexers
and users to circumvent the issues associated with indexing and retrieving images
described previously. Reducing the need to include all information associated with a
given image in a catalog record or caption or find ways of retrieving images based on
physical characteristics of images, terms associated with image sets can be included
on web pages and sets of related images can be grouped together as thumbnail
images through which users can browse, clicking on images of interest to view larger
images.
Several recent studies, building on Bates’ (1989) discussion of browsing and
berry-picking, suggest that browsing may be the most effective search strategy for
digital image users, particularly when visually similar images are clustered and/or
combined with captioned descriptions (Ørnager, 1995 and 1997; Markkula and
Sormunen, 1998, 2000, and 2001; Rodden, Basalaj, Sinclair and Wood, 1999 and
2001; and Yang, 2004), and/or include zoomable user interfaces (ZUIs) (Combs and
Bederson, 1999).
Testing the idea that allowing users to browse through visually similar images
could enhance retrieval, Rodden, Basalaj, Sinclair and Wood (2001) sought to
determine whether organizing images by similarity would enhance browsing. As
query-based retrieval using purely visual measures, as in CBIR, had, as of that point,
only limited success, they began with the expectation that many image users would
begin their searches by entering textual queries or navigating hierarchical
arrangements of categories and then browsing the resulting set of thumbnails.
Eighteen designers were given 100 thumbnail images of particular cities and
asked to choose from them three photos to illustrate “destination guide” articles for a
new independent travel web site. The thumbnails were arranged by visual similarity
(color and general appearance) or by category (images of midtown Manhattan,
Brooklyn Bridge, etc.).
The first experiment was designed to determine whether designers would find
either of the similarity-based arrangements (visual or captioned) useful for selecting
images and also whether it was helpful to have both arrangements available. It was
expected that both the caption-based and the visual arrangements would be
regarded as useful in their own right and also when used in combination. Two-thirds
of the participants found the caption-based arrangement useful, but participants
disagreed on the usefulness of the visual arrangement, with some finding it useful,
but many finding it difficult to use.
ZOOMABLE USER INTERFACES (ZUIs)
Zoomable user interfaces (ZUIs) have been proposed as a means of
enhancing users’ browsing experience and improving their ability to find and retrieve
images of interest (Combs and Bederson, 1999). Building on Shneiderman’s (1987)
work on information visualization, Combs and Bederson examined whether zooming
improves browsing. Thirty participants used four different image browsers to find
and retrieve images. The first of the two simultaneous experiments used between-
subject testing, in which participants were randomly assigned one of four browsers
and instructed to browse through each of three image sets and the second used
within-subject testing. Results found that “the zoomable image browser as well as
the traditional 2D grid of thumbnails works best for performance time and user
satisfaction” but one curious finding was that, despite given training in zooming,
roughly half of the participants did not take advantage of this capability. (Combs and
Bederson 1999). Suggestions from users included requests for “the ability to group
images in clusters by content” and for a means of searching for the target image
rather than browsing for it, perhaps because of the large (up to 225) number of
images on the screen.
A STUDY OF STUDIES
Chu (2001) conducted a statistical analysis of data relating to research in
concept-based and content-based image retrieval. Using SciSearch and Social
SciSearch, Chu searched for articles on image indexing retrieval and ranked them by
author and cited author to find the most prolific and highly cited authors. Results
indicated, perhaps not surprisingly, that researchers from the field of computer
science were most closely associated with research in the content-based domain,
while researchers in the field of library and information science tended to emphasize
research in concept-based image retrieval. Despite the differences between the two
groups, Chu found evidence of collaboration with, or at least citation of, authors from
other specialties and recommended the promotion and encouragement of further
collaboration and perhaps some integration of the two approaches.
CONCLUSION
While much research remains to be done, it appears that the most viable solutions to
image indexing problems have indeed come and will continue to come from
synthesizing new methods of content-based image retrieval with improvements to
traditional concept-based indexing (Chen and Rasmussen, 1999; Enser, 2000; Jaimes,
Benitez, Jörgensen, and Chang, 2001; Markkula and Sormunen, 2001; Yang, 2004) or
image browsing approaches (Markkula and Sormunen, 2004, Yang, 2004, Bederson,
2005).
Research relating to image users and image retrieval, which began with
studies of image indexing and interindexer consistency and a handful of user studies,
has expanded into a growing subfield of both library science and computer science
that includes studies on digital imaging, automatic image retrieval, and image
searching on the Internet. In recent years, several scholars have pointed to the need
to integrate the previously disparate camps of those who approach image indexing
from the concept-based perspective (more often library science people) and those
who approach it from the content-based perspective (more often computer science
people) as a means of finding a workable solution to the issues associated with
improving access to digitized images on the web.
Regardless of their focus, most studies indicate that image users have unique
needs and often have difficulty retrieving the images that they need through
traditional means of research and retrieval. They also indicate that users requesting
images of people and places have the best chance of finding appropriate images,
while those requesting images of more ambiguous subjects often have difficulty.
Complicating matters is the difficulty that cataloguers often have in assigning subject
terms to images, because images are more likely than textual materials to be
interpreted by different persons in different ways. Recent developments in content-
based image retrieval promise to provide solutions to the time-consuming and
inconsistent means of indexing images manually, but as of yet these automatic
means of image retrieval have been successful only in retrieving images by “low-
level” attributes such as color and, to a lesser extent, shape.
Clearly, recent years have seen an ever-greater emphasis on the role of the
user and on developing new means of indexing and image display based on the
users’ needs, perhaps influenced by Dervin’s concepts of sense-making (Dervin
1976) and Belkin’s study of the user’s anomalous states of knowledge and designing
systems to fit the user, rather than attempting to make the user conform to the
system (Belkin 1980).
Image browsing seems a promising solution to some problems relating to
image indexing and improving image retrieval for users. Computer operating
systems now allow librarians to arrange digital images in folders on their servers and
examine all digital images as thumbnails. Images may be displayed as thumbnails
on web pages, which allows users to find them through keyword searches, browse
through them, and enlarge images of interest.
These browsing options have provided a partial solution to the problem of
image searching and retrieval for onsite and offsite users. Even when the original
images have been catalogued using traditional means, librarians and users can
retrieve digital surrogates for viewing with minimal effort. Rather than paging items
from several different locations and making reference photocopies, librarians can
browse scanned images at the reference desk, display images for in-house users, and
e-mail low-resolution reference images to offsite users. Images not yet scanned
must, of course, be paged in the traditional way and the ever-present time and
funding constraints may prevent scanning all materials, but a slow yet steady
scanning of images (even if only for user requests) can, over a period of years, help a
library to develop a digital library of scans that saves much time and work in the long
run. For online users, the ability to search through hundreds of thumbnails on web
pages that include captioned keywords to facilitate their retrieval from image search
engines reduces the need for indexers to try to anticipate users’ query terms and the
need for users to try to guess at the terms under which images have been indexed.
A digital image browsing program developed by Bederson and presented at
the NFAIS 2005 conference allows users to browse hundreds of thumbnail images
and zoom in using the concept of zoomable user interfaces described above and
select images of interest (Bederson 2001, Bederson 2005). Bederson’s International
Children’s Digital Library (ICDL), also demonstrated at the conference, provides the
ability to search books not only by traditional text-based means, but also by such
attributes as color of book covers and the age range of those for whom the books are
intended (Druin, Bederson, Hourcade, Sherman, Revelle, Platner, & Weng, 2001;
Bederson 2005).
Reading today the words of an art historian who participated in the Getty
study (Bakewell, Beeman, Reese, and Schmidt, 1988), one realizes the tremendous
technological advances in image indexing and retrieval that have been made over
the past two decades:
“I would like access to visual traditions…and to me that means electronic and computerized, so that you can slice the pie there and get different permutations and combinations [making] images accessible according to iconography, time, region, artist, and owner… The issue is not just indexing; it’s accessibility. And that means articulating indices, giving you different ways of organizing material and slicing it, and also efficiency of access” (Bakewell, Beeman, Reese, and Schmidt, 1988, p. 51).
One hopes that this foresighted scholar lived to see digital imaging and
database developments such as ARTSTOR.
Recent years have seen calls for syntheses of traditional concept-based
indexing methods with content-based image retrieval with an emphasis on the need
for cooperation between library and information science professionals, who have
traditionally favored the concept-based approach, and information systems
professionals who are at the forefront of developing the content-based and browsing
methods. By combining their interests, efforts, and approaches, they can build on
the advances that have been made and continue to find better means of improving
users’ abilities to find and retrieve images.
REFERENCES
Armitage, L. and Enser, P. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287-299.
Bailey, C. and Graham, M. (2000). The corpus and the art historian. CIHA London: Thirtieth International Congress of the History of Art [Electronic resource]. http://www.unites.uqam.ca/AHWA/Meetings/2000.CIHA/Bailey.html, viewed March 3, 2005.
Bakewell, E., Beeman, W., Reese, C., and Schmitt, M. (1988). Object, image, inquiry: the art historian at work: report on a collaborative study by the Getty Art History Information Program (AHIP) and the Institute for Research in Information and Scholarship (IRIS), Brown University. Santa Monica, CA: The Getty Art History Information Program.
Bates, M. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.
Bederson, B. (2001). PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps. Proceedings of the 14th annual ACM symposium on user interface software and technology, 71-80.
Bederson, B. (2005). Making sense of search results: creating effective visualizations. Presentation at NFAIS Annual Conference, March 1, 2005
Belkin, N. (1980). Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133-143.
Besser, H. (1990). Visual access to visual images: the UC Berkeley image database project. Library Trends, 38(4), 787-798
Besser, H. and Trant, J. (1995). Introduction to imaging: issues in constructing an image database. Santa Monica, CA: The Getty Art History Information Program.
Brilliant, R. How an art historian connects objects and information. Library Trends, 37, 120-129.
Chen, Hsin-liang (2001). An analysis of image retrieval tasks in the field of art history [Electronic version]. Information Processing and Management, 37, 701-720.
Chen, Hsin-liang (2001). An analysis of image queries in the field of art history [Electronic version]. Journal of the American Society for Information Science and Technology, 52(3), 260-273.
Chen, H. and Rasmussen, E. (1999). Intellectual access to images [Electronic version]. Library Trends, 48(2), 291-302.
Choi, Youngok and Edie M. Rasmussen (2002). Users' relevance criteria in image retrieval in American history [Electronic version]. Information Processing and Management, 38, 695-726.
Choi, Youngok and Edie M. Rasmussen (2003). Searching for images: the analysis of users' queries for image retrieval in American history [Electronic version]. Journal of the American Society for Information Science and Technology, 54(60), 498-511.
Collins, Karen (1998). Providing subject access to images: a study of user queries.The American Archivist, 61, 36-55.
Dervin, B. (1998). Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of Knowledge Management, 2 (2), 36-46.
Druin, A., Bederson, B. B., Hourcade, J. P., Sherman, L., Revelle, G., Platner, M., & Weng, S. (2001). Designing a digital library for young children: an intergenerational Partnership. Proceedings of Joint Conference on Digital Libraries, 398-405.
Enser, P. Visual image retrieval: seeking the alliance of concept-based and content-based paradigms [Electronic version]. Journal of Information Science, 26(4), 199-210.
Enser, P. and McGregor, C. (1993). Analysis of visual information retrieval queries. Report on Project G16412 to the British Library Research and Development Department. London: British Library.
Enser, P. (2000). Visual image retrieval: Seeking the alliance of concept-based and content-based paradigms [Electronic version]. Journal of Information Science, 26(4), 199-210.
Fidel, R. (1994). User-centered indexing [Electronic version]. Journal of the American Society for Information Science, 45(8), 572-576.
Gevers, T. and Smeulders, A. (1998). Image retrieval by multi-scale illumination invariant indexing [Electronic version]. Lecture Notes in Computer Science, 1464, 96-108.
Gevers, T. and Smeulders, A. (2000). Pictoseek: Combining color and shape invariant features for image retrieval [Electronic version]. IEEE Transactions on Image Processing, 9(1), 102-119
Goodrum, Abby and Spink, Amanda (2001). Image searching on the Excite websearch engine [Electronic version]. Information Processing and Management, 37(2), 295-311.
Gordon, A. (2001). Browsing image collections with representations of common-sense activities. Journal of the American Society for Information Science and Technology, 52(11), 925-929.
Graham, M. (2001). The cataloguing and indexing of images: time for a new paradigm? Art Libraries Journal, 26(1), 22-37.
Greisdorf, Howard and O’Connor (2002). Modelling what users see when they look at images: a cognitive viewpoint [Electronic version]. Journal of Documentation, 58(1), 6-29.
Gudivada, V. and Raghavan, V. (1995). Content-based image retrieval systems. Computer 28(9), 18-22.
Gupta, A., and Jain, R. (1997). Visual information retrieval. Communications of the ACM, 40(5), 70-79
Hastings, Samantha K. (1995). Query categories in a study of intellectual access todigitized art images. Proceedings of the 58th Annual Meeting of the American Society for Information Science, 32, 3-8.
Hastings, Samantha K. (1999). Evaluation of image retrieval systems: role of user feedback [Electronic version]. Library Trends, 48(2), 438-452.
Jaimes, A., Benitez, A., Jörgensen, C., and Chang, S. A conceptual framework and research for classifying visual descriptors. Journal of the American Society for Information Science, 52 (11), 938-947
Jörgensen, C. (1996). Testing an image description template. Proceedings of the 59th Annual Meeting of the American Society for Information Science, 209-213.
Jörgensen, Corinne (1998). Attributes of images in describing tasks [Electronic version]. Information Processing and Management, 34(2/3), 161-174.
Markey, K. (1984). Interindexer consistency tests: a literature review and report of a test of consistency in indexing visual materials. Library and Information Science Research, 6, 155-157.
Markkula, M. and Sormunen, E. (2001). A test collection for the evaluation of content-based image retrieval algorithms: a user and task-based approach [Electronic version]. Information Retrieval, 4(3/4), 275-294.
Markkula, Marjo and Sormunen, Eero (2004). End-user searching challenges indexing practices in the digital newspaper photo archive [Electronic version]. Information Retrieval, 1, 259-285.
O'Connor, Brian, O’Connor, Mary K. and Abbas, June M. (1999). User reactions as access mechanism: an exploration based on captions for images [Electronic version]. Journal of the American Society for Information Science, 50(8), 681-697.
Ørnager, Susanne (1995). The newspaper image database: empirical supported analysis of users’ typology and word association clusters [Electronic version]. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 212-218.
Ørnager, S. (1997). Image retrieval: theoretical analysis and empirical user studies on accessing information in images. Proceedings of the 60th ASIS Annual Meeting. ASIS '97, Washington, D.C., Information Today, Inc., 202-210.
Panofsky, E. (1962). Chapter I: Introductory. In Studies in iconology: humanistic themes in the art of the Renaissance.
Roberts, H. (2001). A picture is worth a thousand words: art indexing in electronic databases [Electronic version]. Journal of the American Society for Information Science and Technology, 52(11), 911-916.
Rodden, K. (2002). Evaluating similarity-based visualisations as interfaces for imagebrowsing. Technical Report no. 543, University of Cambridge Computer Laboratory.
Rodden, K., Basalaj, W., Sinclair, D., and Wood, K. (2001). Does organisation by similarity assist image browsing? Proceedings of the SIGCHI conference on Human factors in computing systems, 3(1), 190-197.
Shatford, S. (1984). Describing a picture: a thousand words are seldom cost-effective. Cataloguing and Classification Quarterly, 4(4), 13-30.
Shatford, S. (1986). Analyzing the subject of a picture: a theoretical approach. Cataloguing and Classification Quarterly, 6(3), 39-62.
Shatford Layne, S. (1994). Some issues in the indexing of images [Electronic version]. Journal of the American Society for Information Science, 45(8), 583-588.
Shneiderman, B. Designing the user interface: strategies for effective human-computer interaction. Reading, Mass.: Addison-Wesley, 1987.
Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349-1380.
Stam, D. (1984). How art historians look for information. Art Documentation 3(4), 117-119.
Stam, D. (1985). Remembrance of things past: mental processes of art reference librarians. Art Documentation, 4(4), 139-141.
Stam, D. (1989) Tracking art historians: on their information needs and information seeking behavior. Art Libraries Journal,14(3), 13-16.
Svenonius, E. (1994). Access to nonbook materials: the limits of subject indexing for visual and aural languages [Electronic version]. Journal of the American Society for Information Science, 45(8), 600-606.
Yang, Christopher C. (2004). Content-based image retrieval: a comparison between query by example and image browsing map approaches. Journal of Information Science, 30(3), 254-267.
Zachary, J. and Iyengar, S. (2001). Information theoretic similarity measures for content based image retrieval. Journal of the American Society for Information Science and Technology, 52(10), 856-867.
Zachary, J., Iyengar, S., and Barhen, J. (2001). Content based image retrieval and information theory: a general approach. Journal of the American Society for Information Science and Technology, 52(10), 840-852.