Lutz, Valerie-Anneval22/eport/papers/INFO511R… · Web viewYang, Christopher C. (2004)....

Lutz, Valerie-AnneINFO 511Dr. Martha SmithWinter 2005

Image Users and Image Retrieval: Review of the Literature

ABSTRACT

This review of the literature provides an overview of the rapid developments

in the field of image indexing and image retrieval, describes several seminal studies

of image users in the 1990s with some consideration of earlier analyses of issues

relating to the indexing of images, and discusses recent proposed solutions for

problems relating to image access and retrieval, including modifications to traditional

controlled vocabulary indexing and new methods such as browsing of digital images

and automatic retrieval based on image features.

KeywordsImage retrieval, image indexing, image users, digital images, visual materials, pictures

INTRODUCTION

The dramatic expansion of the Internet over the past ten years has seen an

equally rapid growth in its use for image searches, with several major search engines

now featuring image search capabilities that allow users to target only images in

their search for a particular word or concept. During this time, studies of image

users, image indexing, and image retrieval have increased as well. The field has

expanded from a handful of studies relating to the indexing of visual materials and

digital imaging technology to include detailed examinations of image users and their

needs, explorations of automatic content-based image retrieval by color, shape,

texture, and other attributes, image searching on the Internet, and browsing of

thumbnail images as a means of enhancing users’ ability to find and retrieve images.

Image users, image indexing, and image retrieval received relatively little

attention until the publication of several seminal works on the qualities of images

that made them more complex to catalogue than textual materials, demonstrated in

the work of Markey (1983), Shatford (1984 and 1986) and on art historians and their

needs, discussed in the work of Stam (1984 and 1989), Brilliant (1988), and the Getty

study (Bakewell, Beeman, Reese, and Schmidt, 1988). As digital imaging technology

evolved in the late 1980s, pioneers such as Besser (1990) and Shneiderman (1987)

studied the possibilities inherent in digital imaging and information visualization,

respectively, and provided much of the foundation for later studies of access to

images, user interfaces, and human-computer interaction. Research into image user

needs and image retrieval, however, was still relatively scarce as recently as the late

1980s (Shatford, 1986; Stam, 1989).

Prior to the developments in digital imaging technology and the Internet, most

image users retrieved images manually, by using indexes in books or on cards.

Although these methods are still used in many libraries and archives, the growth of

digital imaging technology has expanded our definition of image retrieval to include

digital image retrieval from the Internet and library computer servers as well as

manual indexes. Images have generally been considered more difficult to index, find,

and retrieve than textual materials, but although some of these difficulties extend to

the digital sphere, technology also has provided means by which we may resolve

some of them as well.

This review focuses on the rapid developments in the field after several

seminal studies of image users, including Enser and McGregor’s study of requests

from the Hulton Deutsch collection at the British Library (1993), Hastings’ study of

query categories for intellectual access to images by art historians (1995), Armitage

and Enser’s analysis of queries for visual materials (1997), Jörgensen’s studies of

image description templates (1996) and access to visual materials (1998), and

O’Connors’ study of user needs (1999), giving some consideration to earlier

influences such as Markey’s (1983) and Shatford’s (1984, 1986, 1994) analyses of

issues relating to the indexing of images. Markey’s studies of visual arts and

computers and interindexer consistency and Shatford’s discussions of image

analysis, while antedating current practices in digital imaging and relating more to

cataloguing than to user needs, provide relevant background on the complexity of

visual materials that influences users’ ability to find and retrieve them. Highly

influential on Raya Fidel’s concept of user-centered indexing (1994) and the many

user studies described here, they remain integral to any study of image users and

image retrieval.

In recent years, scholars have debated over the methods that provide the

best solutions for problems relating to image access and retrieval. Traditional

concept-based image indexing uses controlled vocabularies such as the Art and

Architecture Thesaurus (AAT) and the Library of Congress Thesaurus of Graphic

Materials (LCTGM). Concept-based image retrieval remains the most common means

of finding images. Users pose queries to staff members or to cataloguing systems

using verbal descriptions or keywords and receive or retrieve images. This provides

sufficient access for most image users, but such methods are often expensive in

terms of money as well as time, given the laborious process of indexing images and

the inconsistency with which indexers may assign terms (Markey, 1983; Shatford,

1984).

Content-based image retrieval (CBIR), in which images are automatically

extracted based on features such as color and shape, has been proposed as a

solution (Gupta and Jain, 1997; Gevers and Smeulders, 1998). Although promising

results have been demonstrated, CBIR still has not proven applicable for image

retrieval based on attributes beyond those relating to image color, shape, and

texture. As most discussions of CBIR are highly technical and do not employ user

studies, they will be discussed here only in terms of general theory. Studies of

specific image user subgroups such as journalists (Ørnager 1995 and 1997; Markkula

and Sormunen, 2004), art historians (Chen, 2001 and 2002), historians (Choi and

Rasmussen, 2002 and 2003) are included, along with general studies of image users

(Collins, 1998, Greisdorf and O’Connor, 2002) and image searching on the Internet

(Goodrum and Spink, 1999, 2002, and 2004).

IMAGE INDEXING AND RETRIEVAL

Though Markey’s study stemmed from a long tradition of studies of

interindexer consistency, reviewed in her work, it provided the first measurement of

consistency among indexers of visual materials, rather than textual materials.

Thirty-nine subjects were assigned 100 works of art, with three different indexers

describing each work of art. Markey found that of subjects examining the same

images, one of every eight terms elicited by all three indexers matched with respect

to concept consistency and one of every 14 terms elicited by all three indexers

matched with respect to terminology consistency. Conducted prior to the

development of the Art and Architecture Thesaurus (AAT) and the Library of Congress

Thesaurus of Graphic Materials (LCTGM), the study suggested that a controlled

vocabulary for images might improve interindexer consistency, but emphasized that

qualities inherent to images and the varied needs of image users continued to pose

challenges for image indexing.

Further foundation for later studies of image users was provided by Shatford’s

(1984, 1986) work on describing and analyzing images. Shatford described how

analysis of the subject of pictures differed from analysis of textual materials. While

textual materials provide clearer information for development of subject terms in the

form of abstracts and the text itself, images provide more ambiguous information

and require a translation from visual to verbal language. Shatford drew on

Panofsky’s (1955, 1962) three levels of analysis for images: pre-iconographic, which

serves as a basic identification that requires only the knowledge acquired from

everyday experience; iconographic, which requires some knowledge of a given

culture, and iconological, which requires deeper analysis of underlying principles, is

highly subjective, and for which it is difficulty to maintain consistency (Shatford,

1984, 1986).

CATEGORIZING USER QUERIES

Initial user studies focused on categorizing user queries. The first major study

of image users was Enser and McGregor’s (1993) study of almost 3,000 requests for

images from the Hulton-Deutsch collection, which is a general commercial image

collection that includes several distinct subcollections. Enser and McGregor

determined that requests fell into the following categories: 1.) unique (requests for

specific persons, places, things, and events and 2.) non-unique (requests for images

represented by concepts). They found that most requests (69 percent) could be

categorized as unique. Requests in both groups were also frequently accompanied

by specifications of time, location, event, or format; these Enser and McGregor

categorized as unique/refined and non-unique/refined.

This frequently cited study had relatively little widespread influence until

Armitage and Enser published an expanded version of the study that included seven

libraries (1997) which, along with the concurrent work of Jörgensen (1996, 1998),

Hastings (1995, 1999), and O'Connor, O’Connor, and Abbas (1999) and the rising

popularity of digital imaging and the Internet, appears to have been the catalyst for a

subsequent explosion of interest in the subject of image users, image indexing, and

image retrieval between 1997 and 2001.

Over the next few years, several additional works referred to Enser and

McGregor’s work, expanded on the work of Markey and Shatford (Shatford Layne,

1994; Svenonius, 1994), and called for user-centered indexing (Fidel, 1994), but

relatively few user studies were undertaken until Hastings’ (1995) examination of

query categories and intellectual access to images used by art historians, one of the

first, and certainly one of the most influential, studies to incorporate digital images.

This study used a closed system of images available only at the location of the study,

in contrast to a later study that would use images available through the Internet

(Hastings 1999).

During the first part of the study, participants viewed color photographs made

from the digital images, with image retrieval software used for the final investigation.

Four distinct levels of queries were found, with level one representing the least

complex and level four the most complex. Queries for a specific fact, such as artist,

date, medium, number, place, and title were classified as level one and could be

answered from a single inquiry to textual information about a painting or from

viewing a surrogate image, with no differences found between photographs and

digital images.

Table 1: Hastings’ Major Components of Intellectual Access to Digital Art Images in a Closed SystemLevels of Complexity Queries Access Points Computer ManipulationsLevel 1: Least Complex Identification queries

(who, where, when)Includes text fields and image in general

Use of search, sort, and display

Level 2: Complex Queries of the type “what are?”—requires sorting of the text info in the answer set

Includes sorted text information and images

Use of search, select, sort, display, and enlarge

Level 3: More complex Includes queries of style, subject, how, and ID of objects or activities

Includes style, keywords, and complex images

Use of compare, enlarge, mark, resolution, and style

Level 4: Most complex Includes queries for meaning, subject, and why

Includes style and subject

Use of style and subject searches plus access to full-text secondary subject resources

Differences between use of the photographs and use of digital images began

with level two queries, which included queries regarding whether an artist was from a

particular school or whether two paintings were by the same artist and required

some sorting of data cards and photographs or sorting within the database of digital

images. Participants completed their searches more quickly (an average of ten

seconds versus ten minutes) when using the digital images, but then tended to

continue their search, using the sorting options of the database to develop additional

queries and find more information. For level three queries (those that required

comparison of two or more images or required magnification of the surrogate image),

photographs were not used, as these queries relied on functions available only on the

computer. Subject and style queries tended to increase in complexity with use of the

computer, due to the availability of additional search options, with participants

comparing up to four images at once, enlarging particular portions of images, and

sorting identified objects.

At the fourth and most complex level of queries, participants investigated

categories to be used for subject indexing of the collection. The study found that

participants “either sought to apply existing categories used in the study of art

history or their work, or they investigated and explored the images to determine

possible categories for classification.” As with level three queries, there were many

additional queries that were not asked of the photographs, with participants

enlarging portions of images and identifying things that they did not notice in the

photographs. The study also found many queries that could not be answered by

either the photographs or the digital images and would require consultation of

additional historical, biographical, or theoretical sources (Hastings 1995).

A sense of Hastings’ study may be gained by viewing these images on the

Internet at http://www.unt.edu/bryantart/, where one has the option to participate in

a study of access to digitized images. After expanding the study to include

participants’ descriptions of digital images on the web as well as in a closed system,

Hastings found that “browsing, manipulation of the images, and need for user

interaction are important aspects of the search for images on the Web” (Hastings

1999). About 80% of web image queries asked for identification of the artist,

activities, or place, while the remaining 20% of the queries asked about the subject.

Results of the study and responses collected from the survey indicated the need for

users to add their own descriptors and index terms in the search process, the

improvement of application of relevance feedback mechanisms, and the importance

of the ability to browse images for web searching and for users to have the ability to

apply their own categories for searching and browsing.

http://www.unt.edu/bryantart/

For evaluation of image retrieval systems, Hastings proposed a model of the

retrieval or search tools required for particular types of query or retrieval tasks and

the evaluation methods best suited to each (Table 2).

Table 2. HASTINGS’ FRAMEWORK FOR EVALUATION OF IMAGE RETRIEVAL SYSTEMSQuery or Retrieval Task Retrieval or Search Tools Evaluation MethodIdentification of known item or image

Index text and field, Browse images

User & relevance feedbackRelevant? Yes or No, measures of time & effort

Identification of unknown item(s) Select & display sets of images, sort sets, enlarge

User supplied terms & categories for browsing, survey form, online user feedback mechanisms, measures of time & effort

Investigations of style and image content

Content-based retrieval tools such as color, texture, shape, and so on

Log analysis, screen captures, survey form

Queries asking “why” and investigations for “aboutness”

Random browsing and extensive answer set displays. May require secondary resources— e.g., biographical and historical information

Amount of user effort, observation of browsing behavior and answer set development. Capture retrieved sets and compare to query task

Enser and Armitage’s analysis of user need in image archives (1997), which

expanded Enser and McGregor’s study. Using Panofsky’s (1955) modes of image

analysis (pre-iconographic, iconographic, and iconological) described by Markey

(1983) and refined by Shatford (1986), Armitage and Enser categorized the queries

by image content, identification, and accessibility, focusing on the image content

requests, for which they developed four main categories (who, what, when, and

where) and three levels of abstraction for each category (specific, general, and

abstract). Despite differences in the missions, collections, and users of the libraries,

Armitage and Enser found similarities in image query formulation among all libraries

and concluded that from this they could formulate a general characterization of

queries based on the established framework for analysis of queries relating to image

content. As it appeared that this schema could be applied to the characterization of

images as well as to the queries for them, they suggested that this schema could be

embedded in the user interface and thus offer a more direct and effective means of

retrieving images.

CLASSIFICATION OF QUERIES AND IMAGE ATTRIBUTES

Another influential study investigated the image attributes noted by

participants in a series of image describing tasks that involved viewing images,

describing them for an image retrieval system, and describing them from memory

(Jörgensen, 1996; Jörgensen, 1998). Using content analysis and descriptive statistics

and incorporating concepts from cognitive science, statements from participants

were analyzed and classified by 47 image attributes that were grouped into 12

higher-level classes of attributes.

Table 3. Jörgensen’s 12 attributes and distribution of attribute classes by percentage for three describing tasks

Tasks Sample attribute Class Viewing Search Memory Average MedianText Objects 34.3% 27.4% 26.2% 29.3% 32.8%People People 8.7% 10.3% 11.1% 10.0% 8.5%Color value Color 9.2% 9.7% 9.0% 9.3% 6.8%Activity Content/

story7.4% 10.8% 9.4% 9.2% 8.6%

Loc—general Location 8.3% 10.7% 7.7% 8.9% 5.7%Number Description 6.0% 9.0% 8.8% 8.0% 4.3%Texture Visual

elements7.2% 5.4% 9.2% 7.2% 5.5%

Artist Art historical info

3.8% 5.7% 7.6% 5.7% 1.3%

Emotion People attributes

5.2% 3.9% 2.6% 3.9% 3.5%

Comparison External relation

3.3% 3.8% 4.0% 3.7% 3.1%

Uncertainty Viewer response

3.7% 1.9% 3.1% 2.9% 0.8%

Theme Abstract 3.0% 1.5% 1.3% 2.0% 1.4%

Results indicated that certain classes of attributes including objects, people or

the human form, color, and location appeared more frequently and that, because

image requests vary greatly, indexers must provide access to a wide range of

attributes to ensure representation of all facets of interest of image users (Jörgensen

1998). One unexpected result found was that many participants assigned terms that

described the “story” within the image, particularly when asked to describe images

specifically for retrieval, providing details that would not normally be included in

indexing systems. Jörgensen points out that differences between these results and

the attributes addressed in most image indexing systems suggests the need to revise

traditional assumptions upon which image indexing and retrieval systems are based.

Building on the work of Enser and McGregor, Jörgensen, and Armitage and

Enser, O’Connor, O’Connor, and Abbas (1999) expanded the categories to a broader

population of subjects who were not professional image users and studied user

reactions as access mechanisms in a study based on image captions. In the first,

flawed portion of the study, 20 users (largely MLS students) were asked to describe

15 images and their reactions to them but instead of providing rich descriptions of

them, the subjects, influenced by their educational backgrounds, constructed Library

of Congress subject headings. In the revised version, 24 subjects were shown 15

images with the instructions to write captions for them, list words or phrases that

they would use to describe them, and list words or phrases that described how the

images made them feel. To allow for a wider range of images and participants, the

third and final version of the study presented 120 respondents (again, all MLS

students, but this time from four different sites in the western United States) with

300 images representing a wide variety of subjects and production values on a web

site. Respondents were asked to choose and describe any 100 images. Results

indicated that the wide variety of responses and descriptions, particularly those that

relating to affective/emotional states, may aid indexers by providing additional

indexing terms that lead to better retrieval by users seeking images that evoke

particular moods and users who seek images that represent concepts that are

difficult to index.

QUERIES, RELEVANCE, AND COGNITIVE PROCESSES

Using as a foundation the work of Enser, Jörgensen, and others, several

studies examined user queries as a means of determining which subject terms and

attributes were used most frequently by users, the specificity of search terms used,

and the relevance that users assigned to the results retrieved (Collins, 1998;

Efthimiadis and Fidel, 2000; Chen, 2001; Greisdorf and O’Connor, 2002; Choi and

Rasmussen, 2002; Choi and Rasmussen, 2003).

Collins (1998) examined 100 user queries at the North Carolina Collection,

University of North Carolina at Chapel Hill and 87 queries at the North Carolina State

Archives in Raleigh. Collins created basic categories of terms and tallied the number

of user requests that employed each term. Results indicated that subject terms were

used more frequently than any other categories of terms (86 percent of queries),

with generic subject terms used for 57 percent of these and specific subject terms

(particular names, locations, etc.) used for 42 percent of these. Subject terms were

followed in frequency by terms relating to time and place, with relatively few

requests for items by genre, visual terms, format, or creator/provenance.

Chen (2001) investigated user queries and image retrieval methods in the

field of art history in two similar studies. In the user query study, Chen collected

queries from 29 art history undergraduates in pre- and post-search questionnaires

and mapped them into the features previously identified by Enser and McGregor

(1993), Jörgensen (1996), and Fidel (1997). The study found high degrees of

matching by three reviewers to Enser and McGregor’s categories of Unique and

Nonunique and Jörgensen’s classes of Location, Literal Object, Art Historical

Information, People, and People-Related Attributes, but with some need for further

refinement. From the results, Chen proposed adding more details to Enser and

McGregor’s four categories and regrouping Jörgensen’s 12 classes of image attributes

(Chen 2001).

In the image retrieval study, Chen investigated image retrieval methods

employed by 26 art history undergraduates, using Jörgensen’s three image retrieval

tasks and Enser’s four models of image retrieval (1995). Students received pre- and

post-search questionnaires and participated in post-search interviews. Chen found a

“significant difference between the mean number of search keywords or phrases

participants planned to use and the mean number of search keywords or phrases

they actually used” along with a significant relationship between search success and

the percentage of search keywords or phrases drawn from the topic title or

description that students had chosen (Chen 2001).

Integrating elements of cognitive psychology, library science, art, and

computer technology, Jaimes, Benitez, Jörgensen and Chang (2001) examined the

cognitive processes of the user and incorporated aspects of content-based image

retrieval as a means of integrating some aspects of the concept-based and content-

based approaches and developed a new conceptual framework based on Jörgensen’s

earlier work. Classifying visual attributes into a “Pyramid” of four syntactic levels

(type/technique, global distribution, local structure, and composition) and six

semantic levels (generic, specific, and abstract levels of both object and scene). Two

groups of participants, naive users with no prior training in indexing of visual

information and indexers (trained in indexing visual information) produced

descriptions of a random group of about 700 images automatically retrieved from the

Internet and 12 color photographs of current news events in different parts of the

world with brief textual descriptions from an Internet newsgroup.

TABLE 4. Jaimes, Benitez, Jörgensen and Chang Mapping of image attributes to the pyramid levels as generated by different methods: experiment I (spontaneous, and retrieval-oriented); II (author, and caption); and III (indexing).Experiment I I II II IIIPyramid Level

Spontaneous

Retrieval

Author Caption Indexing

I Type/Technique

X X X

II Global X X XIII Local Structure

X X X X

IV Global X X XV Generic Objects

X X X X X

VI Specific Objects

X X X X X

VII Abstract Objects

X X X X X

VIII Generic Scene

X X X X X

IX Specific Scene

X X X X X

X Abstract Scene

X X X X X

Results showed that the preciseness of even naïve users’ descriptions improved when they were instructed to describe images in a way that would aid in retrieval,

suggesting the need for greater structure in image indexing. The study supported the 10-level conceptual structure represented by the pyramid, with all attributes classified at some level of the Pyramid. Although they found some variation in the distribution of attributes among the levels of the Pyramid depending on whether the participants were naïve users or indexers, and depending on the task (describing, indexing, retrieval), “the researchers found no instances where an attribute could not be accommodated by a level of the Pyramid.” In addition, they found that the Pyramid assisted indexers “by making explicit specific, generic, and abstract levels of description. Especially useful is the recursive nature of the Pyramid, which permits associations among objects and attributes (e.g., can be applied to a scene, object, section of an image, etc.)” (Jaimes, Benitez, Jorgensen, and Chang, 2001).

TABLE 5. Sample image indexing record terms from Jaimes, Benitez,Jorgensen, and Chang’s indexing template mapped to the pyramid levelsImage term Pyramid levelPainting Type/techniqueOil Type/techniqueCracked Global distributionRed, white Local structureBackground Local structureRectangle Local structureCenter Local structureEye level Global compositionFlag Generic objectHistorical landscape Generic scenePatriotism Abstract objectPride Abstract scene

A new thesaurus for indexing of images across diverse domains has been

proposed by Jörgensen (2004) and promises to reduce the remaining difficulties and

inconsistencies inherent in indexing and retrieving images.

Examining the concept of relevance, integrating elements of cognitive

science, and seeking to synthesize elements of the concept-based and content-based

approaches, Greisdorf and O’Connor (2002) studied how users assign pre-determined

query terms to retrieved images and how they cognitively assess meaningful terms

after retrieval. They found that users’ perceptions of the relevance of retrieved

images “may arise from descriptions of objects and content-based elements that are

not evident or not even present in the image” and that affective or emotion-based

query terms are a significant category for image retrieval. They proposed that image

retrieval efficiency in systems using traditional indexing methods and the more

technological content extraction algorithms could be enhanced by the development

of a system for capturing human interpretations derived from their cognitive

engagements with viewed images.

To explore users’ perception of relevance to the study of image searching in a

photographic archive on the Internet, Choi and Rasmussen used the Library of

Congress’ American Memory photographic archive as the source of images for a

study of 38 faculty and graduate students of American history and investigated

criteria applied by image users when judging the relevance of images for their

research in American history. Building on work by Barry and Schamber (1995) and

others that examined relevance criteria for textual and non-textual documents, Choi

and Rasmussen explored the extent to which these criteria can be applied to visual

documents and the extent to which new and different criteria apply. Subjects

completed a pre-test questionnaire, searched for images in American Memory,

completed a post-test questionnaire, and participated in interviews. An analysis of

38 natural language statements, 185 search terms provided by study participants,

and 219 descriptors noted by participants in relevant records to determine the

distribution of the subject content of the queries found that over half of the queries

could be identified as “general/nameable needs” and most described images in terms

of person, thing, event, or condition, depending on the location or time of interest

(Choi and Rasmussen, 2003).

Applying quantitative statistical methods to analyze the importance of

relevance criteria and determine how much each criterion affected users’ judgments,

they found that the user’s perception of how the images related to his or her topic

still remained the most important factor throughout the information-seeking stages,

but that users also decided on retrieved items according to other criteria. The most

important of these were image quality and clarity, but also included title, date,

subject descriptors, and notes provided (Choi and Rasmussen 2002).

INTERNET IMAGE SEARCHING

The rapid increase in the use of the Internet for image searches has led to

studies that explore web searches for images. The first major study of an Internet

image search engine examined 1,025,908 sequential queries from 211,058 users on

the Excite web search engine, which included a subset of 22,149 image queries by

9,855 users for both still and moving images (Goodrum and Spink, 1999; Goodrum

and Spink, 2001). They found, on average, 2.36 image queries per user, each of

which had an average of 3.74 search terms. Results showed a large number of

unique terms in the queries, with the most frequently occurring terms appearing less

than 10 percent of the time and most terms occurring only once. As might be

expected, a great number of the most frequently occurring terms were for images of

a sexual nature or those relating to celebrities, though to an even greater extent

than one might anticipate, with almost all of the most frequently occurring terms

falling into these categories.

Contrasting this with the work of Enser (1995), who examined written queries

for non-digital images, they discussed implications for development of models for

visual information retrieval and design of web search engines that would allow for

better automatic retrieval of images. At the time of the study, fewer image search

engines existed than currently exist. Goodrum and Spink noted that since they

began their project, Excite had added tools for still and moving image searches, but

pointed out that the continuing problems associated with providing textual queries

for visual information still existed and that content-based image retrieval still

provided retrieval only by color and shape, with little ability to retrieve images using

higher-level attributes.

CONTENT-BASED IMAGE RETRIEVAL

Recent studies demonstrate the growing popularity of digital images and

research on content-based image retrieval and digital image browsers. Content-

based image retrieval emphasizes the automatic retrieval of images based on

characteristics of the image such as color, shape, and texture. As manual indexing

and retrieval of images can be a time-consuming process, content-based image

retrieval has been hailed as a quicker and more efficient means of image retrieval

(Gudivada and Raghavan, 1995; Gupta and Jain, 1997; Gevers and Smeulders, 1998;

Gevers and Smeulders, 2000; Smeulders, Worring, Santini, Gupta and Jain, 2000;

Zachary, Iyengar, and Barhen, 2001; Zachary and Iyengar, 2001).

Most studies of content-based image retrieval are systems oriented, focusing

on the highly technical aspects of retrieval from pixel-level details of digital images

and include pages of algorithms and discussion of color histograms. Although CBIR

holds much promise for image retrieval, studies examining its usefulness in real user

needs and contexts have only begun in recent years. Three studies that have

specifically applied CBIR to studies of image users (Bailey and Graham, 2000;

Markkula and Sormunen, 2001 and 2004; and Yang, 2004) are discussed below.

INDEXING REVISITED: A CALL FOR A NEW PARADIGM

Spurred by the rapid growth of digital imaging and the use of images in

general, new examinations of image indexing by Roberts (2001) and Graham (2001)

urged development of new indexing methods for use in image databases. Roberts

described ways in which historians often consider images as secondary to and

inferior to the text and provided instances of inadequate citations of images that lead

to difficulty in retrieving these images for later use. Discussing the efforts of art

historians and cataloguers to develop the infrastructure needed to properly index

works of art, Roberts argued that image databases need to index not merely basic

descriptions of art but also concepts about what the images represent or have

represented throughout history (Roberts 2001).

Graham proposed the development of a new paradigm for the cataloguing of

images that would integrate traditional concept-based indexing with developments in

content-based image retrieval, considering and allowing for the needs of varied

groups of users, including large numbers of novice users (Graham 2001). The need

for this new paradigm was demonstrated by a study (Bailey and Graham, 2000) in

which 56 art historians contacted through the Association of Art Historians (UK) and

65 art historians who answered an online survey were asked about the effect of

digital images on their working methods and research interests. While a majority of

both groups (64.5 percent) indicated that it had not changed their research interests,

35.4 percent of the online survey group indicated that it had. The 47.1 percent who

indicated that digital images had affected their working methods, included, perhaps

not surprisingly, 61% of the online survey group and while not a large percentage,

still included 30.4% of the non-online group.

Table 6. Bailey and Graham’s study of the effect of digital images on art historians and their work

Access to digital images has AAH survey Online survey Grand totals(a) affected work methods: Yes 17 30.4

%40 61.5% 57 47.1%

No 31 55.4%

18 27.7% 49 40.5%

Not sure 5 8.9% 7 10.8% 12 9.9%(b) affected research interests: Yes 7 12.5

%23 35.4% 30 24.8%

No 42 75% 36 55.4% 78 64.5% Not sure 3 5.4% 6 9.2% 9 7.4%No answer to the question 1 1.8% 0 0.0% 1 0.8%

Source: Compare and Contrast Survey, 1999/2000, University of Northumbria, UK

DIGITAL ARCHIVES: QUERIES AND BROWSING

Digital newspaper image archives were studied by Ørnager (1995, 1997) and

Markkula and Sormunen (1999, 2001, 2004). Ørnager sought to determine future

rules for image indexing by identifying the categories that users employed for subject

analysis, identifying the kinds of requests users made from the archives, and

classifying the users into particular categories based on the kinds of queries they

posed. Findings showed that results could be used to formulate rules for indexing

images and establish a user group typology, with the results of word association tests

serving as the foundation for an image user model based on word clusters. The

typology included five categories: (a) “specific inquirer,” who asks specific questions

due to already having a particular image in mind; (b) “general inquirer,” who asks

broad questions as (s)he wants to choose images with little assistance from the

archives staff; (c) “storyteller inquirer” who tells archivists about his or her research

and seeks their suggestions; (d) “story giver inquirer,” who not only tells about his or

her research, but also turns over responsibility for image selection to archivists,

seeing them as the experts; and “fill in space inquirer,” who seeks merely any

graphic of a particular size to fill empty space.

Markkula and Sormunen (1999, 2001, 2004) compared and contrasted

journalists’ photo needs, queries, and information-seeking behaviors and found that

the journalists were able to retrieve images effectively and satisfactorily for less

complex and more specific requests, such as for images of particular persons, but

had greater difficulty in retrieving images for more complex and/or more general

requests, such as those for generic objects and themes. They also found that,

although journalists employed browsing as their main search strategy, the system did

not provide good support for browsing. Results indicated that browsing is preferable

to querying for the following reasons:

1. Criteria used in selecting photos are difficult to express by words but are easily applied when the photo is seen.

2. Non-professional searchers have difficulties in formulating focused queries. Browsing is a method to compensate for these difficulties.

3. Photo selection criteria depend on a particular work situation. These aspects are difficult to predict in indexing.

4. Browsing of thumbnail images is quite efficient and the journalists feel comfortable with browsing (Markkula and Sormunen, 2001)

An earlier study had raised the possibility that content-based image retrieval

could solve the problem of browsing large query sets in text-based image retrieval

systems (Markkula and Sormunen, 1999). In a later study, they sought to determine

whether selecting one photograph of interest from a set could, using a CBIR

algorithm, lead to the identification of other visually similar images. Using a

database of 50,000 sample digital images from the newspaper archives, with

standard text retrieval and capability for browsing thumbnail images, ten journalists

were asked to select illustrations for sample articles. Results implied a “correlation

between the photograph similarities computed by the algorithms and the similarities

assessed by the users” (Markkula and Sormunen, 2001, p. 16). As in other CBIR

studies, results retrieved based on color and shape showed the most accuracy.

Determining that content-based retrieval based on color and features is

inadequate for journalists browsing in newspaper archives, Markkula and Sormunen

concluded that further development of concept-based, traditional textual indexes is

needed. As such indexes can be produced automatically from captions and other

descriptions, they suggested that archivists and librarians not spend scarce time

indexing these attributes, but instead should focus on developing improved user

interfaces and bringing together images that represent broad concepts or themes,

which would allow for more effective browsing (Markkula and Sormunen, 2004).

In another study that further examined the potential of content-based

methods for real-life users, Yang (2004) investigated and compared the information-

seeking behavior and performances of two content-based image retrieval methods:

query by example (QBE) and self-organizing image browsing map (SIM). Three sets

of images were given to subjects who had received a training session in which they

became familiar with the QBE and SIM systems. Participants were asked to search by

pre-determined textual descriptions and for pre-selected target images. Yang found

that the image browsing map provided more support for information seeking and led

to better performance in image searching than did the query by example method.

IMAGE BROWSING AND VISUALIZATION

As indicated above, image browsers and visualization may provide a solution

to the unique problems associated with indexing and retrieving images.

Developments in digital imaging and web design technology allow image indexers

and users to circumvent the issues associated with indexing and retrieving images

described previously. Reducing the need to include all information associated with a

given image in a catalog record or caption or find ways of retrieving images based on

physical characteristics of images, terms associated with image sets can be included

on web pages and sets of related images can be grouped together as thumbnail

images through which users can browse, clicking on images of interest to view larger

images.

Several recent studies, building on Bates’ (1989) discussion of browsing and

berry-picking, suggest that browsing may be the most effective search strategy for

digital image users, particularly when visually similar images are clustered and/or

combined with captioned descriptions (Ørnager, 1995 and 1997; Markkula and

Sormunen, 1998, 2000, and 2001; Rodden, Basalaj, Sinclair and Wood, 1999 and

2001; and Yang, 2004), and/or include zoomable user interfaces (ZUIs) (Combs and

Bederson, 1999).

Testing the idea that allowing users to browse through visually similar images

could enhance retrieval, Rodden, Basalaj, Sinclair and Wood (2001) sought to

determine whether organizing images by similarity would enhance browsing. As

query-based retrieval using purely visual measures, as in CBIR, had, as of that point,

only limited success, they began with the expectation that many image users would

begin their searches by entering textual queries or navigating hierarchical

arrangements of categories and then browsing the resulting set of thumbnails.

Eighteen designers were given 100 thumbnail images of particular cities and

asked to choose from them three photos to illustrate “destination guide” articles for a

new independent travel web site. The thumbnails were arranged by visual similarity

(color and general appearance) or by category (images of midtown Manhattan,

Brooklyn Bridge, etc.).

The first experiment was designed to determine whether designers would find

either of the similarity-based arrangements (visual or captioned) useful for selecting

images and also whether it was helpful to have both arrangements available. It was

expected that both the caption-based and the visual arrangements would be

regarded as useful in their own right and also when used in combination. Two-thirds

of the participants found the caption-based arrangement useful, but participants

disagreed on the usefulness of the visual arrangement, with some finding it useful,

but many finding it difficult to use.

ZOOMABLE USER INTERFACES (ZUIs)

Zoomable user interfaces (ZUIs) have been proposed as a means of

enhancing users’ browsing experience and improving their ability to find and retrieve

images of interest (Combs and Bederson, 1999). Building on Shneiderman’s (1987)

work on information visualization, Combs and Bederson examined whether zooming

improves browsing. Thirty participants used four different image browsers to find

and retrieve images. The first of the two simultaneous experiments used between-

subject testing, in which participants were randomly assigned one of four browsers

and instructed to browse through each of three image sets and the second used

within-subject testing. Results found that “the zoomable image browser as well as

the traditional 2D grid of thumbnails works best for performance time and user

satisfaction” but one curious finding was that, despite given training in zooming,

roughly half of the participants did not take advantage of this capability. (Combs and

Bederson 1999). Suggestions from users included requests for “the ability to group

images in clusters by content” and for a means of searching for the target image

rather than browsing for it, perhaps because of the large (up to 225) number of

images on the screen.

A STUDY OF STUDIES

Chu (2001) conducted a statistical analysis of data relating to research in

concept-based and content-based image retrieval. Using SciSearch and Social

SciSearch, Chu searched for articles on image indexing retrieval and ranked them by

author and cited author to find the most prolific and highly cited authors. Results

indicated, perhaps not surprisingly, that researchers from the field of computer

science were most closely associated with research in the content-based domain,

while researchers in the field of library and information science tended to emphasize

research in concept-based image retrieval. Despite the differences between the two

groups, Chu found evidence of collaboration with, or at least citation of, authors from

other specialties and recommended the promotion and encouragement of further

collaboration and perhaps some integration of the two approaches.

CONCLUSION

While much research remains to be done, it appears that the most viable solutions to

image indexing problems have indeed come and will continue to come from

synthesizing new methods of content-based image retrieval with improvements to

traditional concept-based indexing (Chen and Rasmussen, 1999; Enser, 2000; Jaimes,

Benitez, Jörgensen, and Chang, 2001; Markkula and Sormunen, 2001; Yang, 2004) or

image browsing approaches (Markkula and Sormunen, 2004, Yang, 2004, Bederson,

2005).

Research relating to image users and image retrieval, which began with

studies of image indexing and interindexer consistency and a handful of user studies,

has expanded into a growing subfield of both library science and computer science

that includes studies on digital imaging, automatic image retrieval, and image

searching on the Internet. In recent years, several scholars have pointed to the need

to integrate the previously disparate camps of those who approach image indexing

from the concept-based perspective (more often library science people) and those

who approach it from the content-based perspective (more often computer science

people) as a means of finding a workable solution to the issues associated with

improving access to digitized images on the web.

Regardless of their focus, most studies indicate that image users have unique

needs and often have difficulty retrieving the images that they need through

traditional means of research and retrieval. They also indicate that users requesting

images of people and places have the best chance of finding appropriate images,

while those requesting images of more ambiguous subjects often have difficulty.

Complicating matters is the difficulty that cataloguers often have in assigning subject

terms to images, because images are more likely than textual materials to be

interpreted by different persons in different ways. Recent developments in content-

based image retrieval promise to provide solutions to the time-consuming and

inconsistent means of indexing images manually, but as of yet these automatic

means of image retrieval have been successful only in retrieving images by “low-

level” attributes such as color and, to a lesser extent, shape.

Clearly, recent years have seen an ever-greater emphasis on the role of the

user and on developing new means of indexing and image display based on the

users’ needs, perhaps influenced by Dervin’s concepts of sense-making (Dervin

1976) and Belkin’s study of the user’s anomalous states of knowledge and designing

systems to fit the user, rather than attempting to make the user conform to the

system (Belkin 1980).

Image browsing seems a promising solution to some problems relating to

image indexing and improving image retrieval for users. Computer operating

systems now allow librarians to arrange digital images in folders on their servers and

examine all digital images as thumbnails. Images may be displayed as thumbnails

on web pages, which allows users to find them through keyword searches, browse

through them, and enlarge images of interest.

These browsing options have provided a partial solution to the problem of

image searching and retrieval for onsite and offsite users. Even when the original

images have been catalogued using traditional means, librarians and users can

retrieve digital surrogates for viewing with minimal effort. Rather than paging items

from several different locations and making reference photocopies, librarians can

browse scanned images at the reference desk, display images for in-house users, and

e-mail low-resolution reference images to offsite users. Images not yet scanned

must, of course, be paged in the traditional way and the ever-present time and

funding constraints may prevent scanning all materials, but a slow yet steady

scanning of images (even if only for user requests) can, over a period of years, help a

library to develop a digital library of scans that saves much time and work in the long

run. For online users, the ability to search through hundreds of thumbnails on web

pages that include captioned keywords to facilitate their retrieval from image search

engines reduces the need for indexers to try to anticipate users’ query terms and the

need for users to try to guess at the terms under which images have been indexed.

A digital image browsing program developed by Bederson and presented at

the NFAIS 2005 conference allows users to browse hundreds of thumbnail images

and zoom in using the concept of zoomable user interfaces described above and

select images of interest (Bederson 2001, Bederson 2005). Bederson’s International

Children’s Digital Library (ICDL), also demonstrated at the conference, provides the

ability to search books not only by traditional text-based means, but also by such

attributes as color of book covers and the age range of those for whom the books are

intended (Druin, Bederson, Hourcade, Sherman, Revelle, Platner, & Weng, 2001;

Bederson 2005).

Reading today the words of an art historian who participated in the Getty

study (Bakewell, Beeman, Reese, and Schmidt, 1988), one realizes the tremendous

technological advances in image indexing and retrieval that have been made over

the past two decades:

“I would like access to visual traditions…and to me that means electronic and computerized, so that you can slice the pie there and get different permutations and combinations [making] images accessible according to iconography, time, region, artist, and owner… The issue is not just indexing; it’s accessibility. And that means articulating indices, giving you different ways of organizing material and slicing it, and also efficiency of access” (Bakewell, Beeman, Reese, and Schmidt, 1988, p. 51).

One hopes that this foresighted scholar lived to see digital imaging and

database developments such as ARTSTOR.

Recent years have seen calls for syntheses of traditional concept-based

indexing methods with content-based image retrieval with an emphasis on the need

for cooperation between library and information science professionals, who have

traditionally favored the concept-based approach, and information systems

professionals who are at the forefront of developing the content-based and browsing

methods. By combining their interests, efforts, and approaches, they can build on

the advances that have been made and continue to find better means of improving

users’ abilities to find and retrieve images.

REFERENCES

Armitage, L. and Enser, P. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287-299.

Bailey, C. and Graham, M. (2000). The corpus and the art historian. CIHA London: Thirtieth International Congress of the History of Art [Electronic resource]. http://www.unites.uqam.ca/AHWA/Meetings/2000.CIHA/Bailey.html, viewed March 3, 2005.

Bakewell, E., Beeman, W., Reese, C., and Schmitt, M. (1988). Object, image, inquiry: the art historian at work: report on a collaborative study by the Getty Art History Information Program (AHIP) and the Institute for Research in Information and Scholarship (IRIS), Brown University. Santa Monica, CA: The Getty Art History Information Program.

Bates, M. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.

Bederson, B. (2001). PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps. Proceedings of the 14th annual ACM symposium on user interface software and technology, 71-80.

Bederson, B. (2005). Making sense of search results: creating effective visualizations. Presentation at NFAIS Annual Conference, March 1, 2005

Belkin, N. (1980). Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133-143.

Besser, H. (1990). Visual access to visual images: the UC Berkeley image database project. Library Trends, 38(4), 787-798

Besser, H. and Trant, J. (1995). Introduction to imaging: issues in constructing an image database. Santa Monica, CA: The Getty Art History Information Program.

Brilliant, R. How an art historian connects objects and information. Library Trends, 37, 120-129.

Chen, Hsin-liang (2001). An analysis of image retrieval tasks in the field of art history [Electronic version]. Information Processing and Management, 37, 701-720.

Chen, Hsin-liang (2001). An analysis of image queries in the field of art history [Electronic version]. Journal of the American Society for Information Science and Technology, 52(3), 260-273.

Chen, H. and Rasmussen, E. (1999). Intellectual access to images [Electronic version]. Library Trends, 48(2), 291-302.

Choi, Youngok and Edie M. Rasmussen (2002). Users' relevance criteria in image retrieval in American history [Electronic version]. Information Processing and Management, 38, 695-726.

http://www.unites.uqam.ca/AHWA/Meetings/2000.CIHA/Bailey.html

Choi, Youngok and Edie M. Rasmussen (2003). Searching for images: the analysis of users' queries for image retrieval in American history [Electronic version]. Journal of the American Society for Information Science and Technology, 54(60), 498-511.

Collins, Karen (1998). Providing subject access to images: a study of user queries.The American Archivist, 61, 36-55.

Dervin, B. (1998). Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of Knowledge Management, 2 (2), 36-46.

Druin, A., Bederson, B. B., Hourcade, J. P., Sherman, L., Revelle, G., Platner, M., & Weng, S. (2001). Designing a digital library for young children: an intergenerational Partnership. Proceedings of Joint Conference on Digital Libraries, 398-405.

Enser, P. Visual image retrieval: seeking the alliance of concept-based and content-based paradigms [Electronic version]. Journal of Information Science, 26(4), 199-210.

Enser, P. and McGregor, C. (1993). Analysis of visual information retrieval queries. Report on Project G16412 to the British Library Research and Development Department. London: British Library.

Enser, P. (2000). Visual image retrieval: Seeking the alliance of concept-based and content-based paradigms [Electronic version]. Journal of Information Science, 26(4), 199-210.

Fidel, R. (1994). User-centered indexing [Electronic version]. Journal of the American Society for Information Science, 45(8), 572-576.

Gevers, T. and Smeulders, A. (1998). Image retrieval by multi-scale illumination invariant indexing [Electronic version]. Lecture Notes in Computer Science, 1464, 96-108.

Gevers, T. and Smeulders, A. (2000). Pictoseek: Combining color and shape invariant features for image retrieval [Electronic version]. IEEE Transactions on Image Processing, 9(1), 102-119

Goodrum, Abby and Spink, Amanda (2001). Image searching on the Excite websearch engine [Electronic version]. Information Processing and Management, 37(2), 295-311.

Gordon, A. (2001). Browsing image collections with representations of common-sense activities. Journal of the American Society for Information Science and Technology, 52(11), 925-929.

Graham, M. (2001). The cataloguing and indexing of images: time for a new paradigm? Art Libraries Journal, 26(1), 22-37.

Greisdorf, Howard and O’Connor (2002). Modelling what users see when they look at images: a cognitive viewpoint [Electronic version]. Journal of Documentation, 58(1), 6-29.

Gudivada, V. and Raghavan, V. (1995). Content-based image retrieval systems. Computer 28(9), 18-22.

Gupta, A., and Jain, R. (1997). Visual information retrieval. Communications of the ACM, 40(5), 70-79

Hastings, Samantha K. (1995). Query categories in a study of intellectual access todigitized art images. Proceedings of the 58th Annual Meeting of the American Society for Information Science, 32, 3-8.

Hastings, Samantha K. (1999). Evaluation of image retrieval systems: role of user feedback [Electronic version]. Library Trends, 48(2), 438-452.

Jaimes, A., Benitez, A., Jörgensen, C., and Chang, S. A conceptual framework and research for classifying visual descriptors. Journal of the American Society for Information Science, 52 (11), 938-947

Jörgensen, C. (1996). Testing an image description template. Proceedings of the 59th Annual Meeting of the American Society for Information Science, 209-213.

Jörgensen, Corinne (1998). Attributes of images in describing tasks [Electronic version]. Information Processing and Management, 34(2/3), 161-174.

Markey, K. (1984). Interindexer consistency tests: a literature review and report of a test of consistency in indexing visual materials. Library and Information Science Research, 6, 155-157.

Markkula, M. and Sormunen, E. (2001). A test collection for the evaluation of content-based image retrieval algorithms: a user and task-based approach [Electronic version]. Information Retrieval, 4(3/4), 275-294.

Markkula, Marjo and Sormunen, Eero (2004). End-user searching challenges indexing practices in the digital newspaper photo archive [Electronic version]. Information Retrieval, 1, 259-285.

O'Connor, Brian, O’Connor, Mary K. and Abbas, June M. (1999). User reactions as access mechanism: an exploration based on captions for images [Electronic version]. Journal of the American Society for Information Science, 50(8), 681-697.

Ørnager, Susanne (1995). The newspaper image database: empirical supported analysis of users’ typology and word association clusters [Electronic version]. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 212-218.

Ørnager, S. (1997). Image retrieval: theoretical analysis and empirical user studies on accessing information in images. Proceedings of the 60th ASIS Annual Meeting. ASIS '97, Washington, D.C., Information Today, Inc., 202-210.

Panofsky, E. (1962). Chapter I: Introductory. In Studies in iconology: humanistic themes in the art of the Renaissance.

Roberts, H. (2001). A picture is worth a thousand words: art indexing in electronic databases [Electronic version]. Journal of the American Society for Information Science and Technology, 52(11), 911-916.

Rodden, K. (2002). Evaluating similarity-based visualisations as interfaces for imagebrowsing. Technical Report no. 543, University of Cambridge Computer Laboratory.

Rodden, K., Basalaj, W., Sinclair, D., and Wood, K. (2001). Does organisation by similarity assist image browsing? Proceedings of the SIGCHI conference on Human factors in computing systems, 3(1), 190-197.

Shatford, S. (1984). Describing a picture: a thousand words are seldom cost-effective. Cataloguing and Classification Quarterly, 4(4), 13-30.

Shatford, S. (1986). Analyzing the subject of a picture: a theoretical approach. Cataloguing and Classification Quarterly, 6(3), 39-62.

Shatford Layne, S. (1994). Some issues in the indexing of images [Electronic version]. Journal of the American Society for Information Science, 45(8), 583-588.

Shneiderman, B. Designing the user interface: strategies for effective human-computer interaction. Reading, Mass.: Addison-Wesley, 1987.

Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349-1380.

Stam, D. (1984). How art historians look for information. Art Documentation 3(4), 117-119.

Stam, D. (1985). Remembrance of things past: mental processes of art reference librarians. Art Documentation, 4(4), 139-141.

Stam, D. (1989) Tracking art historians: on their information needs and information seeking behavior. Art Libraries Journal,14(3), 13-16.

Svenonius, E. (1994). Access to nonbook materials: the limits of subject indexing for visual and aural languages [Electronic version]. Journal of the American Society for Information Science, 45(8), 600-606.

Yang, Christopher C. (2004). Content-based image retrieval: a comparison between query by example and image browsing map approaches. Journal of Information Science, 30(3), 254-267.

Zachary, J. and Iyengar, S. (2001). Information theoretic similarity measures for content based image retrieval. Journal of the American Society for Information Science and Technology, 52(10), 856-867.

Zachary, J., Iyengar, S., and Barhen, J. (2001). Content based image retrieval and information theory: a general approach. Journal of the American Society for Information Science and Technology, 52(10), 840-852.

Lutz, Valerie-Anneval22/eport/papers/INFO511R… · Web viewYang, Christopher C. (2004)....

Documents

Transcript of Lutz, Valerie-Anneval22/eport/papers/INFO511R… · Web viewYang, Christopher C. (2004)....