1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

99
1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    3

Transcript of 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

Page 1: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

1

SIMS 247: Information Visualization and PresentationMarti Hearst

Nov 2 and Nov 7, 2005 

 

Page 2: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

2

Outline

• Why Text is Tough

• Single-document Visualization

• Visualizing Concept Spaces– Clusters

– Category Hierarchies

• Visualizing Query Specifications

• Visualizing Retrieval Results

• Usability Study Meta-Analysis

Page 3: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

3

Why Visualize Text?

• To help with Information Retrieval– give an overview of a collection– show user what aspects of their interests are

present in a collection– help user understand why documents retrieved as a

result of a query

• Text Data Mining– Mainly clustering & nodes-and-links

• Software Engineering– not really text, but has some similar properties

Page 4: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

4

Why Text is Tough• Text is not pre-attentive• Text consists of abstract concepts

– which are difficult to visualize

• Text represents similar concepts in many different ways– space ship, flying saucer, UFO, figment of imagination

• Text has very high dimensionality– Tens or hundreds of thousands of features– Many subsets can be combined together

Page 5: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

5

Why Text is Tough

The Dog.

Page 6: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

6

Why Text is Tough

The Dog.

The dog cavorts.

The dog cavorted.

Page 7: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

7

Why Text is Tough

The man.

The man walks.

Page 8: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

8

Why Text is Tough

The man walks the cavorting dog.

So far, we can sort of show this in pictures.

Page 9: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

9

Why Text is Tough

As the man walks the cavorting dog, thoughtsarrive unbidden of the previous spring, so unlikethis one, in which walking was marching anddogs were baleful sentinals outside unjust halls.

How do we visualize this?

Page 10: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

10

Why Text is Tough

• Abstract concepts are difficult to visualize• Combinations of abstract concepts are even

more difficult to visualize– time– shades of meaning– social and psychological concepts– causal relationships

Page 11: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

11

Why Text is Tough

• Language only hints at meaning• Most meaning of text lies within our minds and

common understanding– “How much is that doggy in the window?”

• how much: social system of barter and trade (not the size of the dog)

• “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own

• “in the window” implies behind a store window, not really inside a window, requires notion of window shopping

Page 12: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

12

Why Text is Tough

• General categories have no standard ordering (nominal data)

• Categorization of documents by single topics misses important distinctions

• Consider an article about– NAFTA– The effects of NAFTA on truck manufacture– The effects of NAFTA on productivity of truck

manufacture in the neighboring cities of El Paso and Juarez

Page 13: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

13

Why Text is Tough

• Other issues about language– ambiguous (many different meanings for the same

words and phrases)– different combinations imply different meanings

Page 14: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

14

Why Text is Tough

• I saw Pathfinder on Mars with a telescope.

• Pathfinder photographed Mars.• The Pathfinder photograph mars our perception of

a lifeless planet.

• The Pathfinder photograph from Ford has arrived.• The Pathfinder forded the river without marring its

paint job.

Page 15: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

15

Why Text is Easy

• Text is highly redundant– When you have lots of it– Pretty much any simple technique can pull out

phrases that seem to characterize a document

• Instant summary:– Extract the most frequent words from a text– Remove the most common English words

Page 16: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

16

Guess the Text

478 said233 god201 father187 land181 jacob160 son157 joseph134 abraham121 earth119 man118 behold113 years104 wife101 name94 pharaoh

Page 17: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

17

Visualizing Individual Documents

• Early approach: SuperBook• Showing term occurences: TextArc

Page 18: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

18

Superbook (http://superbook.bellcore.com/SB)

Page 19: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

19

TextArc (www.textarc.org)

Page 20: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

20

SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick & Wills 95)

Page 21: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

21

Virtual Shakespeare (Small ‘96)

Page 22: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

22

Text Collection Overviews

• How can we show an overview of the contents of a text collection?– Show info external to the docs

• e.g., date, author, source, number of inlinks• does not show what they are about

– Show the meanings or topics in the docs• a list of titles• results of clustering words or documents• organize according to categories (next time)

Page 23: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

23

The Need to Group

• Interviews with lay users often reveal a desire for better organization of retrieval results

• Useful for suggesting where to look next– People prefer links over generating search terms– But only when the links are for what they want

• Three main approaches for text and images:– Group items according to pre-defined categories– Group items into automatically-created clusters– Group items according to common keywords

Ojakaar and Spool, Users Continue After Category Links, UIETips Newsletter, http://world.std.com/~uieweb/Articles/, 2001

Page 24: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

24

Categories

• Human-created– But often automatically assigned to items

• Arranged in hierarchy, network, or facets– Can assign multiple categories to items– Or place items within categories

• Usually restricted to a fixed set– So help reduce the space of concepts

• Intended to be readily understandable– To those who know the underlying domain– Provide a novice with a conceptual structure

• There are many already made up!• However, until recently, their use in interfaces has been

– Under-investigated– Not met their promise

Page 25: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

25

Clustering

• “The art of finding groups in data” – Kaufman and Rousseeuw

• Groups are formed according to associations and commonalities among the data’s features.– There are dozens of algorithms, more all the time– Most need a way of determining similarity or

difference between a pair of items– In text clustering, documents usually represented as

a vector of weighted features which are some transformation on the words

– Similarity between documents is a weighted measure of feature overlap

Page 26: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

26

Clustering• Potential benefits:

– Find the main themes in a set of documents• Potentially useful if the user wants a summary of the

main themes in the subcollection• Potentially harmful if the user is interested in less

dominant themes– More flexible than pre-defined categories

• There may be important themes that have not been anticipated

– Disambiguate ambiguous terms• ACL

– Clustering retrieved documents tends to group those relevant to a complex query together

Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96

Page 27: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

27

Scatter/Gather Clustering• Developed at PARC in the late 80’s/early 90’s• Top-down approach

– Start with k seeds (documents) to represent k clusters– Each document assigned to the cluster with the most similar seeds

• To choose the seeds: – Cluster in a bottom-up manner– Hierarchical agglomerative clustering

• Start with n documents, compare all by pairwise similarity, combine the two most similar documents to make a cluster

• Now compare both clusters and individual documents to find the most similar pair to combine

• Continue until k clusters remain• Use the centroid of each of these as seeds

– Centroid: average of the weighted vectors

• Can recluster a cluster to produce a hierarchy of clusters

Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, SIGIR 1992

Page 28: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

28

Sca

tter/

Gath

er

Page 29: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

29

Northern Light Web Search: Started out with clustering. Then integrated with categories. Then did not do web search and used only categories.

Page 30: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

30

Page 31: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

31

Page 32: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

32

Visualizing Clustering Results

• Use clustering to map the entire huge multidimensional document space into a huge number of small clusters.

• User dimension reduction and then project these onto a 2D/3D graphical representation

Page 33: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

33

Clustering Multi-Dimensional Document Space(image from Wise et al 95)

Page 34: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

34

Clustering Multi-Dimensional Document Space(image from Wise et al 95)

Page 35: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

35

Koh

on

en F

eatu

re M

aps

on

Text

(fro

m C

hen e

t al.,

JAS

IS 4

9(7

))

Page 36: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

36

Is it useful?

• 4 Clustering Visualization Usability Studies

Page 37: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

37

Clustering for Search Study 1

• This study compared– a system with 2D graphical clusters– a system with 3D graphical clusters– a system that shows textual clusters

• Novice users• Only textual clusters were helpful (and they

were difficult to use well)

Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive users. SDAIR’96

Page 38: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

38

Clustering Study 2: Kohonen Feature Maps

• Comparison: Kohonen Map and Yahoo• Task:

– “Window shop” for interesting home page– Repeat with other interface

• Results:– Starting with map could repeat in Yahoo (8/11)– Starting with Yahoo unable to repeat in map (2/14)

Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Page 39: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

39

Koh

on

en F

eatu

re M

aps

(Lin

92

, C

hen e

t al. 9

7)

Page 40: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

40

Study 2 (cont.)

• Participants liked:– Correspondence of region size to # documents– Overview (but also wanted zoom)– Ease of jumping from one topic to another – Multiple routes to topics– Use of category and subcategory labels

Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Page 41: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

41

Study 2 (cont.)

• Participants wanted:– hierarchical organization– other ordering of concepts (alphabetical)– integration of browsing and search– correspondence of color to meaning – more meaningful labels– labels at same level of abstraction– fit more labels in the given space– combined keyword and category search– multiple category assignment (sports+entertain)

• (These can all be addressed with faceted hierarchical categories)

Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. JASIS 49(7): 582-603 (1998)

Page 42: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

42

Clustering Study 3: NIRVEEach rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one another. Opening a cluster causes a projection that shows the titles.

Page 43: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

43

Study 3

This study compared:

– 3D graphical clusters– 2D graphical clusters– textual clusters

• 15 participants, between-subject design• Tasks

– Locate a particular document– Locate and mark a particular document– Locate a previously marked document– Locate all clusters that discuss some topic– List more frequently represented topics

Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.

Page 44: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

44

Study 3• Results (time to locate targets)

– Text clusters fastest– 2D next– 3D last– With practice (6 sessions) 2D neared text results; 3D still slower– Computer experts were just as fast with 3D

• Certain tasks equally fast with 2D & text– Find particular cluster– Find an already-marked document

• But anything involving text (e.g., find title) much faster with text.– Spatial location rotated, so users lost context

• Helpful viz features– Color coding (helped text too)– Relative vertical locations

Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.

Page 45: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

45

Clustering Study 4• Compared several

factors

• Findings:– Topic effects dominate

(this is a common finding)

– Strong difference in results based on spatial ability

– No difference between librarians and other people

– No evidence of usefulness for the cluster visualization

Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems, Swan, &Allan, SIGIR 1998.

Page 46: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

46

Summary:Visualizing for Search Using Clusters

• Huge 2D maps may be inappropriate focus for information retrieval – cannot see what the documents are about– space is difficult to browse for IR purposes– (tough to visualize abstract concepts)

• Perhaps more suited for pattern discovery and gist-like overviews

Page 47: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

47

Category Combinations

Let’s show categories instead of clusters

Page 48: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

48

DynaCat (Pratt, Hearst, & Fagan 99)

Page 49: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

49

DynaCat (Pratt 97)

• Decide on important question types in an advance– What are the adverse effects of drug D?– What is the prognosis for treatment T?

• Make use of MeSH categories• Retain only those types of categories known to

be useful for this type of query.

Page 50: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

50

DynaCat Study

• Design– Three queries– 24 cancer patients– Compared three interfaces

• ranked list, clusters, categories

• Results– Participants strongly preferred categories– Participants found more answers using categories– Participants took same amount of time with all three

interfaces

Page 51: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

51

MultiTrees (Furnas & Zacks ’94)

Page 52: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

52first page

Cat-a-Cone:Multiple Simultaneous Categories

• Key Ideas:– Separate documents from category labels– Show both simultaneously

• Link the two for iterative feedback• Distinguish between:

– Searching for Documents vs.– Searching for Categories

Page 53: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

Cat-a-Cone Interface

Page 54: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

54first page

Cat-a-Cone

• Catacomb: (definition 2b, online Websters)“A complex set of interrelated things”

• Makes use of earlier PARC work on 3D+animation:

Rooms Henderson and Card 86IV: Cone Tree Robertson, Card, Mackinlay 93Web Book Card, Robertson, York 96

Page 55: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

55first page

Collection

Retrieved Documents

searchsearch

CategoryHierarch

y

browsebrowsequery terms

Page 56: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

56first page

ConeTree for Category Labels

• Browse/explore category hierarchy– by search on label names– by growing/shrinking subtrees– by spinning subtrees

• Affordances– learn meaning via ancestors, siblings– disambiguate meanings– all cats simultaneously viewable

Page 57: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

57first page

Virtual Book for Result Sets

– Categories on Page (Retrieved Document) linked to Categories in Tree

– Flipping through Book Pages causes some Subtrees to Expand and Contract

– Most Subtrees remain unchanged

– Book can be Stored for later Re-Use

Page 58: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

58first page

Improvements over Standard Category Interfaces

Integrate category selection with Integrate category selection with viewing of categories viewing of categories

Show all categories + context Show all categories + context Show relationship of retrieved Show relationship of retrieved

documents to the category structuredocuments to the category structure But … do users understand and like the But … do users understand and like the

3D?3D?

Page 59: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

59

The FLAMENCO Project

• Basic idea similar to Cat-a-Cone• But use familiar HTML interaction to achieve

similar goals• Usability results are very strong for users who

care about the collection.

Page 60: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

60

Co-Citation Analysis• Has been around since the 50’s. (Small, Garfield, White & McCain)

• Used to identify core sets of– authors, journals, articles for particular fields – Not for general search

• Main Idea:– Find pairs of papers that cite third papers– Look for commonalitieis

• A nice demonstration by Eugene Garfield at: – http://165.123.33.33/eugene_garfield/papers/mapsciworld.html

Page 61: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

61

Co-citation analysis (From Garfield 98)

Page 62: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

62

Co-citation analysis (From Garfield 98)

Page 63: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

63

Co-citation analysis (From Garfield 98)

Page 64: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

64

Query Specification

Page 65: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

65

Command-Based Query Specification

• command attribute value connector …

– find pa shneiderman and tw user#

• What are the attribute names?• What are the command names?• What are allowable values?

Page 66: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

66

Form-Based Query Specification (Altavista)

Page 67: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

67

Form-Based Query Specification (Melvyl)

Page 68: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

68

Form-based Query Specification (Infoseek)

Page 69: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

69

Dir

ect

Man

ipula

tion

Spec.

VQ

UER

Y (

Jones

98)

Page 70: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

70

Menu-based Query Specification(Young & Shneiderman 93)

Page 71: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

71

Context

Page 72: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

72

Putting Results in Context• Visualizations of Query Term Distribution

– KWIC, TileBars, SeeSoft• Visualizing Shared Subsets of Query Terms

– InfoCrystal, VIBE, Lattice Views

• Table of Contents as Context– Superbook, Cha-Cha, DynaCat

• Organizing Results with Tables– Envision, SenseMaker

• Using Hyperlinks– WebCutter

Page 73: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

73

Putting Results in Context

• Interfaces should – give hints about the roles terms play in the collection– give hints about what will happen if various terms

are combined– show explicitly why documents are retrieved in

response to the query– summarize compactly the subset of interest

Page 74: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

74

KWIC (Keyword in Context)• An old standard, ignored until recently by internet search

engines– used in some intranet engines, e.g., Cha-Cha

Page 75: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

75

Highlighting Keywords in Context

Page 76: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

76

Page 77: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

77

Superbook (Remde et al. 89)• Hyper-media software manual• Functions:

– Word Lookup: – Table of Contents: Dynamic fisheye view of the

hierarchical topics list– Page of Text: show selected page and highlighted

search terms

• Hypertext features linking through search words rather than page links

Page 78: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

78

Display of Retrieval Results

Goal: minimize time/effort for deciding which documents to examine in detail

Idea: show the roles of the query terms in the retrieved documents, making use of document structure

Page 79: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

79

TileBars

Graphical Representation of Term Distribution and Overlap

Simultaneously Indicate:– relative document length– query term frequencies– query term distributions– query term overlap

Page 80: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

80

Page 81: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

81

Page 82: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

82

Exploiting Visual Properties

• Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83)

• Varying shades of gray show varying quantities better than color (Tufte ‘83)

• Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Page 83: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

83

Key Aspect: Faceted Queries

• Conjunct of disjuncts• Each disjunct is a concept

– osteoporosis, bone loss– prevention, cure– research, Mayo clinic, study

• User does not have to specify which are main topics, which are subtopics

• Ranking algorithm gives higher weight to overlap of topics– This kind of query works better at high-precision

queries than similarity search (Hearst 95)

Page 84: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

84

TileBars Summary

Preliminary User Studies users understand them

find them helpful in some situations, but probably slower than just reading titles

sometimes terms need to be disambiguated

Page 85: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

85

More Recent Attempts

• Analyzing retrieval results– KartOO http://www.kartoo.com/

– Grokker http://www.groxis.com/service/grok

Page 86: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

86

Page 87: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

87

Page 88: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

88

Page 89: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

89

Page 90: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

90

Query Term Subsets

Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents

Page 91: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

91

Term Occurrences in Results Sets

Show how often each query term occurs in retrieved documents– VIBE (Korfhage ‘91)– InfoCrystal (Spoerri ‘94)– Problems:

• can’t see overlap of terms within docs• quantities not represented graphically• more than 4 terms hard to handle• no help in selecting terms to begin with

Page 92: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

92

InfoCrystal (Spoerri 94)

Page 93: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

93

VIBE (Olson et al. 93, Korfhage 93)

Page 94: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

94

Term Occurrences in Results Sets

– Problems: • can’t see overlap of terms within docs• quantities not represented graphically• more than 4 terms hard to handle• no help in selecting terms to begin with

Page 95: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

95

DLITE (Cousins 97)

• Supporting the Information Seeking Process– UI to a digital library

• Direct manipulation interface • Workcenter approach

– experts create workcenters– lots of tools for one task – contents persistent

Page 96: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

96Slide by Shankar Raman

DLITE (Cousins 97)• Drag and Drop interface• Reify queries, sources, retrieval results• Animation to keep track of activity

Page 97: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

97

IR Infovis Meta-Analysis (Chen & Yu ’00)

• Goal– Find invariant underlying relations suggested

collectively by empirical findings from many different studies

• Procedure– Examine the literature of empirical infoviz studies

• 35 studies between 1991 and 2000• 27 focused on information retrieval tasks• But due to wide differences in the conduct of the

studies and the reporting of statistics, could use only 6 studies

Page 98: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

98

IR Infovis Meta-Analysis (Chen & Yu ’00)

• Conclusions:– IR Infoviz studies not reported in a standard format– Individual cognitive differences had the largest effect

• Especially on accuracy• Somewhat on efficiency

– Holding cognitive abilities constant, users did better with simpler visual-spatial interfaces

– The combined effect of visualization is not statistically significant

– Misc• Tilebars and Scatter/Gather are well-known enough to

not require citations!!

Page 99: 1 SIMS 247: Information Visualization and Presentation Marti Hearst Nov 2 and Nov 7, 2005.

99

Summary: Search and Doc Viz

• Visualization still has yet to prove its usefulness for search and documents

• Needs to integrate with more accurate dialogue systems