Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC...
-
date post
18-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC...
Marti HearstSchool of Information, UC Berkeley
Visualization in Text Analysis Problems
VAC Consortium MeetingStanford, May 24, 2006
Outline
Some Visualization Design Principles Illustrated with a new example
Why Text is Tricky to Visualize How to do good visualization design with
text while meeting analysts needs?Focus on Flexibility with ReproducibilityExamples from 4 different domains
What Makes for a Good Visualization?
Visually illuminates important aspects of the underlying data and domain.
Supports the users’ tasks (better than without the visualization).
Adheres to good design principles.
Example from Software EngineeringMarat Boshernitsan, UC Berkeley PhD Dissertation 2006
Problem: need to make complex changes throughout code. Example: convert from one API to another.
A Typical Solution Either requires programmers to understand
and manipulate abstract syntax trees … Or requires learning another programming
language (or both)!
First Attempt
Second Attempt
A Better Solution
Build on how programmers think about programming. Operate on the textual representation of code.
Users Operate on Familiar Visual Representation of Code
Context-and-Domain Sensitive Visual Cues
Lessons from this Example
User-centered DesignThis was the third attempt.First 2 attempts did not accurately reflect how
users think about the problem.Careful design of labels and interaction cuesVery intelligent backend, but user-activated.
Visually and interactively reflects how programmers think about programming.
What Makes for a Good Visualization for Analysts?
Visually illuminates important aspects of the underlying data and domain.
Supports the users’ tasks (better than without the visualization).
Adheres to good design principles.
Goals vs. Tasks
Analysts’ Goals:Understand current and past situationsPredict and anticipate future situations
Observations by Pirolli & Card ’05: Different analysts starting with people,
organizations, tasks, and time: predict coup likelihood understand bio-warfare threats understand relations within cartel
Goals vs. Tasks Analysts’ tasks:
ExploreExtractFilterLinkArrangeCompareHypothesize
(A combination of Foraging and Sensemaking) Should do the tasks only to support the goals.
Design Principles for Analysts
Experienced analysts notice what is missing or unexpected (Wright et al. ’06)
Thus consistency and reproducibility are important.
Design Principles for Analysts
Analysts must guard against confirmation bias. (Pirolli & Card ’05)
Thus it is important for analysts toBe able to easily arrange and re-arrange,View information flexibly from many angles,
While at the same time retaining consistency and reproducibility.
However … it’s hard to do this with text.
Working with Text Text is especially difficult to visualize
Very high dimensionality Tens to hundreds of thousands of features
Compositional Can be combined together in innumerable ways
Abstract And so difficult to visualize
Not pre-attentive Must foveate to read
Subtle Small differences matter
Unordered
Text Meaning is NOT pre-attentive
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXOCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOCGOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREMCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMGOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOCSUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXOCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
Why Text is Tough
Abstract concepts are difficult to visualize Combinations of abstract concepts are
even more difficult to visualize timeshades of meaningsocial and psychological conceptscausal relationships
Why Text is ToughThe dog..
Why Text is Tough
Why Text is ToughThe dog.
The dog cavorts.
The dog cavorted.
Why Text is Tough
Why Text is Tough
The man.
The man walks.
Why Text is Tough
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
Why Text is Tough
Why Text is Tough
As the man walks the cavorting dog, thoughtsarrive unbidden of the previous spring, so unlikethis one, in which walking was marching anddogs were baleful sentinels outside unjust halls.
How do we visualize this?
Why Text is Tough
Why Text is Tough
Language only hints at meaning Most meaning of text lies within our minds and
common understanding “How much is that doggy in the window?”
how much: social system of barter and trade (not the size of the dog)
“doggy” implies childlike, plaintive, probably cannot do the purchasing on their own
“in the window” implies behind a store window, not really inside a window, requires notion of window shopping
Why Text is Tough
Why Text is Tough
General categories have no standard ordering (nominal data)
Categorization of documents by single topics misses important distinctions
Consider an article aboutNAFTAThe effects of NAFTA on truck manufactureThe effects of NAFTA on productivity of truck
manufacture in the neighboring cities of El Paso and Juarez
Why Text is Tough
Why Text is Tough
Other issues about languageAmbiguous (many different meanings for
the same words and phrases)Same meaning implied by different
combinationsDifferent combinations imply different
meanings
Why Text is (Deceptively) Easy Text is easier when you have a lot of it
Web search is now usually conjunction Text has a lot of redundancy
A very simple algorithm can: Pull out “important” phrases Find “meaningfully” related words Create a “summary” from document Group “related” documents
Simple Text Analysis can Mislead Most frequent words
Biases towards concepts with unique identifiers.
From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01
Major Trends vs. Minor Discoveries
With text, it’s easy to extract and show the largest, main trends
But often we want the rare but unexpected and important event: Russian oil company example Schwarzenegger and Enron Cigarettes and kids Person on the periphery who is working stealthily to
influence things This is really difficult to solve!
Design Principles for Analysts
Experienced analysts notice what is missing or unexpected.
Analysts must guard against confirmation bias. Need to be able to easily arrange and re-arrange, View information flexibly from many angles,
While at the same time retaining consistency and reproducibility.
Interfaces should reflect the domain and data. How to achieve this with text collections?
Must transform text in understandable ways Must provide multiple, consistent views that nevertheless
allow for new discovery and insight
Why Emphasize Flexibility?
Can’t view representations of all the text content at once.
Instead, needs ways to flexibly navigate, group, organize, explore
See important pieces over time.
The Importance of Flexibility
Russell, Slaney, Qu, Houston ’05 The ease of viewing and manipulation in the system
strongly influenced the kind of analysis operations done.
Examples of Flexibility on Text Data
PaperLens (Conference proceedings) TAMKI (Customer service requests) Faceted Browsing (e-commerce)
FlamencoEbay ExpressFaThumb
TRIST and Sandbox (Analysts)
Flexible views
Infoviz 2004 contest Visualize 8 years of conference proceedings Tasks:
1. Static Overview of 10 years of Infovis 2. Characterize the research areas and their evolution3. The people in InfoVis4. Which papers/authors are most often referenced? 5. How many papers conducted a user study?
PaperLens integrated solution by Lee, Czerwinski, Robertson, Bederson
Uses graphical elements and brushing and linking to flexibly elicudate a collection’s contents. http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml
Flexibility in Foraging and Analysis
TAKMI, by Nasukawa and Nagano, ‘01 The system integrates:
Analysis tasks (customer service help) Content analysis Information Visualization
Flexibility in AnalysisTAKMI, by Nasukawa and Nagano, 2001
Documents containing “windows 98”
TAKMI, by Nasukawa and Nagano, 2001Flexibility in AnalysisTAKMI, by Nasukawa and Nagano, 2001
Patent documents containing “inkjet”, organized by entity and year
Flexibility in Category Navigation
Browsing Information Collections using (Hierarchical) Faceted Metadata
What are facets? Sets of categories, each of which describe
a different aspect of the objects in the collection.
Each of these can be hierarchical. (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
Time/Date TopicGeoRegion
Facet example: Recipes
Course
Main Course
CookingMethod
Stir-fry
Cuisine
Thai
Ingredient
Red Bell Pepper
Curry
Chicken
Nobel Prize Winners Collection
New Site: eBay Express
Is This Visualization?
Prior experience and other people’s attempts seem to suggest that fewer graphics and more text is better.
Details of layout, font and color contrast, label selection, and interaction make all the difference.
Earlier Variation on the Idea
Cat-a-Cone, 1997
Mobile Variation FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06 Well-received, but visualization part not looked at.
Flexibility in SenseMaking
DLITE by Cousins et al. ‘97 Sandbox by Wright et al. ‘06
Query History
Entities
Dimensions
TRIST (The Rapid Information Scanning Tool) is the work space for Information Retrieval and Information Triage.
Launch Queries
AnnotatedDocument Browser
Comparative Analysisof Answers and Content
User Defined andAutomatic
Categorization
Rapid Scanningwith Context
Linked Multi-Dimensional Views Speed Scanning
Flexibility in SensemakingTRIST, Jonkers et al 05
Flexibility for Sensemaking Support
Quick Emphasis of Items of
Importance.
Sandbox, Wright et al ‘06
Direct interactionwith Gestures(no dialog, no controls).
DynamicAnalytical Models.
Assertions with Proving/Disprovin
g Gates.
Communication-Centric Text
Email, conversations, blogsThe first thought is usually nodes and
linksDoesn’t have the desired flexibility
Some alternatives:The NetworkMultivariate Networks
Re-envisioning Networks Viewing people’s shared workplaces,
hometowns, schools over time. www.theyrule.net:
Re-envisioning Networks
First cut: Hastings, Snow, and King ’05
Re-envisioning Networks
Better version: Hastings, Snow, and King ’05
Re-envisioning Networks Wattenberg ’06 OLAP on directed labeled graphs
Network Flexibility
Martin Wattenberg, “Visual Exploration of Multivariate Graphs”
M FLocation A
Location B
Location C
Location D
Location E
Re-envisioning Networks
Idea: vary these ideas to apply to email and other communication text.
Summary:Text Viz Design Guidelines
An emphasis on flexible views on text data Emphasize brushing and linking using appropriate
visual cues. Interaction flow should guide the user but also be
flexible. Information structure should be consistent and
reproducible. Other guidelines:
Make text visible. Visual components should reflect the data and tasks.
Thank you!
www.sims.berkeley.edu/~hearst