CIDR 2009: Jeff Heer Keynote

68
Voyagers and Voyeurs Supporting Social Data Analysis Jeffrey Heer Computer Science Department Stanford University CIDR 2009 Monterey, CA 5 January 2009

description

This is a CIDR 2009 presentation. See http://infoblog.stanford.edu/ for more information and http://www-db.cs.wisc.edu/cidr/cidr2009/program.html for downloads.

Transcript of CIDR 2009: Jeff Heer Keynote

Page 1: CIDR 2009: Jeff Heer Keynote

Voyagers and VoyeursSupporting Social Data Analysis

Jeffrey HeerComputer Science DepartmentStanford University

CIDR 2009 – Monterey, CA5 January 2009

Page 2: CIDR 2009: Jeff Heer Keynote

A Tale of Two Visualizations

Page 3: CIDR 2009: Jeff Heer Keynote

vizster

Page 4: CIDR 2009: Jeff Heer Keynote

Observations

Groups spent more time in front of the visualization than individuals.

Friends encouraged each other to unearth relationships, probe community boundaries, and challenge reported information.

Social play resulted in informal analysis, often driven by story-telling of group histories.

Page 5: CIDR 2009: Jeff Heer Keynote

NameVoyagerThe Baby Name Voyager

Page 6: CIDR 2009: Jeff Heer Keynote
Page 7: CIDR 2009: Jeff Heer Keynote
Page 8: CIDR 2009: Jeff Heer Keynote
Page 9: CIDR 2009: Jeff Heer Keynote
Page 10: CIDR 2009: Jeff Heer Keynote

Social Data Analysis

Visual sensemaking can be social as well as cognitive.

Analysis of data coupled with social interpretation and deliberation.

How can user interfaces catalyze and support collaborative visual analysis?

Page 11: CIDR 2009: Jeff Heer Keynote

sense.usA Web Application for Collaborative Visualization of Demographic Data

Page 12: CIDR 2009: Jeff Heer Keynote
Page 13: CIDR 2009: Jeff Heer Keynote

Voyagers and Voyeurs

Complementary faces of analysis

Voyager – focus on visualized data

Active engagement with the data

Serendipitous comment discovery

Voyeur – focus on comment listings

Investigate others’ explorations

Find people and topics of interest

Catalyze new explorations

Page 14: CIDR 2009: Jeff Heer Keynote

Out of the Lab,Into the Wild

Page 15: CIDR 2009: Jeff Heer Keynote
Page 16: CIDR 2009: Jeff Heer Keynote
Page 17: CIDR 2009: Jeff Heer Keynote

Wikimapia.org

Page 18: CIDR 2009: Jeff Heer Keynote

DecisionSite posters

Spotfire Decision Site Posters

Page 19: CIDR 2009: Jeff Heer Keynote

Tableau Server

Page 20: CIDR 2009: Jeff Heer Keynote
Page 21: CIDR 2009: Jeff Heer Keynote

Many-Eyes

Page 22: CIDR 2009: Jeff Heer Keynote

Social Data Analysis In Action

1. Discussion and Debate

2. Text is Data, Too

3. Data Integrity and Cleaning

4. Integrating Data in Context

5. Pointing and Naming

For each, some thoughts on future directions.

I asked my colleagues: if you could give database researchers a wish list, what would it be?

Page 23: CIDR 2009: Jeff Heer Keynote

Discussion and Debate

Page 24: CIDR 2009: Jeff Heer Keynote
Page 25: CIDR 2009: Jeff Heer Keynote
Page 26: CIDR 2009: Jeff Heer Keynote
Page 27: CIDR 2009: Jeff Heer Keynote

Tableau X-Box / Quest Diag?

“Valley of Death”

Page 28: CIDR 2009: Jeff Heer Keynote
Page 29: CIDR 2009: Jeff Heer Keynote
Page 30: CIDR 2009: Jeff Heer Keynote
Page 31: CIDR 2009: Jeff Heer Keynote

Content Analysis of Comments

Feature prevalence from content analysis (min Cohen’s = .74)High co-occurrence of Observations, Questions, and Hypotheses

ServiceSense.us Many-Eyes

0 20 40 60 80

Percentage

0 20 40 60 80

Percentage

ObservationQuestion

HypothesisData Integrity

LinkingSocializing

System DesignTesting

TipsTo-Do

Affirmation

Page 32: CIDR 2009: Jeff Heer Keynote

Reduce the cost of synthesizing contributions

WANTED: Structured Conversation

Wikipedia: Shared Revisions NASA ClickWorkers: Statistics

Page 33: CIDR 2009: Jeff Heer Keynote

Reduce the cost of synthesizing contributions

Can we represent data, visualizations, and social activity in a unified data model?

WANTED: Structured Conversation

Page 34: CIDR 2009: Jeff Heer Keynote

Text is Data, Too

Page 35: CIDR 2009: Jeff Heer Keynote

Visualization Popularity

Over 1/3 of Many-Eyes visualizations use free text

ServiceMany-Eyes Swivel

0.0 0.1 0.2 0.3 0.4 0.5

Percentage

0.0 0.1 0.2 0.3 0.4 0.5

Percentage

Tag CloudBubble Graph

Word TreeBar Chart

MapsNetwork Diagram

TreemapMatrix Chart

Line GraphScatterplot

Stacked GraphPie Chart

Histogram

Page 36: CIDR 2009: Jeff Heer Keynote
Page 37: CIDR 2009: Jeff Heer Keynote

Alberto Gonzales

Page 38: CIDR 2009: Jeff Heer Keynote

WANTED: Better Tools for Text

Statistical Analysis of text (with ties to source!)

Entity Extraction

Aggregation and Comparison of texts

Get a “global” view of documents

We can do better than Tag Clouds (!?)

Use text analysis tools to enable analysis of structured conversation by the community.

Page 39: CIDR 2009: Jeff Heer Keynote

Data Integrity and Cleaning

Page 40: CIDR 2009: Jeff Heer Keynote

No cooks in 1910? … There may have been cooks then. But maybe not.

Page 41: CIDR 2009: Jeff Heer Keynote

The great postmaster scourge of 1910?

Or just a bugin the data?

Page 42: CIDR 2009: Jeff Heer Keynote
Page 43: CIDR 2009: Jeff Heer Keynote
Page 44: CIDR 2009: Jeff Heer Keynote

Content Analysis of Comments

16% of sense.us comments and 10% of Many-Eyes comments reference data quality or integrity.

ServiceSense.us Many-Eyes

0 20 40 60 80

Percentage

0 20 40 60 80

Percentage

ObservationQuestion

HypothesisData Integrity

LinkingSocializing

System DesignTesting

TipsTo-Do

Affirmation

Page 45: CIDR 2009: Jeff Heer Keynote

WANTED: Data Cleaning Tools

Reshape data, reformat rows & columns

Handle missing data: label, repair, interpolate

Entity resolution and de-duplication

Group related values into aggregates

Assist table lookups & data transforms

Provide tools in situ to leverage collective

Transparency requires provenance

Page 46: CIDR 2009: Jeff Heer Keynote

Integrating Data in Context

Page 47: CIDR 2009: Jeff Heer Keynote
Page 48: CIDR 2009: Jeff Heer Keynote
Page 49: CIDR 2009: Jeff Heer Keynote

College Drug Use

Page 50: CIDR 2009: Jeff Heer Keynote

College Drug Use

Page 51: CIDR 2009: Jeff Heer Keynote

Harry Potter is Freaking Popular

Page 52: CIDR 2009: Jeff Heer Keynote
Page 53: CIDR 2009: Jeff Heer Keynote

WANTED: In-Situ Data Integration

Search for and suggest related data or views

User input for types, schema matching, or data

Apply in context of the current task

But record mappings for future use

Record provenance: chain of data sources

Examples: Google Web Tables, Pay-As-You-Go, Stanford Vispedia, Utah VisTrails

Page 54: CIDR 2009: Jeff Heer Keynote

Pointing and Naming

Page 55: CIDR 2009: Jeff Heer Keynote

“Look at that spike.”

Page 56: CIDR 2009: Jeff Heer Keynote

“Look at the spike for Turkey.”

Page 57: CIDR 2009: Jeff Heer Keynote

“Look at the spike in the middle.”

Page 58: CIDR 2009: Jeff Heer Keynote

Free-form Data-aware

Page 59: CIDR 2009: Jeff Heer Keynote

Visual Queries

Model selections as declarative queries over interface elements or underlying data

(-118.371≤ lon AND lon≤ -118.164)AND(33.915≤ lat AND lat≤ 34.089)

Page 60: CIDR 2009: Jeff Heer Keynote

Visual Queries

Model selections as declarative queries over interface elements or underlying data

Applicable to dynamic, time-varying data

Retarget selection across visual encodings

Support social navigation and data mining

Page 61: CIDR 2009: Jeff Heer Keynote

WANTED: Data-Aware Annotation

Meta-queries linking annotations to views

Visually specifying notification triggers

Annotating data aggregates (use lineage?)

Unified model (again!) to facilitate reference

How to make it work at scale?

How else to use machine-readable annotations?

Can annotations be used to steer data mining?

Page 62: CIDR 2009: Jeff Heer Keynote

Conclusion

Page 63: CIDR 2009: Jeff Heer Keynote

Social Data Analysis

Collective analysis of data supported by social interaction.

1. Discussion and Debate

2. Text is Data, Too

3. Data Integrity and Cleaning

4. Integrating Data in Context

5. Pointing and Naming

Page 64: CIDR 2009: Jeff Heer Keynote

Summary

As visualization becomes common on the web, opportunities for collaborative analysis abound.

Weave visualizations into the web: data access, visualization creation, view sharing and pointing.

Support discovery, discussion, and integrationof contributions to leverage the collective.

Improve both processes and technologies for communication and dissemination.

Page 65: CIDR 2009: Jeff Heer Keynote

Parting Thoughts

Visualizations may have a catalytic effecton social interaction around data.

Encourage participation by minimizing or offsetting interaction costs.

Provide incentives by fostering the personal relevance of the data.

Page 66: CIDR 2009: Jeff Heer Keynote

Acknowledgements

@ Berkeley: Maneesh Agrawala, Wes Willett, danah boyd, Marti Hearst, Joe Hellerstein

@ IBM: Martin Wattenberg, Fernanda Viégas

@ PARC: Stu Card

@ Tableau: Jock Mackinlay, Chris Stolte, Christian Chabot

Page 67: CIDR 2009: Jeff Heer Keynote

Jeffrey Heer Stanford University

[email protected]://jheer.org

Voyagers and VoyeursSupporting Social Data Analysis

Page 68: CIDR 2009: Jeff Heer Keynote

With a collaborative spirit, with a collaborative platformwhere people can upload data, explore data, compare solutions, discuss the results, build consensus, we can engage passionate people, local communities, media and this will raise - incredibly - the amount of people who can understand what is going on.

And this would have fantastic outcomes: the engagement of people, especially new generations; it would increase knowledge, unlock statistics, improve transparency and accountability of public policies, change culture, increase numeracy, and in the end, improve democracy and welfare.

Enrico Giovannini, Chief Statistician, OECD. June 2007.