Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

45
Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Transcript of Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Page 1: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Metro Maps of

Dafna ShahafCarlos Guestrin

Eric Horvitz

Page 2: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

The abundance of books is a distraction‘‘

,,Lucius Annaeus Seneca

4 BC – 65 AD

Page 3: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

… and it does not get any better

• 129,864,880 Books (Google estimate)

• Research:– PubMed: 19 million papers

(One paper added per minute!)– Scopus: 40 million papers

Page 4: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Papers

InnovativePapers

Page 5: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

So, you want to understand a research topic…

Now what?

Page 6: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Search Engines are Great

• But do not show how it all fits together

Page 7: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Timeline Systems

Page 8: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Research is not Linear

Page 9: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Metro Map

• A map is a set of lines of articles• Each line follows a coherent narrative thread• Temporal Dynamics + Structure

austerity

bailout

junk status

Germany

protests

strike

labor unionsMerkel

Page 10: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Map Definition• A map M is a pair (G, P) where

– G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line

austerity

bailout

junk status

protests

strike

Germany

labor unionsMerkel

Page 11: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Game Plan

Objective Algorithm Does itwork?

Page 12: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Properties of a Good Map

1. Coherence

???

Page 13: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

1 2 3 4 5

Greece

Europe

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

Coherence is not a property of local interactions:

Incoherent: Each pair shares different words

Page 14: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

1 2 3 4 5

Greece

Austerity

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

A more-coherent chain:

Coherent: a small number of words captures the story

Page 15: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Page 16: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Using the Citation Graph

• Create a graph per word– All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11]

3

6 7

4

9

2

8

1

5

Network

Where did paper 8 get the idea?

Do papers 8 and 9 mean the same thing?

Page 17: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Incoherent

Page 18: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Properties of a Good Map

1. Coherence

Is it enough?

Page 19: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Max-coherence MapQuery: Reinforcement Learning

Page 20: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Properties of a Good Map

1. Coherence

2. Coverage

Should cover diverse topics important to

the user

Page 21: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Coverage: What to Cover?

• Perhaps words?• Not enough:

SVM in oracle database 10gMilenova et al

VLDB '05

Support Vector Machines in Relational Databases RupingSVM '02

1

2

Page 22: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Similar Content

1 2

Page 23: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Different Impact Citing Venues and Authors:

Affected more authors/ venues

Very little intersection

1 2

Page 24: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

What to Cover?

• Instead of words…• Cover papers• A paper covers papers that

it had an impact on• High-coverage map:

impact on a lot of the corpus• Why descendants?

• Soft notion: [0,1]

Page 25: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

p has High Impact on q if…p

q

Many paths(especially short)

Note that our protocol is different from previous

work…

coherent

Formalize with coherent random walks

We use the algorithm of…

r

Page 26: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Map Coverage• Documents cover pieces of the corpus:

CorpusCoverage

Page 27: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

High-coverage, Coherent Map

Page 28: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Properties of a Good Map

1. Coherence

2. Coverage

3. Connectivity

Page 29: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Definition: Connectivity

• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines

Page 30: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Lines with No Intersection

Solution: Reward lines that had impact on each other

Perceptrons SVMOptimizing Kernels

for SVM

Face DetectionSVM for Facial

Recognition

Page 31: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Tying it all Together:Map Objective

• Coherence– Either coherent or not: Constraint

• Coverage– Must have!

• Connectivity– Nice to have

Consider all coherent maps with maximum possible coverage.

Find the most connected one.

Page 32: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Game Plan

Objective Algorithm Does itwork?

Page 33: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Page 34: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Coherence Graph: Main Idea

• Vertices correspond to short coherent chains• Directed edges between chains which can be

conjoined and remain coherent

1 2 3

4 5 6 5 8 9

1 2 3 5 8 9

Page 35: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

1 2 3

4 5 6 5 8 9

Cover( )

>

Cover( )

?

1 2 3 4 5 6

1 2 3 5 8 9

Page 36: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation

Orienteering

a function of the nodes visited

Page 37: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Approach Overview: Recap

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Encodes all coherent chains as

graph paths

Submodular orienteering [Chekuri & Pal, 2005]

Quasipoly time recursive greedy

O(log OPT) approximation

Page 38: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Example Map: Reinforcement Learning

multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdp

Page 39: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Example Map Detail: SVM

Page 40: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Game Plan

Objective Algorithm Does itwork?

Page 41: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

User Study

• Tricky!– No double-blind, no within-subject– Domain: understandable yet unfamiliar– Reinforcement Learning (RL)

Page 42: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

User Study

• 30 participants• First-year grad student, Reinforcement

Learning project• Update a survey paper from 1996• Identify research directions + relevant papers

– Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

Page 43: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Results (in a nutshell)Be

tter

Google Us Google Us

Map users find better papers, and

cover more important areas

Page 44: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

User CommentsHelpful

noticed directions I didn't know aboutgreat starting point

… get a basic idea of what science is up to

why don't you draw words on edges?

Legend is confusing

hard to get an idea from paper title alone

Page 45: Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz.

Conclusions• Formulated metrics characterizing good maps for

the scientific domain• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization

Thank you!