Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly)...

26
Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly) Collaborative Question Answering Utilizing Web Trails 5/22/10 1 LREC QA workshop

Transcript of Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly)...

Tomek Strzalkowski & Sharon G. SmallILS Institute, SUNY Albany

LAANCORMay 22, 2010

(Tacitly) Collaborative Question Answering Utilizing Web Trails

5/22/101 LREC QA workshop

CollaborationWorking together

Efficiency, sharing vs. groupthinkTacit collaboration

Professional analysts → COLLANE systemInformation sharing

Why and whenCollaborative filteringSharing insight and experience

5/22/10LREC QA workshop2

OutlineIntroductionCollaborative Knowledge LayerWeb TrailsExploratory EpisodesExperiments

Data CollectionResults

Collaborative SharingConclusionsFuture Research

5/22/103 LREC QA workshop

Sharing on the Internet?Internet users leave behind trails of their

workWhat they askedWhat links they triedWhat worked and what didn’t

Capture this Exploratory KnowledgeUtilize this Knowledge for subsequent users

Tacitly enables Collaborative Question Answering

Improved Efficiency and Accuracy

5/22/104 LREC QA workshop

Collaborative Knowledge LayerCaptures exploration paths (Web Trails)

Supplies meaning to the underlying dataMay clarify/alter originally intended meaning

Hypothesis: CKL may be utilized to Improve interactive QASupport tacit collaboration

Current ExperimentsCapturing web exploration trailsComputing degree of trail overlap

5/22/10LREC QA workshop5

Collaborative Space

5/22/10LREC QA workshop6

Web Trails

5/22/10LREC QA workshop7

Consists of individual exploratory movesEntering a search queryTyping text into an input boxResponses from the browserOffers accepted or ignored

Files savedItems viewedLinks clicked through, etc.

Returns to the search boxContain optimal paths leading to specific

outcomes

Exploratory Episodes

5/22/10LREC QA workshop8

Discovered overlapping subsequences of web trailsCommon portions of exploratory web trails

from multiple network usersMay begin with a single user web trail

Shared with new users who appear to be pursuing a compatible task

9

A

B Q

D

T

G

K

M

F

E

C

A-B-Q-D-G Exploratory Episode helps new user from M to G

5/22/10LREC QA workshop

Experiment

5/22/10LREC QA workshop10

Evaluate degree of web trail overlap11 Research Problems DefinedGenerated 100 short queries for each

research problem descriptionUsed Google to retrieve the top 500 results

from each query~500MB per topic

Filtered for duplicates, commercial, offensive topics, etc.

2GB Corpus of web-mined text

Experiment setup

5/22/10LREC QA workshop11

4-6 Analysts per Research Topic2.5 hours per topicUtilized two fully functional QA Systems

HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany

COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany

Analyst’s ObjectiveFind sufficient information for a 3-page report

for the assigned topic

Example topic: artificial reefs

5/22/10LREC QA workshop12

Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic.

What is COLLANE?

5/22/10LREC QA workshop13

An Analytic ToolAn Analytic Tool

Exploits the strength of collaborative work Exploits the strength of collaborative work

Collaborative environment– Analysts work in teams

– Synchronously and asynchronously

– Information sharing on as-needed basis

Collaborating via COLLANE A team of users work on a task

Each user has own working space

A Combined Answer Space is createdMade out of individual contributions

Users interact with the systemVia question answering and visual interfaces

The system observes and facilitatesShares relevant information found by others

tacit collaboration

Users interact with each otherExchange tips and data items via a chat facility

open collaboration

5/22/10LREC QA workshop14

COLLANE/HITIQA user interface

5/22/10LREC QA workshop15

Key Tracked Events

5/22/10LREC QA workshop16

Questions AskedData Items CopiedData Items IgnoredSystems offers

accepted/rejectedDisplaying TextWords searched in

user interface

All dialogue between user and system

Bringing up full document source

Passages viewedTime spent

Experimental Results

5/22/10LREC QA workshop17

Aligned Episodes on common data itemsOnly considered user copy as indicator

Used document level overlapIgnored potential content overlap between

different documentsLower bound on Episode overlap

5/22/10LREC QA workshop18

A-G & E-H 60-75% OverlapArtificial Reefs Example

5/22/1019 LREC QA workshop

Experimental Results

5/22/10LREC QA workshop20

95 Exploratory EpisodesEE grouped by the degree of overlap

60% or higher → may be shared?OR40% or lower → divergent?

Find an overlap thresholdMaximize information sharingMinimize rejection

Some topics appear more suitable for information sharing and tacit collaboration

5/22/10LREC QA workshop21

At 50% episode overlap threshold more than half of all episodes are candidates for sharing

5/22/1022 LREC QA workshop

Collaborative Sharing Objective

5/22/10LREC QA workshop23

Leverage Exploratory KnowledgeUse experience and judgment of users who

faced the same or similar problemProvide superior accuracy and

responsiveness to subsequent usersSimilar to Relevance Feedback in IR

Community based rather than single user judgment

Utilize User B trailOffer D4-D7 to User D

After D3 copy

Avoids 2 fruitless questionsQ2 & Q4

Finds extra potential relevant data pointD7 5/22/1024 LREC QA workshop

Conclusion

5/22/10LREC QA workshop25

Users searching for information in a networked environment leave behind exploratory trails that can be captured

Exploratory Episodes can be compared for overlap by data items copied

Many users searching for same or highly related information are likely to follow similar routes through the dataWhen a user overlaps an EE above a

threshold they may benefit from tacit information sharing

Future Research

5/22/10LREC QA workshop26

Evaluate overlap utilizing semantic equivalence of data items copied

Distill Exploratory Episodes into shareable knowledge elements

Expand overlap metricsQuestion similarityItems Ignored, etc.

Evaluate frequency of acceptance of offered materialVarying thresholds