Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly)...
-
Upload
harold-smith -
Category
Documents
-
view
217 -
download
2
Transcript of Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 (Tacitly)...
Tomek Strzalkowski & Sharon G. SmallILS Institute, SUNY Albany
LAANCORMay 22, 2010
(Tacitly) Collaborative Question Answering Utilizing Web Trails
5/22/101 LREC QA workshop
CollaborationWorking together
Efficiency, sharing vs. groupthinkTacit collaboration
Professional analysts → COLLANE systemInformation sharing
Why and whenCollaborative filteringSharing insight and experience
5/22/10LREC QA workshop2
OutlineIntroductionCollaborative Knowledge LayerWeb TrailsExploratory EpisodesExperiments
Data CollectionResults
Collaborative SharingConclusionsFuture Research
5/22/103 LREC QA workshop
Sharing on the Internet?Internet users leave behind trails of their
workWhat they askedWhat links they triedWhat worked and what didn’t
Capture this Exploratory KnowledgeUtilize this Knowledge for subsequent users
Tacitly enables Collaborative Question Answering
Improved Efficiency and Accuracy
5/22/104 LREC QA workshop
Collaborative Knowledge LayerCaptures exploration paths (Web Trails)
Supplies meaning to the underlying dataMay clarify/alter originally intended meaning
Hypothesis: CKL may be utilized to Improve interactive QASupport tacit collaboration
Current ExperimentsCapturing web exploration trailsComputing degree of trail overlap
5/22/10LREC QA workshop5
Web Trails
5/22/10LREC QA workshop7
Consists of individual exploratory movesEntering a search queryTyping text into an input boxResponses from the browserOffers accepted or ignored
Files savedItems viewedLinks clicked through, etc.
Returns to the search boxContain optimal paths leading to specific
outcomes
Exploratory Episodes
5/22/10LREC QA workshop8
Discovered overlapping subsequences of web trailsCommon portions of exploratory web trails
from multiple network usersMay begin with a single user web trail
Shared with new users who appear to be pursuing a compatible task
9
A
B Q
D
T
G
K
M
F
E
C
A-B-Q-D-G Exploratory Episode helps new user from M to G
5/22/10LREC QA workshop
Experiment
5/22/10LREC QA workshop10
Evaluate degree of web trail overlap11 Research Problems DefinedGenerated 100 short queries for each
research problem descriptionUsed Google to retrieve the top 500 results
from each query~500MB per topic
Filtered for duplicates, commercial, offensive topics, etc.
2GB Corpus of web-mined text
Experiment setup
5/22/10LREC QA workshop11
4-6 Analysts per Research Topic2.5 hours per topicUtilized two fully functional QA Systems
HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany
COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany
Analyst’s ObjectiveFind sufficient information for a 3-page report
for the assigned topic
Example topic: artificial reefs
5/22/10LREC QA workshop12
Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic.
What is COLLANE?
5/22/10LREC QA workshop13
An Analytic ToolAn Analytic Tool
Exploits the strength of collaborative work Exploits the strength of collaborative work
Collaborative environment– Analysts work in teams
– Synchronously and asynchronously
– Information sharing on as-needed basis
Collaborating via COLLANE A team of users work on a task
Each user has own working space
A Combined Answer Space is createdMade out of individual contributions
Users interact with the systemVia question answering and visual interfaces
The system observes and facilitatesShares relevant information found by others
tacit collaboration
Users interact with each otherExchange tips and data items via a chat facility
open collaboration
5/22/10LREC QA workshop14
Key Tracked Events
5/22/10LREC QA workshop16
Questions AskedData Items CopiedData Items IgnoredSystems offers
accepted/rejectedDisplaying TextWords searched in
user interface
All dialogue between user and system
Bringing up full document source
Passages viewedTime spent
Experimental Results
5/22/10LREC QA workshop17
Aligned Episodes on common data itemsOnly considered user copy as indicator
Used document level overlapIgnored potential content overlap between
different documentsLower bound on Episode overlap
Experimental Results
5/22/10LREC QA workshop20
95 Exploratory EpisodesEE grouped by the degree of overlap
60% or higher → may be shared?OR40% or lower → divergent?
Find an overlap thresholdMaximize information sharingMinimize rejection
Some topics appear more suitable for information sharing and tacit collaboration
5/22/10LREC QA workshop21
At 50% episode overlap threshold more than half of all episodes are candidates for sharing
5/22/1022 LREC QA workshop
Collaborative Sharing Objective
5/22/10LREC QA workshop23
Leverage Exploratory KnowledgeUse experience and judgment of users who
faced the same or similar problemProvide superior accuracy and
responsiveness to subsequent usersSimilar to Relevance Feedback in IR
Community based rather than single user judgment
Utilize User B trailOffer D4-D7 to User D
After D3 copy
Avoids 2 fruitless questionsQ2 & Q4
Finds extra potential relevant data pointD7 5/22/1024 LREC QA workshop
Conclusion
5/22/10LREC QA workshop25
Users searching for information in a networked environment leave behind exploratory trails that can be captured
Exploratory Episodes can be compared for overlap by data items copied
Many users searching for same or highly related information are likely to follow similar routes through the dataWhen a user overlaps an EE above a
threshold they may benefit from tacit information sharing
Future Research
5/22/10LREC QA workshop26
Evaluate overlap utilizing semantic equivalence of data items copied
Distill Exploratory Episodes into shareable knowledge elements
Expand overlap metricsQuestion similarityItems Ignored, etc.
Evaluate frequency of acceptance of offered materialVarying thresholds