Charles L.A. Clarke School of Computer Science, University of Waterloo, Canada Elaine G. Toms...

15
Charles L.A. Clarke School of Computer Science, University of Waterloo, Canada Elaine G. Toms Faculty of Management, Dalhousie University, Halifax, Canada Luanne Freund Faculty of Information Studies, University of Toronto, Canada Modeling Task-Genre Relationships for IR in the Workplace The 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. August 15-19, 2005, in Salvador, Brazil.

Transcript of Charles L.A. Clarke School of Computer Science, University of Waterloo, Canada Elaine G. Toms...

Charles L.A. ClarkeSchool of Computer Science, University of Waterloo, Canada

Elaine G. TomsFaculty of Management, Dalhousie University, Halifax, Canada

Luanne Freund Faculty of Information Studies, University of Toronto, Canada

Modeling Task-Genre Relationships for IR in

the Workplace

Modeling Task-Genre Relationships for IR in

the Workplace

The 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. August 15-19, 2005, in Salvador, Brazil.

2

Workplace IRHow do I phrase this so I don't get 2,000 responses? If I look for certain words, I know I'm going to get thousands of responses and they don't mean anything when I sift through them.

There’s lots of great information out there. We just don't know how to find it yet. We don't know how to make it easy to find yet.

I really just don't have time to read through 10 articles to find one that's good. First off I just don't have that kind of attention span and secondly I've just got lots and lots of other stuff to do.

3

Approach: Contextual IR

querybag of

words

work domain

searcher

info need Traditional IR

work task

work domain

author

info

work task

problem purpose

•work tasks

•information tasks

•document genreWhich Factors?

Searcher Information

Contextual IR

Interaction

4

Research Questions

• Do discernable relationships exist between work tasks, information tasks and document genres in a specific work domain?

• If so, what are these relationships, and are there broader factors underlying the patterns of association between these variables?

5

Methods: Work Domain

Software Engineering• large multinational hi-tech company• software services consulting group• wide range of work activities: assessment,

troubleshooting, implementation, system migration, project management, etc.

• heavy reliance upon digital information sources

6

Methods: DatasetInternal Intellectual Capital Database• Documents submitted and meta-tagged by

consultants• Tags: document type & task (purpose)• 5,800 pairs of tags for analysisgenres (17) tasks: work (20) tasks: informational

(16)

cookbookdiscussionlecture / labpresentationschedulesales kitreading materialsource codetools, etc.

architecturedebugginginstallationconfigurationdeploymentimplementationproof of conceptproject managementtesting, etc.

compareeducatedocumentguidedemonstrateindexsupportmarketmethodology, etc.

7

Methods: Analysis

Correspondence Analysis

• exploratory method used to identify patterns of association between variables

• generalization of PCA to contingency tables with multiple categories for each variable

• maps vectors of row and column profiles in multi-dimensional space – using Chi-Square distance

• calculates inertia - measure of dispersion - for each row and column

• uses best-fitting planes to reduce dimensionality of solution

8

Results: Genre DistributionSignificant Relationship:

Genre & Task (x2= 5878.968, df=612, p<.001)

0%

5%

10%

15%

20%

25%

30%

35%

40%

Configuration

Development

Project Management

Select Work Tasks: Genre Distribution

9

Results: Genre Distribution

0%

5%

10%

15%

20%

25%

30%

35%

40%

comparedocumentexample

Select Information Tasks: Genre Distribution

Significant Relationship: Genre & Task (x2= 5878.968, df=612,

p<.001O)

10

Correspondence Map: Dimensions 1& 2

“Work Activities”

engineering consulting

“Info

rmati

on G

oals

doing;low level

learning;high level

11

Correspondence Map: Dimensions 1& 4

“Work Activities”

“Info

rmati

on

Goals

demonstrating; interactive

fact-finding; static

12

Summary –Patterns of Association

Work Role Doing“how to”low-level

Learning“why?”high-level

Fact-Finding“what?”static

Demonstrating “show me”interactive

Software engineering

integrationinstallationtoolcookbook demo

architecturecapacity planningguidewebsite

administratesecuritydesign docsroadmap standards

testdebuggingperformance tuningsource code tooldemo

consulting project managementengagement summaryschedule

product presentationtechnical infolecture/labpresentation

project managementindexschedulelegal material

discovery sessioncompetitive evaluationmethodsdiscussion technical info

13

Genre Clusters

• reusables• low-level technical• product maintenance• high-level generic• educational

Meta-genres?

14

Key relevance criteria for engineers: “task applicability”

To what extent does genre reflect this?

DiscussionInfo-centric perspective on work tasks– Significant relationship: task & genre– Micro-relationships – specific tasks & genres –

general relationships exist; moderated by roles and information goals

– Macro-relationships – suggests factors for hypothesis-testing for engineering domain; enterprise search

15

This research is supported by an IBM Centre for Advanced Studies (Toronto) fellowship to the first and second authors, and a SSHRC and Canada Research Chairs Program grant to the second author. We would like to thank Julie Waterhouse, IBM, and the many software services consultants who contributed

their valuable time to the project.

thank you