Exploring problems of data mobility, sharing and reuse

23
Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1

description

Exploring problems of data mobility, sharing and reuse. Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot. Overview. The eResearch vision. Background to this study. Earlier studies of data mobility, sharing and re-use. Fieldwork findings and implications. Conclusions. - PowerPoint PPT Presentation

Transcript of Exploring problems of data mobility, sharing and reuse

Page 1: Exploring problems of data mobility, sharing and reuse

Exploring problems of data mobility, sharing and reuse

Rob ProcterMark Hartswood, Stuart Anderson, Paul

Taylor, Lilian Blot

1

Page 2: Exploring problems of data mobility, sharing and reuse

Overview

• The eResearch vision.• Background to this study.• Earlier studies of data mobility, sharing and

re-use.• Fieldwork findings and implications.• Conclusions.

2

Page 3: Exploring problems of data mobility, sharing and reuse

The eResearch vision

• The eResearch vision promotes collaboration, interdisciplinary work and ‘reduced time to discovery’ as the keys to future scientific advances.

• Increased data sharing and re-use is seen as fundamental to the realisation of this vision.

3

Page 4: Exploring problems of data mobility, sharing and reuse

Background to this study

• eDiaMoND was a UK e-Science programme project to create a shared national archive of digital mammograms from the UK breast screening programme, and use it to support a range of activities, including training.

• A follow-on project (LEMI) developed a training tool in collaboration with clinicians.

• Its aim was to draw upon archive materials and use them in ‘live’ training situations.

4

Page 5: Exploring problems of data mobility, sharing and reuse

The UK National Breast Screening Programme

• Breast cancer is the most common cause of cancer in the UK.

• Screening by mammography (breast X-Rays) offered every three years to women between 50 and 70 years of age.

• Mammograms examined by trained readers for signs of abnormality.

• Abnormal cases are recalled for further tests at an assessment clinic.– 3-6% are recalled and about 0.3-0.6% are malignant.

5

Page 6: Exploring problems of data mobility, sharing and reuse

e-DiaMoND

eDiaMoND blueprint document, 2005

http://www.ediamond.ox.ac.uk/publications/blueprint-Final.pdf

Digital mammogram archive

LEMITraining

Screening tool Lesion Zoo

Research• Epidemiology• Image analysisPractice• Training• Remote reading

6

Page 7: Exploring problems of data mobility, sharing and reuse

eDiaMoND data sharing and re-use model

Data archiveOriginating context

Use contextData archive

Metadata

Page 8: Exploring problems of data mobility, sharing and reuse

Earlier studies of eDiaMoND• Jirotka, M. et al (2005) Collaboration and Trust in Healthcare Innovation:

The eDiaMoND Case Study. JCSCW– Problematised the idea of remote reading.– Understanding the circumstances of mammogram production and use

important for trust in the data.• Coopmans, C. (2006) Making Mammograms Mobile: Suggestions for a

Sociology of Data Mobility. Information, Communication and Society– Problematised the idea of data mobility.– “An understanding of mobility … does not only emphasize that transit

is an active achievement but also draws attention to the craft like nature of that achievement: the artful connecting of time, space, material and immaterial elements into a ‘mobility effect.’”

8

Page 9: Exploring problems of data mobility, sharing and reuse

Questions motivating this study

• How should we understand the relationship between data and its originating context?

• What happens when people actually engage with the data to do something purposeful?

9

Page 10: Exploring problems of data mobility, sharing and reuse

How should we understand the relationship between data and context?

• Berg and Goorman (1999) describe medical data as ‘entangled’ with the context of its production.

• Words like ‘disentangled’ seem to imply that data can somehow liberated from its context.

• Berg and Goorman argue that the more contexts data has to be usable in, the more work needed to disentangle it.

10

Page 11: Exploring problems of data mobility, sharing and reuse

Patient records and data structures

Rich

Heterogeneous

Redundant

Documenting and guiding practice

Implicit relations

Partial

Selected

Explicit relations

11

Page 12: Exploring problems of data mobility, sharing and reuse

Encounters with eDiaMoND data

• Problems emerging when encountering the data in relation to:– Application development.– Set selection.– Training.

• We will examine:– How problems were recognised, diagnosed and fixed.– Who was involved and what resources they needed.

12

Page 13: Exploring problems of data mobility, sharing and reuse

Example 1: Data correction work

• Couldn’t be done automatically: – Data not of sufficient

quality

• But enough data embedded in the digital artefacts that a skilled person could correct.

13

Page 14: Exploring problems of data mobility, sharing and reuse

Example 2: Selecting cases to include in training sets

14

Page 15: Exploring problems of data mobility, sharing and reuse

Uncovering omissions

15

Page 16: Exploring problems of data mobility, sharing and reuse

Example 3: Training

16

Page 17: Exploring problems of data mobility, sharing and reuse

Mentoring the trainee

17

Page 18: Exploring problems of data mobility, sharing and reuse

Findings: 1

• Use of the data led to different sorts of data ‘problem’ emerging, requiring different sorts of resources to diagnose and repair.

• We had to go back to source and make corrections, additions, sometimes change the data model.

• Making sense of data depends on some understanding of the context of production.

• It was difficult to predict a priori what contextual information to preserve and what to discard.

18

Page 19: Exploring problems of data mobility, sharing and reuse

Findings: 2

• Studies of data mobility focus on need for work to ‘disentangle’ or ‘decontextualise’ data, but making interpretation and use of data less dependent on the originating context is only a part contributor to mobility.

• While we carve out a ‘chunk of context’, we also throw away significant detail, and no longer have easy access to the full range of resources that we would usually depend upon for making sense of its contents.

19

Page 20: Exploring problems of data mobility, sharing and reuse

Implications

• Moving on from eDiaMoND data curation model:– Tacit assumption that data abstracted from a working

context can be treated as self-sufficient.

• Better access to originating contexts:– Interpretative practices attendant on data re-use involve

linking originating and use context by some other means than that provided by metadata.

• Ease of correcting and amending data in-situ:– Facilities need to be available at point of use, and not

separated out into different processes and activities.

20

Page 21: Exploring problems of data mobility, sharing and reuse

Conclusions: 1

• Achieving data mobility is less about making it independent of the context of production, and more about appropriately maintaining and carefully managing links to that context.

• We find that users continually (re)appraise data based on their understandings of practices associated with its production and abstraction.

• This is also shown in Zimmerman’s study of data reuse by ecologists, whereby the appropriateness of using third party datasets is gauged according to what ecologists know and understand about the specific phenomena and data collection practices.

21

Page 22: Exploring problems of data mobility, sharing and reuse

Conclusions: 2

• Zimmerman asked ecologists to report retrospectively how they selected data for reuse whereas, in our study, we examined actual occasions of data reuse.

• While agreeing that greater detail of data collection practices should be made available, we take the more radical step of recommending capture of richer representations of the originating context.

22

Page 23: Exploring problems of data mobility, sharing and reuse

Conclusions: 3

• We need to move away from ideas of linear processes and static data sets towards thinking of data as more organic, ‘living’ artefacts in need of periodic amendment, repair, renewal and retirement.

• If we shift our focus to accommodate non-linear aspects of data collection and the dynamic character of ‘live’ data, then this opens various opportunities for a radical reconfiguration of a variety of data management practices.

• This reconfiguration of data management needs to be taken seriously if the benefits of increased data re-use and sharing envisaged by eResearch are going to be realised fully.

23