Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson,...

23
Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1

Transcript of Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson,...

Page 1: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Exploring problems of data mobility, sharing and reuse

Rob ProcterMark Hartswood, Stuart Anderson, Paul

Taylor, Lilian Blot

1

Page 2: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Overview

• The eResearch vision.• Background to this study.• Earlier studies of data mobility, sharing and

re-use.• Fieldwork findings and implications.• Conclusions.

2

Page 3: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

The eResearch vision

• The eResearch vision promotes collaboration, interdisciplinary work and ‘reduced time to discovery’ as the keys to future scientific advances.

• Increased data sharing and re-use is seen as fundamental to the realisation of this vision.

3

Page 4: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Background to this study

• eDiaMoND was a UK e-Science programme project to create a shared national archive of digital mammograms from the UK breast screening programme, and use it to support a range of activities, including training.

• A follow-on project (LEMI) developed a training tool in collaboration with clinicians.

• Its aim was to draw upon archive materials and use them in ‘live’ training situations.

4

Page 5: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

The UK National Breast Screening Programme

• Breast cancer is the most common cause of cancer in the UK.

• Screening by mammography (breast X-Rays) offered every three years to women between 50 and 70 years of age.

• Mammograms examined by trained readers for signs of abnormality.

• Abnormal cases are recalled for further tests at an assessment clinic.– 3-6% are recalled and about 0.3-0.6% are malignant.

5

Page 6: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

e-DiaMoND

eDiaMoND blueprint document, 2005

http://www.ediamond.ox.ac.uk/publications/blueprint-Final.pdf

Digital mammogram archive

LEMITraining

Screening tool Lesion Zoo

Research• Epidemiology• Image analysisPractice• Training• Remote reading

6

Page 7: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

eDiaMoND data sharing and re-use model

Data archiveOriginating context

Use contextData archive

Metadata

Page 8: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Earlier studies of eDiaMoND• Jirotka, M. et al (2005) Collaboration and Trust in Healthcare Innovation:

The eDiaMoND Case Study. JCSCW– Problematised the idea of remote reading.– Understanding the circumstances of mammogram production and use

important for trust in the data.• Coopmans, C. (2006) Making Mammograms Mobile: Suggestions for a

Sociology of Data Mobility. Information, Communication and Society– Problematised the idea of data mobility.– “An understanding of mobility … does not only emphasize that transit

is an active achievement but also draws attention to the craft like nature of that achievement: the artful connecting of time, space, material and immaterial elements into a ‘mobility effect.’”

8

Page 9: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Questions motivating this study

• How should we understand the relationship between data and its originating context?

• What happens when people actually engage with the data to do something purposeful?

9

Page 10: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

How should we understand the relationship between data and context?

• Berg and Goorman (1999) describe medical data as ‘entangled’ with the context of its production.

• Words like ‘disentangled’ seem to imply that data can somehow liberated from its context.

• Berg and Goorman argue that the more contexts data has to be usable in, the more work needed to disentangle it.

10

Page 11: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Patient records and data structures

Rich

Heterogeneous

Redundant

Documenting and guiding practice

Implicit relations

Partial

Selected

Explicit relations

11

Page 12: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Encounters with eDiaMoND data

• Problems emerging when encountering the data in relation to:– Application development.– Set selection.– Training.

• We will examine:– How problems were recognised, diagnosed and fixed.– Who was involved and what resources they needed.

12

Page 13: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Example 1: Data correction work

• Couldn’t be done automatically: – Data not of sufficient

quality

• But enough data embedded in the digital artefacts that a skilled person could correct.

13

Page 14: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Example 2: Selecting cases to include in training sets

14

Page 15: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Uncovering omissions

15

Page 16: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Example 3: Training

16

Page 17: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Mentoring the trainee

17

Page 18: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Findings: 1

• Use of the data led to different sorts of data ‘problem’ emerging, requiring different sorts of resources to diagnose and repair.

• We had to go back to source and make corrections, additions, sometimes change the data model.

• Making sense of data depends on some understanding of the context of production.

• It was difficult to predict a priori what contextual information to preserve and what to discard.

18

Page 19: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Findings: 2

• Studies of data mobility focus on need for work to ‘disentangle’ or ‘decontextualise’ data, but making interpretation and use of data less dependent on the originating context is only a part contributor to mobility.

• While we carve out a ‘chunk of context’, we also throw away significant detail, and no longer have easy access to the full range of resources that we would usually depend upon for making sense of its contents.

19

Page 20: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Implications

• Moving on from eDiaMoND data curation model:– Tacit assumption that data abstracted from a working

context can be treated as self-sufficient.

• Better access to originating contexts:– Interpretative practices attendant on data re-use involve

linking originating and use context by some other means than that provided by metadata.

• Ease of correcting and amending data in-situ:– Facilities need to be available at point of use, and not

separated out into different processes and activities.

20

Page 21: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Conclusions: 1

• Achieving data mobility is less about making it independent of the context of production, and more about appropriately maintaining and carefully managing links to that context.

• We find that users continually (re)appraise data based on their understandings of practices associated with its production and abstraction.

• This is also shown in Zimmerman’s study of data reuse by ecologists, whereby the appropriateness of using third party datasets is gauged according to what ecologists know and understand about the specific phenomena and data collection practices.

21

Page 22: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Conclusions: 2

• Zimmerman asked ecologists to report retrospectively how they selected data for reuse whereas, in our study, we examined actual occasions of data reuse.

• While agreeing that greater detail of data collection practices should be made available, we take the more radical step of recommending capture of richer representations of the originating context.

22

Page 23: Exploring problems of data mobility, sharing and reuse Rob Procter Mark Hartswood, Stuart Anderson, Paul Taylor, Lilian Blot 1.

Conclusions: 3

• We need to move away from ideas of linear processes and static data sets towards thinking of data as more organic, ‘living’ artefacts in need of periodic amendment, repair, renewal and retirement.

• If we shift our focus to accommodate non-linear aspects of data collection and the dynamic character of ‘live’ data, then this opens various opportunities for a radical reconfiguration of a variety of data management practices.

• This reconfiguration of data management needs to be taken seriously if the benefits of increased data re-use and sharing envisaged by eResearch are going to be realised fully.

23