Clinical and Translational Science Institute / CTSI at the University of California, San Francisco...

12
Clinical and Translational Science Institute / CTSI at the University of California, San Francisco UCSF DataShare Making Research Data Available to All & Catalyzing Open Science Maninder Kahlon, PhD + Anirvan Chatterjee, Angela Rizk-Jackson, PhD, Peter St. Wecker, PhD, Michael Weiner, MD

Transcript of Clinical and Translational Science Institute / CTSI at the University of California, San Francisco...

Clinical and TranslationalScience Institute / CTSIat the University of California, San Francisco

UCSF DataShareMaking Research Data Available to All & Catalyzing Open Science

Maninder Kahlon, PhD+ Anirvan Chatterjee, Angela Rizk-Jackson, PhD, Peter St. Wecker, PhD, Michael Weiner, MD

Goals for this discussion

• Introduce project (UCSF DataShare), rationale, status, challenges

• Discuss:– Solutions for challenges– Any points of integration for other initiatives?– How/when for campus support and involvement

Why & How this Started

Encourages “open science” & fits CTSI mission to accelerate research through the encouragement of reuse of data

Policies encourage it, sometimes require it. May be more stringent down the road.

Competitive advantage – evidence accruing that open data increases impact of research

Passionate PI, who’s already lead a major datasharing initiative - the Alzheimer’s Disease Neuroimaging Initiative (ADNI) – Michael Weiner

Networks (at UCSF, and nationally, CTSA and ADNI) to promulgate and catalyze adoption

Sharing data reinforces open scientific inquiry, encourages diversity of analysis and opinion, promotes new research, makes possible the testing of new or alternative hypotheses and methods of analysis, supports studies on data collection methods and measurement, facilitates the education of new researchers, enables the exploration of topics not envisioned by the initial investigators, and permits the creation of new data sets when data from multiple sources are combined. 

—National Institutes of Health, 2002

Policy

• NIH

- Public Access Policy: Peer-reviewed manuscripts to PubMed Central to ensure public access

- Statement on Sharing Research Data (NIH-OD-03-032) : All projects >$500K must share data

• NSF

- General Grant Conditions: “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”

• Wellcome Trust (UK)

- “The Wellcome Trust expects all of its funded researchers to maximise the availability of research data with as few restrictions as possible”

• Journal requirements to make data available as a condition of publication

- Nature journals

- PLoS journals

- Royal Society journals

Competitive Advantage

Our Approach

• Pilot datasharing using best-of-breed open source tools, science & technology talent, leverage networks. (Began in Spring ‘11)

• Identify major challenges and work with partners to resolve

• Develop long-term (sustainable) plan for UCSF

• Use UCSF experience to catalyze greater data sharing (and use) nationally.

Datashare.ucsf.edu

Open Source System Designed for Archiving & Reuse - Dataverse

Begun Some Adaptations

Opportunities

• Crowded field – but minimal integration of data sharing and technical advocates with investigators, grounded in research and passionate about sharing (aka “Users”)

• Opportunity to get beyond an inflexion point, demonstrate ease and utility, and catalyze larger scale sharing in biomedical research.

Challenges

1. Data Storage – currently supported by CTSI. Future & sustainable option?

• California Digital Library (UC)

• UCSF?

• Partnership with commercial (like Amazon)

• NIH/NLM options – not likely in the near future

2. Data Curation, Management, etc – establish sustainable model

3. Reduce barriers to sharing for investigators• Reluctance of investigators to share, even after publication for a variety of

reasons: it’s hard/takes work; it’s “my” data.

4. Reduce barriers to using for investigators• Hard to find data that you might need, hard to understand it (lack of standard

ways of describing data)

5. Incentivize investigator use and adoption