Are Funders and Academic Institutions Approaches to Data Science Aligned
-
Upload
philip-bourne -
Category
Education
-
view
461 -
download
0
Transcript of Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data
Science Aligned?
Philip E. Bourne PhD, FACMIStephenson Chair of Data Science
Director, Data Science InstituteProfessor of Biomedical Engineering
[email protected]://www.slideshare.net/pebourne
6/15/17 Dataverse 2017 1
My Bias in Addressing this Question
• Many years as a receiver of funds
• Active developer of public-private partnerships
• Chief Data Officer and final word on the BD2K project at the NIH for 3 years (funder view)
• DSI Director for 6 weeks (state and institutional view)
6/15/17 Dataverse 2017 2
What Do I Mean by Data Science?
• Use of the ever increasing amount of open, complex, diverse digital data
• Finding ways to ask and then answer relevant questions by combing such diverse data sets
• Arriving at statistically significant conclusions not otherwise obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human condition
6/15/17 Dataverse 2017 3
Consider Some Current High Profile NIH Examples where Data Science is
Being Applied
• Moonshot - Bringing together 5 petabytes of homogenized data within the Genome Data Commons (GDC) to explore genotype-phenotype relationships
• MODs – Multiple high value high cost genomic resources• Human Microbiome Project – microbe characterization and analysis• TOPMed – Genomic, proteomic, metabolomic, image and EHR data• Precision Medicine - Building a platform to support data on >1M individuals
with extensive and constantly updated health profiles• ECHO – Effects of Environmental Exposures on Child Health and
Development - Integration of child health and environmental data• BRAIN - Temporal and spatial analysis of neural circuits
4
How is Data Science Being Applied?
• Moonshot – new ways to analyze genotype-phenotype associations• MODs – new curation and integration tools• Human Microbiome Project – new cloud based tools• TOPMed – large scale storage and analysis; data harmonization• Precision Medicine – security; analysis of sensor data; EHR integration• ECHO – metadata descriptions of health and environmental data;
application of geospatial methods• BRAIN – methods for network analysis, visualization
All:
Analytics, the Commons, FAIR, sustainability, workforce5
Are Funders and Academic Institutions Approaches to Data Science Aligned?
6/15/17 Dataverse 2017 6
A spoiler …
Yes…But both are so far behind the
times its scary
Top
Down
FundersFederal, Foundations, Philanthropy
State
GeneralPublic
Faculty&
Staff
Students
ScholarlyCommunities
Lack of Alignment Can Come from Serving Different Masters/Mistresses:
Academic Institutions
6/15/17 Dataverse 2017 7
Top
Down
Congress
GeneralPublic
ScholarlyCommunities
Lack of Alignment Can Come from Serving Different Masters/Mistresses:
Funders
Researchers
6/15/17 Dataverse 2017 8
Why Does Alignment Matter Now?One extreme view is the 6D’s
6/15/17 Dataverse 2017 9
How Significant?One extreme is the 6D’s
6/15/17 Dataverse 2017 10
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
How Much Data?
• Big Data– Total data from NIH-funded research currently
estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year
• Dark Data– Only 12% of data described in published papers is
in recognized archives – 88% is dark data^
• Cost– 2007-2014: NIH spent ~$1.2Bn extramurally on
maintaining data archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/262077596/15/17 Dataverse 2017 11
Renaissance or not, all of us, including funders and institutions, feel that some changes are afoot
6/15/17 Dataverse 2017 12
Interestingly both are organizationally similar which points to some form of
shared misalignment
• Funder institutions
– Silos –Institutes/Divisions
• Resources flow directly to the silos
• Patchwork efforts to compensate
– Common fund (NIH)
• Academic Institutions
– Silos –Schools/Departments
• Resources flow directly to the silos (RCM)
• Patchwork efforts to compensate
– Joint appointments
6/15/17 Dataverse 2017 13
In both environments data science transcends the traditional
organizational structure, but it is not necessarily clear what to do with it
6/15/17 Dataverse 2017 14
Approaches
• Funders
– Chief data officer
– Establish• Programs
• Divisional data officers
• Institutions
– Chief data officer
– Establish• Schools/Deans
• Departments/Chairs
• Divisions
• Centers/Directors
• Institutes/Directors
6/15/17 Dataverse 2017 15
Motivations
• Funders– Intramural
• Productivity/Cost-effectiveness
– Extramural• Acceleration of research
outcomes
• Reproducibility
• Governance including policy
• Workforce inc. diversity
• Ethics
• Stewardship
• Discovery
• Institutions– Yet to significantly eat their
own dog food
– Workforce development inc. diversity
– Research dollars to economic development
– Public private partnership• More dollars
– Alumni
6/15/17 Dataverse 2017 16
Example of what motivates funders …
6/15/17 Dataverse 2017 17
Why a More Open Process?Use case:
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick6/15/17 Dataverse 2017 18
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary, co-
occurring mutation
From Adam Resnick6/15/17 Dataverse 2017 19
What do we need to do differently to reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed to
the disease during that time) could have
been impacted if only data were FAIRFrom Adam Resnick
6/15/17 Dataverse 2017 20
Example of what motivates institutions …
6/15/17 Dataverse 2017 21
The cynical view …50 x $50,000 = $2.5M
6/15/17 Dataverse 2017 22
The Google University
6/15/17 Dataverse 2017 23
Both funders and institutions see the need to move from pipes to
platforms…In this regard Dataverse is ahead of the
curve
6/15/17 Dataverse 2017 24
https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model-750x410.png
Example: NSF and NIH Approaches
6/15/17 Dataverse 2017 25
If platforms are the answer we could ask the question…
Will biomedical research become more like Airbnb?
6/15/17 Dataverse 2017 26
Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working: – 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818
6/15/17 Dataverse 2017 27
Is not biomedical research the same?
6/15/17 Dataverse 2017 28
Why a comparison to Airbnb is not fair
• Airbnb was born digital
• The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research
Nevertheless there is much to be learnt
6/15/17 Dataverse 2017 29
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider
Reagent Consumer
Software Provider
Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS ProjectGoogle Drive
CourseraResearchgateAcademia.eduOpen Science
FrameworkSynapseF1000
Rio
Educator Student
Platforms – The situation today
6/15/17 Dataverse 2017 30
In summary there is not currently a widely adopted single platform for
the exchange of services in biomedical research. Either there is a platform per service or no platform
at all….
Funders and the institutions they fund need to work more closely to
implement platforms6/15/17 Dataverse 2017 31
Impediments to a biomedical platform
• Current work practices by all stakeholders
• Entrenched business models
• Size of the undertaking aka resources needed
• Trust
• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to-employee-innovation/#8bdbaa8111336/15/17 Dataverse 2017 32
Funders are pushing open data science…
Institutions are more resistent
6/15/17 Dataverse 2017 33
NIH – a culture of sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “HoldrenMemo”)
6/15/17 Dataverse 2017 34
Driving sharing and innovation: Open Science Prize
NIH, Wellcome Trust, HHMI
https://www.openscienceprize.org
• An international scientific challenge competition to encourage and support the prototyping and development of services, tools, or platforms that enable utilization of open content
• 96 submissions received
• Solvers from 45 countries,
spanning 5 continents
• Timeline
• May 2016: Phase 1 winners announced at Health DataPalooza
• Dec 1, 2016: Presentations and public voting
• Feb 2017: Overall winner announced
Institutions are going their own way
• More dependency on the state
• More dependency on philanthropy
• More dependency on foundations
• More depend public-private partnership
What We are Doing/Planning at One Institution
• Starting an open UVA initiative
• Focusing on practical training through Capstones
• Not owning anything; only working through collaboration
• Planning for the data village – an ecosystem in which students, faculty, staff, visitors, private sector reps, entrepeneurs live and work
6/15/17 Dataverse 2017 37
So let me summarize
• Data may be the next Renaissance
• Both funders and academic institutions are slow to realize this – its hard to break away from the old ways
• Result a growing gap between what both should be doing vs what they are doing
• More exemplars like Dataverse are needed that integrate aspects of the research lifecycle
6/15/17 Dataverse 2017 38
Acknowledgements
6/15/17 Dataverse 2017 39
The BD2K Team at NIH
My New Colleagues at UVA
The 150 folks who have passed through my laboratoryhttps://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0