Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
-
Upload
lshtm -
Category
Technology
-
view
147 -
download
1
description
Transcript of Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
![Page 1: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/1.jpg)
Enabling simultaneous analysis of multiple cohort studies without accessing the full
dataset: a BRISSKit use case
Dr Jonathan Tedds [email protected] @jtedds
Senior Research Fellow, Health Informatics & Interdisciplinary Research Group,Department of Health Sciences (University of Leicester)
PI #BRISSKit http://www.brisskit.le.ac.uk
![Page 2: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/2.jpg)
![Page 3: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/3.jpg)
http://www.astrogrid.org(April 2008 1st public release)
![Page 4: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/4.jpg)
Data Reuse: asking new questions
Hubble Space Telescope• Papers based upon reuse of archived observations now exceed those based on the use
described in the original proposal.– http://archive.stsci.edu/hst/bibliography/pubstat.html
• See also work by Piwowar & Vision re life sciences: “Data reuse and the open data citation advantage”– http://peerj.com/preprints/1/
![Page 5: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/5.jpg)
Science as an Open Enterprise ReportWhy open?• As a first step towards this intelligent openness,
data that underpin a journal article should be made concurrently available in an accessible database
• We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7]
• Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science-public-enterprise/report/
• Issues linking data to the scientific record:– Data persistence– Data and metadata quality– Attribution and credit for data producers
• Geoffrey Boulton (Edinburgh), Lead author:– “Science has been sleepwalking into crisis of
replicability...and of the credibility of science”– “Publishing articles without making the data
available is scientific malpractice”
![Page 6: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/6.jpg)
BRISSKit context: The I4Health goal of applying knowledge engineering to close the
‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
![Page 7: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/7.jpg)
Biomedical Research Infrastructure Software Service Kit
A vision for cloud-based open source research applications #BRISSKit
http://www.brisskit.le.ac.uk
![Page 8: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/8.jpg)
http://www.brisskit.le.ac.uk
![Page 9: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/9.jpg)
BRISSKit USPs Integrated support for core research processes
Well-established mature open source applications as protoyped in Cardiovascular, Respiratory, Cancer Theme Biobank: UK customised
A platform for seamless management and integration between applications
An API allows integration with existing clinical systems
Easy set up, use and administration through browser (including on mobile devices)
Capability of being hosted in any compliant cloud provider including UHL (NHS information governance)
![Page 10: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/10.jpg)
www.brisskit.le.ac.uk Email: [email protected]
![Page 11: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/11.jpg)
BRISSKit Community & Hack Event, Oct 2012
http://www.brisskit.le.ac.uk/node/35
![Page 12: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/12.jpg)
BRISSKit Information Governance& Security Management Work Stream
- Dr Andrew Burnham leading
1. Information Governance Toolkit - analysis of Department of Health (DoH/NHS) IGT requirements vs. BRISSKit organisation/project and services/toolsa) Hosted Secondary Use Team/project (Hosted IGT)b) Acute Trust (Acute Trust IGT)
2. IG Training Tool (NHS – University is registered)
3. Pseudonymisation requirements
4. Data Management Plan
5. IT Security & standards – Penetration Testing & Security Testing
6. Other NHS Standards/Requirements:- Care Records Guarantee- NHS Constitution- NHS Records Management- Patient Safety DSCN 14/2009, 18/2009
![Page 13: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/13.jpg)
The semantic bridge
?
OBiBa Onyx
Records participantconsent, questionnairedata and primaryspecimen IDs
i2b2
Cohort selection and data querying
Bio-ontology!
![Page 14: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/14.jpg)
• Deploy solutions in international bio banking initiatives
• Investment through Prof Paul Burton (Health Sciences at Leicester/Bristol) & international collaborations
• Building on strong informatics expertise at University of Leicester in partnership with the University Hospitals Leicester Trust• Cardiovascular, Respiratory & Lifestyle
BRUs• Cancer Theme Biobank• Genomics etc
BRISSKit and Bio Banking
![Page 15: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/15.jpg)
Contemporary biobanking: meeting the “data” challenge
![Page 16: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/16.jpg)
Large data sets, why bother?
• Sample size• Depth of phenotyping• Quality of measurementAll critical
![Page 17: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/17.jpg)
How big is BIG?• The direct effect of a gene• 2,000 cases minimum, 10,000 cases better
• Environmental and life-style factors• Highly context specific: from hundreds to tens of
thousands of cases• Gene-lifestyle and gene-gene “interactions”• Absolute minimum 10,000, usually need at least
20,000, a comprehensive platform needs at least 50,000• Scientifically fundamental
![Page 18: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/18.jpg)
The bottom line Effective data access is crucial Effective joint analysis is essential too (integration) Fundamental challenges• Scientific harmonization• Restriction on access to individual level data• Streamlined access to multiple data sets
Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc
Also fundamental to the aims of potential BRISSKit users
18
![Page 19: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/19.jpg)
Horizontally partitioned data
Data
PREVENDPREVEND
1958BC1958BC
Data
KORAGENKORAGEN
Joint centralanalysis
Data
FINRISKFINRISK
Data
How can we undertake a full joint analysis using multiple data sources if the data cannot physically be pooled? Ethico-legal constraints Physical size of the data objects Intellectual property issues
![Page 20: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/20.jpg)
DataSHIELD: a novel solutionTake analysis to data not data to analysis
One step analyses: simple
Iterative analyses: parallel processes linked together by entirely non-identifying summary statistics
Typically produces mathematically identical results to fitting a single model to all the data held in one pooled data set
Take analysis to data not data to analysis
One step analyses: simple
Iterative analyses: parallel processes linked together by entirely non-identifying summary statistics
Typically produces mathematically identical results to fitting a single model to all the data held in one pooled data set
![Page 21: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/21.jpg)
R
R
R RAnalysis
Computer
Web services
Web servicesWeb services
Data computer OpalFinrisk
OpalFinrisk
OpalPrevend
OpalPrevend
Opal1958BC
Opal1958BC
Data computer Data computer
BioSHaREweb site
BioSHaREweb site
Web services
Horizontal DataSHIELD
![Page 22: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/22.jpg)
R
R
R RAnalysis
Computer
Web services
Web servicesWeb services
Data computer OpalFinrisk
OpalFinrisk
OpalPrevend
OpalPrevend
Opal1958BC
Opal1958BC
Data computer Data computer
BioSHaREweb site
BioSHaREweb site
Web services
Horizontal DataSHIELD
Opal includes• DataSHIELD• DataSHaPER• Researcher ID
![Page 23: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/23.jpg)
R
R
R RAnalysis
Computer
Web services
Web servicesWeb services
Data computer OpalFinrisk
OpalFinrisk
OpalPrevend
OpalPrevend
Opal1958BC
Opal1958BC
Data computer Data computer
Horizontal DataSHIELD
BioSHaREweb site
BioSHaREweb site
Web services
Work in progress:• Embed Opal in BRISSKit • ALSPAC• MRC e-HIRCS• +more…
![Page 24: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/24.jpg)
Opal1958BC
Opal1958BC
![Page 25: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/25.jpg)
Opal1958BC
Opal1958BC
BRISSKit gains• DataSHIELD• DataSHaPER• Researcher ID
![Page 26: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/26.jpg)
Opal1958BC
Opal1958BC
Opal gains• Direct interface with more tools• I2B2 functionality• Potential for
enhanced user interface
BRISSKit gains• DataSHIELD• DataSHaPER• Researcher ID
![Page 27: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/27.jpg)
Opal1958BC
Opal1958BC
BRISSKit gains• DataSHIELD• DataSHaPER• Researcher ID
Opal gains• Direct interface with more tools• I2B2 functionality• Potential for
enhanced user interface
Everybody gains• Enhanced combined functionality - better science• Bigger user group - greater portability• Greater potential to become a sustainable standard
![Page 28: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/28.jpg)
Opal1958BC
Opal1958BC
BRISSKit gains• DataSHIELD• DataSHaPER• Researcher ID
Everybody gains• Enhanced combined functionality - better science• Bigger user group - greater portability• Greater potential to become a sustainable standard
Opal gains• Direct interface with more tools• I2B2 functionality• Potential for
enhanced user interface
Enhanced joint analysis with• Ethico-legal constraints e.g.US/Europe biobanks• Intellectual property issues e.g. H3AFRICA
![Page 29: Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case](https://reader035.fdocuments.net/reader035/viewer/2022062404/554b013fb4c905c12d8b4d66/html5/thumbnails/29.jpg)
The bottom line Effective data access is crucial Effective joint analysis is essential too (integration) Fundamental challenges• Scientific harmonization• Restriction on access to individual level data• Streamlined access to multiple data sets
Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc
Also fundamental to the aims of potential BRISSKit users
29