Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: The...
-
Upload
philippe-rocca-serra -
Category
Science
-
view
131 -
download
2
Transcript of Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: The...
Modeling a microbial community
and biodiversity assay with OBI
and PCO: the gains of a modular
approach
ICBO2014, in Houston Oct 6-9Philippe Rocca-Serra, Ramona Walls, Jacob Parnell, Rachel Gallery, Jie
Zheng, Susanna Assunta Sansone and Alejandra Gonzalez-Beltran
Biodiversity in the
News
• Grim headlines
• True for many
Vertebrates species
• Mankind only now
starts to build tools
enabling true
exploration of diversity
Exploring the world biodiversity
• Game changing progress in sequencing technology
– Illumina
– Oxford Nanopore Minion
http://dx.doi.org/10.5524/100102
Microbial Diversity
Biodiversity studies with molecular
techniques
• Shotgun sequencing:
– Sequencing as much as possible (probing is
limited by sequencing depth available, the
rarer the species, the deeper the sequencing
needs to be)
• Targeted sequencing:
– Reliance on a ‘marker gene’ whose variability
will be used to estimate distance between
species
‘Barcode’ as in Multiplexed
Libraries
Credits: http://rdp.cme.msu.edu/wiki/index.php/Pyrosequencing_Help
genomic DNA isolated from individual sample is
-fragmented (shearing)
-ligated to a unique short DNA tag (i.e called the barcode)
-PCR amplification and sequencing
-output of a single collection of reads which can be subsequently sorted
using the DNA short-hand by computational mean – deconvolution process
‘Barcode’ as in Barcode of Life
Credits: http://www.barcodeoflife.org
Ambiguous Language
• What is a barcode or what is a barcoding experiment?
– Metaphors are impenetrable to computers.
– Need to make representation unambiguous
– Barcoding, meaning a technique for processing more samples in one go -> another word for multiplexing
– Barcoding, meaning the creation of a unique profile as a means to identify types of living things
Heaps of sequence data for
sure….but
• What is the value in
the absence of
accompanying
descriptors?
• Essential annotation
to ascertain identity
and origin, sampling
conditions and
rationale
Helping Data Management
• MIXS Guidelines checklist
• SRA xml schema, Genbank records…
• Tabular Templates for Data Collection
• Wealth of RDF conversion tools
– R2RML W3C data standards
• Using the same xml and same guidelines,
nevertheless ambiguities subsist
ISA templates for Microbial
Diversity Studies
• Integrating MIXS checklist in the ISA
framework
• Mapping MIXS entities into SRA XML
schema
– Properties of sample
– Properties of sample processing
– Properties of resulting libraries
– Properties of data processing
Ambiguities: Barcoding
• Library Experiment Sample unicity
• Use Case: creation of libraries for
Bacteria,Fungi,Eukaryota with specific genes
(16sRNA, ITS, COI)
• ISA conversion to ENA:
– 1 sample -> 3 libraries
• SRA/ENA submission:
– 3 libraries -> 3 samples
Working with OBI, PCO,SO, CHEBI
Drawn using CMAPtools: http://cmap.ihmc.us
Working with OBI, PCO,SO, CHEBI
Drawn using CMAPtools: http://cmap.ihmc.us
OBI-PCO based representation
• ‘targeted gene survey’
• has part some ‘library preparation’ (OBI_0000711)
• ‘polymerase chain reaction’ (OBI_0000415) is_part_of ‘library preparation’ (OBI_0000711)
• ‘polymerase chain reaction’(OBI_0000415)
• has_specified_input some ‘forward pcr primer’ (OBI_0000722)
• has_specified_input some ‘reverse pcr primer’ (OBI_0001951)
• has_specified_input some ‘multiplexing sequence identifier’
• has_specified_input some ‘DNA extract’ (OBI_0001051)
• ‘library preparation’ (OBI_0000711) ‘has_specified_output’ some ‘single fragment library’ (OBI_0000736)
• ‘library preparation’ (OBI_0000711) precedes ‘DNA sequencing’(OBI_0000626)
• ‘library sequence deconvolution’ is_preceded_by ‘DNA sequencing’(OBI_0000626)
• ‘library sequence deconvolution’ is_followed_by ‘(OBI_0200187)’
• ‘sequence analysis data transformation’ (OBI_0200187) has_specified_output some ‘data item’ (IAO_0000027) and is about ‘population quality’ (PCO_0000003)
Conclusions
• We have clarified the OWL representation of
several assays commonly used in biodiversity
studies.
• We have outlined good practice for serializing
biodiversity experimental process both using ISA,
SRA and RDF format
• We have shown how synergies obtained from
resources of the OBO Foundry can greatly benefit
fast development of fit for purpose tabular data
collection templates which greatly help compliance
with annotation standard guidelines.
Why does it matter?
• Correct sample size assessment
• Assessing independence of samples and
sampling events.
• Is it really possible to ascertain identity of
samples by solely relying a metadata?
• How can such uncertainties affect
downstream analysis / meta analysis?
Future directions
• Sample Collection Protocols and
Procedures as applied in biodiversity
studies (field studies, “Marine macrofauna
grab sampling method” and so forth)
• Clarify the reporting of actual results
• Keeping working with PCO and OBO
Foundry related efforts.
Acknowledgements
• Dr. Ramona Walls (iPlant, Uni of Arizona)
• Pr. Paula Mabee (Uni South Dakota)
• RCN: Phenotype Ontology Research Coordination Network , National Science Foundation (NSF-DEB-0956049), (2010 - 2015)
• Dr. Jie Zheng and OBI companions
• PCO coworkers and RCN workshop participants
• ISA Team
• You
Acknowledgements 2