Glycomics project overview
description
Transcript of Glycomics project overview
Knowledge Enabled Information and Services Science
Glycomics project overview
Knowledge Enabled Information and Services Science
Life Science Ontologies
• ProPreO• An ontology for capturing process and lifecycle information
related to proteomic experiments• 398 classes, 32 relationships• 3.1 million instances• Published through the National Center for Biomedical
Ontology (NCBO) and Open Biomedical Ontologies (OBO)
• Glyco• An ontology for structure and function of Glycopeptides• 573 classes, 113 relationships• Published through the National Center for Biomedical
Ontology (NCBO)
Knowledge Enabled Information and Services Science
Two aspects of glycoproteomics:
o What is it? → identificationo How much of it is there? → quantification
Heterogeneity in data generation process, instrumental parameters, formatsNeed data and process provenance → ontology-mediated provenanceHence, ProPreO models both the glycoproteomics experimental process and
attendant data
ProPreO ontology
Knowledge Enabled Information and Services Science
ProPreO population: transformation to rdf
Scientific Data
Computational Methods
Ontology instances
Knowledge Enabled Information and Services Science
“Protein RDF”
chemicalmass
monoisotopicmass
amino-acidsequence
n-glycosylationconcensus
Protein Dataamino-acidsequence
ChemicalMass RDF
MonoisotopicMass RDF
Amino-acidSequence
RDF
“Peptide RDF”
chemicalmass
monoisotopicmass
amino-acidsequence
n-glycosylationconcensus
parentprotein
CalculateChemical
Mass
CalculateMonoisotopic
Mass
DetermineN-glycosylation
Concensus
KeyProtein Path
Peptide Path
amino-acidsequence
Extract Peptide Amino-acid Sequence from Protein Amino-acid Sequence
ProPreO population: transformation to rdf
Scientific DataComputational Methods
RDF
Knowledge Enabled Information and Services Science
Semantic annotation of scientific/experimental data
Knowledge Enabled Information and Services Science
830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043
parent ion m/z
fragment ion m/z
ms/ms peaklist data
fragment ionabundance
parent ionabundance
parent ion charge
ProPreO: Ontology-mediated provenance
Mass Spectrometry (MS) Data
Knowledge Enabled Information and Services Science
<ms-ms_peak_list><parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”
mode=“ms-ms”/><parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/>
<fragment_ion m-z=“580.2985” abundance=“0.3592”/><fragment_ion m-z=“688.3214” abundance=“0.2526”/><fragment_ion m-z=“779.4759” abundance=“38.4939”/><fragment_ion m-z=“784.3607” abundance=“21.7736”/><fragment_ion m-z=“1543.7476” abundance=“1.3822”/><fragment_ion m-z=“1544.7595” abundance=“2.9977”/><fragment_ion m-z=“1562.8113” abundance=“37.4790”/><fragment_ion m-z=“1660.7776” abundance=“476.5043”/>
</ms-ms_peak_list>
OntologicalConcepts
ProPreO: Ontology-mediated provenance
Semantically Annotated MS Data
Knowledge Enabled Information and Services Science
Semantic annotation of Scientific DataSemantic annotation of Scientific Data
Annotated ms/ms peaklist data
<ms/ms_peak_list><parameterinstrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode = “ms/ms”/><parent_ion_mass>830.9570</parent_ion_mass><total_abundance>194.9604</total_abundance><z>2</z><mass_spec_peak m/z = 580.2985 abundance = 0.3592/><mass_spec_peak m/z = 688.3214 abundance = 0.2526/><mass_spec_peak m/z = 779.4759 abundance = 38.4939/><mass_spec_peak m/z = 784.3607 abundance = 21.7736/><mass_spec_peak m/z = 1543.7476 abundance = 1.3822/><mass_spec_peak m/z = 1544.7595 abundance = 2.9977/><mass_spec_peak m/z = 1562.8113 abundance = 37.4790/><mass_spec_peak m/z = 1660.7776 abundance = 476.5043/><ms/ms_peak_list>
Knowledge Enabled Information and Services Science
N-GlycosylationN-Glycosylation ProcessProcess (NGPNGP)Cell Culture
Glycoprotein Fraction
Glycopeptides Fraction
extract
Separation technique I
Glycopeptides Fraction
n*m
n
Signal integrationData correlation
Peptide Fraction
Peptide Fraction
ms data ms/ms data
ms peaklist ms/ms peaklist
Peptide listN-dimensional arrayGlycopeptide identificationand quantification
proteolysis
Separation technique II
PNGase
Mass spectrometry
Data reductionData reduction
Peptide identificationbinning
n
1
Knowledge Enabled Information and Services Science
Storage
Standard FormatData
Raw Data
Filtered Data
Search Results
Final Output
Agent Agent Agent Agent Biological Sample Analysis
by MS/MS
Raw Data to
Standard Format
DataPre-
process
DB Search
(Mascot/Sequest)
Results Post-
process
(ProValt)
O I O I O I O I O
Biological Information
SemanticAnnotationApplications
Semantic Web Process to incorporate provenance
Knowledge Enabled Information and Services Science
Raw2mzXML mzXML2Pkl Pkl2pSplit MASCOT Search ProVault
Raw mzXML Pkl pSplit MACOTresult
ProVaultresult
ExperimentalData Semantic
Annotation MetadataFile
SPARQL query-based User Interface
SemanticMetadataRegistry
PROTEOMECOMMONS
PROTEOMICS WORKFLOW
Integrated Semantic Information and knowledge System (Isis)
ProPreO ontology
EXPERIMENTAL DATA
Have I performed an error? Give me all result files from a similar
organism, cell, preparation, mass spectrometric conditions
and compare results.
Is the result erroneous? Give me all result files from a similar
organism, cell, preparation, mass spectrometric conditions
and compare results.
Knowledge Enabled Information and Services Science
Semantic Biological Web Service Registry
Semantic Web Service
Knowledge Enabled Information and Services Science
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE GlydeCT SYSTEM "http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11.DTD"><GlydeCT xmlns:GlydeCT="http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11"> <structure type="molecule" id="molecule_1" name=“GP1"> <part type="moiety" id=“moiety_1" ref=“some_file#GNGS" name="GNGS"/> <part type="moiety" id=“moiety_2" ref=“some_file#Man3" name="Man3GlcNAc2"/> <link from=“moiety_2" to=“moiety_1"> <link from=“residue_1" to=“residue_2"> <link from="C1" to="N4"/> </link> </link> </structure></Glyde-CT>
Gly|Asn|Gly|Ser
moiety_2
moiety_1
123
5
41
2
3
4
GLYDE-CT : GLYcan Data Exchange Based on a Connection Table Format
Knowledge Enabled Information and Services Science
Data, ontologies, more publications at Biomedical Glycomics project web site:
http://knoesis.wright.edu/research/bioinformatics/index.html
Thank You