Análise de genomas e transcriptomas Prof. Dr. Francisco Prosdocimi.
Prosdocimi ucb cdao
-
Upload
francisco-prosdocimi -
Category
Education
-
view
478 -
download
5
description
Transcript of Prosdocimi ucb cdao
Francisco ProsdocimiBrandon Chisham
Enrico PontelliArlin Stoltzfus
Julie Thompson
Framework for a Comparative Data Analysis Ontology
IGBMC Department SeminarFebruary 2009, Strasbourg
Linking Evolution and Integrative Biology
BackgroundBackground Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
An explosion of the number and quality of data to be analyzed
Nature4th September 2008
The Petabyte era (1015): a new generation ofDNA sequencers is up and runninggenome annotation, protein function and structure prediction, homologs searches, prediction of SNPs, etc
New tools are needed for the about-to-exist individual-based genomic sciences and medicine: populational genomics, farmacogenomics, evolutionary genomics
Lots of new data exiges large-scale automated analysis interactome, gene expression, microRNA evolution, etc
Integrative biologydata mining, analysis and integration
Powerful tools for evolutionary analysis remain under-utilized and difficult to apply
Nowadays tools are mainly used in an expert-supervised approach, which is time-consuming, difficult to document, error-prone, and not scalable
Need for better documentation of the whole pipeline used for evolutionary analysis
Other ChallengesOther Challenges
Ortholog searches
MultipleAlignment
Alignmentrefinement
Phylogeneticreconstruction
Sequencing andBase-calling
DNAextraction
Statisticalanalysis
Extraction kitsConditions
PCR conditionsSequencerPHRED
BLAST BBHCOGnitorPSI-BLASTPhylogeny
ClustalT-CoffeMAFFTMultAlign
ManualLeonREFINERHMM
ParsimonyMax LikelihoodPAUPPhylip
BootstrapJacknifeBayesianMCMC
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
New tools are necessary to the automatic treatment of high-throughput data
NESCent
Evo-Info@NESCent: a dozen scientific experts in phylogenetic software development got together to discuss these problems
Need to lower the technology barrier to apply the full force of evolutionary analysis to emerging problem areas (systems biology)
An integrated solution would make use of a combination of technologies, including: Clear workflow schemas User-friendly software and web-services Promotion of new databases and data standards Development of standard vocabulary to represent
evolutionary data C-DAO
What to do?What to do?
http://evoinfo.nescent.org/
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Developing StandardsDeveloping Standards Standards for standards: formally
approved standards are defined by a number of international bodies, such as W3C
The modern way to standardize knowledgeis creating ontologies and they have beensuccessfully applied for a number of other biomedical applications
Standardization of knowledge is a crucial step forward to allow easy communication and data interoperability
Standardization does not remove diversity but does improve connection, documentation, annotation and scalability
obo
Connecting data, connecting people, connecting algorithms
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
What is an ontology?What is an ontology? Ontology from philosophy: study of the
nature of being, existence and reality
Ontology and Language: description of concepts (nouns) to describe events and entities in the real world and relations (actions or verbs) to relate these entities
Biomedical ontologiesPositive heuristicsfertile research program
“The positive heuristic of the programme saves
the scientist from becoming confused by
the ocean of anomalies.”
Imre Lakatos (1922-1974)
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
“the mathematician is said to speak not about numbers, functions and
infinite classes but merely about meaningless
symbols and formulas manipulated according to
given formal rules”
Rudolf Carnap(1891-1970)
Hein? O que é mesmo?Hein? O que é mesmo? Conjunto de termos e relações entre termos que devem ser
utilizados para a descrição de algum fenômeno natural
A ontologia da pizza, definição de termos Relações (verbais) entre termos: temMassa, temBorda,
temIngrediente, temTopo, éMassaDe, éTopoDe Termos: Pan, Italiana, recheioCatupiry, recheioQueijo,
molhoDeTomate, Calabresa, Presunto, QuatroQueijos, Pimentão, Cebola, Ovo, Frango...
Instanciando a ontologia MinhaPizza temMassa Pan
MinhaPizza temBorda recheioQueijoMinhaPizza temIngrediente molhoDeTomateMinhaPizza temIngrediente FrangoMinhaPizza temTopo Catupiry
Gerando novas informações Valor nutricional, preço
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
A ontologia é a criação de uma linguagem formal com termos e relações entre termos que podem ser instanciados para a descrição formal de eventos do mundo real/natural.
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Gene ontologyGene ontology Primeira ontologia criada em biologia molecular, 2000
Consórcio para a padronização da anotação gênica
Vocabulário padrão para a descrição de genes em três categorias Processos biológico Função molecular Localização celular
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
As sub-ontologias do GO
Anotação de genomas usando os mesmos termos
Comparação eficaz
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Além do Gene ontologyAlém do Gene ontology OBO foundry: The open biomedical ontologies
Anatomy ontologies
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
GO X CDAOGO X CDAO Pré-CDAO ontologies (GO, anatomy, etc.)
Relações semânticas simples (is_a, part_of) entre os conceitos criados; ontologia descritiva
Relation ontology: limitação do número de relações (verbos) a serem utilizados na descrição
CDAO Relações semânticas complexas Tentativa de criar uma verdadeira linguagem lógico-formal
para a descrição de eventos Possibilidade de realização de inferências novas
Knowledge discovery Uma vez que os dados tenham sido anotados de acordo com
termos e relações fixas, programas conhecidos como reasoners são capazes de ler o vocabulário e realizar inferências automáticas → Petabyte-era
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
MIAPA integrationMIAPA integration
MIAME - Minimum InformationAbout a Microarray Experiment(Nat Genet. 2001) Documentação formal da informação
mínima necessária para a reprodução do experimento
MIAPA - Minimum InformationAbout a Phylogenetic Analysis(OMICS, 2006)
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Algorithm for CDAOAlgorithm for CDAOIF Petabyte era, BIG-data
AND
Non-scalability of modern evolutionary analysis
AND
Science as language creation
AND
We know the standards to create standards
AND
Biomedical community know how to use ontologies (GO)
THENWe gonna create this evolutionary ontology and help people to use and talk about evolution! However...
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
“Nothing in biology makes sense except
in the light of evolution”
T. Dobzhansky
(1900-1975)
The central role of Evolutionary biology
Every single data collection made in biology can be viewed from an evolutionary perspective
CDAO must be able to represent virtually any data collection in the whole field of biology under an evolutionary perspective! From biochemistry to zoology, genetics to botany, genomics to ecology, microbiology to development, physiology and medicine and so on…
And... there are controversies among scholars... What is a species? What is an OTU? Should evolutionary
characters be homologous? Darwin’s selectionism or Kimura’s neutralism? Gradualism or punctuated equilibrium? Phenetics or cladistics? Parsimony or likelihood?
Evolution as the Evolution as the corecore Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Phenetics and cladistics data are both supported into C-DAO
Aimed at the formalization of the structure of knowledge on evolutionary analysis
1. To represent both the data and the objective classification (tree) of compared entities, methods used on the analysis and relevant information
1. To map the stepwise history of evolution, including a chronicle of character-modification events
1. To make biological inferences about the present (propagating knowledge)
1. To cope with different views and paradigms applied on modern evolutionary biology field
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
1 Specification – Use casesProtein family alignment, Modelling character evolution, Functional inference, Human variation, Bayesian supertrees, Determine concordance between two or more phylogenies, Estimate divergence times, Determine genome-wide distribution of Ks (silent site substitutions), Tree reconciliation (orthology analysis), etc.
2 Representation
3 ConceptualizationDefine the conceptsDefine the relations between concepts (semantics)Define numeric restrictions
4 Implementation
5 Evaluation
Back to step3:Reconceptualization
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
DataIntegration
Datarepresentation
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
EvaluationEvaluation Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Translation of real test-cases represented in NEXUS files into C-DAO instances
C-DAO internal format
<cdao:Node rdf:ID="inode15"> <cdao:part_of rdf:resource="#Tree_con_50_majrule"/> <cdao:belongs_to_Edge rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge rdf:resource= "#edge_Athaliana_CAB79970_inode15" /> <cdao:belongs_to_Edge rdf:resource="#edge_Athaliana_AAD31363_inode15" /> <cdao:belongs_to_Edge_as_Child rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Athaliana_CAB79970_inode15" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Athaliana_AAD31363_inode15" /> <cdao:nca_node_of rdf:resource="#set_nca_44"/></cdao:Node>
<cdao:Directed_Edge rdf:ID="edge_Athaliana_CAB79970_1_inode15"> <cdao:part_of rdf:resource="#Tree"/> <cdao:has_Parent_Node rdf:resource="#node_inode15"/> <cdao:has_Child_Node rdf:resource="#node_Athaliana_CAB79970_1"/> <cdao:has_Annotation rdf:resource="#edge_Athaliana_CAB79970_1_inode15_length"/></cdao:Directed_Edge><cdao:Edge_Length rdf:ID="edge_Athaliana_CAB79970_1_inode15_length"> <cdao:has_Value rdf:datatype="&xsd;float"> 0.009539 </cdao:has_Value></cdao:Edge_Length>
http://www.cs.nmsu.edu/~bchisham/ontology/test_results/
Allows the representation of large datasets (syntactics, data representation)
Allows different anomalous datasets to be combined (data integration)
Provides strict concepts making researchers speak in a standard vocabulary (avoids a Babel’s Tower problem)
Allows logical inferences and knowledge propagation to bemade automatically (semantics)
1. If TU1 has_annotation == GO:00062602. If TU2 has_annotation == “”; 3. If TU3 has_annotation == GO:00062604. If TU1, TU2 and TU3 form a monophyletic cladeTHEN TU2 has_annotation = GO:0006260
And so far, CDAO...And so far, CDAO...
TU1TU3 TU2
AN1
AN2
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Future ChallengesFuture Challenges Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Verify the usability of the ontology by evolutionary biologists
Development of new tools for data format conversion
Integrate C-DAO into a generic workflow of evolutionary biology software (Arlin Stoltzfus)
Integrate CDAO with other ontologies (MAO, SO, AA, anatomy) for specific applications
Expand terms and concepts to allow a broader representation of evolutionary and comparative data
ConclusionsConclusions C-DAO is a prototype for a well-annotated ontology
providing represention of key concepts in evolutionary analysis, such as:
Phylogenetic trees of entities-to-be-compared Character-state data representing the attributes of entities Methodological annotation of procedures used on the
analysis (integration with MIAPA) Evolutionary changes in characters over time
It aims to facilitate communication, annotation, program interoperability, data integration and automated analysis of large-scale evolutionary datasets
http://sourceforge.net/projects/cdao
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
PublicationsPublications Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
AcknowledgementsAcknowledgements
JonathanJoeMarkJohnSergei L.SudhirPaul O.AaronDavidWayneWeigangAndrewArlinDavid L.RutgerXuhuaChristian
EisenFelsensteinHolderHuelsenbeckKosakovsky PondKumarLewisMackeyMaddisonMaddisonQiuRambautStoltzfusSwoffordVosXiaZmasek
UC Davis Genome Center, UC Davis, CADepartment of Genome Sciences/ Biology, Seattle, WASchool of Computational Science, FSU, Tallahassee, FL University of California, San Diego, CA Antiviral Research Center, UC, San Diego, CACenter for Evolutionary Functional Genomics, Tempe, AZUniversity of Connecticutt, Storrs, CTGlaxoSmithKline, King of Prussia, PADepartment of Entomology, UA,Tucson, AZDepartments of Zoology and Botany, UBC, Vancouver, BCDepartment of Biological Sciences, HCCUNY, New York, NYZoology Department, University of Oxford, Oxford, UKInstitute of Evolutionary Biology, UE, Edinburgh, UKSchool of Computational Science, FSU, Tallahassee, FLUniversity of British Columbia, Vancouver, BC (Canada)Biology Department, University of Ottawa, Ottawa, ONBurnham Institute for Medical Research, La Jolla, CA
https://www.nescent.org/wg_evoinfo/
Evo-info working group
EvolHHuPro/LBGI working group Pierre Pontarotti, Elodie Darbo, Philippe GouretOlivier Poch and LBGI members
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Pós-graduação em ciências genômicas e biotecnologia - UCB
Julie Thompson Enrico PontelliBrandon Chisham
Arlin Stoltzfus
Visit our web-page at
http://evolutionaryontology.org
Dr. Francisco Prosdocimi – [email protected]
Introduction/ Motivation
Development
Features
Evaluation
Application
Concluding remarks
Francisco Prosdocimi
CDAO meeting
August, 2009
Las cruces, New Mexico