Data Integration Team of the research unit DAP - Home -...

49
Data Integration Team of the research unit DAP South Green Bioinformatics activities at CIRAD Manuel Ruiz, CIP, Lima, 23rd january

Transcript of Data Integration Team of the research unit DAP - Home -...

Page 1: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Data Integration Team of the research unit DAP

South Green Bioinformatics activities

at CIRAD

Manuel Ruiz, CIP, Lima, 23rd january

Page 2: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

The Joint Research Unit DAP (Développement et Amélioration des Plantes = Plant Development and Genetic Improvement) :

The 2 main research thematics focus on genetics and plant improvement & development and adaptation.

Page 3: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Studied species:

rice, wheat, sorghum, sugarcane, banana, coconut, oil palm, yam, coffee, rubber tree, cocoa, cotton, apple and olive

Page 4: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

IS in development

Haplophyle

MS-DMind

Web portal information systems (IS)

http://southgreen.cirad.fr/ (very soon)

Page 5: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

a database that manages genetic and genomic information about tropical crops

an interactive information system for rice reverse genetics

a database for the phenotypic characterization of the Génoplante rice insertion line library

a Web portal for crossing cocoa phenotypic, genetic and genomic data

Information systems

http://tropgenedb.cirad.fr/

banana, cocoa, coconut, coffee, cotton, oil palm, rice, rubber, sugarcane, sorghum

rice

rice

cocoa

http://orygenesdb.cirad.fr/

http://urgi.versailles.inra.fr/OryzaTagLine/

http://cocoagendb.cirad.fr/

Page 6: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

a database that manages genetic and genomic information about tropical crops

Version 1.0• genetic map• QTL data• marker : RFLP, RAPD, SSR, etc.• genotype data• phenotype data• germplasm data

http://tropgenedb.cirad.fr/

Page 7: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

a database that manages genetic and genomic information about tropical crops

Version 1.0• genetic map• QTL data• marker : RFLP, RAPD, SSR, etc.• genotype data• phenotype data• germplasm data

http://tropgenedb.cirad.fr/ Chantal Hamelin

Xavier Argout,

Page 8: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 9: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 10: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 11: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 12: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Gaétan Droc

Page 13: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 14: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Pierre Larmande

Page 15: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Analysis pipeline for cDNA

Application for SSR marker development

Analysis

A methodology for genome-wide searches for orthologs in plants http://greenphyl.cirad.fr

Christophe PérinMatthieu ConteMathieu Rouard (Bioversity)

http://sat.cirad.fr/sat

http://esttik.cirad.fr (secure access)

Page 16: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Xavier Argout,

Page 17: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Xavier Argout,

Jean-François RamiClaire Billot

Page 18: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Plants genomes sequencingPlants genomes sequencing

species Genome size (Mb) Chromosomes number

Arabidopsis thaliana 119.2 5 Complete

Oryza sativa 389 12 Complete

Populus trichocarpa 480 19 Complete

Vitis vinifera 475 19 Complete

Chlamydomonas reinhardtii

100 17 Complete

Sorghum bicolor 760 10 Complete

Medicago truncatula 500 8 Complete

Physcomitrella patens 511 27 Complete

Solanum lycopersicum 950 12 In progress

Triticum aestivum 16500 7 In progress

Zea mays 2365 10 In progress

C Périn

Page 19: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Species Clone Feature Ploidy, Heterozygosity

Data Organization Manager project Status Organism Family

Musa acuminata Calcutta 4 wild banana Heterozygous diploid AA

40 BAC

Cavendish Grande Naine

Cultivated bananas are sterile, parthenocarpic, vegetatvely propagated plants

Heterozygous triploid AAA 6 BAC

Musa balbisianaPisang Klutuk Wulung

wild bananaHeterozygous diploid BB 18 BAC

Musa acuminata Pahang doubled haploid Homozygous AA 2*600 Mb (2*11Χ) Roux & D'HontComplete genome JGI Genoscope

Submitted

Saccharum hybrid

R570 sugarcane

Heterozygous dodecaploid aneuploid (spontaneum and officinarum parents)

12 BAC

International Consortium for Sugarcane Biotechnology (ICSB)

D'Hont BAC Genoscope In progress

Oryza sativa Japonica Rice is a model organism for monocotyledon

diploid 2*430 Mb (2*12Χ)

International Rice Genome Sequencing Project (IRGSP)

Droc complete genome

Done

Sorghum bicolor Sorghum diploid 2*800 Mb (2*10Χ) Rami complete genome JGI

In progress

Elaeis guineensis

Jacq African oil palm diploid 2*1000 Mb (2*16Χ) Billotte complete genome

To submit Arecaceae

Arabidopsis thaliana

Col-0 Thale cress is a model organism for dicotyledon

diploid 2*115 Mb (2*5Χ) International Collaboration

complete genome

Done Brassicaceae

Theobroma cacao Criollo Cacao tree

Homozygous diploid 2*380 Mb (2*10Χ)

international consortium of cocoa genomics

Lanaud complete genome To submit Malvaceae

Dicotyledon

Musaceae

Poaceae

Global Musa Genomics consortium

Rouard & Baurens BAC NIAS In progress

Monocotyledon

Analysis of plant genomes

Page 20: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Manual curation is not sufficient

Function comment fields for all proteins in Swiss-Prot over time.Baumgartner bioinformatics 2007

Page 21: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

project 2008-2010

• Develop a platform of structural and functional annotation supported by comparative genomics

• Dedicated to plant and bio-aggressor genomes

• Allowing both automatic predictions and manual curation of genes and transposable elements

• User-friendly, generic, modular, portable, sustainable, upgradable et compatible

BIVI Spo

GDEC

Page 22: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Instances

Page 23: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Instances

Stéphanie Sidibe-Bocs

Valentin GuignonGaetan Droc

Page 24: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Information system model

Page 25: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.
Page 26: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Apollo Editors

x

Page 27: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

http://orygenesdb.cirad.fr/cgi-bin/gbrowse/gnpannot [email protected]

Page 28: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

“I suggest that functional predictions can be greatly improved by

focusing on how the genes became similar in sequence (i.e., evolution) rather than on the sequence similarity itself”.

Jonathan A. Eisen

1998

Find homologs using phylogenomic analysis

GreenPhylDBuse phylogenomic

method to identify homologous genes

Page 29: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

•• Orthologous genesOrthologous genes are homologous genes that are descended are homologous genes that are descended from the last common ancestor through from the last common ancestor through speciationspeciation and most and most probablyprobably encode proteins with a encode proteins with a similar functionsimilar function in different in different speciesspecies

•• ParalogousParalogous genesgenes are referred as homologous genes that are referred as homologous genes that evolved through evolved through duplicationsduplications and and maymay encode proteins with encode proteins with more more divergent functionsdivergent functions

Arabidopsis gene

Rice gene A

Rice gene B

Orthologs

Speciation event

Paralogs

Gene duplication event

Homologs genes: orthologs and paralogs

Page 30: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

GreenPhylDB V1.0 http://greenphyl.cirad.fr/

• Oryza sativa and Arabidopsis thaliana model plants

• Full genome available

• Gene annotation quality – TAIR gene database release 8: gene ID like ‘At1g12345’– TIGR gene database release 5: gene ID like ‘Os01g12345’

• Most of functional evidence.

Page 31: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Filtering procedureFiltering procedure LEON*LEON*

LowLow--complexity maskingcomplexity masking

Alignment refinementAlignment refinement

AlignmentAlignment

Alignment maskingAlignment masking

MAFFTMAFFT

CASTCAST

RascalRascal

AL2COAL2CO

Splices selection SS*SS*

Gene id indexingGene id indexing GI*GI*

FILTERING

MULTIALIGNEMENT

Genetic distance (x100)Genetic distance (x100)

Rooting tree (x100)Rooting tree (x100)

Tree construction (x100)Tree construction (x100) PHYMLPHYML

ProtDistProtDist

SDISDI

Bootstrapping Bootstrapping alignementalignement(x100)(x100)

SeqBootSeqBoot

Set Bootstrap values on Set Bootstrap values on PHYML treePHYML tree SB*SB*

OrthologsOrthologs InferenceInference DoRIODoRIO

Output: Orthologs predictions (.txt & NHX files)

TREE CONSTRUCTION

ORTHOLOGSINFERENCE Gene id indexingGene id indexing GI*GI*

Pipeline phylogénomique: GreenPhyl

Page 32: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

30500 Arabidopsis genesTAIR

50200 rice genesTIGR

4400 phylogeneticaly analyzed gene families

GreenPhyl phylogenomic pipeline

24.000 orthologs relationships between rice and Arabidopsis Probable same function

6420 manually validated gene families

Automatic clustering procedure

Page 33: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

1. 1. Add informationAdd information to rice or to rice or aratharath with a gene from a new species with a gene from a new species particularly studiedparticularly studied

Gene with KNOWNbiological information

?

Query

?

?

i-GOST (Iterative GreenPhyl Orthog Search Tool)

2 objectives

2. 2. Get informationGet information on a new sequenced gene using information on a new sequenced gene using information available from rice or available from rice or arabidopsisarabidopsis

Gene with UNKNOWN function

Add biological information

? Query

Page 34: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

GreenPhylDB V2.0 … in progress Objectives

• 10 news fully sequenced genomes are now available(Populus alba, Glycine max, sorghum bicolor, Medicago truncatula, Vitis vinifera , Selaginella moellendorffii , Physcomitrella patens , Ostreococcus Tauri, Chlamydomonas reinhardtii , Cyanidioschyzon merolae )

• Why do you integrate these species?

1. Complete sequencing and gene prediction2. Will provide the complete list of plant gene families!3. Use functional information available on these species4. Reinforce phylogenomic signal and then orthologs predictions5. Have a good taxonomy sampling

Page 35: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

GreenPhylDB V2.0 A huge database…

GreenPhylDatabase

V1.0

GreenPhylDatabase

v2.0

10 news species~300,000 sequences

2 species81,000 genes21,400 clusters6,400 genes families

~390,000 sequences~ 25,000 clusters

Page 36: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Functional Annotation

Page 37: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Annotation fonctionnelle

Page 38: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Knowledge modeling of the structure-function

relationships

Page 39: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

The insulin receptor pathway

Page 40: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Knowledge modeling of the sequence-structure

relationships

Page 41: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

project 2009 - 2011

Page 42: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

GCP Generation Challenge Programme

A global consortium of crop research institutes established in 2003 with an approximately 10 year mandate to integrate comparative genomics and genetic resources molecular characterisation into plant breeding for stress tolerance, in particular, in drought-prone environments.

http://www.generationcp.org

Page 43: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Generation Challenge Program

Gautier Sarah

Haplophyle, Methodology development for reconstruction of Genealogies based on Haplotypes related to geographic patterns (HaploPhyle: graphical haplotype network in the light of external data)

GenDiversity is a query and analysis application combining genotyping data from diverse data sources, developed in support of diversity studies.

Page 44: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Data Integration

Page 45: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

CIMMYT, CIRAD, IRRI, CIP, ICISAT, etc. Raw data

Scientists

non GCP DB

GCP DB

Software analysis

Platform

Data Integration

Page 46: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

Platform architecture

Page 47: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

IDID

SRGSRG

SEGSEG

DARDAR

GSGS

DGBDGB

DIADIA--PCPCGDPGDP

CINESCINES

LIRMMLIRMMO. O. GascuelGascuelI. I. MougenotMougenot

INRA, INRA, Genoscope,CNGGenoscope,CNG……

SwissprotSwissprotGMOD consortiumGMOD consortiumBiotecBiotec

((ThailandThailand))

GCP programGCP programBioversityBioversity

CIP, IRRI, CIMMYT,CIP, IRRI, CIMMYT,EMBRAPAEMBRAPA,,……

Equipes biomEquipes bioméétrie Ciradtrie Cirad

PartnershipPartnershipAgropolis

International

France

X. PerrierX. PerrierL. BaudouinL. Baudouin

Page 48: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

ATGC: LIRMM Bioinformatics

platform

BiologicalRessources Genomics Genetic

RessourcesEvolution Sequence

analysisAnalysis of gene

expression

AgropolisAgropolis Plants Plants BioinformaticsBioinformatics((geneticsgenetics and and genomicsgenomics))

New algorithms

UMRs DAP, DIAPC, BGPI, LSTM, SPO, RPB, BIVI, LGDP

Geneticdiversity

Page 49: Data Integration Team of the research unit DAP - Home - …cipotato.org/.../uploads/Seminars-archives/ruiz_ppt.pdf ·  · 2017-05-03Data Integration Team of the research unit DAP.

High Power Computing

CINES, Montpellier