Cassava genome hub
Transcript of Cassava genome hub
Cassava Genome HubUpdates on cassava genomics big data management and analysis.
Anestis Gkanogiannis
CIRAD
June 24, 2016
IntroductionBig Data
The Cassava Genome HubUsecase
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
Who am I?
Born and raised in the Greek island of Evia.
Physicist, 1998 - 2003, BSc, UOC, Crete, Greece
Informatician
2003 -2005, MSc in Information Retrieval, AUEB, Athens2005 - 2011, PhD in Machine Learning, AUEB, Athens
2011 - 2013, Text Analysis, UNB, Fredericton, Canada
2013 - 2015, Bacterial Genomics, Genoscope, Paris,France
2015 - 2016, Plant Genomics, CIRAD, Montpellier, France
2016 - ??
IntroductionBig Data
The Cassava Genome HubUsecase
Who am I?
Born and raised in the Greek island of Evia.
Physicist, 1998 - 2003, BSc, UOC, Crete, Greece
Informatician
2003 -2005, MSc in Information Retrieval, AUEB, Athens2005 - 2011, PhD in Machine Learning, AUEB, Athens
2011 - 2013, Text Analysis, UNB, Fredericton, Canada
2013 - 2015, Bacterial Genomics, Genoscope, Paris,France
2015 - 2016, Plant Genomics, CIRAD, Montpellier, France
2016 - ??
IntroductionBig Data
The Cassava Genome HubUsecase
Who are we?
IntroductionBig Data
The Cassava Genome HubUsecase
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
Definition
Everyone is talking about it.
Any combination of Very hot subject in Omics.
IntroductionBig Data
The Cassava Genome HubUsecase
Definition
Everyone is talking about it. Any combination of
Very hot subject in Omics.
IntroductionBig Data
The Cassava Genome HubUsecase
Definition
Everyone is talking about it. Any combination of Very hot subject in Omics.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
http://www.cassavagenome.org
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Technologies
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Data
VolumeTens of TB of raw sequence data.
Hundreds of GB of processed and analyzed data.
Velocity
New and improved assemblies and annotation.
New sequencing technologies and lower cost.
Variety
Genomic sequences, RNASeq,RADSeq, etc.
Annotation
Variants
Metabolomic
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Data
Public/Private Type Technology Description Publication Samples
Public Genomic WGS Assembly and annotation V6 Prochnik et al, 2012Public Genomic WGS Genetic Variants Bredeson et al, 2016 61Private Genomic RADSeq Genetic Variants in progress 1100Private Genomic WGS Genetic Variants in progress 34Public Transcriptomic RNASeq Response to Xanthomonas Munoz-Bodnar et al, 2014 12(2*6)Public Transcriptomic RNASeq Response to Xanthomonas Cohn et al, 2014 18(3*6)Private Transcriptomic RNASeq Response to White Fly in progress 16(2*8)
Table: Resources of data available
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
JBrowse
A fast, embeddableGenome Browser builtcompletely withJavaScript and HTML5.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
JBrowse
A fast, embeddableGenome Browser builtcompletely withJavaScript and HTML5.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
JBrowse
A fast, embeddableGenome Browser builtcompletely withJavaScript and HTML5.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
SNiPlay
SNiPlay3: a web-basedapplication forexploration and largescale analyses of genomicvariations.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
SNiPlay
SNiPlay3: a web-basedapplication forexploration and largescale analyses of genomicvariations.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
SNiPlay
SNiPlay3: a web-basedapplication forexploration and largescale analyses of genomicvariations.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
SNiPlay
SNiPlay3: a web-basedapplication forexploration and largescale analyses of genomicvariations.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
GIGWA
A web-based tool thatprovides an easy andintuitive way to explorelarge amounts ofgenotyping data byfiltering it.
Data storage relies onMongoDB, which offersgood scalabilityproperties.
Can handle multipledatabases and may bedeployed in either single-or multi-user mode, whileit provides a wide rangeof popular exportformats.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
GIGWA
A web-based tool thatprovides an easy andintuitive way to explorelarge amounts ofgenotyping data byfiltering it.
Data storage relies onMongoDB, which offersgood scalabilityproperties.
Can handle multipledatabases and may bedeployed in either single-or multi-user mode, whileit provides a wide rangeof popular exportformats.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
DiffExDB
Explore differentialexpression analyses.Visualize heatmap ofRPKM expression values.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
DiffExDB
Explore differentialexpression analyses.Visualize heatmap ofRPKM expression values.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
DiffExDB
Explore differentialexpression analyses.Visualize heatmap ofRPKM expression values.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
CMap
A browser-based tool forthe visual comparison ofvarious maps (sequence,genetic, etc.).
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
CMap
A browser-based tool forthe visual comparison ofvarious maps (sequence,genetic, etc.).
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
QuigMap
A fast cross-platformgenetic map viewer.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
BLAST
blastn : Searchnucleotide databasesusing a nucleotide query.
blastp : Search proteindatabases using a proteinquery.
blastx, tblastn, tblastx
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
BLAST
blastn : Searchnucleotide databasesusing a nucleotide query.
blastp : Search proteindatabases using a proteinquery.
blastx, tblastn, tblastx
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Advanced Search
Search for genomicfeatures, genomiclocations, enzymaticcodes, gene ontologyterms, etc.
Output as nucleotide ortranslated aminoacidsequences.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Advanced Search
Search for genomicfeatures, genomiclocations, enzymaticcodes, gene ontologyterms, etc.
Output as nucleotide ortranslated aminoacidsequences.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Pathway Tools
Creates a newPathway/GenomeDatabase (PGDB)containing the predictedmetabolic pathways.
Supports query,visualization, and analysisof PGDBs.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Galaxy
A scientific workflow,data integration and dataanalysis platform thataims to makecomputational biologyaccessible to researchscientists that do nothave computerprogramming experience.
Provides means to buildmulti-step computationalanalyses. It provides agraphical user interfacefor specifying what datato operate on, what stepsto take, and what orderto do them in.
IntroductionBig Data
The Cassava Genome HubUsecase
ArchitectureTools
Galaxy
A scientific workflow,data integration and dataanalysis platform thataims to makecomputational biologyaccessible to researchscientists that do nothave computerprogramming experience.
Provides means to buildmulti-step computationalanalyses. It provides agraphical user interfacefor specifying what datato operate on, what stepsto take, and what orderto do them in.
IntroductionBig Data
The Cassava Genome HubUsecase
Table of Contents
1 Introduction2 Big Data3 The Cassava Genome Hub
ArchitectureTechnologiesData
ToolsJBrowseSNiPlayGIGWADiffExDBGenetic MapQuerying ToolsGalaxy
4 Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
Usecase
IntroductionBig Data
The Cassava Genome HubUsecase
Thank you!