Bioinformatics at IITA

download Bioinformatics at IITA

of 44

Embed Size (px)

Transcript of Bioinformatics at IITA

Slide 1Bioinformatics @ IITA
Andreas Gisel
Bioinformatics @ IITA
Bioinformatics - definition
Bioinformatics - definition
Bioinformatics - definition
Bioinformatics is an interdisciplinary science that develops and improves on methods of analyzing biological data and storing, retrieving, organizing, and visualizing them.
This is in order to support to solve biological problems and discover the wealth of biological information hidden in biological data.
Bio – Biology, Life Sciences
?
Descriptions
Pictures
Sequences
Descriptions
Pictures
Sequences
Protein
RNA
DNA
First fully sequence nucleic acid
tRNA (75nt) 1965
Biological Data
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Expressions
Up to 600’000’000’000 (600GB) bases per experiment
Data Explosion
NGS
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Expressions
Data Analysis – DNA/RNA sequences
What to do?
Functional analysis of genes
Data Analysis
Hardware – Computing infrastructure (CPU, RAM, Storage)
Tools – Programs that process your data
Reference data – Databases for existing data
INTERNET– connection to external Databases
www.iita.org
Bioinformatics @ IITA
Toyin Abdulsalam – research fellow (bioinformatics and transcriptom analysis)
Andreas Gisel
Bioinformatics @ IITA
HP Blade, with:
3 blades with each 2 16-core processors (AMD Opteron Processor 6272),
384Gb RAM
2Tb attached storage (DAS)
8TB attached storage (NAS)
The operating system is Ubuntu 14.04.1 LTS installed via biolinux 8.
www.iita.org
Bioinformatics @ IITA
Basic bioinformatics services mainly based on sequence analysis
Next Generation Sequencing data analysis pipelines including:
GBS (genotyping by sequencing) data analysis and SNP calling
Transcriptomics (RNA-seq) mapping, assembly and expression profiling
smallRNA data analysis: discovery and expression profiling
DNA methylation (BS-seq) data analysis
DNA (shotgun) assembly and variation calling
Genome annotation using different data pipelines and visualization
Customized approaches using perl and shell scripting
www.iita.org
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
Cassava
5500 x ~160’000SNPs
200GB compressed sequence data (~800 accessions) 800 x ~25’000SNPs
Raw sequencing data
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
www.iita.org
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
SNP matrix
Ismail Rabbi
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
Bioinformatics @ IITA
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
External data
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Chr10
Chr1
Chr4
Chr6
Chr5
Chr2
Chr3
Chr7
Chr8
Chr18
Chr9
Chr16
Chr17
Chr15
Chr13
Chr14
Chr12
Chr11
www.iita.org
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Gene Distribution
SNP Distribution
GBS Coverage
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
www.iita.org
Bioinformatics @ IITA
Transcriptomics (RNA-seq) mapping, assembly and expression profiling
What is RNA-seq?
Bioinformatics @ IITA
smallRNA data analysis: discovery and expression profiling
Automated pipeline for reference supported and de novo transcriptome assembly and expression profiling
www.iita.org
Bioinformatics @ IITA
smallRNA data analysis: discovery and expression profiling
Small RNA are short (21 -200nt) long RNA, not coding for proteins with gene regulatory effects.
www.iita.org
Bioinformatics @ IITA
smallRNA data analysis: discovery and expression profiling
Automated pipeline for non-coding RNA classification and expression profiling.
www.iita.org
Bioinformatics @ IITA
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is another gene regulation mechanism which can be inherited.
www.iita.org
Bioinformatics @ IITA
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is another gene regulation mechanism which can be inherited.
www.iita.org
Bioinformatics @ IITA
Genome annotation using different data pipelines and visualization
www.iita.org
Bioinformatics @ IITA
Genomic Reference Data
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Bioinformatics @ IITA
Genomic Reference Data
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Bioinformatics @ IITA
Genomic Reference Data
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Bioinformatics @ IITA
Automated pipelines and strategies for big data downloads
www.iita.org
Bioinformatics & IITA
IITA Projects
Involvement in planning of data production, analysis - financing of data storage and analysis
Bioinformatics
Bioscience
www.iita.org
Bioinformatics & IITA
We need to individuate the bioinformatics part
We need to take over at least a part of the bioinformatics activities
We have the Bioscience involved in the planning of the data production to optimize the data analysis and knowledge building
Capacity building to enforce the bioinformatics facility
www.iita.org
Thank you!
Data from:
Ranjana Bhattacharjee
Livia Stavolone
Morag Ferguson
Ismail Rabbi
0 10
00 20
00 30
00 40