Download - Bioinformatics at IITA

Transcript
Page 1: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Andreas Gisel

IITA – Bioscience & Bioinformatics

Page 2: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Bioinformatics – definition and introduction

Bioinformatics @ IITA

Bioinformatics & IITA

Page 3: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics - definition

Bio – Biology, Life Sciences

Informatics – computational sciences

DATA INTERPRETATIONS

RESU

LTSBio informatics

Page 4: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics - definition

Bio – Biology, Life Sciences

Informatics – computational sciences

DATA INTERPRETATIONS

RESU

LTS

Data Repositories

Knowledge

Page 5: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics - definition

Bio – Biology, Life Sciences

Informatics – computational sciences

DATA INTERPRETATIONS

Bioinformatics is an interdisciplinary science that develops and improves on methods of analyzing biological data and storing, retrieving, organizing, and visualizing them.

This is in order to support to solve biological problems and discover the wealth of biological information hidden in biological data.

Page 6: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

?

Biological Data

DescriptionsPictures

Page 7: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences

Biological Data

Page 8: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

First fully sequenced bio-sequence amino acid of insulin (51aa) 1955

First fully sequence nucleic acid tRNA (75nt) 1965

First DNA Bacteriophage (5375nt) 1977

DNA sequencing Sanger sequencing technology (1975) Pyrosequencing (Next Generation sequencing 2004)

Biological Data

Page 9: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

Structures

Biological Data

Page 10: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

Structures Protein RNA

Biological Data

Page 11: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

Structures Protein RNA

Interactions

Biological Data

Page 12: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

Structures Protein RNA

InteractionsExpressions

Biological Data

Page 13: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Up to 600’000’000’000 (600GB) bases per experiment

Data Explosion

DescriptionsPictures

Sequences Protein RNA DNA

Structures Protein RNA

InteractionsExpressions M

icroarray

High Throughput sequencing

Up to 1 million data points per experiment

NGS(Next Generation Sequencing)

Page 14: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

DescriptionsPictures

Sequences Protein RNA DNA

Structures Protein RNA

InteractionsExpressions

Data Explosion

Page 15: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Data Analysis – DNA/RNA sequences

Sequence without knowledge connected to it is meaningless!What to do?

Sequence similarityFinding genes and regulatory elementsFunctional analysis of genesHomologyPolymorphism

BIOINFORMATICS

Page 16: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Data Analysis

So we need bioinformatics tools and reference data

Hardware – Computing infrastructure (CPU, RAM, Storage)

Tools – Programs that process your data

Reference data – Databases for existing data

INTERNET– connection to external Databases

Page 17: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Personel

Livia Stavolone – molecular biologist

Deborah Adeyele – student (training in bioinformatics and non-coding RNA)

Toyin Abdulsalam – research fellow (bioinformatics and transcriptom analysis)

Andreas Gisel

Whole Bioscience Team

Page 18: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Hardware – Computing infrastructure (CPU, RAM, Storage)

HP Blade, with: 3 blades with each 2 16-core processors (AMD Opteron Processor 6272), 384Gb RAM 2Tb attached storage (DAS)8TB attached storage (NAS)

The operating system is Ubuntu 14.04.1 LTS installed via biolinux 8.

Page 19: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

Basic bioinformatics services mainly based on sequence analysis

Next Generation Sequencing data analysis pipelines including:

GBS (genotyping by sequencing) data analysis and SNP callingTranscriptomics (RNA-seq) mapping, assembly and expression profilingsmallRNA data analysis: discovery and expression profilingDNA methylation (BS-seq) data analysisDNA (shotgun) assembly and variation callingGenome annotation using different data pipelines and visualization

Customized approaches using perl and shell scripting

Page 20: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

Cassava1200GB compressed sequence data (~5500 accessions) SNP matrix

5500 x ~160’000SNPsYam200GB compressed sequence data (~800 accessions) 800 x ~25’000SNPs

Raw sequencing data SNP matrix

Cornell SNP calling (TASSEL)

Broad SNP calling (GATK)

Page 21: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

Page 22: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

GBS (genotyping by sequencing) data analysis and SNP calling

Ismail Rabbi

Bioinformatics @ IITA

Tools – Programs that process your data

SNP matrix

Cornell

Page 23: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

SNP matrix

In-house

Page 24: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

SNP matrix

Page 25: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

SNP matrix

Page 26: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

SNP matrix

Page 27: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

GBS (genotyping by sequencing) data analysis and SNP calling

SNP matrix

External data

In-house developed scripts

Page 28: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

GBS (genotyping by sequencing) data analysis and SNP calling

Bioinformatics @ IITA

Tools – Programs that process your data

Chr10

Chr1

Chr4

Chr6

Chr5

Chr2

Chr3

Chr7

Chr8

Chr18

Chr9

Chr16

Chr17

Chr15

Chr13

Chr14

Chr12

Chr11

Cassava Assembly & Annotation Version 6.1

Page 29: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Cassava Assembly & Annotation Version 6.1

GBS (genotyping by sequencing) data analysis and SNP calling

Bioinformatics @ IITA

Tools – Programs that process your data

Gene Distribution

SNP Distribution

GBS Coverage

Heterocygosity

Page 30: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

GBS (genotyping by sequencing) data analysis and SNP calling

Bioinformatics @ IITA

Tools – Programs that process your data

Page 31: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

Transcriptomics (RNA-seq) mapping, assembly and expression profiling

What is RNA-seq?

Page 32: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

smallRNA data analysis: discovery and expression profiling

Automated pipeline for reference supported and de novo transcriptome assembly and expression profiling

Page 33: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

smallRNA data analysis: discovery and expression profiling

Small RNA are short (21 -200nt) long RNA, not coding for proteins with gene regulatory effects.

Page 34: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

smallRNA data analysis: discovery and expression profiling

Automated pipeline for non-coding RNA classification and expression profiling.

Page 35: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

DNA methylation (BS-seq) data analysis

What is BS-seq?

DNA methylation is another gene regulation mechanism which can be inherited.

Page 36: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

DNA methylation (BS-seq) data analysis

What is BS-seq?

DNA methylation is another gene regulation mechanism which can be inherited.

Page 37: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Tools – Programs that process your data

DNA (shotgun) assembly and variation callingGenome annotation using different data pipelines and visualization

Page 38: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Reference data – Databases for existing data

Genomic Reference Data

Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)

Archive

Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)

Page 39: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Reference data – Databases for existing data

Genomic Reference Data

Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)

Archive

Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)

Page 40: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

Reference data – Databases for existing data

Genomic Reference Data

Cassava (sequence, annotation, function)D.rotundata (sequence, working on annotation and function)D.alata (waiting for sequence and annotation)Maize (ready sequence and annotation)Banana (ready sequence and annotation)

Archive

Cassava (GBS, WGS, RNA-seq)D.rotundata (GBS, smallRNA)Maize (GBS)

Page 41: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics @ IITA

INTERNET– connection to external Databases

Automated pipelines and strategies for big data downloads

Page 42: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics & IITA

Development of Bioinformatics Capacity

IITA Projects

Involvement in planning of data production, analysis - financing of data storage and analysis

BioinformaticsBioscience

Data analysis, Data repositories, Visualization

Page 43: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Bioinformatics & IITA

Development of Bioinformatics Capacity

In project with sequencing activities:We need to individuate the bioinformatics part

We need to take over at least a part of the bioinformatics

activities

We have the Bioscience involved in the planning of the data

production to optimize the data analysis and knowledge building

Capacity building to enforce the bioinformatics facility

Page 44: Bioinformatics at IITA

www.iita.orgA member of CGIAR consortium

Thank you!

Data from:

Ranjana Bhattacharjee

Livia Stavolone

Morag Ferguson

Ismail Rabbi