How we revealed genomes secrets?

54
In the name of Allah How we revealed genomes secrets? 1 Peanut genome as an example

Transcript of How we revealed genomes secrets?

Page 1: How we revealed genomes secrets?

1

In the name of Allah

How we revealed genomes secrets? Peanut genome as an example

Page 2: How we revealed genomes secrets?

2

The Past

Page 3: How we revealed genomes secrets?

3

19th century

Page 4: How we revealed genomes secrets?

4

20th century

Page 5: How we revealed genomes secrets?

5

20th century

Page 6: How we revealed genomes secrets?

6

20th century

Homo sapiens Genome=

3000000000 base pairs=

3.000.000.000 dollars

Page 7: How we revealed genomes secrets?

7

21th century

Page 8: How we revealed genomes secrets?

8

21th century

>3000000 base pairs

Page 9: How we revealed genomes secrets?

9

21th century

Page 10: How we revealed genomes secrets?

10

NGSNext Generation Sequencing2nd generation of sequencers

Speed, Cost, Sample size, Accuracy

Benefits of FGS over NGSThird generation of sequencers

Page 11: How we revealed genomes secrets?

11

Genome sequencing

STEP 1

Sample preparation

STEP 2

Sequencing

STEP 3

Assembly

STEP 4

Annotation

Page 12: How we revealed genomes secrets?

12

Sample preparation

Solid-phase amplification

Emulsion PCR

Primer immobilized

Template immobilized

Polymerase immobilized

Page 13: How we revealed genomes secrets?

13

Genome sequencing

STEP 1

Sample preparation

STEP 2

Sequencing

STEP 3

Assembly

STEP 4

Annotation

Page 14: How we revealed genomes secrets?

14

Sequencing

Sequencing by synthesis

Sequencing by ligation

Ion Semiconductor

Real-time sequencing Pyrosequencing

Page 15: How we revealed genomes secrets?

15

Genome sequencing

STEP 1

Sample preparation

STEP 2

Sequencing

STEP 3

Assembly

STEP 4

Annotation

Page 16: How we revealed genomes secrets?

16

AssemblyBenefits of FGS over NGS

Longer reads

SensitivityCoverage bias

variation

Page 17: How we revealed genomes secrets?

17

AssemblyUnder ideal Conditions, assembly is:

Simply merging reads with maximal overlap

But

Genome organization is Complex

Coverage Repetitive sequences

length Copy number sequence

Page 18: How we revealed genomes secrets?

18

Assembly99.99% accuracy in euchromatic portion of genome

complete assembly VS. draft assembly

Page 19: How we revealed genomes secrets?

19

Assembly

Assembly algorithms

EARLY STRATEGIES OLC

DE BRUIJN STRING GRAPH

Page 20: How we revealed genomes secrets?

20

OLC assemblyOverlap - Layout - Consensus

Page 21: How we revealed genomes secrets?

21

OLC assembly•Genome resolution increases with read length

•Benefits of whole read length

•Conservative in nature

•Better response to self error correctors

•Human genome was constructed primarily using

OLC algorithms

•Notable OLC based algorithms: Newbler, PCAP,

Arachne, Celera

Page 22: How we revealed genomes secrets?

22

De Bruijn assemblyReplace read with set of K-mers

Page 23: How we revealed genomes secrets?

23

De Bruijn assembly•Require highly accurate reads

•Discards some of the ability for reads to resolve

repeats longer than k-mers

•Don’t require the storage of pairwise overlaps

•Very useful in mammalians MPS projects

•Aggressive in nature

•Error correctors affect their results

•Notable de Bruijn based algorithms: ALLPATHS,

SOAPdenovo, ABySS, Velvet

Page 24: How we revealed genomes secrets?

24

String graph assemblyRelated to A-Bruijn graph

Page 25: How we revealed genomes secrets?

25

String graph assembly

•Don’t decompose reads to k-mers

•Takes the full length of reads

•Benefits both graph and length

•FALCON is an open source implementation of SG

Page 26: How we revealed genomes secrets?

26

SomeAlgorithms

Newbler

AByss Velvet

Celera

SOAPdenovo

Revised to CABOGReduce homopolymer runBuilds unitigs from maximal sample paths

Implements OLC twiceConstruct unitigsConstruct contigs

AlgorithmsSimplificationError removal

Address memory limitationsSimplificationDoesn’t build scaffolds

Uses pre-set tresholdsBubble removal based on coverageUses OLC and DBG techniques

Page 27: How we revealed genomes secrets?

27

Algorithms overview

Page 28: How we revealed genomes secrets?

28

Third Generation of Sequencing

SMS: Single Molecule Sequencing

Quiver Nanocorrect

Page 29: How we revealed genomes secrets?

29

Advances in Bioinformatics and technology

•Reducing computational space requirements

•More sensitive variant detections

•Advances in read length

•Single cell sequencing

•Optical mapping

•Metagenomics

•TruSeq

Page 30: How we revealed genomes secrets?

30

Genome sequencing

STEP 1

Sample preparation

STEP 2

Sequencing

STEP 3

Assembly

STEP 4

Annotation

Page 31: How we revealed genomes secrets?

Annotation

ORFsGO

Variation RNA-Seq

31

Retrotransposons

Structural Functional

Page 32: How we revealed genomes secrets?

32

Genome Browsersa graphical interface for display of genomic data

Page 33: How we revealed genomes secrets?

33

Genome browsers

ApolloIGV

NCBI Genome Workbench

UCSC Genome Browser

Artemis

Page 34: How we revealed genomes secrets?

34

Peanut Genome

STEP 1

Sample preparation

STEP 2

Sequencing

STEP 3

Assembly

STEP 4

Annotation

Page 35: How we revealed genomes secrets?

Introduction

• Arachis hypogaea• 46 million tons• Endemic to South America• Staple food

Page 36: How we revealed genomes secrets?

Genome charasteristics

• Genome size: 2.7 Gb• 40 chromosomes in tetraploid• Repetitive content: 64%• No large change in genome size

since polyploidy

Page 37: How we revealed genomes secrets?

Goal of article

• Sequence and annotate 2 candidate ancestors of peanut

• Sequence peanut transcript• Find real ancestors of peanut• Propose site of peanut

domestication

Page 38: How we revealed genomes secrets?

Methods

• Plant samples• Genome sequencing and assembly• Genome annotation• Transcript sequencing and assembly

Seeds from Brazilian Arachis germplasm collectionIllumina HiSeq 2000Paired-end libraries with 250 bp, 500 bp, 2 kb, 5 kb, 10 kb & 20 kb insert sizes40 kb fasmid based libraries160X coverage in 90-150 bp readsQuality filteringCOPESOAPdenovo 2.05GapcloserTruSeqLinkage maps

Transposons: RepeatmaskerGene prediction: MAKER-PGene duplicationsDR genes: BLASTPGene evolution: DAGchainerSynteny: MUMmer, CViT

FastQCTrinity packageTranscript accuracy estimation: GSNAP

Page 39: How we revealed genomes secrets?

Results

• Sequencing and assembly– 1211 and 1512 Mb for A and B genome– 10 kb scaffolds– Genetic maps to resolve scaffold chimer– Molecular markers to to resolve scaffold order– 96 and 99.2% od the sequence in 10 chromosome

pseudomolecules– 14 BACs usage

Page 40: How we revealed genomes secrets?

Results

• Transposons– 61.7 and 68.5% – Mostly shared– Macroscale: similar– Microscale: abundant differences– LTR retrotransposons: half of each genome– LINEs: highest in plants

Page 41: How we revealed genomes secrets?

Results

Page 42: How we revealed genomes secrets?

Results

• Gene annotation and duplications– 36734 genes in A. duranensis– 41840 genes in A. ipaensis– More local duplications in B genome

Page 43: How we revealed genomes secrets?

Results

• DNA methylation– Whole genome bisulfite sequencing– MethylC-Seq– 8X and 10X coverage– Similar genic methylation patterns

Page 44: How we revealed genomes secrets?

Results

• DR genes– 345 and 397 respectively– Largest clusters in distal regions– Root-knot nematode resistance genes– Rust resistance genes

Page 45: How we revealed genomes secrets?

Results

• Gene evolution– Ks parameter– Ks : 0.95 and 0.90– Divergence in : 2.16 million years ago

Page 46: How we revealed genomes secrets?

Results

• Chromosomal structure and synteny– Mostly symmetrical chromosomes– 2, 3, 4, 10 : colliniar– 5, 6, 9 : large inversion in one arm– 1 : large inversion in both arms– 7 & 8 : complete rearrangment– Distal, Proximal switch

Page 47: How we revealed genomes secrets?

Results

Page 48: How we revealed genomes secrets?

Results

Page 49: How we revealed genomes secrets?

Results

Page 50: How we revealed genomes secrets?

Results

• Comparison with tetraploid peanut– One to one correspondance– 98.3 and 99.9% identity– Genetic recombination in collinear honeologous– Tendency toward A. ipaensis genome– 247000 and 9400 years divergence

Page 51: How we revealed genomes secrets?

Results

Page 52: How we revealed genomes secrets?

Results

• Tetraploid transcript assembly with diploid guide– De novo– Parsing into A and B followed by separate

assembly– Parsing into A and B followed by genome

guided assembly– The last is the most accurate (68.5%

mismach free mapping)

Page 53: How we revealed genomes secrets?

Discussion

• B subgenome nearly identical to: A. ipaensis• The site of occurance of A. ipaensis• Cultivated peanut story

Page 54: How we revealed genomes secrets?

Thank You