How we revealed genomes secrets?
-
Upload
ehsan-sepahi -
Category
Science
-
view
30 -
download
0
Transcript of How we revealed genomes secrets?
1
In the name of Allah
How we revealed genomes secrets? Peanut genome as an example
2
The Past
3
19th century
4
20th century
5
20th century
6
20th century
Homo sapiens Genome=
3000000000 base pairs=
3.000.000.000 dollars
7
21th century
8
21th century
>3000000 base pairs
9
21th century
10
NGSNext Generation Sequencing2nd generation of sequencers
Speed, Cost, Sample size, Accuracy
Benefits of FGS over NGSThird generation of sequencers
11
Genome sequencing
STEP 1
Sample preparation
STEP 2
Sequencing
STEP 3
Assembly
STEP 4
Annotation
12
Sample preparation
Solid-phase amplification
Emulsion PCR
Primer immobilized
Template immobilized
Polymerase immobilized
13
Genome sequencing
STEP 1
Sample preparation
STEP 2
Sequencing
STEP 3
Assembly
STEP 4
Annotation
14
Sequencing
Sequencing by synthesis
Sequencing by ligation
Ion Semiconductor
Real-time sequencing Pyrosequencing
15
Genome sequencing
STEP 1
Sample preparation
STEP 2
Sequencing
STEP 3
Assembly
STEP 4
Annotation
16
AssemblyBenefits of FGS over NGS
Longer reads
SensitivityCoverage bias
variation
17
AssemblyUnder ideal Conditions, assembly is:
Simply merging reads with maximal overlap
But
Genome organization is Complex
Coverage Repetitive sequences
length Copy number sequence
18
Assembly99.99% accuracy in euchromatic portion of genome
complete assembly VS. draft assembly
19
Assembly
Assembly algorithms
EARLY STRATEGIES OLC
DE BRUIJN STRING GRAPH
20
OLC assemblyOverlap - Layout - Consensus
21
OLC assembly•Genome resolution increases with read length
•Benefits of whole read length
•Conservative in nature
•Better response to self error correctors
•Human genome was constructed primarily using
OLC algorithms
•Notable OLC based algorithms: Newbler, PCAP,
Arachne, Celera
22
De Bruijn assemblyReplace read with set of K-mers
23
De Bruijn assembly•Require highly accurate reads
•Discards some of the ability for reads to resolve
repeats longer than k-mers
•Don’t require the storage of pairwise overlaps
•Very useful in mammalians MPS projects
•Aggressive in nature
•Error correctors affect their results
•Notable de Bruijn based algorithms: ALLPATHS,
SOAPdenovo, ABySS, Velvet
24
String graph assemblyRelated to A-Bruijn graph
25
String graph assembly
•Don’t decompose reads to k-mers
•Takes the full length of reads
•Benefits both graph and length
•FALCON is an open source implementation of SG
26
SomeAlgorithms
Newbler
AByss Velvet
Celera
SOAPdenovo
Revised to CABOGReduce homopolymer runBuilds unitigs from maximal sample paths
Implements OLC twiceConstruct unitigsConstruct contigs
AlgorithmsSimplificationError removal
Address memory limitationsSimplificationDoesn’t build scaffolds
Uses pre-set tresholdsBubble removal based on coverageUses OLC and DBG techniques
27
Algorithms overview
28
Third Generation of Sequencing
SMS: Single Molecule Sequencing
Quiver Nanocorrect
29
Advances in Bioinformatics and technology
•Reducing computational space requirements
•More sensitive variant detections
•Advances in read length
•Single cell sequencing
•Optical mapping
•Metagenomics
•TruSeq
30
Genome sequencing
STEP 1
Sample preparation
STEP 2
Sequencing
STEP 3
Assembly
STEP 4
Annotation
Annotation
ORFsGO
Variation RNA-Seq
31
Retrotransposons
Structural Functional
32
Genome Browsersa graphical interface for display of genomic data
33
Genome browsers
ApolloIGV
NCBI Genome Workbench
UCSC Genome Browser
Artemis
34
Peanut Genome
STEP 1
Sample preparation
STEP 2
Sequencing
STEP 3
Assembly
STEP 4
Annotation
Introduction
• Arachis hypogaea• 46 million tons• Endemic to South America• Staple food
Genome charasteristics
• Genome size: 2.7 Gb• 40 chromosomes in tetraploid• Repetitive content: 64%• No large change in genome size
since polyploidy
Goal of article
• Sequence and annotate 2 candidate ancestors of peanut
• Sequence peanut transcript• Find real ancestors of peanut• Propose site of peanut
domestication
Methods
• Plant samples• Genome sequencing and assembly• Genome annotation• Transcript sequencing and assembly
Seeds from Brazilian Arachis germplasm collectionIllumina HiSeq 2000Paired-end libraries with 250 bp, 500 bp, 2 kb, 5 kb, 10 kb & 20 kb insert sizes40 kb fasmid based libraries160X coverage in 90-150 bp readsQuality filteringCOPESOAPdenovo 2.05GapcloserTruSeqLinkage maps
Transposons: RepeatmaskerGene prediction: MAKER-PGene duplicationsDR genes: BLASTPGene evolution: DAGchainerSynteny: MUMmer, CViT
FastQCTrinity packageTranscript accuracy estimation: GSNAP
Results
• Sequencing and assembly– 1211 and 1512 Mb for A and B genome– 10 kb scaffolds– Genetic maps to resolve scaffold chimer– Molecular markers to to resolve scaffold order– 96 and 99.2% od the sequence in 10 chromosome
pseudomolecules– 14 BACs usage
Results
• Transposons– 61.7 and 68.5% – Mostly shared– Macroscale: similar– Microscale: abundant differences– LTR retrotransposons: half of each genome– LINEs: highest in plants
Results
Results
• Gene annotation and duplications– 36734 genes in A. duranensis– 41840 genes in A. ipaensis– More local duplications in B genome
Results
• DNA methylation– Whole genome bisulfite sequencing– MethylC-Seq– 8X and 10X coverage– Similar genic methylation patterns
Results
• DR genes– 345 and 397 respectively– Largest clusters in distal regions– Root-knot nematode resistance genes– Rust resistance genes
Results
• Gene evolution– Ks parameter– Ks : 0.95 and 0.90– Divergence in : 2.16 million years ago
Results
• Chromosomal structure and synteny– Mostly symmetrical chromosomes– 2, 3, 4, 10 : colliniar– 5, 6, 9 : large inversion in one arm– 1 : large inversion in both arms– 7 & 8 : complete rearrangment– Distal, Proximal switch
Results
Results
Results
Results
• Comparison with tetraploid peanut– One to one correspondance– 98.3 and 99.9% identity– Genetic recombination in collinear honeologous– Tendency toward A. ipaensis genome– 247000 and 9400 years divergence
Results
Results
• Tetraploid transcript assembly with diploid guide– De novo– Parsing into A and B followed by separate
assembly– Parsing into A and B followed by genome
guided assembly– The last is the most accurate (68.5%
mismach free mapping)
Discussion
• B subgenome nearly identical to: A. ipaensis• The site of occurance of A. ipaensis• Cultivated peanut story
Thank You