next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant...
Transcript of next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant...
![Page 1: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/1.jpg)
Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, [email protected] // Twitter:@SahaSurya
BTI Plant Bioinformatics Course 2015
http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
![Page 2: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/2.jpg)
19
53
DNA Structure discovery
19
77
20
12
Sanger DNA sequencing by chain-terminating inhibitors
19
84
Epstein-Barr virus
(170 Kb)
19
87
Abi370 Sequencer
19
95
20
01
Homo sapiens (3.0 Gb)
20
05
454
Solexa
Solid
20
07
20
11
Ion Torrent
PacBio
Haemophilusinfluenzae(1.83 Mb)
20
13
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
IlluminaHiseq X
454
3/31/2015 BTI Plant Bioinformatics Course 2015 2
Pinustaeda
(24 Gb)
20
14
NanoporeMinION
![Page 3: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/3.jpg)
First generation sequencing
3/31/2015 BTI Plant Bioinformatics Course 2015 3
Sanger. Annu Rev Biochem. 1988;57:1-28.
Thanks to Nick Loman for the mention
![Page 4: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/4.jpg)
Maxam-Gilbert method
3/31/2015 BTI Plant Bioinformatics Course 2015 4
![Page 5: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/5.jpg)
Maxam-Gilbert method
3/31/2015 BTI Plant Bioinformatics Course 2015 5
http://en.wikipedia.org/wiki/File:Maxam-Gilbert_sequencing_en.svg
https://www.nationaldiagnostics.com/electrophoresis/article/maxam-gilbert-sequencing
![Page 6: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/6.jpg)
Sanger method
3/31/2015 BTI Plant Bioinformatics Course 2015 6
Frederick Sanger13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
![Page 7: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/7.jpg)
Sanger method
3/31/2015 BTI Plant Bioinformatics Course 2015 7
http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg
http://en.wikipedia.org/wiki/File:Radioactive_Fluorescent_Seq.jpg
![Page 8: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/8.jpg)
First generation sequencing
• Very high quality sequences (99.999% or Q50)
• Very low throughput
3/31/2015 BTI Plant Bioinformatics Course 2015 8
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400
http://www.hindawi.com/journals/bmri/2012/251364/tab1/
![Page 9: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/9.jpg)
Next generation sequencing
3/31/2015 BTI Plant Bioinformatics Course 2015 9
![Page 10: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/10.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 10
https://twitter.com/kbradnam/status/443153578429923328
• Second generation• Third generation• Fourth generation• Next-next-generation• Next-next-next
generationhttp://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-diepart-2
![Page 11: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/11.jpg)
Use the specific technology used to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS1/RSII
– Ion Torrent Proton/PGM
– SOLiD
– Oxford Nanopore
3/31/2015 BTI Plant Bioinformatics Course 2015 11
http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-diepart-2
![Page 12: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/12.jpg)
454 Pyrosequencing
One purified DNA fragment, to one bead, to one read.
3/31/2015 BTI Plant Bioinformatics Course 2015 12
http://www.genengnews.com/
GS FLX Titanium
https://mariamuir.com/wp-content/uploads/2013/04/rip.gif
![Page 13: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/13.jpg)
Illumina
3/31/2015 BTI Plant Bioinformatics Course 2015 13
Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB
Number of Reads/ Flow cell
25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion
Read Length
2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
250030004000
500
![Page 14: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/14.jpg)
Illumina
3/31/2015 BTI Plant Bioinformatics Course 2015 14
Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB
Number of Reads/ Flow cell
25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion
Read Length
2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
250030004000
$1000 human genome??
500
![Page 15: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/15.jpg)
Illu
min
a
3/31/2015 BTI Plant Bioinformatics Course 2015 15
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
![Page 16: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/16.jpg)
Illu
min
a
3/31/2015 BTI Plant Bioinformatics Course 2015 16
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
![Page 17: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/17.jpg)
Illu
min
a: T
ruSe
qLo
ng
Rea
d
3/31/2015 BTI Plant Bioinformatics Course 2015 17
Voskoboynik eLife 2013;2:e00569
![Page 18: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/18.jpg)
Pacific Biosciences SMRT sequencing
Single Molecule Real Time sequencing
3/31/2015 BTI Plant Bioinformatics Course 2015 18
http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif
![Page 19: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/19.jpg)
Pacific Biosciences SMRT sequencingError correction methods
3/31/2015 BTI Plant Bioinformatics Course 2015 19
Hierarchical genome-assembly process (HGAP)
English et al., PLOS One. 2012
PBJelly
![Page 20: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/20.jpg)
Pacific Biosciences SMRT sequencingError correction methods
3/31/2015 BTI Plant Bioinformatics Course 2015 20
PB
cRP
ipel
ine
![Page 21: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/21.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 21
Pacific Biosciences SMRT sequencingRead Lengths
http://www.igs.umaryland.edu/labs/grc/
Mean Read Length: 8391 bpMaximum Subread Length: 24585 bp
![Page 22: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/22.jpg)
3/31/2015 Centre for Agricultural Bioinformatics, Pusa 22
Pacific Biosciences SMRT sequencingRead Lengths
![Page 23: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/23.jpg)
Oxford Nanopore
3/31/2015 Centre for Agricultural Bioinformatics, Pusa 23
https://www.nanoporetech.com/
http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion
http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
![Page 24: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/24.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 24
![Page 25: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/25.jpg)
Sequencing Trends
3/31/2015 BTI Plant Bioinformatics Course 2015 25
https://www.google.com/trends/
![Page 26: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/26.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 26
0
5000
10000
15000
20000
25000
30000
2008 2009 2010 2011 2012 2013 2014
Number of Publications
Illumina Pacific Biosciences Roche 454 Ion Torrent
-2000
-1000
0
1000
2000
3000
4000
5000
6000
2009 2010 2011 2012 2013 2014
Increase in Number of Publications
Illumina Pacific Biosciences Roche 454 Ion Torrent
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
2009 2010 2011 2012 2013 2014
% Increase in Number of Publications
Pacific Biosciences Roche 454 Ion Torrent
![Page 27: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/27.jpg)
Hi-C Crosslinking
3/31/2015 BTI Plant Bioinformatics Course 2015 27
![Page 28: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/28.jpg)
Others
• Ion Torrent Proton/PGM
• SOLiD
• Helicos
• Supporting technologies– BioNano
– Nabsys
– OpGen
– 10X Genomics
– Fluidigm
3/31/2015 BTI Plant Bioinformatics Course 2015 28
![Page 29: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/29.jpg)
Comparison
3/31/2015 BTI Plant Bioinformatics Course 2015 29
![Page 30: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/30.jpg)
Next generation sequencing
3/31/2015 BTI Plant Bioinformatics Course 2015 30
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing24h 700 bp Q20-Q30 1 GB $10
Illumina Miseq 27h 2x300bp > Q30 15 GB $0.15
Illumina Hiseq
25001 - 10days 2x250bp >Q30 3000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences30m - 4h 10kb - >40kb
>Q50 consensus
>Q10 single
500 - 1000MB
/SMRT cell$0.13 - $0.60
http://www.hindawi.com/journals/bmri/2012/251364/http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431227
![Page 31: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/31.jpg)
http://omicsmaps.com/
Next Generation Genomics: World Map of High-throughput Sequencers
BTI Plant Bioinformatics Course 20153/31/2015 31
![Page 32: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/32.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 32
https://flxlexblog.wordpress.com/2014/06/11/developments-in-next-generation-sequencing-june-2014-edition/
![Page 33: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/33.jpg)
3/31/2015 BTI Plant Bioinformatics Course 2015 33
https://flxlexblog.wordpress.com/2014/06/11/developments-in-next-generation-sequencing-june-2014-edition/
![Page 34: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/34.jpg)
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
3/31/2015 34BTI Plant Bioinformatics Course 2015
![Page 35: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/35.jpg)
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
3/31/2015 35
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano BombarelyBTI Plant Bioinformatics Course 2015
![Page 36: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/36.jpg)
Implications of Choice of Library
3/31/2015 36Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers) or Optical maps
NNNNN NN
BTI Plant Bioinformatics Course 2015
![Page 37: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/37.jpg)
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector.
3/31/2015 37Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGTAGTCGT
AGTCGTAGTCGT
TGAGCATGAGCA
TGAGCATGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCATGAGCA
TGAGCA
TGAGCA
Sequencing
BTI Plant Bioinformatics Course 2015
![Page 38: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/38.jpg)
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
3/31/2015 38Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2015
![Page 39: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/39.jpg)
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length is identical to sequence
3/31/2015 39Slide credit: Aureliano Bombarely
File Formats
BTI Plant Bioinformatics Course 2015
![Page 40: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/40.jpg)
3/31/2015 40
Quality control: EncodingFastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2015
![Page 41: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/41.jpg)
Quality control: Encoding
3/31/2015 41
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2015
![Page 42: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/42.jpg)
3/31/2015 42
Quality control: Encoding
http://en.wikipedia.org/wiki/Phred_quality_score
Phred score of a base is:Qphred = -10 log10 (e)
where e is the estimated probability of a base being wrong
BTI Plant Bioinformatics Course 2015
![Page 43: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/43.jpg)
Pre-processing: Tools
Trimming
• FastQC
• FASTX toolkit
• Trimmomatic
• Scythe
Joining paired-end reads
• fastq-join
• FLASH
• PANDAseq
3/31/2015 43BTI Plant Bioinformatics Course 2015
![Page 44: next-generation-sequencing-must-die Surya Saha · Next generation sequencing 3/31/2015 BTI Plant Bioinformatics Course 2015 30 Run Time Read Length Quality Total nucleotides sequenced](https://reader034.fdocuments.net/reader034/viewer/2022042304/5ecfbc6385fee802e977939f/html5/thumbnails/44.jpg)
Thank you!!
3/31/2015 BTI Plant Bioinformatics Course 2015 44