Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson...
Transcript of Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson...
![Page 1: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/1.jpg)
Surya SahaSol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, [email protected] // Twitter:@SahaSurya
BTI Plant Bioinformatics Course 2018
http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
![Page 2: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/2.jpg)
19
53
DNA Structure discovery
19
77
20
12
Sanger DNA sequencing by
chain-terminating inhibitors
19
84
Epstein-Barr virus
(170 Kb)
19
87
Abi370 Sequencer
19
95
20
01
Homo sapiens (3.0 Gb)
20
05
454
Solexa
Solid
20
07
20
11
Ion Torrent
PacBio
Haemophilusinfluenzae(1.83 Mb)
20
13
Slide concept: Aureliano Bombarely
Sequencing over the Ages
Illumina
IlluminaHiseq X
454
4/2/2018 BTI Plant Bioinformatics Course 2018 2
Pinustaeda
(24 Gb)
20
14
NanoporeMinION
20
15
10XGenomics
![Page 3: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/3.jpg)
First generation sequencing
4/2/2018 BTI Plant Bioinformatics Course 2018 3
Sanger. Annu Rev Biochem. 1988;57:1-28.
Thanks to Nick Loman for the mention
![Page 4: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/4.jpg)
Sanger method
4/2/2018 BTI Plant Bioinformatics Course 2018 4
Frederick Sanger13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
![Page 5: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/5.jpg)
Sanger method
4/2/2018 BTI Plant Bioinformatics Course 2018 5
http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg
http://en.wikipedia.org/wiki/File:Radioactive_Fluorescent_Seq.jpg
![Page 6: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/6.jpg)
First generation sequencing
• Very high quality sequences (99.999% or Q50)
• Very very low throughput
4/2/2018 BTI Plant Bioinformatics Course 2018 6
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400
http://www.hindawi.com/journals/bmri/2012/251364/tab1/
![Page 7: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/7.jpg)
Next generation sequencing
4/2/2018 BTI Plant Bioinformatics Course 2018 7
![Page 8: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/8.jpg)
Use the specific technology used to generate the data
– Illumina Hiseq/Miseq/NextSeq/Novaseq
– Pacific Biosciences RS I/RS II/Sequel
– Ion Torrent Proton/PGM
– Oxford Nanopore
4/2/2018 BTI Plant Bioinformatics Course 2018 8
http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-diepart-2
![Page 9: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/9.jpg)
454 Pyrosequencing
One purified DNA fragment, to one bead, to one read.
4/2/2018 BTI Plant Bioinformatics Course 2018 9
http://www.genengnews.com/
GS FLX Titanium
https://mariamuir.com/wp-content/uploads/2013/04/rip.gif
![Page 10: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/10.jpg)
Illumina
4/2/2018 BTI Plant Bioinformatics Course 2018 10
Output 15 Gb 120 GB 1500 GB 1800 GB
Max Number of Reads/ Run
25 Million 400 Million 5 Billion 6 Billion
Max Read Length
2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
250030004000
500550
![Page 11: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/11.jpg)
Illumina
4/2/2018 BTI Plant Bioinformatics Course 2018 11
Output 15 Gb 120 GB 1500 GB 1800 GB
Max Number of Reads/ Run
25 Million 400 Million 5 Billion 6 Billion
Max Read Length
2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
250030004000
500550
![Page 12: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/12.jpg)
Illu
min
a
4/2/2018 BTI Plant Bioinformatics Course 2018 12
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
![Page 13: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/13.jpg)
Illu
min
a
4/2/2018 BTI Plant Bioinformatics Course 2018 13
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
![Page 14: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/14.jpg)
Pacific Biosciences SMRT sequencing
Single Molecule Real Time sequencing
4/2/2018 BTI Plant Bioinformatics Course 2018 14
http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif
RS II
Sequel
![Page 15: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/15.jpg)
Pacific Biosciences SMRT sequencingError correction methods
4/2/2018 BTI Plant Bioinformatics Course 2018 15
PB
cRP
ipel
ine
![Page 16: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/16.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 16
Pacific Biosciences SMRT sequencingRead Lengths
![Page 17: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/17.jpg)
Oxford Nanopore
4/2/2018 BTI Plant Bioinformatics Course 2018 17
https://www.nanoporetech.com/
http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion
http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
![Page 18: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/18.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 18
![Page 19: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/19.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 19
http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/
E. coli K-12 MG1655 on a standard FLO-MIN106 (R9.4) flowcell
![Page 20: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/20.jpg)
Long range scaffolding
4/2/2018 BTI Plant Bioinformatics Course 2018 20
![Page 21: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/21.jpg)
Hi-C Crosslinking
4/2/2018 BTI Plant Bioinformatics Course 2018 21
![Page 22: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/22.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 22
http://mms.businesswire.com/media/20150225005296/en/454639/5/GemCodePlatform.jpg
• Long read information from short reads using 14bp bar codes• Very low input DNA ( as low as 0.625 ng) • Short library preparation time• 1ng of DNA is split across 100,000 Gel Coated Beads (GEMs)• Chromium instrument for single-cell RNAseq
GemCode
![Page 23: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/23.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 23
http://www.bionanogenomics.com/technology/why-genome-mapping/
![Page 24: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/24.jpg)
Many Others..
• Ion Torrent Proton/PGM
• Dovetail
• Supporting technologies
– Nabsys
– OpGen
– Fluidigm
4/2/2018 BTI Plant Bioinformatics Course 2018 24
http://nextgenseek.com/2012/11/did-you-know-there-are-at-least-14-next-gen-sequence-technology-companies/
![Page 25: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/25.jpg)
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
4/2/2018 25BTI Plant Bioinformatics Course 2018
![Page 26: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/26.jpg)
So What Sequencer Do I Use??
Microbial genome
• Draft genome– Illumina Miseq (100-130X)
– Illumina Hiseq (<200X)
• Complete genome– Pacific Biosciences (80-100X)
• Amplicons (16S, ITS)– Illumina Miseq
Eukaryotic genome
• Denovo assembly– Pacific Biosciences (70-80X)
– Illumina Hiseq (100X+)
– 10X Genomics
– Hi-C
• Genotyping (GBS)– Illumina Hiseq
• BACs– Pacific Biosciences
4/3/2018 BTI Plant Bioinformatics Course 2018 26
$$$$ ????
![Page 27: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/27.jpg)
Genome Assembly
4/2/2018 BTI Plant Bioinformatics Course 2018 27
http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html
![Page 28: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/28.jpg)
4/2/2018 BTI Plant Bioinformatics Course 2018 28
Slide credit: Torsten Seemann
![Page 29: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/29.jpg)
Whole Genome Shotgun Sequencing
4/3/2018 29Slide credit: cbcb.umd.edu
BTI Plant Bioinformatics Course 2018
![Page 30: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/30.jpg)
Genome Sequencing Strategies
4/3/2018 Centre for Agricultural Bioinformatics, Pusa 30
International Human Genome Sequencing Consortium 2001
Overlap Layout Consensus
http://contig.wordpress.com/
cbcb.umd.edu
Lon
g re
ad s
eq
uen
cin
g
![Page 31: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/31.jpg)
Overlap-Layout-Consensus
4/3/2018 BTI Plant Bioinformatics Course 2018 31
Slide source: Commins 2009
![Page 32: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/32.jpg)
4/3/2018 32BTI Plant Bioinformatics Course 2018
De
Bru
ijn G
rap
h
![Page 33: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/33.jpg)
Ingredients for a Good Assembly
4/3/2018 33
Slide credit: Mike Schatz
BTI Plant Bioinformatics Course 2018
![Page 34: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/34.jpg)
4/3/2018 BTI Plant Bioinformatics Course 2018 34
The diploid reference genome
![Page 35: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/35.jpg)
CHROMOSOMES
SCAFFOLDSCONTIGS
BTI Plant Bioinformatics Course 2018
Gene to Genome – The BIG picture
CONTIG GAPSSCAFFOLD GAPS
GENES
MAP (chr1)Ovate (chr1)TM (chr 9)L2 (chr 10)
![Page 36: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/36.jpg)
BTI Plant Bioinformatics Course 2018
State of the SL2.50 Build
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 1 2 3 4 5 6 7 8 9 10 11 12
Sequence Scaffold gap length Component gap length
Length 823Mb
Sequence 737Mb
Contig gaps 43Mb (5.30%)
Scaffold gaps 42Mb (5.17%)
Total gaps 86Mb (10.47%)
Reference assembly but plenty of gaps!!
![Page 37: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/37.jpg)
BTI Plant Bioinformatics Course 2018
Summary
Any genome assembly:
• Is a hypothesis that needs to be refined
• Is a work in progress
• Can sometimes be misguiding
So is genome annotation…..
![Page 38: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/38.jpg)
BTI Plant Bioinformatics Course 2018
Gene structure improvement example
ITAG3.2
ITAG2.40
ITAG3.2
ITAG2.40Fusion of split genes
UTR extension
RNAseq
XY plot
RNAseq
XY plot
Required for 3’ RNAseq
![Page 39: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/39.jpg)
BTI Plant Bioinformatics Course 2018
Quality check - Annotation Edit Distance (AED)
Based on RNAseq data support
AED= 0 complete support
AED =1 lack of support
Annotation Edit Distance
AED provides a means to
evaluate quality of annotations
given RNAseq and ortholog
evidence
Cu
mu
lati
ve f
ract
ion
of
tran
scri
pts
![Page 40: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/40.jpg)
BTI Plant Bioinformatics Course 2018
Solanaceae Apollo annotation editor
Genomes available in Apollo • Request access to Apollo by
contacting SGN• More organisms will be added
as they become available.
For creating account: https://solgenomics.net/contact/form
Apollo: collaborative genome annotation editorhttps://github.com/gmod/apollohttp://genomearchitect.org
![Page 41: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/41.jpg)
BTI Plant Bioinformatics Course 2018
Editing an existing gene model
![Page 42: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/42.jpg)
BTI Plant Bioinformatics Course 2018
Correction of predicted gene model
![Page 43: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/43.jpg)
BTI Plant Bioinformatics Course 2018
Information Editor• DBXRefs (InterPro, Pfam)• PubMed IDs• Gene Ontology IDs (GO)• Comments
![Page 44: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/44.jpg)
Cornell Sequencing Core
• Illumina Hiseq 2500 (Rapid run and High output)
• Illumina Miseq
• Illumina Nextseq 500
• 10X Genomics GemCode
4/2/2018 BTI Plant Bioinformatics Course 2018 44
http://www.biotech.cornell.edu/brc/genomics/services/price-list#overlay-context=brc/genomics-facility/next-generation-sequencing
$
$
$
![Page 45: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/45.jpg)
Library Types
Single end
Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
4/2/2018 45
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano BombarelyBTI Plant Bioinformatics Course 2018
![Page 46: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/46.jpg)
Implications of Choice of Library
4/2/2018 46Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers) or Optical maps
NNNNN NN
BTI Plant Bioinformatics Course 2018
![Page 47: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/47.jpg)
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector.
4/2/2018 47Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGTAGTCGT
AGTCGTAGTCGT
TGAGCATGAGCA
TGAGCATGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCATGAGCA
TGAGCA
TGAGCA
Sequencing
BTI Plant Bioinformatics Course 2018
![Page 48: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/48.jpg)
Data!!
4/2/2018 BTI Plant Bioinformatics Course 2018 48
![Page 49: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/49.jpg)
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
4/2/2018 49Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2018
![Page 50: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/50.jpg)
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length is identical to sequence
4/2/2018 50Slide credit: Aureliano Bombarely
File Formats
BTI Plant Bioinformatics Course 2018
![Page 51: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/51.jpg)
4/2/2018 51
Quality control: EncodingFastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2018
![Page 52: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/52.jpg)
Quality control: Encoding
4/2/2018 52
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2018
![Page 53: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/53.jpg)
4/2/2018 53
Quality control: Encoding
http://en.wikipedia.org/wiki/Phred_quality_score
Phred score of a base is:Qphred = -10 log10 (e)
where e is the estimated error probability of a base
BTI Plant Bioinformatics Course 2018
![Page 54: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/54.jpg)
Pre-processing: Tools
Trimming
• FastQC
• FASTX toolkit
• Trimmomatic
• Scythe
Joining paired-end reads
• fastq-join
• FLASH
• PANDAseq
4/2/2018 54BTI Plant Bioinformatics Course 2018
![Page 55: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018](https://reader036.fdocuments.net/reader036/viewer/2022062920/5f020d4a7e708231d4025635/html5/thumbnails/55.jpg)
Thank you!!
4/2/2018 BTI Plant Bioinformatics Course 2018 55