MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor,...

27
MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Genome Informatics I (2015 Spring)

Transcript of MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor,...

Page 1: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

MES7594-01 Genome Infor-matics I

- Lecture IV. NGS basics

Sangwoo Kim, Ph.D.Assistant Professor,

Severance Biomedical Research Institute, Yonsei University College of Medicine

Page 2: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Overview

• Goal of this lecture– You will learn the basic technologies and proper-

ties of Next Generation Sequencing

• Sequencing technologies– Sanger sequencing– Next generation sequencing

• Illumina sequencing• 454/Ion torrent sequencing• Other sequencing

– Raw data (fastq)• Format/Phred Quality

– Practice• meet the raw data

Page 3: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

SEQUENCING TECHNOLO-GIES

Page 4: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Traditional Sequencing

1. Genomic DNA is fragmented, then cloned to a plasmid vector and used to transform E. coli

2. For each sequencing reaction, a single bacterial colony is picked and plasmid DNA isolated

3. Each cycle sequencing reaction takes place within a microliter-scale volume

Page 5: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Sanger Sequencing

Page 6: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Next Generation Sequenc-ing

• No cloning– DNA to be sequenced is used to construct a library of

fragments that have synthetic DNAs (adapters) added covalently to each fragment end by use of DNA ligase

• Amplification can be done in parallel– Library fragments are amplified in situ on a solid surface

• Sequencing can be done in parallel (in 3 it-erative steps)– a nucleotide addition step– a detection step– a wash step

Page 7: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Illumina Sequencing

Page 8: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Illumina Sequencing

Page 9: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Illumina Sequencing

Page 10: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Illumina Sequencing

https://www.youtube.com/watch?v=HMyCqWhwB8E

Page 11: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Ion Torrent Sequencing

1. DNA capture on beads2. Single bead in a well3. Attach one nucleotide (A/T/G/C)

at one time4. Detect pH change

1. Measure the level of change for homopolymer detection

Page 12: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Ion Torrent Sequencing

Page 13: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Ion Torrent Sequencing

Page 14: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Ion Torrent Sequencing

Page 15: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Pacbio SMRT sequencing

zero-mode waveguide (ZMW) http://www.pacificbiosciences.com/products/smrt-technology/

Page 16: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Nanopore sequencing

https://www.youtube.com/watch?v=3UHw22hBpAk

Page 17: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Comparison

Page 18: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

18

Page 19: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

19

Page 20: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

NGS DATAraw data (FASTQ)

Page 21: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

FASTA format

A format for DNA (or protein) se-quence

Page 22: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

FASTQ format (NGS raw data)

one read

se-quencequal-ity

A format for NGS read (FASTQ + qual-ity)

Page 23: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Practice

• First look on NGS data

cd /scratch/2015_GenomeInformatics/public/fastq ls less sample1.fastq

Page 24: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

[email protected] D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTG-TATTCACCTTAGAAC+SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101#1:BDDDDF?FF@B>:ACFIBCGB3BF@C<?F9?DFBFCFEBFEFIFEIFFFDC>@ABBBB?BBBBBBBB?@:?AA@B@?(:4:>?<AB@:B@@B>>ABBB

Page 25: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Quality

• Each basecall (a call for nucleotide – ‘A’,’T’,’C’,’G’) has its own quality– quality is a confidence of the machine

Page 26: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

Phred scale [email protected] D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTG-TATTCACCTTAGAAC+SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101#1:BDDDDF?FF@B>:ACFIBCGB3BF@C<?F9?DFBFCFEBFEFIFEIFFFDC>@ABBBB?BBBBBBBB?@:?AA@B@?(:4:>?<AB@:B@@B>>ABBB

Q = -10log10(e)

Probability of the base call being wrong10%, 1%, 0.1%,

0.01%...

10, 20, 30, 40…Quality score

+33

+,5,?,I…

ASCII code table

Page 27: MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.

Genome Informatics I (2015 Spring)

practice• pick any sequence and find out where it is from• calculate what is the probability of a basecall with

quality ‘D’ is wrong• (advanced) write a python code that transforms Q

to e (or vice versa) – hint: function chr(i) converts the integer i to its matching

ASCII code character. e.g. chr(65)=‘A’– function ord(c) converts the character c to its matching

ASCII code integer. e.g. ord(‘A’)=65– math.log(10, x) calculates the log10 value of X

• You must import math library at the first line (import math)

– answer is in public/script/qtoe.py