2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene...

42
1 2: Large- Scale 1 / 42 Large!
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene...

Page 1: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

1

2: Large-Scale1 / 42

Large!

Page 2: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale2 / 42

High throughput technologies:

• Sequencing• Gene expression profiling• Chip-CHIP and tiling arrays• Whole genome yeast two hybrid scan• Genomic knockout of all single genes• SNP/CGH• Methylation profiling … • Proteome profiling

Page 3: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale3 / 42

Genomic Sequencing – shotgun sequencing

Sequencing is usually ~700 bp in a single run.

How can we sequence a genome?

Page 4: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale4 / 42

Genomic Sequencing – Walking.

1.Design a primer2.Sequence.3.Design a new primer4.Sequence5.…

One has to design new primers every time. To do so, one has to wait for the sequencing results

Page 5: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale5 / 42

GAGGAGACGAACACCCGTATACAGTCGACG

ACCCCGAGGAGACGAACACCCGTATACAGTCGACGTTTATATATA

GTATACAGTCGACGTTTATATATA

ACCCCGAGGAGACGA

Genomic Sequencing – shotgun sequencing

1. Break DNA to small pieces2. Sequence each piece3. Assemble

Page 6: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale6 / 42

After the DNA is isolated (from the tissue/cell/virus), it is fragmented either by restriction enzymes or by mechanical force.

ACGTAACGTATACCCGACTATATGCATTGCATATG “Frayed ends”

1 .Break DNA to small pieces

Page 7: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale7 / 42

←ATACGTAACGTATACCCGAC

TATATGCATTGCATATGGG→3’

5’

5’

3’

To blunt-end (“fix”) frayed ends, one needs a DNA polymerase. In the example above, just adding a polymerase will make the edges blunt.

Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)

Page 8: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale8 / 42

ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT

3’

5’

5’

3’

Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)

But what about this case?

←ATACGTAACGTATACCCGAC

TATATGCATTGCATATGGG→

5’3’

3’5’

Page 9: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale9 / 42

E. coli DNA polymerase has 3 domains:

One does the replication

One digests DNA 3’ → 5’ (exonuclease).

One digests DNA 5’ → 3’ (exonuclease).

Klenow fragment = engineered polymerase without the 5’ → 3’ exonuclease activity.

Page 10: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale10 / 42

ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT

3’

5’

5’

3’

Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)

But what about this case? Klenow has 3’ → 5’ exonuclease activity

←ATACGTAACGTATACCCGAC

TATATGCATTGCATATGGG→

5’3’

3’5’

Page 11: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale11 / 42

GAGGAGACGAACACCCGTATACAGTCGACG

GTATACAGTCGACGTTTATATATAACCCCGAGGAGACGA

The pieces are inserted into a vector – e.g., a plasmid. Sequencing is done from both sides

2. Sequence each piece:

One can use the same primers for all the sequencing. Parallelism of sequencing.

Page 12: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale12 / 42

GAGGAGACGAACACCCGTATACAGTCGACG

ACCCCGAGGAGACGA ? GTATACAGTCGACGTTTATATATA

GTATACAGTCGACGTTTATATATA

ACCCCGAGGAGACGA

Shotgun sequencing – why isn’t it a trivial task?

1. By chance, some parts are not sequenced even once!!!

Page 13: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale13 / 42

Shotgun sequencing – Definition of coverage.

X5 coverage: each base in the final sequence was present, on average, in 5 reads

Although the human genome was sequenced at a X12 coverage, still 1% of the genome is either not assembled or not reliable.

Page 14: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale14 / 42

Shotgun sequencing – why isn’t it a trivial task?

2. Some pieces do not align because of sequencing errors

GAGGTGAGGAACACCCGTATACAGTCGACG

ACCCCGAGG?GA?GAACACCCGTATACAGTCGACGTTTATATATA

ACCCCGAGGAGACGA

Page 15: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale15 / 42

Shotgun sequencing – why not a trivial task?

3. Repetitive sequences –satellites DNA.

GGGGGGGGGGGGGGGGGGGGGGGGGGGG

ACCCCGGGGGGGGGGGGG????GGGGGGGGGGGGGA

GGGGGGGGGGGGGGGGGGGGGGA

ACCCCGGGGG

Page 16: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale16 / 42

Shotgun sequencing – why isn’t it a trivial task?

4. Repetitive sequences (duplicated regions).In the genome we have duplicated regions which have almost identical sequence.

Page 17: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale17 / 42

Shotgun sequencing – why isn’t it a trivial task?

5. Some fragments are not sequenced because once inserted to a bacterium, they are toxic.

Page 18: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale18 / 42

A section of the genome that could be reliably assembled.

A contig

Page 19: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale19 / 42

A contig

Lander-Waterman estimation of number of contigs w.r.t. genome coverage

Page 20: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale20 / 42

At 8X-10X coverage, ~5 contigs are expected -> some of the genome is expected to be un-sequenced.

Page 21: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale21 / 42

Scaffolding

Page 22: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale22 / 42

Vector (e.g., e. coli)

Cloned fragment of the genome (e.g., 10 KB)

When sequencing a large genome, often the inserts are very large (10KB). In such case, it is impossible to sequence the entire insert, and only the edges are sequenced.

Page 23: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale23 / 42

Short fragments from both ends are sequenced

Mate pairsA read

Page 24: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale24 / 42

The size of the insert is also recorded.

Mate pairsA read 10 KB

Page 25: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale25 / 42

Information from mate pairs is used to build a scaffold of the genome

A contig

A contig

Page 26: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale26 / 42

The human genome is the chimp genome with 99% accuracy.

Comparative assembly

If one sequences the chimp genome – the information from the human genome can aid in the assembly.

Page 27: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale27 / 42

If one offers you to sequence your genome at 99.9% accuracy – don’t take it even for 5$.

Page 28: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale28 / 42

Often, phages are used as cloning vectors in standard cloning experiments. For genomic sequencing, Bacterial Artificial Chromosomes (BACs) are often used.

These are based on theF plasmid – a large plasmid that is stably replicating in E. coli.

Over 300kb can be insertedin the plasmid.

Page 29: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale29 / 42

The idea is to first divide a big genome to overlapping regions, put each in a BAC, and then use shotgun method to sequence each BAC.

BAC

BAC-by-BAC Assemble of the Genome

Into BAC

Shotgun

Sequencing the edges

Assemble each BAC

Page 30: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale30 / 42

Pyrosequencing: sequencing at the speed of light

Page 31: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale31 / 42

Pyrosequencing: a relatively new technique (invented 1986) in which the sequence of a DNA is discovered by synthesizing its complementary strand (the "sequencing by synthesis" principle).

Page 32: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale32 / 42 Pyrosequencing:

•Gel free

•Nucleotides are label free

•Parallelism

Page 33: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale33 / 42

GTP + DNA(n) -> DNA(n+1) + PPi

Enzyme = polymerase

PPi -> ATP

Enzyme = ATP Sulfurylase

ATP -> light

Enzyme = luciferase

ATP -> AMP + 2PPi

Enzyme = Apyrase

Page 34: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale34 / 42 Pyrosequencing

ACGTAACGTATACCCG

TGCATT?

Only if one adds G – there will be light!

AC

GT

AA

CG

TA

TA

CC

CG

TG

CA

TT

?

1. Add ATP -> no light2. Add CTP -> no light3. Add GTP -> light4. Add TTP -> no light5. Add ATP -> no light6. Add CTP -> light7. Add GTP -> no light8. Add TTP -> no light9. Add ATP -> light

GCA Sequence = GCA

Page 35: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale35 / 42 Pyrosequencing

Each DNA fragment was amplified and attached to a bead separately (one bead to each fragment). Each bead was added to a fibre-optic well.

Page 36: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale36 / 42 Pyrosequencing

A computer can read the light pattern from billions of wells simultaneously.

(Sequencing of a bacterial genome in 7h).

Page 37: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale37 / 42 Bioinformatics and medicine

Your chip analysis suggests

stress

Page 38: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale38 / 42 Bioinformatics and medicine

1.Today, medicine is based on episodic treatment.

2.First step that is currently taken place is the use of digital imaging and their analysis (e.g., optic fibers).

3.Next step: “Digital health” – medical data for a person will be shared by all doctors – no matter where you are.

Page 39: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale39 / 42 Bioinformatics and medicine

4. Clinical genomics: fast and accurate identification of pathogens

5. Clinical genomics: sequence (part) of the genome to gain insights into which drugs are efficient.

6. Predisposition analysis for diseases.

7. Towards “lifetime treatment”…

8. Less doctor intuition – more quantitative parameters and statistical analysis.

Page 40: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale40 / 42

Difference between humans:

• SNP – single nucleotide polymorphism

• CGH – copy number variation

• Chromatin

• Epigenetics

We want to link these differences to diseases.

Bioinformatics and medicine

Page 41: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale41 / 42 Some more important buzz words

Genomics

Proteomics

Metabolomics

System biology

In-silico (in vitro, in vivo)

Protein Engineering

Synthetic biology

Post genomic era

Page 42: 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

2: Large-Scale42 / 42 Some important NUMBERS

Human DNA = ~2 meters

300 x 109 cells

3.2 x 109 nucleotides