2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene...
1
2: Large-Scale1 / 42
Large!
2: Large-Scale2 / 42
High throughput technologies:
• Sequencing• Gene expression profiling• Chip-CHIP and tiling arrays• Whole genome yeast two hybrid scan• Genomic knockout of all single genes• SNP/CGH• Methylation profiling … • Proteome profiling
2: Large-Scale3 / 42
Genomic Sequencing – shotgun sequencing
Sequencing is usually ~700 bp in a single run.
How can we sequence a genome?
2: Large-Scale4 / 42
Genomic Sequencing – Walking.
1.Design a primer2.Sequence.3.Design a new primer4.Sequence5.…
One has to design new primers every time. To do so, one has to wait for the sequencing results
2: Large-Scale5 / 42
GAGGAGACGAACACCCGTATACAGTCGACG
ACCCCGAGGAGACGAACACCCGTATACAGTCGACGTTTATATATA
GTATACAGTCGACGTTTATATATA
ACCCCGAGGAGACGA
Genomic Sequencing – shotgun sequencing
1. Break DNA to small pieces2. Sequence each piece3. Assemble
2: Large-Scale6 / 42
After the DNA is isolated (from the tissue/cell/virus), it is fragmented either by restriction enzymes or by mechanical force.
ACGTAACGTATACCCGACTATATGCATTGCATATG “Frayed ends”
1 .Break DNA to small pieces
2: Large-Scale7 / 42
←ATACGTAACGTATACCCGAC
TATATGCATTGCATATGGG→3’
5’
5’
3’
To blunt-end (“fix”) frayed ends, one needs a DNA polymerase. In the example above, just adding a polymerase will make the edges blunt.
Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)
2: Large-Scale8 / 42
ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT
3’
5’
5’
3’
Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)
But what about this case?
←ATACGTAACGTATACCCGAC
TATATGCATTGCATATGGG→
5’3’
3’5’
2: Large-Scale9 / 42
E. coli DNA polymerase has 3 domains:
One does the replication
One digests DNA 3’ → 5’ (exonuclease).
One digests DNA 5’ → 3’ (exonuclease).
Klenow fragment = engineered polymerase without the 5’ → 3’ exonuclease activity.
2: Large-Scale10 / 42
ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT
3’
5’
5’
3’
Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)
But what about this case? Klenow has 3’ → 5’ exonuclease activity
←ATACGTAACGTATACCCGAC
TATATGCATTGCATATGGG→
5’3’
3’5’
2: Large-Scale11 / 42
GAGGAGACGAACACCCGTATACAGTCGACG
GTATACAGTCGACGTTTATATATAACCCCGAGGAGACGA
The pieces are inserted into a vector – e.g., a plasmid. Sequencing is done from both sides
2. Sequence each piece:
One can use the same primers for all the sequencing. Parallelism of sequencing.
2: Large-Scale12 / 42
GAGGAGACGAACACCCGTATACAGTCGACG
ACCCCGAGGAGACGA ? GTATACAGTCGACGTTTATATATA
GTATACAGTCGACGTTTATATATA
ACCCCGAGGAGACGA
Shotgun sequencing – why isn’t it a trivial task?
1. By chance, some parts are not sequenced even once!!!
2: Large-Scale13 / 42
Shotgun sequencing – Definition of coverage.
X5 coverage: each base in the final sequence was present, on average, in 5 reads
Although the human genome was sequenced at a X12 coverage, still 1% of the genome is either not assembled or not reliable.
2: Large-Scale14 / 42
Shotgun sequencing – why isn’t it a trivial task?
2. Some pieces do not align because of sequencing errors
GAGGTGAGGAACACCCGTATACAGTCGACG
ACCCCGAGG?GA?GAACACCCGTATACAGTCGACGTTTATATATA
ACCCCGAGGAGACGA
2: Large-Scale15 / 42
Shotgun sequencing – why not a trivial task?
3. Repetitive sequences –satellites DNA.
GGGGGGGGGGGGGGGGGGGGGGGGGGGG
ACCCCGGGGGGGGGGGGG????GGGGGGGGGGGGGA
GGGGGGGGGGGGGGGGGGGGGGA
ACCCCGGGGG
2: Large-Scale16 / 42
Shotgun sequencing – why isn’t it a trivial task?
4. Repetitive sequences (duplicated regions).In the genome we have duplicated regions which have almost identical sequence.
2: Large-Scale17 / 42
Shotgun sequencing – why isn’t it a trivial task?
5. Some fragments are not sequenced because once inserted to a bacterium, they are toxic.
2: Large-Scale18 / 42
A section of the genome that could be reliably assembled.
A contig
2: Large-Scale19 / 42
A contig
Lander-Waterman estimation of number of contigs w.r.t. genome coverage
2: Large-Scale20 / 42
At 8X-10X coverage, ~5 contigs are expected -> some of the genome is expected to be un-sequenced.
2: Large-Scale21 / 42
Scaffolding
2: Large-Scale22 / 42
Vector (e.g., e. coli)
Cloned fragment of the genome (e.g., 10 KB)
When sequencing a large genome, often the inserts are very large (10KB). In such case, it is impossible to sequence the entire insert, and only the edges are sequenced.
2: Large-Scale23 / 42
Short fragments from both ends are sequenced
Mate pairsA read
2: Large-Scale24 / 42
The size of the insert is also recorded.
Mate pairsA read 10 KB
2: Large-Scale25 / 42
Information from mate pairs is used to build a scaffold of the genome
A contig
A contig
2: Large-Scale26 / 42
The human genome is the chimp genome with 99% accuracy.
Comparative assembly
If one sequences the chimp genome – the information from the human genome can aid in the assembly.
2: Large-Scale27 / 42
If one offers you to sequence your genome at 99.9% accuracy – don’t take it even for 5$.
2: Large-Scale28 / 42
Often, phages are used as cloning vectors in standard cloning experiments. For genomic sequencing, Bacterial Artificial Chromosomes (BACs) are often used.
These are based on theF plasmid – a large plasmid that is stably replicating in E. coli.
Over 300kb can be insertedin the plasmid.
2: Large-Scale29 / 42
The idea is to first divide a big genome to overlapping regions, put each in a BAC, and then use shotgun method to sequence each BAC.
BAC
BAC-by-BAC Assemble of the Genome
Into BAC
Shotgun
Sequencing the edges
Assemble each BAC
2: Large-Scale30 / 42
Pyrosequencing: sequencing at the speed of light
2: Large-Scale31 / 42
Pyrosequencing: a relatively new technique (invented 1986) in which the sequence of a DNA is discovered by synthesizing its complementary strand (the "sequencing by synthesis" principle).
2: Large-Scale32 / 42 Pyrosequencing:
•Gel free
•Nucleotides are label free
•Parallelism
2: Large-Scale33 / 42
GTP + DNA(n) -> DNA(n+1) + PPi
Enzyme = polymerase
PPi -> ATP
Enzyme = ATP Sulfurylase
ATP -> light
Enzyme = luciferase
ATP -> AMP + 2PPi
Enzyme = Apyrase
2: Large-Scale34 / 42 Pyrosequencing
ACGTAACGTATACCCG
TGCATT?
Only if one adds G – there will be light!
AC
GT
AA
CG
TA
TA
CC
CG
TG
CA
TT
?
1. Add ATP -> no light2. Add CTP -> no light3. Add GTP -> light4. Add TTP -> no light5. Add ATP -> no light6. Add CTP -> light7. Add GTP -> no light8. Add TTP -> no light9. Add ATP -> light
GCA Sequence = GCA
2: Large-Scale35 / 42 Pyrosequencing
Each DNA fragment was amplified and attached to a bead separately (one bead to each fragment). Each bead was added to a fibre-optic well.
2: Large-Scale36 / 42 Pyrosequencing
A computer can read the light pattern from billions of wells simultaneously.
(Sequencing of a bacterial genome in 7h).
2: Large-Scale37 / 42 Bioinformatics and medicine
Your chip analysis suggests
stress
2: Large-Scale38 / 42 Bioinformatics and medicine
1.Today, medicine is based on episodic treatment.
2.First step that is currently taken place is the use of digital imaging and their analysis (e.g., optic fibers).
3.Next step: “Digital health” – medical data for a person will be shared by all doctors – no matter where you are.
2: Large-Scale39 / 42 Bioinformatics and medicine
4. Clinical genomics: fast and accurate identification of pathogens
5. Clinical genomics: sequence (part) of the genome to gain insights into which drugs are efficient.
6. Predisposition analysis for diseases.
7. Towards “lifetime treatment”…
8. Less doctor intuition – more quantitative parameters and statistical analysis.
2: Large-Scale40 / 42
Difference between humans:
• SNP – single nucleotide polymorphism
• CGH – copy number variation
• Chromatin
• Epigenetics
We want to link these differences to diseases.
Bioinformatics and medicine
2: Large-Scale41 / 42 Some more important buzz words
Genomics
Proteomics
Metabolomics
System biology
In-silico (in vitro, in vivo)
Protein Engineering
Synthetic biology
Post genomic era
2: Large-Scale42 / 42 Some important NUMBERS
Human DNA = ~2 meters
300 x 109 cells
3.2 x 109 nucleotides