. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir...
![Page 1: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/1.jpg)
.
Introduction to Algorithms in Computational Biology
Lecture 1
This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il/~nir. Changes made by Dan Geiger.
Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.
![Page 2: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/2.jpg)
2
Course InformationMeetings:
Lecture, by Dan Geiger: Mondays 16:30 –18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 – 11:30, Taub 2.
Grade: 20% in five question sets. These questions sets are obligatory. Each
contains 4-6 theoretical problems. Submit in pairs in two weeks time 80% test. Must pass beyond 55 for the homework’s grade to count
Information and handouts:
www.cs.technion.ac.il/~cs236522
A brochure with zeroxed material at Taub library
![Page 3: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/3.jpg)
3
Course PrerequisitesComputer Science and Probability Background Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course)
Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Biology 1 (especially for those in the
Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site).
Studying the algorithms in this course while acquiring enough biology background is far more rewarding than ignoring the biological context.
![Page 4: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/4.jpg)
4
Relations to Some Other Courses
Intro to Bioinformatics (cs236523). This course covers practical aspects and hands on experience with web-based bioinformatics Software . Albeit not a formal requirement, it is recommended that you look on the web site http://webcourse.technion.ac.il/234523/ and examine the relevant software.
Algorithms in Computational Biology (cs236522). This is the current course which focuses on modeling some bioinformatics problems and presents algorithms for their solution.
Bioinformatics project (cs5236524). Developing bioinformatics tools under close guidance.
![Page 5: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/5.jpg)
5
First Homework Assignment
Solve two of the questions for Chapter 2 and two of the questions for Chapter 3.
Due time: During the third tutorial class, or earlier in the teaching assistant’s mail slot. Recall to submit in pairs.
Read carefully the first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.
![Page 6: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/6.jpg)
6
Computational Biology
Computational biology is the application of computational tools and techniques to (primarily) molecular biology. It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics.
Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.
![Page 7: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/7.jpg)
7
Examples of Areas of Interest
• Building evolutionary trees from molecular (and other) data• Efficiently assembling genomes of various organisms• Understanding the structure of genomes (SNP, SSR, Genes)• Understanding function of genes in the cell cycle and disease• Deciphering structure and function of proteins
![Page 8: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/8.jpg)
8
Exponential growth of biological information: growth of sequences, structures, and literature.
![Page 9: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/9.jpg)
9
Four Aspects
Biological What is the task?
Algorithmic How to perform the task at hand efficiently?
Learning How to adapt/estimate/learn parameters and
models describing the task from examples
Statistics How to differentiate true phenomena from
artifacts
![Page 10: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/10.jpg)
10
Example: Sequence Comparison
Biological Evolution preserves sequences, thus similar genes might
have similar function
Algorithmic Consider all ways to “align” one sequence against
another
Learning How do we define “similar” sequences? Use examples to
define similarity
Statistics When we compare to ~106 sequences, what is a random
match and what is true one
![Page 11: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/11.jpg)
11
Course Goals
Learning about computational tools for (primarily) molecular biology.
We will cover computational tasks that are posed by modern molecular biology
We will discuss the biological motivation and setup for these tasks
We will understand the kinds of solutions that exist and what principles justify them
![Page 12: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/12.jpg)
12
Topics I
Dealing with DNA/Protein sequences: Finding similar sequences Models of sequences: Hidden Markov Models Gene finding Genome projects and how sequences are found
![Page 13: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/13.jpg)
13
Topics II
Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from sequences Short term: genetic variations in a population Finding genes by linkage and association
![Page 14: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/14.jpg)
14
Topics III (One class, if time allows)
Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data How to analyze proteins changes from raw
experimental measurements (MassSpec)
![Page 15: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/15.jpg)
15
Human Genome
Most human cells contain
46 chromosomes:
2 sex chromosomes (X,Y):
XY – in males.
XX – in females.
22 pairs of chromosomes named autosomes.
![Page 16: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/16.jpg)
16
DNA OrganizationS
ourc
e: A
lber
ts e
t al
![Page 17: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/17.jpg)
17
The Double HelixS
ourc
e: A
lber
ts e
t al
![Page 18: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/18.jpg)
18
DNA Components
Four nucleotide types: Adenine Guanine Cytosine Thymine
Hydrogen bonds(electrostatic connection): A-T C-G
![Page 19: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/19.jpg)
19
Genome Sizes
E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases
![Page 20: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/20.jpg)
20
Genetic Information
Gene – basic unit of genetic information. They determine the inherited characters.
Genome – the collection of genetic information.
Chromosomes – storage units of genes.
![Page 21: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/21.jpg)
21
GenesThe DNA strings include: Coding regions (“genes”)
E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes
Control regions These typically are adjacent to the genes They determine when a gene should be
expressed “Junk” DNA (unknown function)
![Page 22: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/22.jpg)
22
The Cell
All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.
![Page 23: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/23.jpg)
23
Example: Tissues in Stomach
How is this variety encoded and expressed ?
![Page 24: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/24.jpg)
24
Central Dogma
Transcription
mRNA
Translation
ProteinGene
cells express different subset of the genesIn different tissues and under different conditions
שעתוק תרגום
![Page 25: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/25.jpg)
25
Transcription
Coding sequences can be transcribed to RNA
RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)
Sou
rce:
Mat
hew
s &
van
Hol
de
![Page 26: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/26.jpg)
26
Transcription: RNA Editing
Exons hold information, they are more stable during evolution.This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma.
1. Transcribe to RNA2. Eliminate introns3. Splice (connect) exons* Alternative splicing exists
![Page 27: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/27.jpg)
27
RNA roles Messenger RNA (mRNA)
Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block).
Transfer RNA (tRNA) Decodes the mRNA molecules to amino-acids. It connects
to the mRNA with one side and holds the appropriate amino acid on its other side.
Ribosomal RNA (rRNA) Part of the ribosome, a machine for translating mRNA to
proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created.
...
![Page 28: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/28.jpg)
28
Translation (Outside the nucleolus)
Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA
molecules The ribosome attaches to the mRNA at a
translation initiation site Then ribosome moves along the mRNA sequence
and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.
![Page 29: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/29.jpg)
29
Genetic Code
There are 20 amino acids from which proteins are build.
![Page 30: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/30.jpg)
30
Protein Structure
Proteins are poly-peptides of 70-3000 amino-acids
This structure is (mostly) determined by the sequence of amino-acids that make up the protein
![Page 31: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/31.jpg)
31
Protein Structure
![Page 32: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/32.jpg)
32
Evolution
Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the
chromosomes Evolution plays a major role in biology
Many mechanisms are shared across a wide range of organisms
During the course of evolution existing components are adapted for new functions
![Page 33: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/33.jpg)
33
Evolution
Evolution of new organisms is driven by Diversity
Different individuals carry different variants of the same basic blue print
Mutations The DNA sequence can be changed due to
single base changes, deletion/insertion of DNA segments, etc.
Selection bias
![Page 34: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/34.jpg)
34
The Tree of Life
Sou
rce:
Alb
erts
et
al
![Page 35: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/35.jpg)
35
Example for Phylogenetic AnalysisInput: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species.
Question: Which evolutionary tree best explains these sequences ?
AGAAAA
GGAAAG
AAA AAA
AAA
21 1
Total #substitutions = 4
One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the evolutionary tree (Also called phylogenetic tree).
![Page 36: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/36.jpg)
36
Example ContinuedThere are many trees possible. For example:
AGAGGA
AAAAAG
AAA AGA
AAA
11
1
Total #substitutions = 3
GGAAAA
AGAAAG
AAA AAA
AAA
11 2
Total #substitutions = 4
The left tree is “better” than the right tree.
Questions:Is this principle yielding realistic phylogenetic trees ? (Evolution)How can we compute the best tree efficiently ? (Computer Science)What is the probability of substitutions given the data ? (Learning)Is the best tree found significantly better than others ? (Statistics)
![Page 37: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/37.jpg)
37
Werner’s Syndrome
A successful application of genetic linkage analysis
![Page 38: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/38.jpg)
38
The Disease
First references in 1960s Causes premature ageing Linkage studies from 1992 WRN gene cloned in 1996 Subsequent discovery of mechanisms involved in
wild-type and mutant proteins
![Page 39: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/39.jpg)
39
A sample Input
The study used 13 Markers; here we see only one.
The study used 14 families; here we see only one.
2
4
5
1
3
H
A1/A1
D
A2/A2
H
A1/A2
D
A1/A2
H
A2/A2
D DA1 A2
H DA1 A2
H | DA2 | A2
D DA2 A2
Recombinant
Phase inferred
![Page 40: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/40.jpg)
40
Genehunter Output
position LOD_score information 0.00 -1.254417 0.224384 1.52 2.836135 0.226379 ...[data skipped]...
18.58 13.688599 0.384088 19.92 14.238474 0.401992 21.26 14.718037 0.426818 22.60 15.159389 0.462284 22.92 15.056713 0.462510 23.24 14.928614 0.463208 23.56 14.754848 0.464387
...[data skipped]...
81.84 1.939215 0.059748 90.60 -11.930449 0.087869
distance between markers in centi-
morgans
Most ‘likely’ position
D8S339D8S131
D8S259
Marker’s name
Log likelihood of placing disease
gene at distance, relative to it being
unlinked.
Maximum log likelihood score
![Page 41: . Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at nir.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d815503460f94a659f0/html5/thumbnails/41.jpg)
41
Final Location
Marker D8S131
Marker D8S259
location of marker D8S339
WRN Gene final location
Error in location by genetic linkage of about 1.25M base pairs.