Bioinformatics For MNW 2 nd Year
description
Transcript of Bioinformatics For MNW 2 nd Year
![Page 1: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/1.jpg)
Bioinformatics For MNW 2nd Year
Jaap Heringa
FEW/FALW
Integrative Bioinformatics Institute VU (IBIVU)
[email protected], www.cs.vu.nl/~ibivu, Tel. 47649, Rm R4.41
![Page 2: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/2.jpg)
Other teachers in the course
• Jens Kleinjung (1/11/02)
• Victor Simosis – PhD (1/12/02)
• Radek Szklarczyk - PhD (1/01/03)
![Page 3: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/3.jpg)
Bioinformatics course 2nd year MNW spring 2003
• Pattern recognition– Supervised/unsupervised learning– Types of data, data normalisation, lacking data– Search image– Similarity/distance measures– Clustering– Principal component analysis– Discriminant analysis
![Page 4: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/4.jpg)
Bioinformatics course 2nd year MNW spring 2003
• Protein– Folding– Structure and function– Protein structure prediction– Secondary structure– Tertiary structure– Function– Post-translational modification – Prot.-Prot. Interaction -- Docking algorithm– Molecular dynamics/Monte Carlo
![Page 5: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/5.jpg)
Bioinformatics course 2nd year MNW spring 2003
• Sequence analysis– Pairwise alignment– Dynamic programming (NW, SW, shortcuts)– Multiple alignment– Combining information– Database/homology searching (Fasta, Blast,
Statistical issues-E/P values)
![Page 6: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/6.jpg)
Bioinformatics course 2nd year MNW spring 2003
• Gene structure and gene finding algorithms• Genomics
– Expression data, Nucleus to ribosome, translation, etc.– Proteomics, Metabolomics, Physiomics– Databases
• DNA, EST • Protein sequence (SwissProt)• Protein structure (PDB)• Microarray data • Proteomics• Mass spectrometry/NMR/X-ray
![Page 7: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/7.jpg)
Bioinformatics course 2nd year MNW spring 2005
• Bioinformatics method development• Programming and scripting languages• Web solutions• Computational issues
– NP-complete problems– CPU, memory, storage problems– Parallel computing
• Bioinformatics method usage/application• Molecular viewers (RasMol, MolMol, etc.)
![Page 8: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/8.jpg)
Gathering knowledge
• Anatomy, architecture
• Dynamics, mechanics
• Informatics(Cybernetics – Wiener, 1948) (Cybernetics has been defined as the science of control in machines and animals, and hence it applies to technological, animal and environmental systems)
• Genomics, bioinformatics
Rembrandt, 1632
Newton, 1726
![Page 9: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/9.jpg)
MathematicsStatistics
Computer ScienceInformatics
BiologyMolecular biology
Medicine
Chemistry
Physics
Bioinformatics
Bioinformatics
![Page 10: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/10.jpg)
Bioinformatics
“Studying informational processes in biological systems” (Hogeweg, early
1970s)• No computers necessary• Back of envelope OK
Applying algorithms with mathematical formalisms in biology (genomics) Not good: biology and biological knowledge is crucial for making meaningful analysis methods!
“Information technology applied to the management and analysis of biological data” (Attwood and Parry-Smith)
![Page 11: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/11.jpg)
Bioinformatics in the olden days• Close to Molecular Biology:
– (Statistical) analysis of protein and nucleotide structure
– Protein folding problem– Protein-protein and protein-nucleotide
interaction
• Many essential methods were created early on (BG era)– Protein sequence analysis (pairwise and
multiple alignment)– Protein structure prediction (secondary, tertiary
structure)
![Page 12: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/12.jpg)
Bioinformatics in the olden days (Cont.)
• Evolution was studied and methods created– Phylogenetic reconstruction (clustering – e.g.,
Neighbour Joining (NJ) method)
![Page 13: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/13.jpg)
But then the big bang….
![Page 14: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/14.jpg)
The Human Genome -- 26 June 2000
![Page 15: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/15.jpg)
The Human Genome -- 26 June 2000
Dr. Craig Venter
Celera Genomics
-- Shotgun method
Sir John Sulston
Human Genome Project
![Page 16: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/16.jpg)
Human DNA
• There are about 3bn (3 109) nucleotides in the nucleus of almost all of the trillions (3.5 1012 ) of cells of a human body (an exception is, for example, red blood cells which have no nucleus and therefore no DNA) – a total of ~1022 nucleotides!
• Many DNA regions code for proteins, and are called genes (1 gene codes for 1 protein as a base rule, but the reality is a lot more complicated)
• Human DNA contains ~27,000 expressed genes • Deoxyribonucleic acid (DNA) comprises 4 different
types of nucleotides: adenine (A), thiamine (T), cytosine (C) and guanine (G). These nucleotides are sometimes also called bases
![Page 17: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/17.jpg)
Human DNA (Cont.)
• All people are different, but the DNA of different people only varies for 0.2% or less. So, only up to 2 letters in 1000 are expected to be different. Evidence in current genomics studies (Single Nucleotide Polymorphisms or SNPs) imply that on average only 1 letter out of 1400 is different between individuals. Over the whole genome, this means that 2 to 3 million letters would differ between individuals.
• The structure of DNA is the so-called double helix, discovered by Watson and Crick in 1953, where the two helices are cross-linked by A-T and C-G base-pairs (nucleotide pairs – so-called Watson-Crick base pairing).
![Page 18: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/18.jpg)
Modern bioinformatics is closely associated with genomics
• The aim is to solve the genomics information problem
• Ultimately, this should lead to biological understanding how all the parts fit (DNA, RNA, proteins, metabolites) and how they interact (gene regulation, gene expression, protein interaction, metabolic pathways, protein signalling, etc.)
• More in the next lecture…
![Page 19: Bioinformatics For MNW 2 nd Year](https://reader034.fdocuments.net/reader034/viewer/2022050908/56814796550346895db4cb91/html5/thumbnails/19.jpg)
TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)
Genome
Expressome
Proteome
Metabolome
Functional GenomicsFunctional GenomicsFrom gene to functionFrom gene to function