What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological...
-
Upload
peregrine-glenn -
Category
Documents
-
view
217 -
download
0
Transcript of What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological...
![Page 1: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/1.jpg)
What is bioinformatics?
![Page 2: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/2.jpg)
What are bioinformaticians up to, actually?
• Manage molecular biological data– Store in databases, organise, formalise, describe...
• Compare molecular biological data• Find patterns in molecular biological data
– phylogenies
– correlations (sequence / structure / expression / function / disease)
Goals:• characterise biological patterns & processes• predict biological properties
– low level data high level properties ⇒(eg., sequence function)⇒
![Page 3: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/3.jpg)
Bioinformatics: neighbour disciplines
• Computational biology– Broader concept: includes computational ecology,
physiology, neurology etc...
• -omics:– Genomics– Transcriptomics– Proteomics
• Systems biology– Putting it all together...– Building models, identify control & regulation
![Page 4: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/4.jpg)
Bioinformatics: prerequisites
• Bio- side:– Molecular biology– Cell biology – Genetics– Evolutionary theory
• -informatics side:– Computer science– Statistics– Theoretical physics
![Page 5: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/5.jpg)
Molecular biology data...
>alpha-DATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTGGAGCCGAGGCCCTGGAGAGGTGCGGGCTGAGCTTGGGGAAACCATGGGCAAGGGGGGCGACTGGGTGGGAGCCCTACAGGGCTGCTGGGGGTTGTTCGGCTGGGGGTCAGCACTGACCATCCCGCTCCCGCAGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTTGCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGCAACGCTGTCAAGAGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCAGCGACCTGCATGCCTACAACCTGCGTGTCGACCCTGTCAACTTCAAGGCAGGCGGGGGACGGGGGTCAGGGGCCGGGGAGTTGGGGGCCAGGGACCTGGTTGGGGATCCGGGGCCATGCCGGCGGTACTGAGCCCTGTTTTGCCTTGCAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTGGCCACACACCTGGGCAACGACTACACCCCGGAGGCACATGCTGCCTTCGACAAGTTCCTGTCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGATAA>alpha-AATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGCCAGGCCGGTGACTTGGGTGGTGAAGCCCTGGAGAGGTATGTGGTCATCCGTCATTACCCCATCTCTTGTCTGTCTGTGACTCCATCCCATCTGCCCCCATACTCTCCCCATCCATAACTGTCCCTGTTCTATGTGGCCCTGGCTCTGTCTCATCTGTCCCCAACTGTCCCTGATTGCCTCTGTCCCCCAGGTTGTTCATCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACCTGTCACATGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTGGCGGAGGCACTGGTTGAGGCTGCCAACCACATCGATGACATCGCTGGTGCCCTCTCCAAGCTGAGCGACCTCCACGCCCAAAAGCTCCGTGTGGACCCCGTCAACTTCAAAGTGAGCATCTGGGAAGGGGTGACCAGTCTGGCTCCCCTCCTGCACACACCTCTGGCTACCCCCTCACCTCACCCCCTTGCTCACCATCTCCTTTTGCCTTTCAGCTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTTCCCCTCTCTCCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGGCACCGTCCTTACTGCCAAGTACCGTTAA
• DNA sequences
![Page 6: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/6.jpg)
Molecular biology data...
• Amino acid sequences
• Protein structure:– X-ray crystallography
– NMR
![Page 7: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/7.jpg)
Cell biology & proteomics data...
• Subcellular localization
![Page 8: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/8.jpg)
Cell biology & proteomics data...
protein-protein interactions
![Page 9: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/9.jpg)
DNA microarray technologyTranscriptomics: DNA microarray technology
![Page 10: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/10.jpg)
Proteomics & transcriptomics data
Proteins encoded by periodically expressed genes: a functionally diverse protein category
![Page 11: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/11.jpg)
Cell cycle regulation
Multiprotein complexes in yeast (Saccharomyces cerevisiae).
Coloured components are periodically expressed (dynamic). White components are always expressed (static).
Complexes contain both kinds of components (just-in-time assembly).
Dynamic complex formation during the yeast cell cycle.Science. 2005 Feb 4;307(5710):724-7.
de Lichtenberg U, Jensen LJ, Brunak S, Bork P.
![Page 12: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/12.jpg)
Cell cycle regulation 2
Transcriptional regulation is poorly conserved!
Lars Juhl Jensen, Thomas Skøt Jensen, Ulrik de Lichtenberg, Søren Brunak and Peer Bork, Nature, Oct 5, 2006
Dynamic components overlap:
![Page 13: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/13.jpg)
Phenotype data: human diseases
![Page 14: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/14.jpg)
Prediction methods
• Homology / Alignment
• Simple pattern (“word”) recognition • Statistical methods
– Weight matrices: calculate amino acid probabilities– Other examples: Regression, variance analysis, clustering
• Machine learning– Like statistical methods, but parameters are estimated by
iterative training rather than direct calculation– Examples: Neural Networks (NN), Hidden Markov Models
(HMM), Support Vector Machines (SVM)
• Combinations
![Page 15: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/15.jpg)
Simple pattern (“word”) recognition
Example: PROSITE entry PS00014, ER_TARGET:
Endoplasmic reticulum targeting sequence (”KDEL-signal”).
Pattern: [KRHQSA]-[DENQ]-E-L>
NB: only yes/no answers!
![Page 16: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/16.jpg)
Statistical methods
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
• Estimate probabilities for nucleotides / amino acids• Information content in sequences; logos; Position-
Weight Matrices.• Quantitative answers.
![Page 17: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/17.jpg)
ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I-TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---IIE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD----TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---VASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE----TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI
Sequence profiles
Conserved
Non-conserved
Matching anything but G => large negative score
Anything can match
TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDPMERNTAGVP
TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI
Unlike Position-Weight Matrices, sequence profiles are able to handle gaps
![Page 18: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/18.jpg)
HMM (Hidden Markov Models)
Which parts of the sequence are
– in the cytoplasm?
– embedded in the membrane (TM-helices)?
– outside?
Rule-of-thumb says amino acids in the TM helices are hydrophobic – but there are exceptions
TMHMM predicts topology from sequence using a statistical model
TMHMM: predicting the topology of α-helix transmembrane proteins
![Page 19: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/19.jpg)
IKEEHVI IQAE
HEC
IKEEHVIIQAEFYLNPDQSGEF…..Window
Input Layer
Hidden Layer
Output Layer
Weights
Neural Networks: ”Architecture”
![Page 20: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/20.jpg)
Neural networks
• Neural networks can learn higher order correlations!– What does this
mean?
A A => 0A C => 1C A => 1C C => 0
No linear function can learn this pattern
Neural Networks: advantages
![Page 21: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/21.jpg)
How to estimate parameters for prediction?
![Page 22: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/22.jpg)
Model selection
Linear Regression Quadratic Regression Join-the-dots
![Page 23: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/23.jpg)
The test set method
![Page 24: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/24.jpg)
The test set method
![Page 25: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/25.jpg)
The test set method
![Page 26: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/26.jpg)
The test set method
![Page 27: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/27.jpg)
The test set method
![Page 28: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/28.jpg)
Problem: sequences are related
• If the sequences in the test set are closely related to those in the training set, we can not measure true generalization performanceALAKAAAAM
ALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
![Page 29: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/29.jpg)
Protein sorting in eukaryotes
• Proteins belong in different organelles of the cell – and some even have their function outside the cell
• Günter Blobel was in 1999 awarded The Nobel Prize in Physiology or Medicine for the discovery that "proteins have intrinsic signals that govern their transport and localization in the cell"
![Page 30: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/30.jpg)
Secretory proteins have a signal peptide
Initially, they are transported across the ER membrane
Protein sorting: secretory pathway / ER
![Page 31: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/31.jpg)
Signal peptides
A signal peptide is an N-terminal part of the amino acid chain, containing a hydrophobic region.
Signal peptides differ between proteins, and can be hard to recognize.
![Page 32: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...](https://reader036.fdocuments.net/reader036/viewer/2022062717/56649e265503460f94b15573/html5/thumbnails/32.jpg)
SignalP
http://www.cbs.dtu.dk/services/SignalP/
Method for prediction of signal peptides from the amino acid sequence
Based on Neural Networks
The first SignalP paper (Nielsen H, et al., Protein Eng. 10[1]: 1-6, January 1997) was cited 3144 times in a 10 year period