Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail
description
Transcript of Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail
![Page 2: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/2.jpg)
Not only small molecules and QM, MM techniques rule the world.
![Page 3: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/3.jpg)
Central dogma of molecular biology
• Term is due to Francis Crick• The conversion DNA →
protein is not direct, RNA is involved
• DNA is the information store, RNA is messenger (mRNA), transporter (tRNA), biomolecular nanomachine (rRNA)
source: wikipedia.org
![Page 4: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/4.jpg)
Nucleic acids• four letters (DNA, RNA)• sequence - AACTAACG (5’ → 3’)• DNA – double helix• RNA – “single stranded” helix, folding (double helical
regions, C2’ -OH → secondary and tertiary motifs)
![Page 5: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/5.jpg)
nucleoside
nucleotide
![Page 6: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/6.jpg)
B-DNA A-DNA Z-DNA
B
A
Z
![Page 7: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/7.jpg)
RNA secondary motifs
Nowakowski and Tinoco, Seminars in Virology 8, 153, 1997.
![Page 8: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/8.jpg)
RNA
source: http://complex.upf.es/~josep/RNA.jpg, http://www.biosci.ki.se/groups/ljo/images/phe_trna_large.jpg, http://rna.ucsc.edu/rnacenter/images/70s_atrna.jpg
![Page 9: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/9.jpg)
Proteins• 20 letters• primary structure - sequence AMNTSSTVG (N-end → C-
end)
Alberts, Molecular Biology of the Cell, 5th Ed.
![Page 10: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/10.jpg)
• secondary structure (random coil, -helix, β-sheet, loops)
• several secondary structure elements form motifs
• e.g. greek key, β-α-β, HTH
![Page 11: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/11.jpg)
• tertiary structure (the arrangements of motifs into domain/s)
• quartenary structure (multimeric complexes)
![Page 12: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/12.jpg)
Proteins
source:http://calstate.fullerton.edu/news/arts/2003/photos/protein-art.jpg
![Page 13: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/13.jpg)
Proteins
source: Petsko, Ringe – Protein structure and function
![Page 14: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/14.jpg)
http://www.cellsignal.com/reference/pathway/NF_kappaB.html
![Page 15: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/15.jpg)
Systems biology• focuses on the systematic study of complex interactions in
biological systems using a new perspective - holism instead of reductionism • holism – the properties of a system cannot be determined or
explained by its component parts alone • one of the goals of systems biology is to discover new
emergent properties • new field, boom since 2000, very little covered in CZ
![Page 16: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/16.jpg)
Systems biology
source: wikipedia.org
![Page 17: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/17.jpg)
Systems biology• based on mathematical modelling of systems, control
theory, cybernetics• engineering view on complex biological systems• e.g. answers questions about robustness of the given
system when one of its part fails• or about response of a systems upon the change of the
environmental conditions
![Page 18: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/18.jpg)
quantum chemistry
molecular dynamics
bioinformatics
systems biology
![Page 19: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/19.jpg)
Bioinformatics• application of information technology to the field of
molecular biology, genomics and related biological disciplines
• tremendous amount of data• the creation and advancement of databases, algorithms,
computational and statistical techniques, and theory to solve problems arising from the management and analysis of biological data
![Page 20: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/20.jpg)
Podle definičního třídění ruských vědců rozlišujeme dva obory paranormálních jevů: bioinformatika a bioenergetika. Bioinformatika (tzn. mimosmyslové vnímání, ESP) zahrnuje získávání a výměnu informací mimosmyslovou cestou (nikoli normálními smyslovými orgány). V podstatě rozlišujeme následující formy bioinformace: hypnózu (kontrolu vědomí), telepatii, dálkové vnímání, prekognici, retrokognici, mimotělní zkušenost, "vidění" rukama nebo jinými částmi těla, inspiraci a zjevení.
zdroj: http://www.esoterika.cz/clanek/2992-mimosmyslova_spionaz_dalkove_pozorovani_i_.htm
![Page 21: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/21.jpg)
Bioinformatics
• sequence analysis (sequence bioinformatics)• structural analysis (structural bioinformatics)• functional analysis (systems biology)
![Page 22: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/22.jpg)
• genetic code• gene• genome, genomics
• large data sets• high throughput
• human genome• DNA localized mainly in nucleus, each nucleus carries the
whole genetic information• 3.2 billions bp• 25 000 – 30 000 genes• ca 1,5 % codes for proteins, the rest - junk DNA
• what is proteome?• proteomics
• Is it more difficult to study genome or proteome?
![Page 23: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/23.jpg)
Sequential bioinformatics
• reconstruction of sequence fragments• searching of genes and other interesting regions in the genome• junk DNA – 95% of human genome is made by non-coding
sequences, either no function, or not yet understood• querying huge genomes for a given sequence• comparison of genes within a specie – similarities between protein functions
• comparison of genes between species – organism's evolutionary relationships (phylogenetic analysis)
![Page 24: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/24.jpg)
Sequence alignment• Procedure of comparing sequences• Point mutations – easy
• More difficult example
• However, gaps can be inserted to get something like this
ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
gapless alignment
gapped alignmentinsertion × deletionindel
![Page 25: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/25.jpg)
Flavors of sequence alignmentpair-wise alignment × multiple sequence alignment
![Page 26: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/26.jpg)
Flavors of sequence alignmentglobal alignment × local alignment
global
local
align entire sequence
stretches of sequence with the highest density of matches are aligned, generating islands of matches or subalignments in the aligned sequences
![Page 27: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/27.jpg)
Identity matrix
Scoring systems I• DNA and protein sequences can be aligned so that the
number of identically matching pairs is maximized.
• Counting the number of matches gives us a score (3 in this case). Higher score means better alignment.
• This procedure can be formalized using substitution matrix.
A T T G - - - TA – - G A C A T
A T C G
A 1
T 0 1
C 0 0 1
G 0 0 0 1
![Page 28: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/28.jpg)
Scoring systems II• For nucleotide sequences identity matrix is usually good enough.• For protein sequences identity matrix is not sufficient to describe
biological and evolutionary proceses.• It’s because amino acids are not exchanged with the same
probability as can be conceived theoretically.• For example substitution of aspartic acids D by glutamic acid E
is frequently observed. And change from aspartic acid to tryptophan W is very rare.
• Why is that?1. Triplet-based genetic code
GAT (D) → GAA (E), GAT (D) → TGG (W)2. Both D and E have similar properties, but D and W differ considerably. D
is hydrophylic, W is hydrophobic, D → W mutation can greatly alter 3D structure and consequently function.
![Page 29: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/29.jpg)
Substitution matrices
small, polar
small, nonpolar
polar or acidic
basic
large, hydrophobic
aromatic
Zvelebil, Baum, Understanding bioinformatics.
Positive score – frequency of substitutions is greater than would have occurred by random chance.
Zero score – frequency is equal to that expected by chance.
Negative score – frequency is less than would have occurred by random chance.
![Page 30: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/30.jpg)
Sequence database search
BLAST
Google of sequence world
![Page 31: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/31.jpg)
Phylogenetic analysis
![Page 32: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/32.jpg)
Structural bioinformatics• the function of chemical moiety is given by its structure• while DNA structure is “given” (double-helix), RNA and
proteins can accommodate very different conformations (i.e. specific arrangements of atoms in 3D space)
• structural bioinformatics covers• analysis of the NA and proteins structure • prediction of structure from the sequence
![Page 33: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/33.jpg)
Protein structure prediction• secondary structure prediction
• the conformational state of each residue is predicted as H (helix), E (extended, β-sheet), C (coil)
• accuracy: 80%• tertiary structure prediction
• why?• many sequences are known, not that many 3D structures has been
solved• some proteins (e.g. transmembrane) are difficult to characterize
experimentally• many proteins have known function, but unknown structure (which is
however needed to understand their behavior in detail)• ab initio, threading, homology modelling
![Page 34: Daniel Svozil Laboratoř Chemie a informatiky daniel.svozil @gmail](https://reader035.fdocuments.net/reader035/viewer/2022062520/56815f79550346895dce8324/html5/thumbnails/34.jpg)
CASP• Critical Assessment of Structure Prediction• http://predictioncenter.org/• since 1994, every 2 years, CASP10 in preparation
• predict solved, but not publicly released structures
• competition of individual groups in 3D prediction:• human groups – answer in 14 days• software (automated prediction) – answer in 48 hours