Protein Structure Searches
description
Transcript of Protein Structure Searches
Doug RaifordLesson 18
04/21/23 1Protein Structure Searches
Given a protein conformation can we find other structurally similar proteins?
Might have a database of structures (like the PDB)
04/21/23 2Protein Structure Searches
Can do a simple RMSD to compare the two conformations
Know precisely which aa’s compare to which
04/21/23 3Protein Structure Searches
Must map aa’s from one to aa’s in the other
How might you do this?Sequence similarityMSA’s
04/21/23 4Protein Structure Searches
3D PSSM Sequence alignment
integrated with 3D alignment
Stored in profile (position specific similarity profile) Gens 1D profiles first (MSAs) Then uses a structural
alignment program (SAP) to augment profiles with structural similarity
04/21/23 5Protein Structure Searches
Aligning secondary structures
04/21/23 6Protein Structure Searches
What do you think of when you hear that you will need to align two things?
Dynamic programming
04/21/23 Protein Structure Searches 7
α α α T β β β
α
α
β
β
β
β
Three components AA similarity (substitution matrix) Local structure ▪ E.g. both aa’s members of alpha helix
Solvent exposure
04/21/23 8Protein Structure Searches
Are the associated AA’s similar, sequence wise (i.e. both glycines)?
Are they both in a similar local structure?
Are they both buried or both exposed to solvent?
SAP (structure alignment) allows a profile to be influenced by secondary structure
Useful to 3D PSSM in thatthreading decisions (whichaa’s match to a profile)
Homology based protein conformation enhancedby making better decisions on where to insert gaps/varying length loops
04/21/23 Protein Structure Searches 9
PFAM Have Markov Models for protein
families Sequences that match models have
high probability of matching conformation
Even though not comparing structures (query to target) are matching a sequence to its most
probable structure
04/21/23 Protein Structure Searches 10
HMMRHMMR
Can’t really alignHow else might it work?
04/21/23 11Protein Structure Searches
How might two distance matrices look? All pair wise distances from each
aa to all other aa’s If identical proteins the matrices
would be almost identical
Low distance region in matrix if parallel
Low distance region if hair pin (anti-parallel)04/21/23 12Protein Structure Searches
Find optimum set of similar sub-structures Even if in different 1D
locationsFind amino acid
equivalenceOnce have equivalence can
easily compare structure similarity E.g. with RMSD
04/21/23 13Protein Structure Searches
Break matrix into a bunch of overlapping sub-matrices
Do an all pair wise comparison
Sub-matrices are merged that naturally extend
Must find pairings of sub-matrices that yield best overall score
04/21/23 14Protein Structure Searches
1 2 3 4 5 6 7
1
2
3
4
5
6
7
1 2 3 4 5 6 7
1
2
3
4
5
6
7
Monte Carlo approach Randomly generate
pairings Calculate overall similarity
Multiple solutions in parallel
Slowly improve each by randomly altering pairings (like a random search)
Have some probability of keeping a solution that is worse than previous
04/21/23 15Protein Structure Searches
1 2 3 4 5 6 7
1
2
3
4
5
6
7
1 2 3 4 5 6 7
1
2
3
4
5
6
7
Can determine similarity
How?
04/21/23 16Protein Structure Searches
Must perturb XYZ (translation), pitch, and yaw (rotation) of one of the proteins minimizing RMSD
Like linear regression
Can’t do until know which aa’s are associated
04/21/23 17Protein Structure Searches
Some numeric methods start by fixing between 2 and 4 amino acids
Some short cuts Center of gravity is the
average of all vectors Translate ▪ ave(p1) – ave(p2)
Singular value decomposition to rotate (Like Eivenvectors)
04/21/23 18Protein Structure Searches
04/21/23 19Protein Structure Searches
Requires double dynamic programming
If nxm matrix then n times m different matrices generated pinning return path to each aa pair
Used to generate a position specific scoring which is then used in aa similarity scoring
Reduces the constraint that two particular aa’s are equivalent
04/21/23 20Protein Structure Searches
α α α T
α
α
β
α α α T
α
α
β
α α α T
α
α
β
α α α T
α
α
β
α α α T
α
α
β