Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids...

83
Protein Structure IST 444

Transcript of Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids...

Page 1: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Structure

IST 444

Page 2: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Chemistry Basics

• Proteins are polymers consisting of amino acids linked by peptide bonds

• Each amino acid consists of:– a central carbon atom– an amino group NH2

– a carboxyl group COOH– a side chain (R group)

• Differences in side chains distinguish different amino acids.

Page 3: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:
Page 4: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

O H O H O H O H O H O H O H

H3N+ CH C N CH C N CH C N CH C N CH C N CH C N CH C N CH COO-

Asp Arg Val Tyr Ile His Pro Phe D R V Y I H P F

Protein sequence: DRVYIHPF

repeating backbone structure

repeating backbone structure

CH2 CH2 CH CH2 H C CH3 CH2 CH2 CH2 CH2

COO- CH2 H3C CH3 CH2 HC CH CH2

CH2 CH3 HN N OH NH CH

C

NH2 N+H2

Page 5: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Hydrophobic stays inside, while hydrophilic stay close to water

Oppositely charged amino acids can form salt bridge.

Polar amino acids can participate hydrogen bonding

Side Chains Determine Structure

Page 6: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:
Page 7: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Steps in Obtaining Protein Structure

Target selection

Obtain, characterize protein

Determine, refine, model the structure

Deposit in repository

Page 8: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Domain, Fold, Motif

• A protein chain could have several domains– A domain is a discrete portion of a protein, can fold

independently, possess its own function

• The overall shape of a domain is called a fold. There are only a few thousand possible folds.

• Sequence motif: highly conserved protein subsequence

• Structure motif: highly conserved substructure

Page 9: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Data BankProtein structures, solved using experimental techniques

Unique structural folds

Different structural folds

Same structural folds

Page 10: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Structure Determination

• High-resolution structure determination– X-ray crystallography (~1Å)– Nuclear magnetic resonance (NMR) (~1-2.5Å)

• Low-resolution structure determination– Cryo-EM (electron-microscropy) ~10-15Å

Page 11: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

X-ray crystallography• most accurate

• An extremely pure protein sample is needed.

• The protein sample must form crystals that are relatively large without flaws. Generally the biggest problem.

• Many proteins aren’t amenable to crystallization at all (i.e., proteins that do their work inside of a cell membrane).

• ~$100K per structure

Page 12: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Nuclear Magnetic Resonance

• Fairly accurate

• No need for crystals

• limited to small, soluble proteins only.

Page 13: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Structure Visualization

• http://www.umass.edu/microbio/chime/top5.htm

• http://molvis.sdsc.edu/visres/• Rasmol• Chime• Protein Explorer• DeepView• JmolJava

Page 14: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Secondary Structure Prediction

• Rules developed from PDB data• Chou and Fasman (1974) developed an

algorithm based on the frequencies of amino acids found in a helices, b-sheets, and turns.

• Proline: occurs at turns, but not in a helices.• http://prowl.rockefeller.edu/aainfo/chou.htm• Modern algorithms: use multiple sequence

alignments and achieve higher success rate (about 70-75%)

Page 15: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Ramachandran Plot

a way to visualize dihedral angles φ (phi) against ψ (psi) of amino acid residues in protein structure.

Page 16: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Chou Fasman 1974• measured frequencies at which each amino acid appeared in

particular types of secondary sequences in a set of proteins of known structure

• assigns the amino acids three conformational parameters based on the frequency at which they were observed in alpha helices, beta sheets and beta turns – P(a) = propensity to form alpha helices – P(b) = propensity to form beta sheets – P(turn) = propensity to form beta turns

• also assigns 4 turn parameters based on frequency at which they were observed in the first, second, third or fourth position of a beta turn – f(i) = probability of being in position 1 – f(i+1) = probability of being in position 2 – f(i+2) = probability of being in position 3 – f(i+3) = probability of being in position 4

Page 17: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

.

A.A.P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)

Alanine 142 83 66 0.060 0.076 0.035 0.058

Arginine 98 93 95 0.070 0.106 0.099 0.085

Asparagine 67 89 156 0.161 0.083 0.191 0.091

Aspartic acid 101 54 146 0.147 0.110 0.179 0.081

Cysteine 70 119 119 0.149 0.050 0.117 0.128

Glutamic acid 151 37 74 0.056 0.060 0.077 0.064

Glutamine 111 110 98 0.074 0.098 0.037 0.098

Glycine 57 75 156 0.102 0.085 0.190 0.152

Histidine 100 87 95 0.140 0.047 0.093 0.054

Isoleucine 108 160 47 0.043 0.034 0.013 0.056

Leucine 121 130 59 0.061 0.025 0.036 0.070

Lysine 114 74 101 0.055 0.115 0.072 0.095

Methionine 145 105 60 0.068 0.082 0.014 0.055

Phenylalanine 113 138 60 0.059 0.041 0.065 0.065

Proline 57 55 152 0.102 0.301 0.034 0.068

Serine 77 75 143 0.120 0.139 0.125 0.106

Threonine 83 119 96 0.086 0.108 0.065 0.079

Tryptophan 108 137 96 0.077 0.013 0.064 0.167

Tyrosine 69 147 114 0.082 0.065 0.114 0.125

Valine 106 170 50 0.062 0.048 0.028 0.053

Page 18: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Chou Fasman isn’t Perfect

• Accuracy = 50-85%, depending on the protein

• http://npsa-pbil.ibcp.fr/NPSA/npsa_references.html

• Software and sites for protein predictions

Page 19: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

GOR (Garnier, Osguthorpe and Robson)

• Another commonly used algorithm, uses a window of 17 amino acids to predict secondary structure

• rationale: experiments show each amino acid has a significant effect on the conformation of amino acids up to 8 positions in front or behind it.

• a collection of 25 proteins of known structure was analyzed, and the frequency at which each amino acid was found in helix, sheet, turn or coil within the 17 position window was determined – this creates a 17 *20 scoring matrix that is used to

calculate the most likely conformation of each amino acid within the 17 a.a. window

• This window slides down the primary sequence, scoring the most likely conformation for each amino acid based on the neighboring amino acids.

• Accuracy is about 65%

Page 20: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal for a Coiled Region

• Gapped in multiple alignments• Small polar residues

–Ala

–Gly (v. small so flexible)

–Ser

–Thr • Prolines rarer in other kinds of secondary

structure

Page 21: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

How to Find Patterns Mathematically

Page 22: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Hidden Markov Models

• Hidden Markov Models (HMMs) are a more sophisticated form of profile analysis.

• Rather than build a table of amino acid frequencies at each position, they model the transition from one amino acid to the next.

• Pfam is built with HMMs.

Page 23: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Hidden Markov Models

Page 24: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Sample ProDom Output

Page 25: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Discovery of new Motifs

• All of the tools discussed so far rely on a database of existing domains/motifs

• How to discover new motifs– Start with a set of related proteins– Make a multiple alignment– Build a pattern or profile

Page 26: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Depicting Structure

Beta Sheet

Helix

LoopPDB ID: 12as

Page 27: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

PDB New Fold Growth

• Only a few thousand unique folds in nature

• 90% of new structures deposited to PDB in the past three years have similar structural folds

New fold

Old fold

Page 28: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

• Secondary structure is context-dependent

• Elements may be predicted to ID topology

• Generally only 50% of a structure is alpha-helix or beta-sheet.

• Beta-strands have necessarily longer range associations.

Page 29: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Secondary Structure• Protein secondary structure takes one of

three forms: Alpha helix Beta pleated sheet Turn

• 2ndary structure is predicted within a small window

• Many different algorithms, not highly accurate• Better predictions from a multiple alignment

Page 30: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signals for Alpha Helices

• Amphipathic helices interact with core and solvent– Characteristic

hydrophobicity profile

• Prolines disrupt the middles of helices

Page 31: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signals for beta strands

• Edge strands alternate hydrophobic/hydrophilic

• Center strands all hydrophobic

• Strands are extended so few residues per core span

Page 32: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Antiparallel Beta Sheet Parallel Beta Sheet

Peptide chains have a directionality conferred by their N-terminus and C-terminus. β strands can be said to be directional, indicated by an arrow pointing toward the C-terminus.

Adjacent β strands can form hydrogen bonds in antiparallel, parallel, or mixed arrangements.

Antiparallel β strands alternate directions so that the N-terminus of one strand is adjacent to the C-terminus of the next. This produces the strongest inter-strand stability because it allows the inter-strand hydrogen bonds between carbonyls and amines to be planar, which is their preferred orientation.

Page 33: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Beta Sheet (Antiparallel)

Page 34: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

R groups don’t form these secondary structures, but block formation of the secondary

structures . The bonds forming the structures are from the amino and carboxy groups of the amino acid residues.

Page 35: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal for a Beta Strand

Page 36: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Creating Beta Sheets

• Large aromatic residues (Tyr, Phe and Trp) and β-branched amino acids (Thr, Val, Ile) are favored to be found in β strands in the middle of β sheets. Interestingly, different types of residues (such as Pro) are likely to be found in the edge strands in β sheets

Page 37: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Classification

• Family: homologous, same ancestor, high sequence identity, similar structures

• Super Family: distant homologous, same ancestor, sequence identity is around 25%-30%, similar structures.

• Fold: only shapes are similar, no homologous relationship, low sequence identity.

• Protein classification databases: Pfam, SCOP, CATH, FSSP

Page 38: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Pfam

• http://www.sanger.ac.uk/Software/Pfam/

• Protein sequence classification database

• As of Pfam 24.0 (October 2009, 11912 families)

• Multiple sequence alignment for each family, then modeled by a HMM model

Page 39: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

SCOP: Structural Classification of Proteins

http://scop.mrc-lmb.cam.ac.uk/scop/Protein structure classification database, manually curated110800 Domains, 38221 PDB entries

Class # folds # superfamilies # families

All alpha proteins 284 507 871

All beta proteins 174 354 742

Alpha and beta proteins (a/b) 147 244 803

Alpha and beta proteins (a+b) 376 552 1055

Multi-domain proteins 66 66 89

Membrane and cell surface 58 110 123

Small proteins 90 129 219

Total 1195 1962 3902

Page 40: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

SCOP

• Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin.

• The SCOP database, created by manual inspection and automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.

• SCOP provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.

Page 41: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

The Problem• Protein functions determined

by 3D structures

• ~ 30,000 protein structures in PDB (Protein Data Bank)

• Experimental determination of protein structures time-consuming and expensive

• Many protein sequences available

sequence

proteinstructure

function

medicine

Page 42: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Structure Prediction

• In theory, a protein structure can be solved computationally

• A protein folds into a 3D structure to minimizes its free potential energy

• The problem can be formulated as a search problem

for minimum energy– the search space is enormous– the number of local minima increases exponentially

Computationally it is an exceedingly difficult problem

Page 43: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Who Cares?• Long history: more than 30 years• Listed as a “grand challenge” problem• IBM’s big blue• Competitions: CASP (1992-2006)

• Useful for– Drug design– Function annotation– Rational protein engineering– Target selection

Page 44: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Observations• Sequences determine structures

• Proteins fold into minimum energy state.

• Structures are more conserved than sequences. Two protein with 30% identity likely share the same fold.

Page 45: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

What determines structures?

• Hydrogen bonds: essential in stabilizing the basic secondary structures

• Hydrophobic effects: strongest determinants of protein structures

• Van der Waal Forces: stabilizing the hydrophobic cores

• Electrostatic forces: oppositely charged side chains form salt bridges

Page 46: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Structure Prediction• Stage 1: Backbone

Prediction– Ab initio folding– Homology

modeling– Protein threading

• Stage 2: Loop Modeling

• Stage 3: Side-Chain Packing

• Stage 4: Structure Refinement

The picture is adapted from http://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html

Page 47: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

State of The Art• Ab inito folding (simulation-based method)

1998 Duan and Kollman36 residues, 1000 ns, 256 processors, 2 monthsDo not find native structure

• Template-based (or knowledge-based) methods– Homology modeling: sequence-sequence alignment,

works if sequence identity > 25%

– Protein threading: sequence-structure alignment, can go beyond the 25% limit

Page 48: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Sample Structure Prediction

....,....1....,....2....,....3....,....4....,....5....,....6 AA |MMSGAPSATQPATAETQHIADQVRSQLEEKYNKKFPVFKAVSFKSQVVAGTNYFIKVHVG| PHD sec | HHHHHHHHHHHHHHHH EEEEEEEEEEEEE EEEEEEEE | Rel sec |999997899667599999999989997655877843368889999999233399999658| detail: prH sec |000000000221289999999989998762011111000000000000000000000000| prE sec |000000000000000000000000000010000023578889989888536699999720| prL sec |999898889777600000000010001126888865311110000000363300000278| subset: SUB sec |LLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLLLL...EEEEEEEEEEE....EEEEEELL| ACCESSIBILITY 3st: P_3 acc |bbebbeeeeeebbeebbebbeebeeebeeeeeee eebebbebebbbbbb bbbbeb bb| 10st: PHD acc |007006778670077007007706760777777737707007060000005000060500| Rel acc |103021343252044604644672424555547615444425212186671016926120| subset: SUB acc |.......e..e..eeb.ebbeeb.e.beeeeeee.eebeb.e....bbbb...bb.b...|

Page 49: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

“Super-secondary” Structure

• Common structural motifs– Membrane spanning (GCG= TransMem)

– Signal peptide (GCG= SPScan)

– Coiled coil (GCG= CoilScan)

– Helix-turn-helix (GCG = HTHScan)

Page 50: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Transmembrane Structures

Page 51: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptide

Page 52: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Coiled Coil

Page 53: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Helix Turn Helix

Page 54: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Fig. 9.23

Page 55: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Finding Information in Protein Sequences

Page 56: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

There Are Many Meaningful Protein Signals

• Predicting protein cleavage sites

• Predicting signal peptides

• Predicting transmembrane domains

Page 57: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptides

• Proteins have intrinsic signals that govern their transport and localization in the cell.

• Noble Prize to Gunter Blobel in 1999 for describing protein signaling.

• Proteins have to be transported either out of the cell, or to the different compartments - the organelles - within the cell.

Page 58: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptides

• Newly synthesized proteins have an intrinsic signal that is essential for governing them to and across the membrane of the endoplasmic reticulum, one of the cell’s organelles.

• How do large proteins traverse the tightly sealed, lipid-containing, membranes surrounding the organelles?

Page 59: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptides

• The signal consists of a peptide: a sequence of amino acids in a particular order that form an integral part of the protein.

• Specific amino acid sequences (topogenic signals) determine whether a protein will pass through a membrane into a particular organelle, become integrated into the membrane, or be exported out of the cell.

Page 60: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:
Page 61: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:
Page 62: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptides

• Software exists that can predict the signal peptide sequences.

• The SignalP World Wide Web server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms:– Gram-positive prokaryotes– Gram-negative prokaryotes– Eukaryotes.

Page 63: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Signal Peptides

• The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

• Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning.

Page 64: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Patterns in Unaligned Sequences

• Sometimes sequences may share just a small common region– common signal peptide– new transcription factors

• MEME: San Diego Supercomputing Facility

– http://www.sdsc.edu/MEME/meme/website/meme.html

• MEME uses Hidden Markov Models

Page 65: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Secondary Structure• CATH (Class, Architecture,Topology,

Homology) http://www.biochem.ucl.ac.uk/dbbrowser/cath/

• SCOP (structural classification of proteins) -hierarchical database of protein folds http://scop.mrc-lmb.cam.ac.uk/scop

• FSSP Fold classification using structure-structure alignment of proteins http://www2.ebi.ac.uk/fssp/fssp.html

• TOPS Cartoon representation of topology showing helices and strands

• http://tops.ebi.ac.uk/tops/

Page 66: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein Sequence Hierarchy

SUPERFAMILY

FAMILY

DOMAIN

FOLD or MOTIF

Active SITE

RESIDUE

Page 67: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein families

• Proteins can be divided into families by:– Sequence.– Structure.– Function.

• Secondary databases divide proteins into families.

Page 68: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Protein families

• Types of secondary databases:

• “Curated” databases: Expert judgment of each family (Prosite, prints, Pfam).

• “Automated” databases: Constructed automatically (Blocks, ProDom).

Page 69: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite• Characterization of protein families by conserved

motifs observed in a multiple sequence alignments of known homologues.

• Each family is defined by a single pattern.

• Motifs:

Page 70: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite

• Each entry includes: Pattern and sometimes also a profile.

• Pattern is a method for describing a conserved sequence (consensus, profile).

• Sample entry

Page 71: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite Structure

• Entries are divided into two files

– Pattern file: the pattern and all Swiss-Prot matches.

– Documentation file: Details of the characterized family, a description of the biological role of the chosen motif, references.

Page 72: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite

• Pattern are described using regular expressions.

• Example:W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

• Regular expressions retain only conserved or significant residue information

Page 73: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite

A A C T T G

A A G T C G

C A C T T C

1 2 3 4 5

A 0.66 1 0 0 .

T 0 0 0 1 .

C 0.33 0 0.66

0 .

G 0 0 0.33

0 .

A A C T T G

[AC-]A-]GC[-T-]TC[-]GC[

multiple alignment

consensus

pattern

profile

•Sensitivity:

consensus<pattern<profile

Page 74: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite Syntax The standard IUPAC one-letter codes.

`x' : any amino acid.

`[]' : residues allowed at the position.

`{}' : residues forbidden at the position.

`()' : repetition of a pattern element are indicated in parenthesis. X(n) or X(n,m) to indicate the number or range of repetition.

`-' : separates each pattern element.

`‹' : indicated a N-terminal restriction of the pattern.

`›' : indicated a C-terminal restriction of the pattern.

`.' : the period ends the pattern.

Page 75: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite Syntax - Examples

• [AC]-x-v-x(4)-{ED}.• [Ala or Cys]-any-val-any-any-any-any-any but

Glu or Asp

• <A-x-[ST](2)-x(0,1)-v • N-terminus-Ala-any-[Ser or Thr]-[Ser or Thr]-

(any or none)-val

Page 76: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Searching with Regular Expressions

• Ideally the pattern should only detect true positives.

• Creating a regular expression that performs well in database searches is a compromise between sensitivity and tolerance (false positives and false negatives).

• The fuzzier the pattern, the noisier its result, but the greater the chances of finding distant relatives

Page 77: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Prosite

Searching Prosite

Input: Protein sequence

Output: list of patterns

Input: A pattern

Output: list sequences

Page 78: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

BLOCKS

• Blocks are multiply aligned un-gapped segments corresponding to the most highly conserved regions of proteins

Page 79: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Blocks

• Blocks of 5-200 aa long alignments.

• A family is characterized by a group of blocks.

Page 80: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

BLOCKS Construction

• Creation of BLOCKS by automatically detecting the most highly conserved regions of each protein family

• Blocks incorporates all known families from the “curated” databases.

Page 81: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Blocks

Searching Blocks

Input: Protein sequence

Output: list of blocks

Input: A Block

Output: list sequences

Page 82: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

InterPro

• Integrated resource of Protein Families

• Unifies a set of secondary databases using same terminology.

• InterPro provides text and sequence based searches.

Page 83: Protein Structure IST 444. Protein Chemistry Basics Proteins are polymers consisting of amino acids linked by peptide bonds Each amino acid consists of:

Conclusions

• Secondary databases are useful for characterizing of protein sequences.

• Numerous databases describe protein families.

• “Curated” databases do not include all known families.

• Secondary databases are useful for testing new user-defined motifs.