GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
-
Upload
biosynthesis -
Category
Documents
-
view
228 -
download
0
Transcript of GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 1/14
540 SECONDARY STRUCTURE CONSIDERATIONS [321
[ 3 2 ] G O R M e t h o d f o r P r e d i c t i n g P r o t e i n S e c o n d a r y
S t r u c t u r e f r o m A m i n o A c i d S e q u e n c eBy JEAN GARNIER, JEAN-FRANqOIS GIBRAT, a n d BARRY ROBSON
I n t r o d u c t i o n
H o w t o e x t r a c t p r o p e r t i e s f r o m t h e a m i n o a c i d s e q u e n c e o f a p r o t e i n
f o r u n d e r s t a n d i n g i ts f u n c ti o n i s o n e o f t h e m o s t t i m e l y a n d c o m p e t i ti v e
a r e a s o f t h e b i o l o g i c a l s c ie n c e s . I t is r e l a t e d t o a f u n d a m e n t a l a s p e c t o f
b i o l o g y . A l i n e ar a n d o r d e r e d s e q u e n c e o f a m i n o a c id s , c o d e d a n d c o n s e r v e d
b y a l in e a r a n d o r d e r e d s e q u e n c e o f n u c l e o t i d e b a s e s , c o d e s f o r a ll t h a t i s
chara c te r i s t i c o f l iv ing o rgan i sm s : spec i f ic and o rgan ized in te rac t ion s in
s p a c e a n d t i m e b e t w e e n p r o t e i n s , l ip i d s, n u c l e i c a ci d s, a n d t h e c e l l m e t a b o -
l i t e s . T h e s e c h a r a c t e r i s t i c s d e p e n d o n h o w a p r o t e i n c a n f o l d i n a u n i q u e
a c t i v e t h r e e - d i m e n s i o n a l s t r u c t u r e . T h i s p r o c e s s is s p o n t a n e o u s u n d e r g i v e n
e n v i r o n m e n t a l c o n d i t i o n s , e v e n t h o u g h t h e l i v i n g c e l l c a n a d d e f f i c i e n c y
a n d c o n t r o l b y u s e o f s o m e p r o t e i n c o m p l e x e s , c a l le d c h a p e r o n e s , t o c a t a ly z e
th i s p roces s .
M u c h e f f o r t h a s b e e n d e v o t e d t o t h e c a l c u l a t io n o f t h e s p a t ia l s t ru c t u r e
o f a p o l y p e p t i d e c h a i n f r o m i ts a m i n o a c id s e q u e n c e a l o n e , w it h o n l y li m i t ed
b u t n e v e r t h e l e s s e n c o u r a g i n g s u c c e s s I w h e n a p o l y p e p t i d e i s l o n g e r t h a n
1 0 - 2 0 a m i n o a c i d s . H o w e v e r , a t t e m p t s t o r e d u c e t h e p r o b l e m t o s i m p l e r
f e a t u r e s o f th e p r o t e i n f o l d s u c h a s a h e l ix , /3 s t r a n d s , a n d a p e r i o d i c o r c o il
s t r u c t u r e h a v e y i e l d e d i n t e r e s ti n g r e s u l ts ( s e e R e f s . 2 a n d 3 ). T h e s e r e s u l ts
h a v e b e e n a n a i d f o r d e s i gn i n g n e w p r o t e in s , p r e d i c t in g t h e e f f e c t o f p o i n t
m uta t ion s , ide n t i fy ing the p ro te in c las s , fo r in s tance , a l l - a o r a ll -/3 p ro te in s ,
p r e d i c t i n g e p i t o p e s , e t c . I t is h o p e d t h a t t h is i n f o r m a t i o n w i ll b e i n c r e a s i n g ly
u s e f u l t o m o l e c u l a r b i o lo g i s ts a n d p r o t e i n m o d e l e r s . U s u a l l y t h e c o m p u t i n g
t im e is s h o rt , a n d m a n y p r o g r a m s o f s e c o n d a r y s t r u c t u r e p r e d i c t io n s a r e
ava i lab le on - l ine to the b io log i s t .
T h e G O R m e t h o d is o n e o f t h e m o s t p o p u l a r o f th e s e c o n d a r y s tr u c tu r e
p r e d i c t i o n s c h e m e s . T h i s m e t h o d i s t h e o r e t i c a l l y w e l l f o u n d e d i n a s e r i e s
o f e a r l i e r p a p e r s , a n d i t h a s b e e n t h e r e a l f ir st p r e d i c t i o n o f s e c o n d a r y
s t r u c t u r e i m p l e m e n t e d a s a c o m p u t e r p r o g r a m . T h e t h r e e l e t t e r s s t a n d f o r
1 Protein Structure Prediction Issue. Proteins 23, 3 (1995).
2 j. Gar nie r and J. M. Levin, CABIOS 7, 133 (1991).
3 B. Ros t and C. Sander, Trends Biochem. Sci. 18, 120 (1993).
Copyright © 1996 by Academic Press, Inc.METHODS IN ENZYMOLOGY, VOL. 266 All rights of reproduction in any form reserved.
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 2/14
[ 3 2 ] T H E G O R M E TH O D 5 4 1
t h e f ir st l e t te r o f t h e n a m e s o f t h e a u t h o r s o f t h e o r i g i n a l p u b l i c a t i o n . 4 T h i s
m e t h o d r e m a i n s r e m a r k a b l y p o p u l a r , 5 b u t u s e r s h a v e o v e r l o o k e d t h e f a ct
t h a t i m p r o v e d v e r s i o n s o f t h e m e t h o d h a v e s i nc e b e e n p u b l is h e d , a n d a
f u l l d e s c r i p t i o n o f t h e m c a n b e f o u n d i n t h e b o o k e d i t e d b y F a s m a n . 6 T h ea d d i t i o n o f h o m o l o g o u s s e q u e n c e i n f o r m a t i o n t h r o u g h m u l t i p le a l ig n m e n t s
h a s g i v e n a s i g n if ic a n t b o o s t t o t h e a c c u r a c y o f s e c o n d a r y s t r u c t u r e p r e d i c -
t io n s . 7-9 I n t h is c h a p t e r , a f t e r p r e s e n t i n g t h e m a j o r p r i n c i p l e s u s e d b y t h e
G O R m e t h o d , w e g i ve s o m e r es u lt s o b t a i n e d w i th a n u p d a t e d v e r s i o n o f
t h i s m e t h o d .
P r i n c i p l e s o f M e t h o d
I n a se r i e s o f a r t ic l e s , R o b s o n et aL 10,11 u s e d t h e f o r m a l i s m o f i n f o r m a t i o nt h e o r y a n d B a y s i a n s t a ti st ic s t o e s t ab l is h t h e c o d e r e l a t i n g t h e a m i n o a c i d
s e q u e n c e a n d t h e s e c o n d a r y s t r u c t u r e s o f a p r o t e i n . T h i s le d l a te r t o t h e
d e v e l o p m e n t o f t h e G O R m e t h o d . 4
I n f o r m a t i o n t h e o r y w a s d e v e l o p e d i n t h e 1 9 5 0 - 1 9 6 0 s x2,13 a n d t h e G O R
m e t h o d m a d e u s e o f a n i n f o r m a t i o n f u n c t io n d e s c r i b e d b y F a n o , TM I ( S ; R ) ,
w h i c h i s d e f i n e d a s
I (S ; R ) = I o g [ P ( S ] R ) / P ( S ) ] ( 1 )
O r i g i n a l ly t hi s f o r m u l a t i o n w a s c o n c e r n e d m a i n l y w i t h e l e c t ro n i c t ra n s m i s -s i on o f in f o r m a t i o n . I n t h e p r e s e n t a p p l i c a t i o n , S is o n e o f t h e t h r e e c o n f o r -
m a t i o n s , R is o n e o f t h e 2 0 a m i n o a c i d r e s i d u e s , P ( S ] R ) is t h e c o n d i t i o n a l
p r o b a b i l i t y f o r o b s e r v i n g a c o n f o r m a t i o n S w h e n a r e s i d u e R i s p r e s e n t ,
a n d P ( S ) is t h e p r o b a b i l i t y o f o b s e r v i n g S . A c c o r d i n g t o t h e d e f in i t i o n o f
c o n d i t i o n a l p r o b a b i l i t i e s , P ( S I R ) = P (S , R ) / P ( R ) w h e r e P ( S , R ) is t h e j o i n t
p r o b a b i l i ty o f o b s e r v i n g t h e e v e n t s S a n d R a n d P ( R ) is t h e p r o b a b i l i t y o f
o b s e r v i n g a r e s i d u e R . I t is e a s y to h a v e a n e s t i m a t i o n o f I ( S ; R ) f r o m
4 j. G arnier, D . Osguthorpe, and B. Robson, .L Mo l. Biol. 120, 97 (1978).5 L. B. M . Ellis and R. P. M ilius, CABIOS 10, 341 (1994).
6 j . G am ier and B. Robson, in "Prediction of Protein Structure and the Principles of Protein
Conformation" (G. D. Fasman, ed.), Chap. 10, p. 417. Plenum Press, New York, 1989.
7 j. M. L evin, S. Pascarella, P. Argos, and J. G arnier, Protein E ng. 6, 849 (1993).
8 B. R ost and C. Sander, J. MoL Biol. 232 , 584 (1993).
9 V. di Francesco, P. J. M unson, and J. Ga rnier, 28th Annual HawaiiInternation al Conferenceon System Sciences (L. Hunte r, ed.), p. 285. IE E E Com puter Society Press, Los Alamos, 1995.
10 B. R obs on a nd R. H. Pain, J. Mo l. Biol. 58, 237 (1971).11 B. Ro bso n, Biochem. J. 141, 853 (1974).
12 C. E. Shannon and W. Weaver, " Th e M athematical T heo ry of Comm unication." Univ. ofIllinois Press, Urbana, Illinois, 1949.
13 L. Brillouin, "Science and Information Th eo ry." Aca demic Press, New York, 1956.14 R. Fano, "Transmission of Infor m ation ." Wiley, New Y ork, 1961.
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 3/14
54 2 SECONDARY STRUCTURE CONSIDERATIONS [321
a database of known sequences and corresponding observed secondary
structures since P ( S , R ) = f s , R /N , P ( R ) = f R / N and P ( S ) = f s / N with N
being the total number of amino acids in the database, fS, R the number of
residues R observed in the conformation S in the same database, fR thetotal number of residues R, and f s the total number of residues observed
in the conformation S in the same database. Then
I (S ; R ) = l o g [ ( f S , R / f R ) / ( f s / N ) ] (2)
This quantity can be obtained easily from the database provided it is
large enough.
A more general treatment requires corrections for levels of data (see
Robson aa and below). Robson ~1 introduced the information difference,
I ( A S ; R ) = I ( S ; R ) - I ( n - S ; R ) = I o g ( fs , R /f ,. S ,R ) + l o g ( f , - s / f s ) (3)
where n - S stands for the conformations other than S (non-S); for instance,
if S is ot helix (H), n - S will be /3 strand (E) and coil (C) for a three-
state prediction. It gives the extra information for S on the two others. It
represents a kind of normalization where the total number of amino acids,
N, and residues, R, in the database have disappeared from the equation.
In effect the positive hypothesis (S; R) and the complementary negative
hypothesis ( n - S ; R ) are treated in concert. This quantity also corresponds toone-residue information or single-residue information or self-information.
Calculated for the three conformations, the highest value of Eq. (3) for
one of the conformations S will be the predicted conformation and will be
the propensity for that residue to be in that conformation, usually expressed
in centinat units when natural logarithms are used. This underlines one
of the differences with the Chou-Fasman propensities which correspond
approximately to the mantissa of the log of Eq. (2).
Equations (1) to (3) can be extended to a local sequence along the
polypeptide chain of n consecutive residues R:
I ( A S ] ; R 1 . . . . , R n ) = log[P(Sj, R1, .. ., R , , ) / P ( n - S ] , R1 .. .. , R,)]
+ l o g [ P ( n - S ) / P ( S ) ](4)
where P ( S j , R 1 . . . . , R , ) is the joint probability of the conformation S at
position j in the sequence and the local sequence R1, .. ., R,. One may
remark that
P(Sj, R1 . . . . . R . ) + P ( n - S j , R , . . . . . R.) = 1 (5 )
and that
P ( S j , R 1 . . . . . R , , ) / P ( n - S j , R 1 . . . . . R , ,) = P ( S ) / P ( n - S ) e ' ( A s ? g l ' ' g . ) (6)
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 4/14
[321 THE G O R METHOD 543
I n p r e d i c ti n g a r e s id u e t o b e i n o n e c o n f o r m a t i o n , o n e c a n p r e d i c t e i t h e r
t h e o n e h a v i n g t h e h i g h e s t v a lu e o f t h e i n f o r m a t i o n w i t h E q . ( 4) o r th e
h i g h e s t p r o b a b i l i t y v a l u e t a k e n f r o m E q s . ( 5) a n d ( 6) . P r o b a b i l i t y v a l u e s
h a v e b e e n u s e d f o r t h e p r e d i c t i o n o f R a m a c h a n d r a n z o n e s. 15 T h e y a r em o r e p r e c i s e t h a n c o n f i d e n c e s c al es d e v e l o p e d f o r o t h e r m e t h o d s 8,16 a n d
u n d e r l i n e t h e f a c t t h a t t h e d e c i s i o n t o p r e d i c t t h e c o n f o r m a t i o n o f th e
h i g h e s t p r o b a b i l i t y le a v e s t h e p o s s ib i li ty t h a t t h e o t h e r c o n f o r m a t i o n s h a v e
a d e f i n i t e p r o b a b i l i t y t o o c c u r w h i c h c a n b e c l o s e t o t h e h i g h e s t , a n d t h u s
s h o u l d n o t n e c e s s a ri l y b e r u l e d o u t .
O n e f a ce s a f u n d a m e n t a l p r o b l e m w h e n c a lc u la t i n g i n f o r m a t i o n v al ue s.
O n e n e e d s t o e s t im a t e t e r m s s u c h a s P ( S j , R 1 . . . . . R n ) i n v o l v i n g N r e s i d u e s .
I t is i m p o s s i b l e t o e v a l u a t e s u c h t e r m s d i r e c tl y f r o m t h e d a t a b a s e , s o o n e
m u s t r e s o r t t o v a ri o u s a p p r o x i m a t i o n s . T h e d i f f e r e n t v e r s i o n s o f t h e G O Rm e t h o d c o r r e s p o n d t o v a ri o u s t y p e s o f a p p r o x i m a t i o n s w e h a v e t r ie d i n a n
e f f o r t t o i m p r o v e t h e a c c u r a c y o f t h e m e t h o d .
A p p r o x i m a t i o n s I n vo l ve d i n G O R M e t h o d
T h e f irs t G O R v e r si o n , 4 n a m e d G O R I , a d d e d t o t h e s i ng l e- re s id u e
i n f o r m a t i o n t h e s o - c a ll e d d i r e c ti o n a l i n f o r m a t i o n o f e ig h t r es i d u e s o n e a c h
s id e o f t h e r e s i d u e t o b e p r e d i c t e d i n th e s e q u e n c e . T h i s l i m i t o f e ig h t w a s
n o t a r b i t r a r y b u t w a s b a s e d o n s t u d i es o f i n f o r m a t i o n c o n t e n t a t i n c re a s in gs e p a r a t i o n s . T o o b t a i n t h e i n f o r m a t i o n m e a s u r e , o n e s t a r t s b y c a l c u l a t i n g
f r o m t h e d a t a b a s e t h e f r e q u e n c y o f e a c h o f th e 2 0 a m i n o a ci ds r e si d u e s a t
d i f f e r e n t p o s i ti o n s , u p t o e i g h t r e s i d u e s o n t h e N - t e r m i n a l a n d C - t e r m i n a l
s id e , w h e n t h e c e n t r a l r e s i d u e i s o b s e r v e d i n a g i v e n c o n f o r m a t i o n b u t
i n d e p e n d e n t l y o f th e n a t u r e o f t h a t r e s i d u e . I n f a ct , i n th i s a p p r o x i m a t i o n
o n e a s s u m e s t h a t t h e r e i s n o c o r r e l a t i o n b e t w e e n r e s id u e s o c c u r r i n g a t
d i f f e r e n t p o s it i o n s i n t h e w i n d o w o f 1 7 r e s i d u e s s o d e f in e d . T h e n
I ( AS j ; R , . . . . . R n) ~ I ( AS j ; R j ) + Xm.m ~0 I ( AS j ; R j+ m ) ( 7 )
w h e r e j s t a n d s f o r a n y p o s i t i o n j i n t h e a m i n o a c id s e q u e n c e o f w h i c h t h e
c o n f o r m a t i o n o u g h t t o b e p r e d i c te d a n d m i s b e t w e e n - 8 ( N - t e rm i n a l s id e )
a n d + 8 ( C - t e r m i n a l s id e ). T h e s a m e v e rs i o n , G O R I I, w a s u p d a t e d w i t h a
n e w d a t a b a s e i n 1 9 89 . 6 B o t h v e r s i o n s p r e d i c t e d f o u r c o n f o r m a t i o n s , H ,
E , C , a n d T , w i th T f o r t u rn s . S u b s e q u e n t v e r s i o n s p r e d i c t e d o n l y t h r e e
c o n f o r m a t i o n s H , E , a n d C , a l t h o u g h t h e m e t h o d h a s n o i n t ri n si c l i m i t a ti o n
in t h e n u m b e r a n d n a t u r e o f c o n f o r m a t i o n s . T h e d i f f e re n t t u r n t y p e s a n d
t h e r e l a t i v e d i ff ic u l ty o f d i s ti n g u is h i n g b e t w e e n t h e m u s i ng t h e D S S P p r o -
g r a m 17 l e d u s t o l im i t t h e p r e d i c t i o n f o r t h e t i m e b e i n g t o t h r e e c o n f o r m a -
15 j. F. G ibrat, B. R obso n, and J. Ga rnier, Biochemistry30, 1578 (1991).16 V. B iou , J. F. Gib rat, J. M . Lev in, B. Rob son, and J. Ga rnier, Protein Eng. 2, 185 (1988).17 W. Ka bsch and C. Sand er, Biopolymers 22, 2577 (1983).
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 5/14
5 4 4 S E C ONDAR Y S T R UC T UR E C ONS IDE R AT IONS I32]
t io n s s o t h a t h e l i c e s a n d s t r an d s r e p r e s e n t t h e m a j o r a r c h i t e c t u r a l s tr u c t u r e s
o f th e c o n s e r v e d c o r e o f h o m o l o g o u s p r ot ei n s•
T h e n e x t l e v el o f a p p r o x i m a t i o n , i n t r o d u c e d in t h e G O R I I I v e rs i o n , 18
c o n s i d e r e d t h e c o r r e l a t i o n b e t w e e n t h e t y p e o f r e si d u es i n t h e w i n d o w a n dt h e t y p e o f t h e r e s i d u e t o b e p r e d i c te d • T h i s v e r s i o n u s e s t h e s o - c a l le d
p a i r i n f o r m a t i o n :
I ( A S j ; R 1 . . . . , R n ) ~ I ( A S j ; R j ) + ~-~m,rn#O ( A S j ; n j + m l R j ) (8 )
T h e s e c o n d t e r m o n t h e f i g h t - h a n d s i d e o f E q . ( 8) is a c o n d i t i o n a l in f o r m a -
t i o n . 15 I t i n v o l v e s t h e c a l c u l a t i o n f r o m t h e d a t a b a s e o f p a i r f r e q u e n c i e s o f
r e s i d u e s R i a n d R j + m w i t h R j h a v i n g t h e o b s e r v e d c o n f o r m a t i o n s S j a n d n - S j
a t p o s i t i o n j , w i t h a f r e q u e n c y o f fs j R +m R a n d fn-s R +m R , r e s p e c t i v e l y , b u t• . . ' ] , ] . J' J ' .1 .
t h e c o n f o r m a t i o n o f r e si d u e R j+,~ is n o t t a k e n i n t o c o n s i d e r a t i o n . W e h a v e
I ( A S j ; R j + m I R j ) = Iog( fs ,,g ,+m ,R,/ f, .s ,,R,+= ,R ) + Io g ( f , - s j , R / fS , ,R ) (9 )
W h e n t h i s a p p r o x i m a t i o n w a s u s e d , t h e d a t a b a s e ( a t t h a t ti m e c o n t a i n i n g
r o u g h l y 1 2 , 0 0 0 r e s i d u e s ) w a s b a r e l y l a r g e e n o u g h t o a l l o w a n e a s y c a l c u l a -
t i o n o f t e r m s f o r E q . ( 9) . E a c h o f t h e s e t e r m s i n v o l v e s t w o a m i n o a c id s
a n d a s e c o n d a r y s t r u c t u r e c o n f o r m a t i o n , s o t h e r e a r e 1 2 0 0 e n t r i e s i n t h e
t ab l e. T h e a v e r a g e n u m b e r o f o b s e rv a t i o n s p e r e n t r y w a s t h e r e f o r e 1 0.
H o w e v e r , t h e a m i n o a c i ds a r e n o t a ll e q u i p r o b a b l e ; s o m e l i k e T r p o r M e t
a r e r a r e r , a n d t h e n u m b e r o f o b s e r v a t i o n s f o r e n t r i e s in v o l v i n g s u c h a m i n oa c id s w e r e l e s s t h a n 10. A s a c o n s e q u e n c e , t h e p r o b a b i l i ti e s e s t i m a t e d u s in g
t h e r a t i o o f s u c h s p a rs e f r e q u e n c i e s w e r e u n r e l i a b l e a n d w e r e r e s p o n s i b l e
f o r a d e c r e a s e o f t h e p r e d i c t i o n a c c u ra c y . T o c i r c u m v e n t t hi s p r o b l e m , w e
i n t r o d u c e d s o -c a ll ed d u m m y f r eq u e n c i e s . R e a d e r s i n t e r e s te d i n t h e p r e c is e
d e f i n i ti o n o f t h e s e d u m m y f r e q u e n c i e s a r e r e f e r r e d t o R e f s . 1 5 a n d 17.
D u m m y f r e q u e n c i e s a m o u n t t o th e f o l lo w i n g c o n s id e r a ti o n s . L e t u s
a s s u m e t h a t w e o b s e r v e in th e d a t a b a s e t w o o c c u r r e n c e s o f a M e t a t p o s i ti o n
j - 1 w h e n t h e r e s i d u e a t j i s a T r p i n h e l ic a l c o n f o r m a t i o n . W e c a n c a lc u l a te
e as il y t h e e x p e c t e d n u m b e r w e w o u l d o b s e r v e i f t h e t w o e v e n ts w e r eu n c o r r e l a t e d , n a m e l y , th i s is t h e f r e q u e n c y o f M e t a t p o s i t io n j - 1
m u l t i p l i e d b y t h e f r e q u e n c y o f T r p a t p o s i ti o n j h a v i n g a h e li c a l c o n f o r m a -
t i o n d i v i d e d b y t h e t o t a l n u m b e r o f r e si d u e s. T h i s n u m b e r i s r e l a ti v e l y
r e l i a b l e s i n ce i t is c a l c u l a t e d u s i n g f r e q u e n c i e s t h a t i n v o l v e o n l y o n e r e s i d u e
( w h i c h t h u s a r e g r e a t e r t h a n f r e q u e n c i e s f o r p a i r s b y a f a c t o r o f 2 0, o n
a v e r ag e ) . N o w w e c a n a sk t h e q u e s t io n , H o w m u c h d o w e t r u st t h e f r e q u e n -
c ie s f o r p a ir s w e o b s e r v e d i n t h e d a t a b a s e ? I f w e t r u s t t h e m 1 0 0% , w e j u s t
u s e t h e s e f r e q u e n c i e s i n t h e c a l c u l a ti o n s o f i n f o r m a t i o n v a l u e s. I f w e m i s-
t r u s t t h e m 1 0 0 % , w e c a n a l w a y s u s e t h e n u m b e r s c a l c u l a t e d a s s u m i n g t h a t
18 . F. G ibrat, J. Garn ier, and B. Rob son, J. M o l . B i o l. 198 , 425 (1987).
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 6/14
[32] THE GOR METHOD 545
the events are independent, but then we are back to the approximation of
Eq. (7). In fact, empirically, to improve the accuracy prediction we need
to consider an intermediary stage when we bias the observed frequencies
toward the calculated (uncorrelated) ones by a given amount.The database available (see Table I) now contains about 63,000 residues,
so the average number of observations for pairs per entry is 50. This is
large enough for us to compute terms involving pairs of residues without
the need for introducing dummy frequencies. Note, however, that this new
database does not allow the calculation of terms involving triplets of amino
acids. We thus decided to include more pairs in our description of the
window of 17 residues. Instead of considering only the 16 pairs Rj+m, Rj ,with m varying from - 8 to +8 and m # 0, that is, the pair formed by each
residue in the window and the central one, we consider all the possiblepairs in the window [there are (17 × 16)/2 such pairs]. We thus have
used for the results presented below the following approximation (GOR
IV version):
P(Sj , LocS eq) 2 +8 P(S j , Rj+m,Rj+~)log p - ~ ~ ) = ]-~ m~8, log P(n -Sj , Rj+m,Rj+.)
( l O )
15 ~-~ P (Sj, Rj+m)log
17 m=_8' P ( n - S j , Rj+m)
where LocSeq stands for the local sequence R1 . . . . . Rn of 17 residues
around the residue to be predicted. The values of P(Sj , LocSeq) from Eq.
(10) for the three conformations are then used directly for the predictions
instead of using probabilities calculated from Eqs. (5) and (6) with the
information value calculated from Eq. (9).
Database and Results
We used a database of 267 protein structures having a resolution betterthan 2.5 .A with an R factor less than 25% and whose length is greater than
50 residues (see Table I). There is no pair of proteins with an identity
above 30%. The prediction is carried out using a jackknife: the protein to
be predicted is removed from the database, the parameters are estimated
using the 266 remaining proteins, and the prediction is done using these
parameters. As mentioned above, this database is large enough that we do
not need to use dummy frequencies anymore. Moreover, we do not use
decision constants to adjust the predicted number of secondary structures
to the observed number in the database. In fact, there is no optimizationof any sort; we just estimate the probabilities according to Eq. (10) from
the frequencies observed in the database.
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 7/14
T A B L E IDATABA SE PROTEINS a
l a a j .x l a a k . x l a a p . a l a b a . x l a b k . x l a b m . a l a d d . x
l a d s . x l a l k . a l a o z . a l a p a . x l a p m . e l a r b . x l a t r . x
l a v h . a l a y h . x l b a b . a lb b h . a l b b p . a l b e t. x l b g e . a
l b l l. e l b m d . a l b o v . a l b p b . x l b r s . d l b t c . x l c 2 r .al c a j . x l c a u . a l c a u . b l c d e . x l c d t . a l c e w . i lc g t . x
l c h m . a l c m b . a l c o b . a lc o l . a l c p c . a l c p c . b l c p t .x
l c r l . x l c s e . i l c t f . x l c t m . x l c u s . x l d d t . x l d h r . x
l d o g . x ld s b . a l e a f . x l e c o . x l e d e . x l e n d . x l e p a . a
l f b a . a l f d d . x lf h a . x l f i a .a l f k b . x lf n a . x l f n r . x
l f x i . a l g a l . x l g d l . o l g d h . a l g k y . x l g l t .x l g m f . a
l g o f .x l g o x .x l g p l . a l g p b . x l g p r . x l g s r . a l h b q . x
l h d x . a l h i v . a l h l b . x l h l e . a l h m y . x l h o e . x l h p l . a
l h r h . a l h s l . a l h u w . x l i f c .x l i p d . x l i s u . a l i t h . a
1129.x l l e4 .x l l en . a l l ga . a l l i s . x l l l a . x l im b .3
l l ts . a l l ts . d l m d c . x l m g n . x l m i n . a l m i n . b l m j c . xl m p p . x l m u p . x l n a r .x l n b a . a l n d k . x l n o a . x l n s b . a
l n x b . x l o f v . x l o l b . a l o m f . x lo m p . x l o n c . x l o s a . x
l p d a . x l p f k . a l p g b . x l p g d . x l p h h . x l p h p . x l p ii . x
l p l f. a l p o c . x l p o h . x l p o x . a l p p a . x l p p f . e l p p f . i
l p p n . x l p r c . c l p r c . h lp r c. 1 l p r c . m l p t s . a l p y a . a
l p y a . b l p y d . a l r c b . x l r e c . x l r i b . a l r n d . x l r o p . a
l r v e . a l s 0 1 .x l s a c . a l s b p . x l s e s . a l s g t . x l s h a . a
l s h f . a l s i m . x l s l t . b l s n c . x l s p a . x l s t f .i l t b e . a
l t c a . x l t i e .x l t m l . x l t n d . a l t p l . a l t r b . x l t r k . a
l t r o . a l t t b . a l u t g .x l v a a . a l v a a . b l v m o . a l w h t . a
l w h t . b l w s y . a l w s y . b l y h b . x l z a a . c 2 5 6 b . a 2 a a i .b
2aza . a 2bo p . a 2ccy . a 2cdv .x 2chs . a 2cm d .x 2cp4 .x
2cpl .x 2 cro .x 2ctc .x 2cts .x 2cyp .x 2dn j .a 2er7 .e
2hbg .x 2h hm .a 2h ip . a 2hp d . a 2 ih l . x 21h2 .x 21iv .x
2 m h r . x 2 m n r . x 2 m s b . a 2 m t a . c 2 m t a . h 2 ra ta .1 2 p f l . x
2p i a .x 2po l . a 2 po r .x 2 reb .x 2 rn2 .x 2 rs l . a 2 sa r . a
2 sas .x 2 scp . a 2 sga .x 2 sn3 .x 2 spc . a 2 tg i. x 2 tm d .a
2 t p r . a 2 t s c .a 3 a a h . a 3 a a h . b 3 a d k .x 3 b5 c .x 3 c d 4 .x
3chy .x 3c l a .x 3cox .x 3d f r . x 3eca . a 3g ap . a 3gbp .x
3 ink . c 3 rub.1 3 rub . s 3 sdh . a 3 tg l .x 451c .x 4b lm .a
4enl .x 4fgf .x 4gcr .x 4 t s l . a 4x is .x 5fb p .a 5p21.x5 t im.a 6 fab .h 6 fab.1 6 t aa .x 8abp .x 8acn .x 8a t c . a
8a t c .b 8ca t . a 8 i l b .x 8 rxn . a 8 t l n . e 91dt .a 9 rn t . x
9w ga .a
a T h e d a t a b a s e w a s p r e p a r e d b y J . M . L e v i n a n d c h e c k e d f o r h o m o l o g o u s s e q u e n c e s w i t h
t h e h e l p o f V . D i F r a n c e s c o . T h i s d a t a b a s e h a s b e e n m o d i fi e d t o r e s t o r e t h e t o t a l l e n g t h
o f t h e s e q u e n c e s a s d e fi n e d in t h e S E Q R E S f ie ld o f t h e P r o t e i n D a t a B a n k ( P D B ) f il e
( t h e D S S P p r o g r a m o m i t s r e s id u e s w h o s e c o o r d i n a t e s a r e m i s si n g i n t h e P D B file , a n d
t h u s i f t h i s o c c u r s i n t h e m i d d l e o f t h e p o l y p e p t i d e c h a i n i t i s s p l i t i n t o t w o o r m o r e
c h a i ns ) . R e s id u e s h a v i n g n o c o o r d i n a t e s w e r e a s s i g n e d t h e c o n f o r m a t i o n X a n d w e r e
n o t t a k e n i n t o a c c o u n t f o r t h e p r e d i c t i o n a c c u r a c y a l t h o u g h t h e p r e d i c t i o n w a s d o n ew i t h t h e w h o l e s e q u e n c e l e n g th . T h e P D B c o d e i s f o ll o w e d b y t h e c h a i n n a m e a , b , c,
d , h ( h e a v y ) , 1 ( l ig h t ) , x ( o n e c h a i n o n l y ) , e ( e n z y m e ) , o r i ( i n h i b i t o r ) .
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 8/14
132] THE G O R MEXHOO 547
TABLE II
GLOBAL RESULTS FOR DATABASE PREDICTION
O b s e r v e d
H E C T o t a l
Pred ic t ed
H
E
C
T o t a l
Qprd a
Gobs b
Q3 c = 64.4%
14,460 3094 4790 22,344
1124 4965 2089 8178
6002 5546 21,496 33,044
21,586 13,605 28,375 63,566
64.7 60.7 65.1
67.0 36.5 75.8
a N u m b e r o f c o r r e ct ly p r e d i c te d r e s i d u e s / n u m b e r o f p r e d i c te d r e s -
idues .
b N u m b e r o f c o r re c tl y p r e d i c t e d r e s i d u e s / n u m b e r o f o b s e r v e d r es -
i d u e s
c T o t a l num ber o f co rrec t ly pred ic t ed re s idue s / t o t a l num ber o f re s-
idues .
H o w e v e r , t h is so m e t i m e s l e a d s to p r e d i c t io n s t h a t a r e n o t p h y s i c a ll y
m e a n i n g f u l , f o r e x a m p l e , h e l ic e s h a v i n g o n l y tw o r e s id u e s , o r m i x tu r e s o fs t ra n d ( E ) a n d h e l i x ( H ) r e s id u e s . S e v e r a l a t te m p t s h a v e b e e n m a d e t o
s o l v e t h a t p r o b l e m ( s ee R o s t a n d S a n d e r 8 a n d Z i m m e r m a n n 1 9 ). H e r e w e
a d d e d a si m p l e f il te r a ft e r th e p r e d i c t i o n w h i c h r e q u i r e s h e l i c e s t o b e a t
l e a s t f o u r r e s i d u e s a n d s t r a n d s t o b e a t l ea s t t w o r e s i d u e s . F o r i n s t a n c e , l e t
u s a s s u m e t h a t w e p r e d i c t t w o i s o la t e d H r e s i d u e s. W e t h e n l o o k f o r a ll
t h e p o s s i b i l it i e s o f e x t e n s i o n o f t h e t w o H r e s i d u e s ( i n t h i s c a s e , t h e r e a r e
t h r e e p o ss ib i li ti e s: a d d i n g t w o H ' s b e f o r e , a d d i n g o n e H b e f o r e a n d o n e H
a f te r , a n d a d d i n g t w o H ' s a f t er t h e p r e d i c t e d H ' s ) . W e t h e n c a l c u l a t e t h e
p r o d u c t o f t h e p r o b a b i l i t ie s o f t h e d i f fe r e n t s e c o n d a r y s t r u c t u r e s f o r t h et h r e e s e g m e n t s s o d e f in e d . T h e s e g m e n t t h a t i s t h e m o s t p r o b a b l e i s s e l e c te d
l e a d i n g e i t h e r t o a n e x t e n s i o n o f t h e h e l i x t o fo u r r e s i d u e s o r t o th e s u p p r e s -
s i o n o f th e t w o i s o l a t e d r e s i d u e s . A l t h o u g h t h i s f il te r a f fe c t s t h e p r e d i c t i o n
o f p a r ti c u la r p r o t e i n s , o n a v e r a g e f o r t h e w h o l e d a t a b a s e i t h a s n o e f f e c t
o n t h e p r e d i c t io n a c cu r a cy ; it n e i t h e r i m p r o v e s n o r d e c r e a s e s t h e p e r c e n t a g e
o f c o r r e c t l y p r e d i c t e d r e s i d u e s .
T h e g l o b a l r e su l ts f o r t h e d a t a b a s e a r e s h o w n i n T a b l e I I. T h e p e r c e n t a g e
O f c o r r e c t l y p r e d i c t e d r e s i d u e s i s 6 4 .4 % , a n d a n i n d i v i d u a l p r e d i c t i o n o u t p u t
o f t h e p r o g r a m i s g i v e n i n T a b l e I I I w i t h a n e x tr a c o l u m n t o c o m p a r e w i t h
a9 K. Z i m m e r m a n n , Pr o t e i n Eng. 7, 1197 (1994).
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 9/14
548 SECONDARY STRUCTURE CONSIDERATIONS [321
TABLE I l l
PREDICTION OF EGLINa
Seq Obs Prd pH pE pC
T X C 0.00 0.00 1.00
E X C 0.00 0.02 0.98
F X C 0.00 0.08 0.92
G X C 0.01 0.13 0.87
S X C 0.02 0.14 0.84
E X C 0.04 0.24 0.72
L X C 0.09 0.33 0.59
K C C 0.15 0.24 0.61
S C C 0.19 0.16 0.65
F C C 0.12 0.12 0.77
P C C 0.29 0.12 0.59E C C 0.35 0.26 0.39
V C C 0.37 0.30 0.34
V C C 0.35 0.24 0.41
G C C 0.25 0.20 0.55
K C C 0.26 0.27 0.47
T C C 0.24 0.34 0.42
V H C 0.37 0.19 0.44
D H H 0.54 0.08 0.38
Q H H 0.58 0.11 0.30
A H H 0.61 0.11 0.28
R H H 0.56 0.19 0.25
E H H 0.50 0.27 0.24
Y H H 0.46 0.35 0.19
F H H 0.34 0.44 0.22
T H H 0.29 0.38 0.32
L H C 0.20 0.35 0.44
H H C 0.09 0.22 0.69
Y C C 0.03 0.11 0.86
P C C 0.05 0.06 0.89
Q C C 0.09 0.15 0.76
Y C C 0.08 0.29 0.63N E C 0.07 0.31 0.61
V E E 0.06 0.65 0.30
Y E E 0.04 0.75 0.21
F E E 0.02 0.76 0.22
L E C 0.01 0.30 0.69
P E C 0.02 0.09 0.88
E C C 0.02 0.03 0.95
G C C 0.01 0.01 0.98
S C C 0.01 0.02 0.97
P C C 0.09 0.12 0.79
V E C 0.18 0.33 0.49T E H 0.23 0.51 0.26
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 10/14
[32 ] THE G O R METHOD 549
T A B L E I I I (continued)
S e q O b s P r d p H p E p C
L C H 0 .36 0 .46 0 .18D C H 0 . 3 8 0 . 3 3 0 . 2 9
L C H 0 .51 0 .17 0 .32
R C C 0 .39 0 .16 0 .46
Y C C 0 .33 0 .22 0 .46
N C C 0 .24 0 .18 0 .57
R E C 0 .20 0 .31 0 .49
V E E 0 .17 0 .57 0 .26
R E E 0 .12 0 .71 0 .17
V E E 0 .07 0 .80 0 .13
F E E 0.05 0.71 0.25
Y E E 0 .03 0 .49 0 .48N E C 0 .01 0 .12 0 .87
P C C 0 .01 0 .03 0 .96
G C C 0 .01 0 .03 0 .96
T C C 0 .02 0 .10 0 .88
N C C 0 .03 0 .29 0 .68
V E E 0 .04 0 .54 0 .41
V E E 0 .05 0 .68 0 .27
N C E 0 .02 0 .71 0 .27
H C E 0 .01 0 .58 0 .41
V C C 0 .01 0 .22 0 .78
P C C 0.01 0.09 0.91
H E C 0 .00 0 .02 0 .98
V E C 0 .00 0 .00 1 .00
G C C 0 .00 0 .00 1 .00
a A m i n o a c i d s e q u e n c e ( S e q ) o f e g l in , a s u b t il i s in i n h i b i t o r ( l c s e ) , w i t h
o b s e r v e d c o n f o r m a t i o n s ( O b s ) , p r e d i c t e d c o n f o r m a t i o n s w i t h f i l t e r ( P r d ) ,
a n d t h e p r o b a b i l i ty v a lu e s p H , p E , a n d p C f o r t h e p r e d i c t e d ~ h e l ix ( H ) ,
/3 s t r a n d s ( E ) , a n d c o i l ( C ) , re s p e c t iv e l y . T h e c o n f o r m a t i o n X c o r r e s p o n d s
t o r e s i d u e s f o r w h i c h t h e c r y s t a l l o g r a p h e r g a v e n o c o o r d i n a t e s . F o r s o m e
r e s i d u e s , f o r i n s t a n c e , V - 1 3 , a l t h o u g h t h e p r o b a b i l i t y f o r H i s h i g h e r , t h e
f i lt e r a s s i g n e d a c o i l ( s e e t e x t ) . T h e a c c u r a c y o f p r e d i c t i o n f o r t h e t h r e e
c o n f o r m a t i o n s ( Q 3 ) i s 7 3 % .
the obse r ve d c onfor m at ions . The r e su l t whe n c ons ide r ing the pr e d ic t ion
of individual prote ins i s 64 .7% with a s tandard deviat ion of 9 .3%. Figure
la show s the num be r o f pr ote ins as a func t ion o f the pe r c e ntage o f c orr e c t ly
pred ic ted res idu es . Figure l b i s just a check that th is d is tr ibution does not
depart s igni f icant ly from a Gauss ian d is tr ibution .
A s sugge s te d by L e v in , 2° F ig . 2 show s a s c a tte r p lo t o f th e p e r c e ntage
of c or r e c t ly pr e d ic te d r e s idue s as a func t ion o f the s i z e o f the pr ote in
20 j . M . L e v i n , t o b e p u b l i s h e d .
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 11/14
"5
=EZ
o
40 50 60 70 80 90 100
Percentage of correctly predicted residues
b
j J
o~ j J "
• . ' J
_ - y
r,
j °~--
j . : : . .
-3 -2 -1 0 1 2 3
QuantUes of Standard Normar
FIG. 1. (a) Histogram of secondary structure prediction accuracies (Q3) for all the proteins
of the database. (b) Normal quant i le-quant i le plot of the results showing the agreement of
the distribution with a normal distribution. The individual values of Q3 for each protein of
the database are sorted according to the quantiles of a normal distribution with a mean of
64.4% and a standard deviation of 9.3%. The dashed line is the least-squares f it of the points.
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 12/14
[321 Tr~z GOR METHOD 551
. { :
oE L
00(3 0
t o
o
° °
• " . . . " i f . ." I , • ~ °
~ '• ~ I° •# oo"t l • • # l • • .
" ~ j I * . • , ,
o • j *
• . " . - . . . • ,
• . q , • • °° . . . ' ~ . . ~ $ ,
° " * * . d ' B • . ~ •
" ," " • • s , ' , . . , , . , t ,o • ° ° ~ w . . • •
• . • . . # • • ° •• . , . * . • °
40 50 60 70 80 90
Percen tage of cor rec t l y pred ic ted res idues
F IG . 2 . D i s t r i b u t i o n o f t h e s e q u e n c e l e n g t h ( n u m b e r o f a m i n o a c i d ) o f t h e d a t a b a s e p r o t e i n s
a s a f u n c t i o n o f t h e n u m b e r o f co r r e c t ly p r e d i c t e d r e s i d u e s .
(number of residues). There seems to be no apparent effect of the length
of the protein on the accuracy of prediction, except that the longer the pro-
tein, the closer the accuracy comes to the average value. For proteins of
less than 200 residues the accuracy can lie anywhere between 40 and 90%.
A consequence of this observation is that no protein is easier to predict
than any other, but rather there are segments of the sequence easier to pre-
dict; thus the shorter is the protein, the more likely its accuracy of predictionwill be different from the mean, and inversely for the longest ones.
Table IV shows results for the prediction of secondary structure seg-
ments, namely, helices and strands. The percentage of correctly predicted
segments is given according to the minimum percentage of overlap that is
allowed between the predicted and observed segments. For instance, in
Table IV, for the row 75% a segment is considered as being correctly
predicted if the predicted segment overlaps with at least 75% of the observed
segment (counted as the number of residues).
Conclusion
Through the successive incorporation of observed frequencies of single,
then pairs of residues on a local sequence of 17 residues, the accuracy of
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 13/14
552 SECONDARY STRUCTURE CONSIDERATIONS [3 9, ]
TABLE IV
RESULTS FOR SEOMENTS: HELICES AND STRANDS
Average
Number of segments length
H E Total H E
Observed 1989 2587 4576 10.9 5.9
Predic ted 2148 2043 4191 10.6 4.1
Overlap H segmen ts E segments
75% 51.1 23.7
50% 70.0 42.0
25% 75.7 50.2
t h e G O R m e t h o d h a s b e e n i m p r o v e d f ro m a b o u t 55 % ( G O R I u s in g a
j a c k k n i f e 21) u p t o 6 4 . 4% . T h e i n c r e a s e o f t h e d a t a b a s e s iz e f r o m 6 7 p r o t e i n s
t o t h e p r e s e n t d a t a b a s e o f 2 6 7 p r o t e i n s a n d t h e u s e o f a m o r e d e t a i l e d
d e s c r ip t i o n o f t h e l o ca l s e q u e n c e r e s u l t e d in a n i m p r o v e m e n t o f a b o u t 1 %
( G O R I II , Q 3 = 6 3 .3 % ; G O R I V , Q3 = 6 4 .4 % ; t h e c o r r e s p o n d i n g s t a n d a r d
d e v i a t i o n s o f Q 3 a r e 0 .8 % a n d 0 .6 % , re s p e c t i v e l y ). H o w e v e r , t h e r e s u l t o f
6 3.3 % f o r G O R I II w a s r e a c h e d u s i ng d u m m y f r e q u e n c i e s a n d a d d i n g
d e c i s io n c o n s t a n t s t h a t w e n o w b e l i e v e r e s u l t e d i n a s li g ht b ia s t o w a r d t h ed a t a b a s e w e w e r e t h e n u s i ng . T h i s c a u s e s t h e o v e r a l l a c c u r a c y o f t h e m e t h o d
t o b e o v e r e s t i m a t e d b y a p e r c e n t o r s o, as b e c a m e a p p a r e n t w h e n p a r a m e -
t e r s d e r i v e d f r o m t h e o r i g in a l d a t a b a s e w e r e u s e d t o p r e d i c t n e w s e ts o f p r o -
t e in s . T h i s is t h e r e a s o n w h y , h e r e , w e a v o i d e d t h e u s e o f d e c i si o n c o n s t a n t s
( e.g ., t o a d j u s t t h e n u m b e r o f p r e d i c t e d s e c o n d a r y s t r u c t u r e s t o w h a t i s o b -
s e r v e d in th e d a t a b a s e ) i n o r d e r t o o b t a i n a m o r e r o b u s t e s t i m a t i o n o f t h e
a c c u r a c y o f t h e m e t h o d . T h i s s m a l l in c r e a s e i n th e p r e d i c t i o n a c c u r a c y is
c o n s i s t e n t w i t h a p r e v i o u s a s s u m p t i o n w e m a d e . I5 W e e s t i m a t e d t h a t w e
w e r e a b l e t o e x t r a c t m o r e o r l es s al l t h e i n f o r m a t i o n a v a i la b l e in t h e l o c a ls e q u e n c e . O t h e r p u b l i s h e d m e t h o d s i n cl u d in g n e u r a l n e t m e t h o d s , u s in g o n l y
t h e p r o t e i n s e q u e n c e , a r e o f s i m il a r o r l o w e r a c c u r a c y ( s e e R e f s. 2 a n d 3 ).
W e a t t r i b u t e d t h e l i m i t at i o n i n t h e a c c u r a c y o f t h e p r e d i c t i o n 15 t o t h e
l a ck o f l o n g - d i st a n c e e f fe c ts . T h i s a p p e a r s t o b e c o n f i r m e d h e r e b y t h e p o o r
q u a l i t y o f t h e / 3 - s t r a n d p r e d i c t i o n . B e c a u s e / 3 s h e e t s r e q u i r e t h e p a i r in g o f
r e s i d u e s t h a t m a y b e d i s t a n t a l o n g t h e s e q u e n c e , t h i s s e c o n d a r y s t r u c t u r e
is p r e s u m a b l y m o r e d e p e n d e n t o n l o n g - r a n g e i n t e r a c t i o n s t h a n a r e o~ h e l ic e s
o r c o i l s . C l e a r l y t h e m e t h o d d o e s n o t f a r e w e l l w i t h t h e p r e d i c t i o n o f / 3
s t r a n d s . A l t h o u g h Q p rd f o r / 3 s t r a n d s i s o n l y s l i g h tl y l o w e r t h a n Qpra o r t h e
21 W. Kabsch and C. Sander, FEBS Lett. 155, 179 (1983).
8/6/2019 GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence
http://slidepdf.com/reader/full/gor-method-for-predicting-protein-secondary-structure-from-amino-acid-sequence 14/14
[32] THE G O R METI-IO0 553
t w o o t h e r s e c o n d a r y s t r u c tu r e s , t h e r e i s a c h r o n i c u n d e r p r e d i c t i o n o f t h is
s t r u c t u r e , Q o b s f o r / 3 s t r a n d s i s th u s s ig n i fi c an t ly l o w e r c o m p a r e d t o Q o bs
f o r h e li c e s a n d c o il s ( e v e n t a k i n g i n t o a c c o u n t t h e o v e r e s t i m a t i o n o f th e
c o r r e s p o n d i n g Q o b s v a l u e s , w h i c h i s a c o n s e q u e n c e o f th e o v e r p r e d i c t i o nof he lic e s and co i ls ) . Th is i s a l so no t ic eab le in the f ac t tha t , w he r ea s the
a v e r a g e l e n g t h o f p r e d i c t e d a n d o b s e r v e d h e l i c es c o r r e s p o n d s c l o se l y , t h e
a v e r a g e l e n g th o f p r e d i c t e d s t r a n d s i s s h o r t e r b y a b o u t o n e - t h i r d c o m p a r e d
t o t h e a v e r a g e l e n g t h o f t h e o b s e r v e d o n e s .
I t i s thus ve ry impor tan t to cons ide r the poss ib i l i t i e s o f inc lud ing long-
r a n g e i n t e r ac t io n s i n o u r m e t h o d ( o r o t h e r m e t h o d s u s in g o n l y s h o r t- r an g e
i n f o r m a t i o n , t h a t i s, l o c a l s e q u e n c e o n l y , f o r t h a t m a t t e r ) . O n e w a y t o
i n t r o d u c e l o n g - d i s t an c e e f f e c ts is t o u s e s p e c if ic n o n l o c a l p a i rs t o i m p r o v e
/ 3 -s tr an d p r e d i c t io n . 22 I n o t h e r w o r d s , t h e p r e d i c t i o n o f / 3 s t r an d s c o u l d b ed o n e u s i n g t h e l o c a l w i n d o w a s u s u a l t o f i r s t s e l e c t p u t a t i v e / 3 - s t r a n d s e g -
m e n t s a n d t h e n o n e c o u l d s l i d e a w i n d o w a l o n g t h e s e q u e n c e t o l o o k
w h e t h e r c o m p l e m e n t a r y s e g m e n t s t o th e s e p u t a t iv e / 3 s t r an d s e g m e n t s ca n
b e f o u n d . A n o t h e r p o s s i b i li t y is t o u s e m u l t ip l e a l i g n m e n t s . T h i s i s b a s e d
o n t h e a s s u m p t i o n t h a t c o r r e s p o n d i n g r e s i d u e s i n t h e a l i g n m e n t , p r o v i d e d
t h e a l i g n m e n t i s c o r r e c t , w i l l h a v e t h e s a m e s e c o n d a r y s t r u c t u r e , b e i n g a t
t h e s a m e l o c a t i o n i n t h e f o ld . T h e u s e o f m u l t i p le a l i g n m e n t s h a s r e c e n t l y
b e e n t h e s o u r c e o f a n i m p r o v e m e n t o f t h e a c c u r a c y r an g in g f r o m 5 , f o r
t h e G O R 7 a n d t h e q u a d r a t ic l o gi st ic9 m e t h o d s u p t o 1 0 p e r c e n t a g e p o i n t sf o r a n e u r a l n e t w o r k m e t h o d . 8
T h e G O R m e t h o d h a s th e a d v a n ta g e o v e r n e u ra l n e t w o r k - b a s ed m e t h -
o d s o r n e a r e s t - n e i g h b o r m e t h o d s i n t h a t i t c l e a r l y i d e n t i f i e s w h a t i s t a k e n
i n to a c c o u n t f o r t h e p r e d i c t io n a n d w h a t i s n e g l e ct e d . M o r e o v e r , t h e m e t h o d
p r o v i d e s e s t i m a t e s o f p r o b a b i l i t i e s f o r t h e t h r e e s e c o n d a r y s t r u c tu r e s a t
e a c h r e s i d u e p o s i t i o n , w h i c h c a n b e u s e f u l f o r fu r t h e r a p p l i c a t i o n o f t h e
m e t h o d . 15 I t r e l ie s o n l y o n o b s e r v e d f r e q u e n c i e s i n t h e d a t a b a s e ; t h u s , t h e
c a l c u l a t io n o f t h e p a r a m e t e r s i s s t r a i g h t f o r w a r d a n d e a s y t o u p d a t e .
A v a i l a b i l i t y
T h e c o r r e s p o n d i n g p r o g r a m h a s b e e n w r i t t e n i n C l a n g u a g e a n d c u r -
r e n t l y r u n s o n a p l a t f o r m w i t h a U N I X o p e r a t i n g s y s t e m ( b u t i t w i l l r u n
e q u a l l y w e ll o n o t h e r o p e r a t i n g s y s t e m s ) . I t c an b e o b t a i n e d b y a n o n y m o u s
f t p a t N C B I ( N a t i o n a l C e n t e r f o r B i o t e c h n o l o g y In f o r m a t i o n ) u s in g th e
f o l l o w i n g p r o c e d u r e : f t p n c b i . n l m . n i h . g o v , m o v e t o t h e d i r e c t o r y g i b r a t /
G O R . I t i s a l s o a v a i l a b l e a t I N R A - J o u y - e n - J o s a s : f t p l o c u s . j o u y . i n r a . f r ,
m o v e t o d i r e c t o r y /p u b / p r o t e i n / G O R .
22 T. J . P . H ub bar d , 2 7 t h A n n u a l H a w a i i I n t e r n a t i o n a l C o n f e r e n c e o n S y s t e m S c i e n c e s , (L .
H u n t e r , e d . ) I E E E C o m p u t e r S o c i e t y P r es s , Lo s A l a m o s , 33 6 (1 99 4) .