Approximate Scaling Properties of RNA Free Energy Landscapes...Baskaran et al Scaling in RNA...
Transcript of Approximate Scaling Properties of RNA Free Energy Landscapes...Baskaran et al Scaling in RNA...
Approximate Scaling Propertiesof RNA Free EnergyLandscapesSubbiah BaskaranPeter F. StadlerPeter Schuster
SFI WORKING PAPER: 1995-10-083
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu
SANTA FE INSTITUTE
Approximate Scaling Properties of
RNA Free Energy Landscapes
By
Subbiah Baskarana�b�c� Peter F� Stadlerb�d� and Peter Schusterb�c�d��
aES��� Marshall Space Flight Center� Huntsville� AL����� USAbInstitut f�ur Theoretische Chemie� Universit�at Wien� Vienna� Austria
cInstitut f�ur Molekulare Biotechnologie eV� Jena� GermanydSanta Fe Institute� Santa Fe� NM ������ USA
�Correspondence to Institut f�ur Theoretische Chemie� Universit�at Wien
W�ahringerstra�e ��� A����� Vienna� AustriaPhone ���� � �� ��� � ���Fax ���� � �� ��� � ���
Email pks�tbiunivieacat
Baskaran et al� Scaling in RNA Landscapes
Abstract
RNA free energy landscapes are analyzed by means of �time�series� that are obtained fromrandom walks restricted to excursion sets� The power spectra� the scaling of the jump size dis�tribution� and the scaling of the curve length measured with di�erent yard stick lengths are usedto describe the structure of these �time�series�� Although they are stationary by construction�we �nd that their local behavior is consistent with both AR� and self�a�ne processes� Randomwalks con�ned to excursion sets i�e�� with the restriction that the �tness value exceeds a certainthreshold at each step� exhibit essentially the same statistics as free random walks�We �nd that an AR� time series is in general approximately self�a�ne on time scales up toapproximately the correlation length� We present an empirical relation between the correlationparameter � of the AR� model and the exponents characterizing self�a�nity�
Key Words
RNA Folding Excursion Sets Fractal Landscape AR� Process ��f noise
� � �
Baskaran et al� Scaling in RNA Landscapes
�� Introduction
Evolutionary optimization as well as combinatorial optimization take place on
landscapes resulting from mapping �micro�con�gurations to scalar quantities like
�tness values� energies� or costs �Schuster � Stadler� ����� In most cases one
lacks a detailed understanding of the structure of �tness landscapes that underlies
a particular instance of biological evolution One resorts thus to using model land�
scapes Well studied examples include combinatorial optimization problems such
as the Traveling Salesman Problem� the Graph Matching Problem� or the Graph
Bipartitioning Problem� various spin glass models� among them the Sherrington�
Kirkpatrick models� and Kau�man�s Nk�models �Kau�man� ������ The ruggedness
of a landscape� often measured by means of a correlation function� is of crucial im�
portance for the dynamics of the evolution process �Eigen et al�� ����� Bonhoe�er
� Stadler� ����� Detailed studies of the correlation structure of model landscapes
can be found� for instance� in the following references �Stadler � Schnabl� ����
Stadler � Happel� ���� Stadler� ���� Weinberger� ����a� Weinberger� ����b�
Weinberger � Stadler� �����
Exclusively in the case of RNA landscapes do we have a sound biophysical model
for the �tness function Models based on RNA secondary structure prediction
algorithms have been analyzed in great details in a series of papers �Fontana et al��
����� Fontana et al�� ����� Bonhoe�er et al�� ����� Tacker et al�� ����� Schuster
et al�� ����� Evolutionary dynamics on such landscapes was the topic of extensive
research as well �Fontana � Schuster� ����� Fontana et al�� ����� Huynen et al��
����� A detailed understanding of these landscapes is a necessary prerequisite
for building simpler models based on spin glass or Nk model landscapes that are
signi�cantly less costly in computer simulations and that lend themselves much
easier to analytical treatment
Weinberger �Weinberger� ����� suggested to characterize a landscape by means of
a �time series� obtained by sampling the �tness values along a random walk in
sequence space While this method is rather indirect� it yields a data set that can
be analyzed by the standard methods of time series analysis �Hordijk� ����� In
� �
Baskaran et al� Scaling in RNA Landscapes
this contribution we shall investigate the �fractal�like� features of landscapes in
terms of the approximate self�a�nity of these �time�series�
A great variety of systems� physical and biological� exhibit ��� power spectra�
commonly called ��f�noise or � icker� noise Some examples are resistivity uctu�
ation in conducting materials �Weissman� ������ luminosity uctuations of stars
and galaxies �Nolan et al�� ������ ow uctuations of highway tra�c �Musha �
Higuchi� ����� and of deep ocean waters �Taft et al�� ������ frequency variations
of quartz oscillator �Attkinson et al�� ������ the loudness uctuations in music and
speech �Voss � Clarke� ����� In biological systems ��f noise has been reported
for nerve membranes �Verveen � Derkson� ������ for the DNA sequences of the
non�coding introns �Voss� ���� Li � Kaneko� ���� as well as of coding regions
�Buldyrev et al�� ����� In this paper we will show that the �time�series� sampled
along a random walk on a RNA free energy landscapes also leads to ��f noise
This contribution is organized as follows In section we review some notions that
are basic to the theory of �tness landscapes In particular� we introduce a variety
of correlation measures and highlight their relations with each other In particular
we consider the class of landscapes that lead to exponential correlation functions
of the �time series� obtained from simple random walks In section � we brie y
consider self�a�ne time�series and show that AR��� processes mimic self�a�nity on
time scales up to their correlation length These �nding are applied to free energy
landscapes of RNA in section � In particular� we shall see that the mountainous
parts of the landscapes do not di�er signi�cantly from the average �tness regime�
at least as long as the excursion sets do not fragment into tiny pieces Section �
concludes our discussion The relaxation time of a simple random walk on a
sequence space is computed in the appendix
� � �
Baskaran et al� Scaling in RNA Landscapes
�� Landscapes
����Rugged Landscapes
De�nition� A landscape is a map f C � IR� where C ! �X�d� is a �nite metric
space with metric d X �X � IR
In most applications of landscapes in biology� physics� or combinatorial optimiza�
tion the con�guration space �X�d� can be represented as a graph " Then two
con�gurations x and y are neighbors in " if d�x� y� ! � The metric d is often
obtained from an editing procedure that allows to interconvert two con�gurations
x� y � X by means of a �nite sequence of operations d�x� y� is commonly de�ned
as the number of operations in the shortest sequence that changes x into y or vice
versa In a biological context the �elementary operations� are in general muta�
tions We will restrict ourselves there to the case where X is a set of sequences
of common length n which are constructed from some alphabet with � letters In
this case d is the so�called Hamming distance �Hamming� ������ and the graph "
is known a the sequence space Qn�� or Boolean hypercube in the special case � !
For a recent review see �Schuster � Stadler� ����� Stadler� ����b�
����Correlation Functions
A very important characteristic of a landscape is its ruggedness Rugged land�
scapes are characterized by a large number of local optima �Palmer� ������ the
fact that uphill walks are short and easily trapped in local optima� and by short
correlation lengths �Kau�man� ����� There is ample evidence that heuristic op�
timization procedures work less e�ciently the more rugged a landscape is �Stadler
� � �
Baskaran et al� Scaling in RNA Landscapes
� Schnabl� ���� Schuster � Stadler� ����� It will be convenient to de�ne for a
given landscape f
f !�
jXj
Xx�X
f�x� ��f !�
jXj
Xx�X
�f�x� � f
��� ����
It has been suggested by various authors �Eigen et al�� ����� Fontana et al�� �����
Sorkin� ����� Weinberger� ����� to measure �ruggedness� by some sort of corre�
lation measure We shall use the following de�nition� which was �rst proposed in
ref �Eigen et al�� �����
��d� !�
��f�
�
jDdj
X�x�y��Dd
�f�x� � f ��f�y� � f
����
Here Dd denotes the set of all pair of vertices that have mutual distance d in the
graph " For a sequence space we have for instance
jDdj ! �n�� � ��d�n
d
�� ����
This de�nition is useful if " is a distance regular graph �Brouwer et al�� ����� A
more general mathematical framework is developed in �Stadler� ����c� Happel �
Stadler� ����� Stadler � Happel� �����
Weinberger �Weinberger� ����� Weinberger� ����a� Weinberger� ����b� suggested
to investigate the properties of landscapes by sampling the values along a simple
random walk in the con�guration space C
x� � x� � x� � � � � � xk � � � �j j j jf� � f� � f� � � � � � fk � � � �
����
where xi and xi�� are neighbors in C At each step one of the neighbors of xi in "
is chosen with uniform probability� ie� the series fxig is a simple random walk on
C �Spitzer� ����� By evaluating the con�gurations along the walk fxig we obtain
a random walk on the landscape� ie� the �time series� ffi ! f�xi�g This series
is stationary by construction
� � �
Baskaran et al� Scaling in RNA Landscapes
The autocorrelation function of a stationary time series is de�ned by
r�s� !hftft�si � hf�t i
������
where �� is the variance� which coincides with ��f de�ned above� and the angular
brackets indicate the expectation value taken over all random walks fxig and
all times t Provided the graph " is D�regular� ie� each con�guration x has
exactly D neighbors� we may write the transition matrix of the random walks as
T ! ���D�A The entry Axy of the adjacency matrix is � or �� depending on
whether the con�gurations x and y are neighbors or not It is shown in �Stadler�
����c� that the correlation function r�s� has the following algebraic representation
r�s� !�
��
hhf�Tsfi � f
�i� ����
Note that h � � � i denotes here a scalar product� not an expectation value# Another
useful �Fontana et al�� ����� representation is
r�s� ! ��h �ft�s � ft�� i
��� ����
The average squared di�erence h �ft�s � ft�� i was used as a correlation measure
in Sorkin�s pioneering paper �Sorkin� �����
The autocorrelation function ��d� of the landscape itself and the autocorrelation
r�s� of the �time�series� of the landscape are related via
r�s� !Xd
�sd��d� ����
where �sd is the probability that a simple random walk of length s ends at distance
d�x�� xs� ! d Explicit expressions for �sd can be found in ref �Fontana et al��
����� Stadler� ����a�� we shall not make use of them in this contribution
� � �
Baskaran et al� Scaling in RNA Landscapes
����Elementary Landscapes
For a quite large number of model landscapes it has been found that the corre�
lation function r�s� is exactly a decaying exponential �Stadler � Happel� ����
Weinberger � Stadler� ������ numerically indistinguishable from a decaying ex�
ponential �Stadler � Schnabl� ���� Stadler� ����� or at least very close to a
decaying exponential �Weinberger� ����� Weinberger� ����a� It has been argued
that a nearly exponential autocorrelation function r�s� would be generic for land�
scapes with a Gaussian distribution of �tness values �Weinberger� ����� This
argument is wrong� however
It is not hard to check that r�s� is exponential whenever f is of the form f�x� !
f $ ��x�� where � is an eigenvector of the adjacency matrix A with eigenvalue
% Indeed� under these conditions one �nds r�s� ! �%�D�s In a more general
context is useful to assume that � is an eigenvector of the so�called graph Laplacian
�Mohar� ������ for regular graphs we have & ! A �DE� where E is the identity
matrix� ie� the eigenvectors of A are the same as the eigenvectors of the Laplacian
& Landscapes of this type have been termed elementary Lov Grover �Grover�
���� found that a number of well known model landscapes are elementary� for
instance the landscape of the Traveling Salesman Problem In �Stadler� ����c� it
is also shown that r�s� is exponential if and only if the landscape is elementary
Note that the possible eigenvalues % are uniquely determined by the adjacency
matrix A� ie� by the geometry of the con�guration space As a consequence
there is only a �nite small number of possible values for the parameter def
���%�D
of the exponential decay� ie� it is not possible to construct a landscape f with
autocorrelation function r�s� ! s with an arbitrarily prescribed parameter '
in contrast to the case of merely constructing a time series In other worlds� only
a very special set of time series is generated by random walks on landscapes
� � �
Baskaran et al� Scaling in RNA Landscapes
����Power Spectra
Instead of a correlation function one can use power spectrum
S��� def
��� limN��
�
N
���
NXt��
ft cos��t�
$
�NXt��
ft sin��t�
� � ����
of the time series fftg as a means of characterizing the landscape Here N is
the number of points sampled from the time series fftg Power spectrum and
autocorrelation function of a stationary process are related by theWiener�Khinchin
theorem �see� eg� �Yaglom� ������
r�s� !
��
Z �
�
S��� cos��s�d�
S��� !��
�� $
�Xs��
r�s� cos��s�
�����
A negative slope of S��� implies some degree of correlation in ft A steeper slope
implies a higher degree of correlation A signal fftg is called ��f noise if a log�log
plot of the power spectrum versus frequency can be approximated by straight line
with slope close to �� in the frequency range of interest More generally� one
speaks of ��fa noise if the slope is �a We shall return to this type of time series
in section �
The most common de�nition of a correlation length in physics is simply the integral
of the autocorrelation function In the discrete case it is convenient to use
(� def
����
$
�Xs��
r�s� � �����
Comparing this de�nition with the Wiener�Khinchin theorem� equ����� yields
the simple relation
S��� !��
(� ����
which can be used as an alternative way of estimating the correlation length of a
time series
� � �
Baskaran et al� Scaling in RNA Landscapes
����Excursion Sets
The parts of the landscape in which the values are close to the global maximum
or minimum are particular interest One might ask� for instance� how the �good�
solutions are distributed in sequence space) Are they clustered around a globally
optimal solution� or are con�gurations with close�to�optimal values scattered all
over the con�guration space) A suitable mathematical framework for this type of
questions is set by the notion of excursion sets �Adler� ����� In this subsection
we collect a few de�nitions and their immediate corrolaries which will be useful
for the discussion of the RNA free energy landscapes in section �
De�nition� Let f X � IR be an arbitrary landscape
�i� A con�guration x is a local optimum if for all neighbors y of x holds f�x� �
f�y� Two con�gurations x and y are called neutral if f�x� ! f�y�
�ii� The set AE ! fx � Xjf�x� � Eg is called the excursion set of f at level E
A connected component of AE is called a cycle �Freidlin � Wentzell� �����
�iii� A connected subgraph B � A is called neutral network in X if all elements
are neutral� and if all neutral neighbors of any x � B are elements of B as
well
For su�ciently small E we have of course AE ! X� the entire con�guration space
On the other hand� if E is larger than the global optimum of f � then AE is
empty Clearly� E � E� implies AE AE� � hence excursion sets introduce a
hierarchical structure on the landscape In general� AE will not be connected� ie�
it will decompose into more than one cycle �connected component� Bounds on
the number of cycles can be obtained for elementary landscapes and the special
value E ! f � for details see �Stadler� ����c� Cycles play a prominent role in
the analysis of simulated annealing techniques on combinatory landscapes� see
�Azencott� ���� for a recent review
Excursion sets� local optima� and neutral networks are closely related We list
here only a few simple geometric relationships �i� Suppose CE is a cycle and B
is a neutral network� then CE and B are either disjoint or B is subset of CE �ii�
� � �
Baskaran et al� Scaling in RNA Landscapes
Each cycle CE contains at least one local optimum �iii� A neutral network B is
a cycle if and only if it consists entirely of local optima Each cycle CE contains
a cycle of this type �iv� A neutral network which is a cycle contains no other
cycles except for itself �v� If a cycle consists of only one con�guration then this
con�guration is a local optimum
The notion of excursion sets suggests two percolation problems �i� At which level
E does AE cease to be a single cycle) �ii� At which level E does AE decom�
poses into many small cycles� as opposed to consisting of a single giant component
containing almost all vertices of AE and a number of very small islands) Both
problems have not been treated so far� although they seem to be of utmost im�
portance for the understanding of adaptation on combinatory landscapes In this
contribution we shall be content with investigating the structure of landscape at
�tness levels for which the cycles are still large in general
���Random Walks on Excursion Sets
Instead of performing the random walk on the entire con�guration space C one may
con�ne it to an excursion set AE � C The random walks is then automatically
constrained to a connected component of AE � ie� to a cycle We used the following
procedure to generate a walk within a cycle CE The process starts in a vertex
a� known to be in the desired excursion set These initial points are generated
by screening a large number of random con�gurations �Alternatively one might
use con�gurations obtained from some simple optimization heuristics as starting
points for higher excursion levels This would� however� bias the the sampling�
since con�gurations in large �mountains� would be favored� Then an attempt is
made to move to a neighboring vertex If it is contained in the same cycle� ie� if its
�tness is above the threshold level E then it is accepted� otherwise the attempt is
rejected The �time�series� is formed by the accepted moves only This procedure
generates a time series provided CE contains more than one con�guration In fact�
we are only interested in large cycles
� �� �
Baskaran et al� Scaling in RNA Landscapes
It is clear that con�ning random walks to cycles means that they sample predom�
inantly in the vicinity of local optima One can hope� therefore� that the resulting
time�series provide information about the most interesting regions of the �tness
landscape ' the region of high �tness The major drawback is that� by equ����
the time series contains a superposition of two e�ects� namely the correlation of
�tness values on the landscape and the geometrical relaxation of the walk in CE
The correlation of a walk in CE is
rE�s� !Xd
�Esd�E�d� �E�hd�s�iE �� with hd�s�iE !
Xd
�Esdd� �����
Here �E�d� is the correlation of the restriction of the landscape to the excursion
set AE and hd�s�iE describes the geometric relaxation of a random walk in a cycle
CE Since the topology of CE is not known it is very di�cult to retrieve more than
qualitative information on the structure of the mountainous parts of the landscape
� �� �
Baskaran et al� Scaling in RNA Landscapes
�� Self�A�ne Time Series and Fractal Landscapes
���� SelfA�ne Time Series
De�nition� A time series fFtg is self�a�ne �or fractal� if
s�H �Ft�s � Ft�d
��� �Ft�� � Ft�� �����
where s is the number of steps between the two measurements The notation d
���
indicates equality in the sense of distributions The parameterH ful�ls � � H � �
An example is fractional Brownian motion� see� eg� �Mandelbrot� ���� The
power spectrum of a time series with a distribution ful�lling ���� follows a power
law �Mandelbrot � vanNess� ����� of the form
S��� ! ��a with a ! � $ H ����
In case of fractional Brownian motion in continuous time� the parameter H and
the Haussdorf dimension DH of the resulting curve are related by DH ! H $ �
We remark that a time series ful�lling ���� strictly for all s cannot be stationary
Instead of using the power spectrum one can use more direct methods for char�
acterizing a self�a�ne time series Probably the most immediate approach is to
consider the jump size
J�s� ! hjFt�s � Ftji �����
As an immediate consequence of ���� we have J�s� � sH H can be obtained by
means of a least square �t from a log�log plot� see� eg� �Osborne � Provenzale�
����� A closely related technique has been proposed by Sorkin �Sorkin� �����
Multiplying ���� by itself and taking the expectation yields
s��H h �Ft�s � Ft�� i ! h �Ft�� � Ft�
� i �����
� � �
Baskaran et al� Scaling in RNA Landscapes
and one obtains the slope H from a log�log plot of the mean square di�erences
versus the lag s
Another approach to self�similarity focuses on the curve length as function of the
yard�stick length used for the measurement The method outlined below was
proposed in ref �Higuchi� ����� as an improvement of the procedure given by
Burlaga and Klein �Burlaga � Klein� ����� It provides numerically stable scaling
exponents even for a small number of data points We divide the time series fFtg
into k partial series
Fm�s� ! fFm� Fm�s� Fm��s� � � � � Fm�bN�m
scsg�
and de�ne the length of Fm�s� as
Lm�s� !N � �
sbN�ms c
Xi
jFm�is � Fm��i���sj�
The curve length L�s� measured with step size s is then the average value taken
over all the partial series
L�s� !�
s
sXm��
Lm�s�� �����
If the time�series is self�a�ne� then the curve length follows a power law of the
form L�s� � s�D The correction factor in the de�nition of Lm�s� approaches �
for large data sets� and hence we �nd as an immediate consequence of equ����
that
L�s��N
ssHh jFt�� � Ftj i � sH��� �����
and therefore D ! ��H The parameters a� H� and D of a self�a�ne time series
are related by means of the equations
a ! � $ H ! �� D� �����
Independent estimates of a� H� and D can thus be used to determine to what
extent a given time series is consistent with the assumption of self�a�nity
� �� �
Baskaran et al� Scaling in RNA Landscapes
����Fractal Landscapes
It is obvious that a time series obtained from a random walk on a landscape cannot
be strictly self�a�ne since it must be �at least approximately� stationary Hence
���� is an approximation that holds only for s n� where n is the maximal
distance in con�guration space
Dividing Sorkin�s equ���� by twice the ��nite� variance �� of the landscape and
substituting equ��� we �nd
s��H��� r�s�� ! �� r���� �����
Solving for the autocorrelation function yields r�s� ! � � cs�H The parameter
c can be obtained as follows Since a single step along the random walk always
leads to distance � we have r��� ! ���� def
��� � the nearest neighbor correlation of
the landscape Thus ! � � c� and we �nally obtain an autocorrelation function
of form
r�s� ! �� �� � �s�H � �����
Equ���� holds of course only for s small compared to the maximum distance in the
landscape It has been used for a classi�cation of rugged landscapes �Weinberger
� Stadler� ����� Stadler� ����b� in terms of the parameter
H !�
ln��� r�s��
ln s������
for small s
���� �AR ��Landscapes� are Locally Fractal
An AR��� �or Ornstein�Uhlenbeck� process is de�ned by the following recurrence
relation� see� eg� �Papoulis� ����� Feller� ����
Ft ! �Ft�� $ �t� �� � � � � ������
� �� �
Baskaran et al� Scaling in RNA Landscapes
where �t is denotes Gaussian white noise with variance ��� The resulting time
series is stationary and has the Markov property Its autocorrelatation function is
r�s� ! s ! exp��s���� � def
��� ��
ln �����
where � is the correlation length as de�ned in �Weinberger� ����� Fontana et al��
����� Conversely� any Gaussian stationary Markov process has an autocorrelation
function of the form ���� The parameter measures the correlation of the time
series If � then the time series is almost uncorrelated� ie� fFtg is almost
white noise On the other hand� for � the time series approximates Brownian
motion
Before we proceed let us brie y discuss the relation between � and (� de�ned in
section The RNA free energy landscapes and almost all of the model landscapes
that have been investigated so far have correlation length that scale linearly with
n� see eg �Schuster � Stadler� ����� for a recent overview In other words� we
have def
��� � � x where x scales as ��n for large systems If r�s� is of the form
����� then we �nd
� !�
x��
$
�
�x$O�x��
(� !�
x��
������
Thus � and (� di�er only by a contribution of order ���� � ��n for the landscapes
of interest
The power spectrum of an AR��� time series is
S��� !����� ��
�� $ � � cos��� ������
see� eg� �Yaglom� ����� In fact� the Wiener�Khinchin theorem� equ���� shows
that equ����� hold for all elementary landscapes� irrespective of the distribution
function of the �tness values
Weinberger �Weinberger� ����� called a landscape f C � IR an AR��� landscape
if the time series obtained by a random walk on the landscape is Gaussian and
has an autocorrelation function of the form ����� The parameter describes
� �� �
Baskaran et al� Scaling in RNA Landscapes
the ruggedness of the landscape The landscapes with exponential autocorrelation
functions are exactly the elementary landscape discussed above An AR��� land�
scape in the sense of Weinberger is thus an elementary landscape with a Gaussian
�tness distribution A number of model landscapes have been shown to be elemen�
tary �Grover� ���� Weinberger � Stadler� ����� Stadler� ����c� Most of them
have in fact a Gaussian distribution of �tness values� at least asymptotically as a
consequence of the central limit theorem The best known examples are the p�spin
models� the graph bipartitioning problem� graph matching� graph coloring� and
symmetric traveling salesman problems Kau�man�s Nk models are approximately
AR���� their decomposition into elementary components is discussed in detail in
�Stadler � Happel� ����� The class of landscapes that are approximately AR���
includes a variety of landscapes based on RNA secondary structures �Fontana
et al�� ����� Fontana et al�� ����� Bonhoe�er et al�� ����� Tacker et al�� �����
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0rho
0.0
0.1
0.2
0.3
0.4
0.5
H, 1
-D
Figure �� Approximated scaling exponents H solid line� and ��D dotted line� as a functionof the correlation � of an AR� process� The deviations are of the order of ��
The following considerations� like equ������ depend only on the form of correla�
tion function r�s�� not on the distribution function of the �tness values They are
� �� �
Baskaran et al� Scaling in RNA Landscapes
therefore valid for any elementary landscape The linear approximation
r�s� �� s��� s n ������
of equ���� is a good approximation for highly correlated landscapes� ie� for
landscapes with correlation lengths � ! O�n� By comparing equ����� with
equ���� we observe that elementary landscapes with large correlation length are
locally self�a�ne� with scaling parameter H!��� ie� time series obtained from
such landscapes behave locally like ordinary Brownian motion
Surprisingly� however� we �nd that even AR��� time series with small correlation
length show approximate power laws for J�s�� equ���� and for the curve lengths
L�s� Numerical simulations show that we have H � � for � �� while � � �
yields H � �� Data obtained from direct measurement of H� according to
equ����� and estimates of the scaling exponent D of the curve length obtained
from ���� are consistent with each other Best �ts of the characteristic exponents
H and � �D as functions of are shown in Figure � It is interesting to note in
this context that certain log�normal distributions can also mimic ��f spectra in a
limited frequency domain �Montroll � Shlesinger� �����
� �� �
Baskaran et al� Scaling in RNA Landscapes
��RNA Free Energy Landscapes
Folding biopolymer sequences into structures is a central problem in molecular
biology research Both robustness and accessibility of structures� as functions of
mutational change in the underlying sequence� are crucial to natural as well as
molecular evolution applied to biotechnology RNA molecules are an excellent
model system In fact� they are the only class of biopolymers for which the folding
problem has been solved at least at the level of secondary structures
An RNA sequence is a string of length n composed of an alphabet of size � In
nature the alphabet consists of the � ! � bases Guanine� Cytosine� Adenosine�
and Uracile In this paper we shall also consider the restricted alphabet fG�Cg
with � ! A natural distance between sequences is the Hamming distance
measuring the number of positions in which two sequences di�er �Hamming� �����
The con�guration space is hence a generalization of the Boolean hypercube known
as the sequence space
A secondary structure is tantamount to a list of Watson�Crick type and GU base
pairs Such a structure can be uniquely decomposed into structural elements that
are �i� base pair stacks� �ii� loops di�ering in size �number of unpaired bases�
and branching degree hairpin loops �degree one�� internal loops �degree two or
more�� and �iii� bases which are not part of a stack or a loop are termed external
�freely rotating joints and unpaired ends� Each stack or loop element contributes
additively to the overall free energy of the structure These energy terms are
empirically determined parameters that depend on the nucleotide sequence �Freier
et al�� ����� The folding process considered here maps an RNA sequence into a
secondary structure minimizing free energy This structure can be computed using
a dynamic programming algorithm �Zuker � Stiegler� ����� Zuker � Sanko��
����� The implementation used in this contribution is described in detail in
�Hofacker et al�� ������ it is available as a public domain package �Hofacker et al��
�����
In this contribution we focus not on the secondary structures themselves but rather
on the free energies� &G� of structure formation The bulk properties of these
� �� �
Baskaran et al� Scaling in RNA Landscapes
-1.5 -1.0 -0.5 0.0 0.5log(omega)
0.0
1.0
2.0
3.0
4.0
5.0
log(
S(o
meg
a))
Figure �� Raw dat of the power spectrum obtained for a GC landscape with chain lengthn � �� at excursion level �G � �� Walk length is N � ����� The solid line is the best�t to S�� � ��a � with a � ���� The dotted line is ��f�noise�
minimum free energy landscapes haven been studied extensively in the past �Bon�
hoe�er et al�� ����� Fontana � Schuster� ����� Fontana et al�� ����� Fontana et al��
����� Fontana et al�� ����� Fontana et al�� ����� Schuster et al�� ����� Schuster �
Stadler� ����� They are typical representants of rugged landscapes
Figure shows a sample power spectrum obtained along a random walk as de�
scribed in section The data are rather noisy In order to smooth them we break
the walk into pieces of �� steps� calculate the power spectrum for each of them�
and then we average the power spectra ��� steps is about twice the diameter of
the sequence space in this case� thus signi�cantly longer walks are not meaningful
because the range of local self�a�nity is necessarily restricted to a small multiple
of the the geometrical relaxation time � of the random walk fxtg�
hd�s�i ! hd���i �� � e�s�� � �����
For a free random walk on a sequence spaces we �nd
� !�� �
�n$O��� � ����
� �� �
Baskaran et al� Scaling in RNA Landscapes
Table ��Power spectrum index a for time series obtained from RNA landscapes
The values of a as obtained directly from the power spectrum are compared to the
values calculated from jump exponentH and the scaling exponentD� aH ! �$H�
and aD ! �� D
n �� �� ��
&G� a aH aD a aH aD a aH aDAUGC
� ��� ���� ��� ���� ���� ���� ���� ��� ����� ���� ��� ���� ���� ���� ��� ���� ��� ������ ���� ���� ��� ��� ��� �� ���� ���� ����� ���� ���� ���� ���� ���� ���� ��� ��� ����� � ���� ���� ���� ���� ���� ���� ���� ����� � ���� ���� � ���� ���� ���� ���� ����
GC
� ���� ���� ��� ��� ��� ���� ��� ��� ���� ���� ���� ��� ��� ��� ���� ��� ��� ����� ���� ���� ��� ��� ��� ���� ��� ��� ����� ���� ���� ��� ��� ��� ���� ��� ��� ���� ���� ���� ��� ��� ��� ���� ��� ��� ���� ���� ���� ���� ��� ��� ���� ��� ��� ���
� �G in kcal�mol�� indicates insu�cient data�Systematic errors are estimated to be of the order of ��� compare Figure �
This expression will be derived in the Appendix
The data are consistent with a ��a spectrum with a not much larger than �
Numerical values are shown in Table � It turns out� however� that the data are
also consistent with the power spectrum of an AR��� process� see Figure �
The additivity of the energy contributions implies a certain degree of neutrality in
the landscape� for details see �Fontana et al�� ����� Several structures which con�
sist of identical sets of substructures map onto the same selective values� although
their phenotypic appearances are di�erent In fact� there are very large neutral
networks on the level of secondary structures themselves �Schuster et al�� �����
Reidys et al�� ����� Gr�uner� ����� This implies that even at fairly high excursion
� � �
Baskaran et al� Scaling in RNA Landscapes
-1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4log(omega)
2.0
3.0
4.0
5.0
log(
S(o
meg
a))
Figure �� The dots are spectral data for walks on a GC landscape with n���� excursion level�� averaged over � walks� The solid curve is the best �t to the AR� power spectrum�equ������ with an estimate for the parameter � � ����� corresponding to a correlationlength � � ���� The correlation length estimated directly from the autocorrlationfunction is about ��� as shown in ref� Fontana et al�� ����� The corresponding powerspectrum is shown as dotted line� The solid straight line is a least square �t with apower law ��a we �nd a � �����
levels �a couple of standard deviation above the mean� the excursion sets are still
large
We �nd that the scaling properties do not depend strongly on the excursion level
There is� however� a systematic trend towards smaller values of � for higher excur�
sion levels Our data indicate that mountainous regions of the landscape are not
drastically di�erent from the average Our data are biassed by the geometry of
the cycles CE� however� and hence a detailed quantitative analysis is not possible
at present
� � �
Baskaran et al� Scaling in RNA Landscapes
��Conclusions
The structure of the mountainous parts of RNA free landscapes was studied by
random walks con�ned to excursion sets at given energy levels
Spectral data and local scaling analysis of the series generated by simple random
walks show self�a�nity consistent with the low�frequency behavior of an AR���
time series We �nd that in general an AR��� processe appears to be approx�
imately self�a�ne on length scales smaller than a few correlation lengths The
data obtained from RNA free energy landscapes indicate that a fractal�like struc�
ture is present at length scales up to the diameter of the sequence space This
is a consequence of the fact that the correlation length of the RNA free energy
landscapes is comparable to the sequence length
Our computer experiments exhibit no signi�cant dependence of the statistical
properties of the excursion set con�ned walk on the energy level Hence� at least
qualitatively� the statistical properties of the mountains do not di�er from the
low�lands This is true at least as long as the excursion set does not break up in
very small cycles
The present study suggests that a detailed investigation of the percolation of ex�
cursion sets� of the geometry of excursion sets� and of the geometrical relaxation of
random walks con�ned to cycles will be necessary before a complete understanding
of the structure of the mountain ranges of �tness landscape is possible
Acknowledgments
The work was funded by �OAD projno Z� ����EH��� EH�Project ������ SB
gratefully thanks Prof Murali Sheshadri and Prof A Nadarajan for their interest
and support PFS thanks the Inst Ciencias Nucleares and the Inst de Fisica of
the Universidad Nacional Autonoma de Mexico for their hospitality in September
����� when this paper was �nished
� �
Baskaran et al� Scaling in RNA Landscapes
References
Adler� D ������ The Geometry of Random Fields New York John Wiley �
Sons
Attkinson� W� Fey� L� � Newman� J ������ Spectrum analysis of extremely
low frequency variation of quartz ocillators Proc�IEEE ��� ���
Azencott� R ����� Simulated Annealing parallelization techniques New
York John Wiley � Sons
Bonhoe�er� S� McCaskill� J� Stadler� P� � Schuster� P ������ Temperature
dependent RNA landscapes� a study based on partition functions European
Biophysics Journal ��
Bonhoe�er� S � Stadler� P F ������ Errortreshold on complex �tness land�
scapes J�Theor�Biol� ��� ������
Brouwer� A� Cohen� A� � Neumaier� A ������ Distance�regular Graphs Berlin�
New York Springer Verlag
Buldyrev� S� Goldberger� A� � Stanley� H ������ Long�range correlation prop�
erties of coding and noncoding dna sequences Genbank analysis Phys�Rev�E
��� ���������
Burlaga� L � Klein� L ������ Fractal structure of the interplanetary magnetic
�eld J�Geophys�Res� ��� ����)))
Eigen� M� McCaskill� J� � Schuster� P ������ The molecular Quasispecies
Adv� Chem� Phys� ��� ��� � ��
Feller� W ����� An Introduction to Probability Theory and its Applications
New York Wiley
Fontana� W� Griesmacher� T� Schnabl� W� Stadler� P� � Schuster� P ������
Statistics of landscapes based on free energies� replication and degredation rate
constants of RNA secondary structures Monatshefte der Chemie ���� �������
� � �
Baskaran et al� Scaling in RNA Landscapes
Fontana� W� Konings� D A M� Stadler� P F� � Schuster� P ������ Statistics
of rna secondary structures Biochemistry ��� ���������
Fontana�W� Schnabl�W� � Schuster� P ������ Physical aspects of evolutionary
optimization and adaption Physical Review A �� ���� ��������
Fontana� W � Schuster� P ������ A computer model of evolutionary optimiza�
tion Biophysical Chemistry �� ������
Fontana� W� Stadler� P F� Bornberg�Bauer� E G� Griesmacher� T� Hofacker�
I L� Tacker� M� Tarazona� P� Weinberger� E D� � Schuster� P ������ RNA
folding and combinatory landscapes Phys� Rev� E �� ���� ��� � ���
Freidlin� M �Wentzell� A ������ Random Perturbations of Dynamical Systems
New York Springer�Verlag
Freier� S M� Kierzek� R� Jaeger� J A� Sugimoto� N� Caruthers� M H� Neilson�
T� � Turner� D H ������ Improved free�energy parameters for predictions of
RNA duplex stability Proc� Natl� Acad� Sci� USA ��� ���������
Grover� L ����� Local search and the local structure of NP�complete problems
Oper�Res�Lett� ��� �����
Gr�uner� W ������ Evolutionary Optimization on RNA Folding Landscapes PhD
thesis Inst of Theoretical Chemistry� Uni Vienna� Austria
Hamming� R W ������ Error detecting and error correcting codes Bell
Syst�Tech�J� ��� �������
Happel� R � Stadler� P F ������ Canonical approximation of �tness landscapes
Santa Fe Institute Preprint ���������
Higuchi� T ������ Approach to an irregular time series on the basis of fractal
theory Physica D ��� ����
Hofacker� I L� Fontana� W� Stadler� P F� Bonhoe�er� L S� Tacker� M�
� Schuster� P ������ Vienna RNA Package pub�RNA�ViennaRNA����� �
ftp�itc�univie�ac�at �Public Domain Software�
� � �
Baskaran et al� Scaling in RNA Landscapes
Hofacker� I L� Fontana� W� Stadler� P F� Bonhoe�er� S� Tacker� M� �
Schuster� P ������ Fast folding and comparison of RNA secondary structures
Monatsh� Chemie ��� ��� �������
Hordijk� W ������ A measure of landscapes Santa Fe Institute Preprint ������
���
Huynen� M A� Stadler� P F� � Fontana� W ������ Evolution of RNA and
the Neutral Theory Proc�Natl�Acad�Sci� in press� Santa Fe Institute Preprint
���������
Kau�man� S ������ The Origin of Order New York� Oxford Oxford University
Press
Li� W � Kaneko� K ����� Long�range correlation and partial ��f� spectrum
in a noncoding dna sequence Europhys�Lett� ��� �������
Mandelbrot� B B ����� The Fractal Geometry of Nature New York Freeman
Mandelbrot� B B � vanNess� J W ������ Fractional brownian motion� frac�
tional noise� and applications SIAM Rev� ��� �����
Mohar� B ������ The laplacian spectrum of graphs In Graph Theory Combi�
natorics and Applications� �Alavi� Y� Chartrand� G� Ollermann� O� � Schwenk�
A� eds� pp �������� New York John Wiley � Sons
Montroll� E � Shlesinger� M ������ Maximum entropy formalism� fractals�
scaling phenomena� and ��f noise A tale of tails J�Stat�Phys� ��� �����
Musha� T � Higuchi� H ������ The ��f uctuation of a tra�c current on an
expressway Jap�J�Appl�Phys� ��� �������
Nolan� P L� Gruber� D E� Matteson� J L� Peterson� L E� Rothschild� R E�
Doty� J P� Levine� A M� Lewin� W H G� � Primini� F A ������ Rapid
variability of ������ kev X�rays from Cygnus X�� Astrophys�J� ��� �������
Osborne� A � Provenzale� A ������ Finite correlation dimension for stochastic
systems with power law spectra Physica D ��� �������
� � �
Baskaran et al� Scaling in RNA Landscapes
Palmer� R ������ Optimization on rugged landscapes In Molecular Evolution
on Rugged Landscapes� Proteins RNA and the Immune System� �Perelson� A S
� Kau�man� S A� eds� pp ��� Addison Wesley Redwood City� CA
Papoulis� A ������ Probability Random Variables and Stochastic Processes
New York McGraw Hill
Reidys� C� Schuster� P� � Stadler� P F ������ Generic properties of combi�
natory maps Neutral networks of RNA secondary structures Santa Fe Institute
Preprint ���������
Schuster� P� Fontana� W� Stadler� P F� � Hofacker� I L ������ From
sequences to shapes and back A case study in RNA secondary structures
Proc�Roy�Soc�Lond�B ���� �����
Schuster� P � Stadler� P F ������ Landscapes Complex optimization problems
and biopolymer structures Computers Chem� ��� ������
Sorkin� G B ������ Combinatorial optimization� simulated annealing� and frac�
tals Technical Report RC����� �No����� IBM Research Report
Spitzer� F ������ Markov random �elds and gibbs ensembles Amer� Math�
Monthly ��� ������
Stadler� P F ����� Correlation in landscapes of combinatorial optimization
problems Europhys� Lett� ��� ������
Stadler� P F �����a� Random walks and orthogonal functions associated with
highly symmetric graphs Disc� Math� in press� Santa Fe Institute Preprint
��������
Stadler� P F �����b� Towards a theory of landscapes In Complex Systems
and Binary Networks� �L*opez Pe+na� R� ed� Springer�Verlag New York in press�
Santa Fe Institute Preprint ��������
Stadler� P F �����c� Landscapes and their correlation functions Santa Fe
Institute Preprint ���������
� � �
Baskaran et al� Scaling in RNA Landscapes
Stadler� P F � Happel� R ����� Correlation structure of the landscape of the
graph�bipartitioning�problem J� Phys� A�� Math� Gen� ��� ���������
Stadler� P F � Happel� R ������ Random �eld models for �tness landscapes
Santa Fe Institute Preprint ���������
Stadler� P F � Schnabl� W ����� The landscape of the traveling salesman
problem Phys� Letters A ��� �������
Tacker� M� Fontana� W� Stadler� P� � Schuster� P ������ Statistics of RNA
melting kinetics Eur� J� Biophys� ��� ����
Taft� B� Hickey� B� Wunsch� C� � Baker� D ������ Equatorial undercurrent
and deeper ows in the central paci�c Deep Sea Res� ��� �������
Verveen� A A � Derkson� H E ������ Fluctuation phenomena in nerve mem�
branes Proc�IEEE �� �������
Voss� R F ����� Evolution of long�range fractal correlations and ��f noise in
DNA base sequences Phys�Rev�Lett� �� ���������
Voss� R F � Clarke� J ������ ��f noise in music and speech Nature ����
�������
Weinberger� E D ������ Correlated and uncorrelated �tness landscapes and
how to tell the di�erence Biol�Cybern� �� ������
Weinberger� E D �����a� Local properties of Kau�man�s N�k model A tunably
rugged energy landscape Phys� Rev� A �� ���� ���������
Weinberger� E D �����b� Fourier and Taylor series on �tness landscapes Bio�
logical Cybernetics �� ������
Weinberger� E D � Stadler� P F ������ Why some �tness landscapes are
fractal J� Theor� Biol� ��� �����
Weissman� M ������ ��f noise and other slow� non�exponential kinetics in con�
densed matter Rev�Mod�Phys� �� �������
Yaglom� A ������ Correlation Theory of Stationary and Related Random Func�
tions� volume �� New York Springer�Verlag
� � �
Baskaran et al� Scaling in RNA Landscapes
Zuker� M � Sanko�� D ������ RNA secondary structures and their prediction
Bull�Math�Biol� � ���� ������
Zuker� M � Stiegler� P ������ Optimal computer folding of large RNA sequences
using thermodynamic and auxilliary information Nucl�Acid Res� �� �������
� � �
Baskaran et al� Scaling in RNA Landscapes
Appendix Relaxation of Random Walks in Sequence Spaces
For a sequence space a detailed analysis of the geometric relaxation of a simple
random walk is possible The probabilities �sd as de�ned in sect can be obtained
recursively from
��� ! �
�sd ! � for s � d
�sd ! w��d� ���s���d�� $w��d��s���d $ w��d $ ���s���d��
�A���
where coe�cients w��d�� w��d�� and w��d� are the probabilities for making a step
forwards� backwards or sidewards given one is in distance d from the origin of the
walk For sequence spaces we have �Fontana et al�� �����
w��d� !n� d
nw��d� !
d
n
� �
� � �w��d� !
d
n
�
� � ��A��
De�ne the moments of the distribution �sd by
&m�s� ! hd�s�mi !Xd
�sddm �A���
&��s� is then the average distance after s steps Inserting the recursion �A�� into
the de�nition of &m�s� yields after considerable algebra the following recursion
for the m�th moment
&m�s� ! � $m��X���
�m
m� �
����
m� �
�$ �
�
�� �
�
n
�&m���s � ��
$
�n
m����X���
�m
m� �
�&m������s� ��
�A���
This recursion is of the form �&�s� ! �$A � �&�s� ��� where A is lower triangular
Hence the eigenvalues �m of A are given by the diagonal elements of A
�m ! ��m�
n
�
�� ��A���
The m�th moment is therefore of the form
&m�s� ! &m��� ,�� ak�sk- � �A���
The slowest mode corresponds to the eigenvalue �� The corresponding relaxation
time is
�� ! ��
ln��!
�� �
�n$O��� �A���
for large n Explicit expressions for the long time limits &m��� of the moments
are obtained as non�zero �xed points of the recursions �A��
� � �
Baskaran et al� Scaling in RNA Landscapes
Table of Contents
� Introduction
Landscapes �
� Rugged Landscapes �
Correlation Functions �
� Elementary Landscapes �
� Power Spectra �
� Excursion Sets �
� Random Walks on Excursion Sets ��
� Self�A�ne Time Series and Fractal Landscapes �
�� Self�A�ne Time Series �
� Fractal Landscapes ��
�� �AR����Landscapes� are Locally Fractal ��
� RNA Free Energy Landscapes ��
� Conclusions
Acknowledgments
References �
Appendix Relaxation of Random Walks in Sequence Spaces �
� i �