Evolving L-systems to Capture Protein Structure Native Conformations

22
Evolving L- systems to Capture Protein Structure Native Conformations Gabi Escuela 1 , Gabriela Ochoa 2 and Natalio Krasnogor 3 1,2 Department of Computer Science, Simon Bolivar University, Caracas, Venezuela 1 [email protected], 2 [email protected] 3 School of Computer Science and I.T., University of Nottingham [email protected]

description

Evolving L-systems to Capture Protein Structure Native Conformations. Gabi Escuela 1 , Gabriela Ochoa 2 and Natalio Krasnogor 3 1,2 Department of Computer Science, Simon Bolivar University, Caracas, Venezuela 1 [email protected], 2 [email protected] - PowerPoint PPT Presentation

Transcript of Evolving L-systems to Capture Protein Structure Native Conformations

Page 1: Evolving L-systems to Capture Protein Structure Native Conformations

Evolving L-systems to Capture Protein Structure

Native Conformations

Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3

1,2 Department of Computer Science, Simon Bolivar University, Caracas, [email protected], [email protected]

3 School of Computer Science and I.T., University of Nottingham

[email protected]

Page 2: Evolving L-systems to Capture Protein Structure Native Conformations

Proteins

• Building blocks and functional units of all biological systems

• Linear chain of ~30-400 units from 20 different amino acids

• Under certain physical conditions, folds into a unique functional structure, called its native state or tertiary structure.

• Show repeated substructures like alpha helices and beta sheets.

Page 3: Evolving L-systems to Capture Protein Structure Native Conformations

1A8M 3-D 1A8M 3-D StructureStructure

1AMB 3-D 1AMB 3-D StructureStructure

Page 4: Evolving L-systems to Capture Protein Structure Native Conformations

Beta Turn

Alpha Helix

Beta Sheet

Page 5: Evolving L-systems to Capture Protein Structure Native Conformations

Protein Folding and PSP

• The Protein Structure Prediction Problem (PSP) is among the most outstanding open problem in Biochemistry

• Finding the correspondence between amino acids and positions in a protein is NP-hard even for very simple models.

• It is not clear that the encoding currently used in evolutionary algorithms for HP models are suitable for crossover and building block transfer between individuals.

Page 6: Evolving L-systems to Capture Protein Structure Native Conformations

HP Model (Dill, 1985)

• Amino acids types: hydrophobic (H) and polar or hydrophilic (P)

• Hydrophobic units that are adjacent in the lattice but non-adjacent in the sequence add a constant negative factor. The native state is thought to be the global energy minimum.

• Lattices: 2D Square, 3D Cubic,...• Coordinates: Relatives, absolutes

HPPHHPPHPPHP->LRRLRLFLLL

Page 7: Evolving L-systems to Capture Protein Structure Native Conformations

L-Systems (1) (Lindenmayer, 1968)

• Axiomatic foundation for biological development

• Define complex objects by successively replacing parts of a simple object using a set of rewriting rules or productions.

• The simplest class of L-systems is the D0L-systems (deterministic and context free).

Page 8: Evolving L-systems to Capture Protein Structure Native Conformations

L-Systems (2)

b|a_|

a b_| |

a b a__| | |__

a b a a b_| / __| |__ \

a b a a b a b a

Fig. 2. Example of a derivation in a D0L-system.

Axiom: bRules: a → ab

b → a

Page 9: Evolving L-systems to Capture Protein Structure Native Conformations

Evolving L-systems

Page 10: Evolving L-systems to Capture Protein Structure Native Conformations

Method

• A generational EA with linear ranking selection and elitism was used to evolve sets of rewriting rules or L-systems that capture a target structure.

• As the variation operators, a recombination and three mutation operators were implemented.

• Stopping criteria: (i) if an individual arises the maximum fitness; and (ii) a predefined maximum number of generations is reached.

Page 11: Evolving L-systems to Capture Protein Structure Native Conformations

Genotype Encoding

• D0L-system:

Alphabet: =t nt

where t={F,L,R} terminal characters and

nt={0,1,2,...,m-1} non-terminal

characters representing rewriting rules

Axiom: α = S S *

Rewriting rules: W0,1,2,...,m-1: w, where w *

Page 12: Evolving L-systems to Capture Protein Structure Native Conformations

Initial Population

• L-systems parameters: max_r, max_la, max_lr

• For each individual: Number of rules is randomly selected in the

range 1 to max_r The axiom is a randomly generated string of

symbols of maximum length max_la. Each rule is generated in a similar way as the

axiom, with a maximum length of max_lr.

Page 13: Evolving L-systems to Capture Protein Structure Native Conformations

Genetic Operators (1)• Selection

Linear Rank Dissasortative Mating

• Recombination

Fig. 3. Genotype, phenotype and fitness from parents and offspring in a recombination where o inherits the rules 0,1 from p1 and 2,3 from p2

RFLRLLRLRRFRLLRRFL RFRRLLLRLLRLRRFRL RFRRLLRLRRFRLLRRFR

p1

axiom R2rules 0:R03F; 1:R01L; 2:F310; 3:LRL3

fitness 16 fitness 7

+ p2

axiom R2rules 0:R023; 1:01L3; 2:F310; 3:R3L1

= o

axiom R2rules 0:R03F; 1:R01L; 2:F310; 3:R3L1

fitness 18

Page 14: Evolving L-systems to Capture Protein Structure Native Conformations

Genetic Operators (2)• Mutation:

Per symbol Axioms and Rules Types

(i) addition (30%)

(ii) deletion (10%)

(iii) modification (60%)

Page 15: Evolving L-systems to Capture Protein Structure Native Conformations

Fitness• Derivation process: from genotype (axiom

and rules) to phenotype (folded structure)

• Post-processing: non-terminal symbols prunning and string cutting

• Fitness calculation: number of matches between the target string and the solution

• Minimum fitness is 0 and the maximum is the length of the desired folding.

Page 16: Evolving L-systems to Capture Protein Structure Native Conformations

Experiments

Parameter Value

Max. Number of Generations 2000

Population Size 50

Mating Strategy Dissasortative (scan size 5)

Mutation rate (per symbol) Axiom 0.05

Mutation rate (per symbol) Rules 0.05

Recombination rate 1.00

Max. Number of Rules 4-5

Max. Length for Axiom 3

Max. Length for Rules 5

Page 17: Evolving L-systems to Capture Protein Structure Native Conformations

Results (1)Instance Lengt

hSuccesses One Solution

HPHPPHHPHPPHPHHPPHPHRFRRLLRLRRFRLLRRFR

18 5/50 (4 rules) See Table 3

HHHPPHPHPHPPHPHPHPPH RRFRFRLFRRFLRLRFRR

18 3/50 (4 rules) axiom = R24 rules = {0:RLR;1:3F32L; 2:1FR33;3:R102}

HHPPHPPHPPHPPHPPHPPHPPHH RLLFLFFRRFLLFRRLRFFRRF

22 0/50 (4 rules)1/50 (5 rules)

axiom = 1R5 rules =

{ 0:4LF3;1:RL243; 2:00F3;3:RRFL; 4:0R14F}

PPHPPHHPPPPHHPPPPHHPPPPHH FFRRFFFLLFFFFRRFFFFLLFF

23 1/50 (5 rules) axiom= 324 rules =

{0:20R2;1:132F; 2:FF012;3:0FLL}

Page 18: Evolving L-systems to Capture Protein Structure Native Conformations

Results (2)

Solution Axiom Rewriting rules

1 31 0:3LL2 1:R0RL 2:RRF 3:RFR1

2 31 0:3L23 1:R0L1 2:1LR 3:RFR1

3 31 0:3LLR 1:R02L 2:23 3:RFR1

4 021 0:1R2LR 1:R1F1R 2:1LLR1  

5 11 0:2210L 1:RF30R 2:LR2 3:RRL

6 (*) 01F 0:RFR1 1:2L2 2:R0L  

7 RF3 0:3RFR 1:312L (**) 2:RRLLR 3:20L0R

8 RF3 0:R3L0 1:0L2R1 2:231RF 3:0R20L

9 RF0 0:R1LL0 1:0R2FR 2:LRR  

10 RF2 0:12RR0 1:RLL3R 2:R1F0 3:RL12R

11 12 0:RL10 1:RF2R 2:30L3L 3:12R1

12 30 0:RFR10 1:LL3R 2:3F13 (**) 3:0R1LR

13 30 0:R32 1:01L2 2:030R 3:RFR1L

(*: best solution, since it has fewer and shorter rules)(**: rule not used)

Table 3. Some results obtained for the folding RFRRLLRLRRFRLLRRFR

Page 19: Evolving L-systems to Capture Protein Structure Native Conformations

Results (3)31

R0RLRFR1

RFR R0RL R 3LL2 RL

RFRR 3LL2 RL R RFR1 LL RRF RL

RFRRLLRLRRFRLLRRFR

axiom

1st derivation

2nd derivation

3th derivation

post-processing

Fig. 4. Derivation process for Solution 1

Page 20: Evolving L-systems to Capture Protein Structure Native Conformations

Results (4)

Fig. 5. Evolutionary progression towards the target structure (1st instance in table 2).

Page 21: Evolving L-systems to Capture Protein Structure Native Conformations

Discussion

• An evolutionary algorithm discovered L-systems that capture a target folding under the HP model in 2D lattices.

• Novel generative encoding for evolutionary approaches to both the protein structure prediction problem and inverse protein folding problem.

• Further work will target longer chains and 3D lattices.

Page 22: Evolving L-systems to Capture Protein Structure Native Conformations

Evolving L-systems to Capture Protein Structure

Native Conformations

Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3

1,2 Department of Computer Science, Simon Bolivar University, Caracas, [email protected], [email protected]

3 School of Computer Science and I.T., University of Nottingham

[email protected]