THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

25
THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU Putting Engineering back into Protein Engineering Jun Liao, UC Santa Cruz Manfred K. Warmuth, UC Santa Cruz Jeremy Minshull, DNA 2.0

description

Putting Engineering back into Protein Engineering Jun Liao, UC Santa Cruz Manfred K. Warmuth, UC Santa Cruz Jeremy Minshull, DNA 2.0. THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU. Protein Engineering Current Paradigms. Mechanism-based (Rational) detailed structural analysis Empiricism-based - PowerPoint PPT Presentation

Transcript of THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Page 1: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Putting Engineering back into Protein Engineering

Jun Liao, UC Santa CruzManfred K. Warmuth, UC Santa Cruz

Jeremy Minshull, DNA 2.0

Page 2: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Protein Engineering Current Paradigms

1. Mechanism-based– (Rational) detailed structural analysis

2. Empiricism-based– (Non-rational ) libraries based

Page 3: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Mechanism-Based Protein Engineering

Based on thermodynamic principles• Calculations are approximate

– calculation cost– structures are really not rigid (MDS)

• Calculations are primarily able to predict binding – catalysis is a special case of binding to a transition state

• Changes in amino acids are designed based on these principles

– very small numbers (<5) of new proteins are synthesized and tested

Page 4: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Empiricism-Based Protein Engineering

• Uses similar principles to evolution– make many variants – screen to find those with the best properties

• No mechanistic understanding needed• Produces large numbers of variants (>1,000) which

are very difficult / expensive to screen for practically relevant properties Proteins related to wild type

Simulated cross over

New variants

Page 5: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

The Key Challenge in Protein Engineering

=Reality

What we need is not what we assay for….

Molecular mechanistic models(does not model activity)

High throughput screens(surrogate assays)

Page 6: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Wish List

•No need to develop surrogate assay

•Variants are tested directly under application

conditions

•Rapid process.

Requirements•Identification of appropriate amino acid substitutions•Design and synthesis of information-rich variants•Interpretation of quantitative functional data using machine learning techniques.

What we want in Protein Engineering

Page 7: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Protein Engineering using Machine Learning

Initial designa) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions

Reality checkSynthesize and test the variant set for function(s) of interest.

Machine learningModel the effect of sequence changes on function(s) of interest.

New designPropose a new variant set (<50) based on the model.

Iterate

End

Select the best variant(s).

Starting pointSelect a protein with some correct initial properties

Page 8: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Engineering of Proteinase K

• Long-term goal of engineering proteinase K to degrade polylactic acid

• Member of the serine protease family– Large amounts of phylogenetic and

sequence information available

• Several different measurable activities available for optimization

Page 9: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Initial designa) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions

Reality checkSynthesize and test the variant set for function(s) of interest.

Machine learningModel the effect of sequence changes on function(s) of interest.

New designPropose a new variant set (<50) based on the model.

Iterate

End

Select the best variant(s).

Starting pointSelect a protein with some correct initial properties

Protein Engineering using Machine Learning

Page 10: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Expert System for Substitution Selection

Expert system:- Calculation of 9 independent scores that measure changes that have succeeded in other places in Nature- Weight and combine scores to pick best changes

Proteins related to proteinaseK

19 switches = search space of 219 = 500,000

? ? ? ? ?

Page 11: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Finding Optima in Complex Landscapes:Design of Experiment

Changing 1 amino acid at a time

Making multiple changes simultaneously

…Now try to envision doing this not with 2, but 200 amino acids / dimensions

x x xx x x xx

xx

Aa 2

Aa 1

x

xx

xx

x x

Aa 2

Aa 1

Page 12: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Design of Initial Proteinase K Variants95 97 107123132138145151167180194199208236237265267273293299310332337355

var C S D A V A F A I I S S H V N S I T A C K R N Swt N P S S I E M Y V L Y A K A R P V S G L I K S P1 S N T A R S2 A A A K R S3 C F I S N T4 S A I S V I5 D V H S C N6 S D I V N K7 A A S H S S8 C S I A C R9 C V A F I H10 V N T A R S11 S A S C K N12 C D A I S N13 A A I S H C14 S F N T A K15 V V S I R S16 S A S V C S17 C D I I A K18 F N S I R N19 A V A S H T20 A H V I A C21 D V A F N S22 S I S S S K23 C A I N T R24 C25 S C26 S S C27 V28 D A F29 F I I K S30 F A S K R

Page 13: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Initial designa) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions

Back to Proteinase K

Reality checkSynthesize and test the variant set for function(s) of interest.

Machine learningModel the effect of sequence changes on function(s) of interest.

New designPropose a new variant set (<50) based on the model.

Iterate

End

Select the best variant(s).

Starting pointSelect a protein with some correct initial properties

Protein Engineering using Machine Learning

Page 14: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

First proteinase K dataset

0

0.5

1

1.5

2

2.5

3

Activity (factor increase relative to wt)

Page 15: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Initial designa) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions

Reality checkSynthesize and test the variant set for function(s) of interest.

Machine learningModel the effect of sequence changes on function(s) of interest.

New designPropose a new variant set (<50) based on the model.

Iterate

End

Select the best variant(s).

Starting pointSelect a protein with some correct initial properties

Protein Engineering using Machine Learning

Page 16: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Sequence-Activity Modeling: How Does it Work?

1. Represent the sequence as a matrixSeq1 AGRWGIGAYHKLIMASeq2 AGRTGVGVYHKLIMASeq3 AGRWGIGVYHRLIMASeq4 AGRTGVGAYHRLIMAbecomes T W V I V A R Kx x1 x2 x3 x4 x5 x6 x7 x8

Seq1 0 1 0 1 0 1 0 1 Seq2 1 0 1 0 1 0 0 1Seq3 0 1 0 1 1 0 1 0Seq4 1 0 1 0 0 1 1 0

2. Measure the activity or activities of interest under the final application conditions

3. y = c1x1 + c2x2 + c3x3 + c4x4 +… cixi

Page 17: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

-0.5

0

0.5

1

1.5

2

2.5

3

-0.5 0 0.5 1 1.5 2

Predicted activity

Measured activity

Assessing the Proteinase K Sequence-Activity Relationship

wt

y = c1x1 + c2x2 + c3x3 + c4x4 +… cixi

Page 18: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Learning Methods

• Variety of regression methods– Ridge Regression & Lasso– SVM Regression & LPSVM Regression– Matching Loss Regression & One-norm Matching

Loss Regression– Partial Least Square Regression– LPBoost Regression

• Use bagging to improve the prediction stability

Page 19: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Variants Design I

• Main issue: Exploitation vs. Exploration

• Optimum design (Exploitation)– Take the combination of substitutions

predicted to have maximal activity– Also consider

• Substitution frequency in the dataset• Variation of weight estimation.

– Used in 2nd & 3rd iterations

Page 20: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Variants Design II

• Diversity design (Exploration)– Calculate the combination of

substitutions predicted to have maximal activity that is also

• No more than 5 changes from a sequence that has already been tested

• No closer than 3 changes from a sequence that has already been tested or selected for synthesis

– Used in 2nd iteration

Page 21: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

80 90

Act

ivity

rel

ativ

e to

wild

typ

eThree Iterations of Activity Engineering

Variants in order synthesized

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 120

1st set: 34 variants

2nd set: 24 variants

3rd set: 38 variants

wild-type

100

ONLY 58 variants were tested to allow design of the fourth set, which contained •3 variants 20-30 x improved over wild-type•50% of variants more active than the best of previous sets•70% of variants more active than wild types•3-11 changes found in variants better than WT

Page 22: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Improving ActivityActivity Improvement

Page 23: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

0

100

200

300

400

500

600

700

v501 v502 v503 v505 v513 v515 v518 v526 v544 v545 v551 v556 v557 v558 v560 NS9

Activity (pm

ol/s/ml)

0

2

4

6

8

10

12

14

Activity (pmol/s/ml)

Half life at 68°C (s)

Hal

f lif

e at

68°

C (

s)

107 123 132 145 151 167 180 194 199 208 237 265 267 273 293 310 332 337 355WT S S I M Y V L Y A K R P V S G I K S P501 A A

502 A H A503 A H A R

505 A H T A R N513 A I H T A R N

515 V A A518 V A I I T A

526 A V A I T A544 V A H T A N

545 V A H T A R N551 A T A

556 V A I T A557 A H A R N

558 V A H T A R560 A V A I H T A N

Variants are Improved in Multiple Properties

Page 24: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Conclusions• Machine learning

– Making a very small number of variants (58) allows a productive search of a total space with 500,000 possible combinations

• Synthetic Biology– Recent advances in gene synthesis methods

were essential for this type of exploration

Page 25: THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

The Future• Proteins are the building blocks of life with a wide

array of applications (therapeutics, diagnostics, industrial catalysts)

• Finding a reliable mechanism for optimizing proteins for human applications would be an amazing feat

• We steal ideas about how proteins evolve from nature, but optimize proteins outside their in vivo constraints (the proteins don’t have to be compatible with life)