Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems...

62
Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas Royal Institute of Technology School of Biotechnology Stockholm 2011

Transcript of Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems...

Page 1: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

i

Intracellular systems for characterization and engineering of proteases and their substrates

George Kostallas

Royal Institute of Technology

School of Biotechnology

Stockholm 2011

Page 2: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

ii

© George Kostallas Stockholm 2011 Royal Institute of Technology School of Biotechnology AlbaNova University Center SE-106 91 Stockholm Sweden Printed by AJ E-print AB Oxtorgsgatan 9 SE-111 57 Stockholm Sweden ISBN 978-91-7415-992-9 TRITA BIO-Report 2011:11 ISSN 1654-2312 Cover illustration: Mesh structure of TEV protease created in MacPyMol based on PDB file 1LVM

Page 3: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

iii

Abstract

Over the years, the view on proteases as relatively non-specific protein degradation enzymes, mainly involved in food digestion and intracellular protein turnover, has shifted and they are now recognized as key regulators of many biological processes that determine the fate of a cell. Besides their biological role, proteases have emerged as important tools in various biotechnical, industrial and medical applications. At present, there are worldwide efforts made that aim at deciphering the biological role of proteases and understanding their mechanism of action in greater detail. In addition, with the growing demand of novel protease variants adapted to specific applications, protease engineering is attracting a lot of attention. With the vision of contributing to the field of protein science, we have developed a platform for the identification of site-specific proteolysis, consisting of two intracellular genetic assays; one fluorescence-based (Paper I) and one antibiotic resistance-based (Paper IV). More specifically, the assays take advantage of genetically encoded short-lived reporter substrates that upon cleavage by a coexpressed protease confer either increased whole-cell fluorescence or antibiotic resistance to the cells in proportion to the efficiency with which the substrates are processed. Thus, the fluorescence-based assay is highly suitable for high-throughput analysis of substrate processing efficiency by flow cytometry analysis and cell sorting, while the antibiotic resistance assay can be used to monitor and identify proteolysis through (competitive) growth in selective media. By using the highly sequence specific tobacco etch virus protease (TEVp) as a model in our systems, we could show that both allowed for (i) discrimination among closely related substrate peptides (Paper I & IV) and (ii) enrichment and identification of the best performing substrate-protease combination from a

Page 4: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

iv

background of suboptimal variants (Paper I & IV). In addition, the fluorescence-based assay was used successfully to determine the substrate specificity of TEVp by flow cytometric screening of large combinatorial substrate libraries (Paper II), and in a separate study also used as one of several methods for the characterization of different TEVp mutants engineered for improved solubility (Paper III). We believe that our assays present a new and promising path forward for high-throughput substrate profiling of proteases, directed evolution of proteases and identification of protease inhibitors, which all are areas of great biological, biotechnical and medical interest. Keywords: proteases, flow cytometry, GFP, CAT, protein engineering, ssrA, ClpXP, tobacco etch virus, TEV, site-specific, high-throughput, selection, proteolysis, libraries

Page 5: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

v

Σα βγεις στον πηγαιµό για την Ιθάκη,

να εύχεσαι νά’ναι µακρύς ο δρόµος,

γεµάτος περιπέτειες, γεµάτος γνώσεις.

When you set sail for Ithaca, wish for the road to be long, full of adventures, full of knowledge.

From the poem Ithaca by Konstantinos Kavafis

Page 6: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

vi

Page 7: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

vii

List of publications

This thesis is based on the following publications, which will be referred to by their roman numerals (I-IV). I. Kostallas G and Samuelson P (2010) Novel fluorescence-assisted whole-

cell assay for engineering and characterization of proteases and their substrates. Appl Environ Microbiol 76: 7500-7508

II. Kostallas G, Löfdahl P-Å and Samuelson P (2011) Substrate profiling of

tobacco etch virus protease using a novel fluorescence-assisted whole-cell assay. PLoS ONE 6: e16136

III. Kostallas G, Sandersjöö L, Al-Askeri M and Samuelson P (2011)

Construction, expression and characterization of TEV protease mutants engineered for improved solubility. Manuscript

IV. Sandersjöö L*, Kostallas G* and Samuelson P (2011) Towards an

antibiotic resistance-based assay for protease characterization and engineering. Manuscript

* Authors contributed equally All publications are reproduced with permission of the copyright holders

Page 8: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

viii

Page 9: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

ix

Table of contents

A constant quest 1

Proteins 4

Proteases 7

Targeting proteases in therapy 8 Proteases as therapeutics and biomarkers 9 Industrial proteases 10

Protease engineering 11

Rational approaches for protease engineering 12 Library-based approaches for protease engineering 12 Discovering novel protease variants 15

Discovering substrate sequences 20

Synthetic substrate libraries 21 Biological substrate libraries 23 MS-based substrate identification 24

Investigations 27

From an extraordinary RNA to a novel protease assay (Paper I) 28 Substrate profiling of tobacco etch virus protease using a novel protease assay (Paper II) 31 Characterization of TEV protease mutants engineered for improved solubility (Paper III) 33 Evolving a current protease assay (Paper IV) 34

Reflections 38

Abbreviations 40

Acknowledgements 41

References 44

Page 10: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas
Page 11: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

1

A constant quest

For ages, man has strived to find an answer to what is the essential unit of life. Numerous theories have been presented through the past but the impact of most of them have faded a long time ago. For instance, although Empedocles’ (430 BC) belief that every living thing is a combination of four eternal elements (earth, wind, fire, water) was widespread for centuries, it is considerably distant from the current conception of a living organism's fundamental components [1].

It took more than two thousand years since Empedocles until someone could take a closer look at living organisms. It was first in 1665 that Robert Hooke described how his homemade microscope enabled him to distinguish tiny structures resembling monastery cells in a sliver of cork. Less than a decade after Hooke’s discovery, Antony van Leeuwenhoek published the initial observation of single cell organisms in a series of letters to The Royal Society of London for the Improvement of Natural Knowledge. Subsequent advances in microscopy, allowed scientists to affirm the significant role of cells in living organisms and prove that cells are capable of growing, dividing, maturing and reacting to environmental stimuli. Eventually, based on the earlier work of Theodor Schwann and Matthias Jakob Schleiden, Rudolf Virchow established in 1858 that cells are the smallest building blocks of life. In addition, Virchow claimed that every cell originates from a pre-existing cell, hence laying the ground for the modern cell theory[1].

Most of plants and animals are composed of a variety of cell types, each specialized in particular tasks. All the same, every cell is involved in a complex interplay with other cells in order to sustain a functional organism. What controls the processes in a cell and thereby its behavior, was a mystery for long. In 1869, Friedrich Miescher extracted and described new molecules from the nuclei of white blood cells, which he collectively named nuclein. Among these molecules

Page 12: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

2

was what we today know as DNA [1]. Despite the extensive research of Miescher and succeeding colleagues, the function of DNA remained unknown until the early 1940s. The first step towards an answer came when George Beadle and Edward Tatum exposed the bread mold Neurospora crassa to X-rays in a range of experiments with the ambition to acquire more knowledge on gene action. Their observations, suggesting a direct connection between genes and proteins, were published in 1941 and paved a new path forward in understanding how genes regulate different cellular processes [2]. In February 1944, at a time when the genetic information was still believed to be concealed in proteins, a study published in the Journal of Experimental Medicine revolutionized the perspective of DNA. Oswald Avery together with his co-workers Colin Macleod and Maclyn McCarty were the first to publish data proving DNA as the carrier of genetic material [3]. Subsequent work of Alfred Hershey and Martha Chase confirmed the important function of DNA [4].

Biological science entered a new era when James D. Watson and Francis Crick came across an extraordinary image taken by Rosalind Franklin and Raymond Gosling. Photo 51, an X-ray diffraction image of DNA, provided the final clues needed for Watson and Crick to make one of the greatest discoveries in history. Inspired by Photo 51 and earlier correspondence with Erwin Chargaff on DNA base pairing, Watson and Crick could describe the double helical form of DNA. Nature magazine published in 1953 Watson and Crick’s suggested DNA structure along with a series of articles supporting their hypothesis [5-7]. Instantly after the DNA structure became public, researchers rushed to interpret the genetic code. This scientific race did not only result in the deciphering of the code of life in 1966, but also brought on important insights into DNA and RNA function [8-10]. During the same epoch, Crick also emphasized the fundamental link between DNA, RNA and proteins by formulating the central dogma of molecular biology in 1957, which he later revised in 1970 [11,12] (Figure 1 describes the central dogma of molecular biology).

Page 13: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

3

As a result of the immense scientific focus on DNA structure and function, the development of methods for DNA manipulation has attracted a growing interest since the 1950s. But it was not until the early 1970s that successful efforts made earlier on the isolation and characterization of enzymes involved in DNA synthesis and modification, finally lead to major breakthroughs in the field of genetic engineering [13-15]. In 1974, unlike other proposed methods for DNA recombination [16-18], Herbert Boyer and Stanley Cohen demonstrated for the first time that it was possible to cleave DNA molecules using restriction endonucleases and thereafter recombine the desired fragments with DNA ligase. Furthermore, Boyer and Cohen went a step further and used their recombinant DNA technique to introduce genes from different organisms into plasmid pSC101 and successfully used it to express the corresponding recombinant proteins in Escherichia coli (E. coli) [19,20]. Only a few years after the birth of recombinant DNA technology, the first human protein, insulin, was recombinantly produced in E. coli in the laboratories of Genentech, a company founded by Boyer [21].

We have, since the groundbreaking initial attempts to produce proteins by recombinant means in heterologous host systems, moved a long way and can now engineer proteins, cellular pathways as well as whole organisms to meet the needs of particular applications. In fact, with the knowledge gained and recent technical improvements made, we have now even reached a stage where we can see the first steps towards whole de novo designed organisms [22].

Figure 1. The central dogma of molecular biology describing the simple and ingenious

connectionbetweenDNA,RNAandproteins.

Page 14: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

4

Proteins

Protein history goes all the way back to the beginning of 19th century. Proteins’ characteristic to precipitate upon heat or acid treatment facilitated their isolation and analysis in the early days of protein science. Pioneering studies on fibrin, albumin, ovalbumin and hemoglobin performed by Gerardus Johannes Mulder suggested that all these molecules shared the same chemical composition. Mulder’s collaborator, Jöns Jakob Berzelius, named this group of biological substances to proteins in 1838. The term protein chosen by Berzelius, which means substances of primary importance in Greek, stressed the vital mission of these molecules [1,23]. Still, proteins significance was not acknowledged until 1926. It was first then that James Sumner crystallized the enzyme urease and by subsequent chemical analysis determined that it actually was a protein [24].

Today, proteins are accepted as essential members of every process in a living organism. Some are responsible for transportation of substances in the blood or through cell membranes, while others are enzymes, catalyzing chemical reactions for example in food digestion and DNA synthesis. Skin, hair and connective tissues also contain proteins such as keratin and collagen providing structural support, whereas actin and myosin are responsible for muscle contraction. One might wonder how proteins display such extreme function-variety, considering that a set of only 20 different amino acids is used as building blocks in naturally occurring proteins. But proteins follow a simple and ingenious design. Every protein consists of at least one chain of amino acids coupled through peptide bonds in a specific order, predefined by the genetic code. After a brief calculation, it becomes obvious that you can get an enormous number of sequence combinations, >1013 (2010 = 1.024x1013), even for a small random peptide chain of only 10 residues. Keeping in mind that polypeptide chains also vary in length and exhibit different structural conformations, the number of possible protein

Page 15: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

5

variants becomes almost limitless (Figure 2 shows different protein structures).

Figure 2. Different structural conformations of proteins.Upper section from the left: Green

fluorescent protein, hemoglobin. Lower section from the left: IgG2a antibody, chaperone

complexGroEL/GroES.ImagescreatedbasedonPDBfiles1AON,1EMA,1IGTand2W6V.

Page 16: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

6

For the successful determination of a protein’s function it is advantageous to have at hand the protein’s structure, cellular localization, amino acid composition and sequence. Therefore, the development of approaches providing such information has contributed enormously to our deeper knowledge of proteins. Collectively, methods such as Edman degradation, X-ray crystallography, NMR spectroscopy, immunohistochemistry and microscopy, in combination with modern recombinant protein production and purification techniques have revolutionized protein science.

With more than 800 genomes sequenced to date, including the human, a lot of work is currently invested to elucidate the role of the proteins encoded by the corresponding genes. These efforts include small studies on single proteins in laboratories all over the world but also more comprehensive approaches such as the Swedish Human Protein Atlas project (HPA) [25] and the Structural Genomics Consortium (SGC) [26]. While HPA’s main focus is to analyze the expression and localization of human proteins in a large variety of normal human tissues and cancer cell lines, SGC aims for the determination of the three dimensional structure of medically relevant proteins. No matter the magnitude of the individual studies, modern protein science together supplies, in an accelerating fashion, valuable information on protein functionality.

Page 17: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

7

Proteases

A specific set of proteins, commonly referred to as proteases or proteolytic enzymes have the ability to break up proteins through hydrolysis of peptide bonds. Depending on the different catalytic mechanisms employed by proteases, they are divided in six separate classes: serine, glutamic, cysteine, threonine, aspartic and metalloproteases. Besides differences in the mechanism of catalysis, proteases also exhibit great variation in shape and size, ranging from single catalytic units of 15 kDa [27] to large proteolytic complexes like multimeric meprin metalloproteases with molecular weights up to 6 MDa [28]. Proteases are often not solely constituted of proteolytic subunits, but may also include other functional domains deciding a protease’s cellular localization, activation, inhibitor sensitivity and substrate recognition.

For years, proteases were believed to be only involved in non-specific protein degradation related to food digestion and intracellular protein turnover. However, a lot has changed since the first proteases, pepsin and trypsin, were described in the middle of the nineteenth century [1,29]. At present, proteases are not exclusively regarded as protein demolishers; they are also governors of intricate proteolytic pathways, networks and cascades determining the destiny and behavior of countless bioactive molecules. Recent genomic studies have revealed that proteases are in fact present in every living organism and account for nearly 3% of every mammalian genome. For instance, the human and mouse genomes encode 689 and 714 known and putative proteases, respectively [30].

Specific proteolysis is recognized to play an important role in a variety of fundamental biological processes such as DNA transcription and replication, apoptosis, immune response, wound repair, cell migration and proliferation, neuronal outgrowth, haemostasis, tissue remodelling and morphogenesis [31]. Proteolytic processing of viral polypeptides has also proven essential for the

Page 18: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

8

maturation of infectious particles as well as for the virulence of pathogenic organisms like Bacillus anthracis and Vibrio cholerae [32,33]. In agreement with the versatility of proteases and the multitude of processes they are involved in, proteolytic malfunction is responsible for several pathological conditions such as cancer, thrombosis, type II diabetes, haemophilia, rheumatoid arthritis, atherosclerosis, progeria, hypertension, congestive heart failure, Parkinson’s and Alzheimer’s disease. According to the Mammalian Degradome Database, as much as 86 hereditary disorders are caused by mutations in protease-encoding genes [31,34-42].

Targeting proteases in therapy

Due to their biological importance, proteases have been popular therapeutic targets for over 60 years; approximately 5-10% of all current pharmaceutical targets are estimated to be proteases. Since excessive proteolysis is believed to be the major cause in a number of diseases, related drug discovery has mainly been focused on the development of inhibitors against relevant proteases. For over two decades, inhibitors for the human protease angiotensin-converting enzyme (ACE) have been extensively used to treat cardiovascular disorders, and the 13 ACE inhibitor drugs currently available on the market account for sales exceeding 6 billion US dollars annually [43].

Other effective protease inhibitor drugs include Tipranavir, Sitagliptin and Desirudin, which are used in modern therapeutic strategies against AIDS, diabetes and thrombosis, respectively. However, cancer treatment based on inhibitors of matrix metalloproteases has been a great challenge. Although preclinical studies on cancer models have shown promising results, subsequent clinical trials were hampered by severe side affects and/or lack of medical effect. In spite of some less successful drug development projects involving protease inhibitors, there are presently several promising therapeutic candidate molecules in preclinical and clinical trials for inhibition of proteolytic activity involved in diabetes, osteoporosis, cancer, hepatitis C, malaria, multiple sclerosis and chronic obstructive pulmonary disease [43-45].

Page 19: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

9

Proteases as therapeutics and biomarkers

In addition to being targets for inhibitors, proteases themselves are used in numerous medical applications where increased proteolysis is desired. For example, different naturally occurring or engineered protease variants are successfully employed as anti-inflammatory and anti-blood-coagulating agents (APC) or for treating cervical dystonia (Botulinum toxin B) and cardiovascular disorders (TNK-tPA) [46]. Moreover, concurrently with our better understanding of proteases’ involvement in different disorders, the interest in proteases and their inhibitors as prognostic or diagnostic markers is increasing. Indeed, the measurement of kallikrein 3 levels in serum has assisted in the accurate diagnosis of prostate cancer since early 1990s [44]. Other potential biomarkers currently tested for diagnosis of various cancer types are the proteases cathepsin B, matrix metalloproteinase 8, a disintegrin and metalloproteinase with thrombospondin motifs 15. Likewise, the Trypanosoma cruzi protease cruzipain, has been suggested for monitoring the parasitic infection causing Chagas disease [44,47,48].

Table1.Examplesofproteaseinhibitorsintheclinic Disease Proteasetarget Drugname

Tipranavir Fosamprenavir HIV/AIDS HIVprotease Saquinavir

Diabetes DDP4 Sitagliptin Desirudin

Thrombosis Thrombin Lepirudin Captopril

Hypertension ACE Ramipril

Periodontitis MMP1,MMP2 Periostat Pancreatitis Broad‐Spectrum Nafamostat Cancer Proteasome Bortezomib

Page 20: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

10

Industrial proteases

Many industrial processes involve the catalysis of different chemical reactions, and chemical catalysts have been the preferred industrial alternatives. Because most enzymes can be produced by recombinant means in large scale and at the same time be more environmental friendly and effective than their chemical counterparts, enzymes have been gradually incorporated as catalysts in many industrial applications over the last decades. Based on their use, industrial enzymes are usually divided into three different categories: technical, food and feed enzymes.

In 2010, the global market of industrial enzymes was estimated to 3.6 billion dollars with a growing annual rate of 5% [49]. Proteases constitute over 60% of the industrial enzymes available on the market, most of them being alkaline proteases of bacterial origin used in the detergent industry. A variety of other proteases are used for waste treatment or production of food, textiles, pharmaceuticals and cosmetics [50-52].

Table2.Applicationsofindustrialproteases Industry Application Tradename Food Meattenderizing NovoCarneTender Animalfeed Nutritionalvalueimprovement RonozymeProact Alcohol Saccharification SanSuper360L Cosmetics Wrinkleremoval Botox

Savinase Detergents Proteinstainremoval

Esperase Bating Pyrase

Leather Unhairing NUE

Pulpandpaper Biofilmremoval Everlase

Page 21: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

11

Protease engineering

Although proteases have evolved to efficiently catalyze specific reactions in their natural environments, their function in most cases is not directly applicable to a desired medical or industrial application. The design and cost effectiveness of each particular application requires specific qualities and attributes of the proteases. For instance, in order to meet all industrial demands, proteases have to tolerate diverse pH ranges, temperatures, surfactants and solvents, but still be able to process specific substrates. On the other hand, proteases intended as therapeutics must possess specialized pharmacokinetic properties and flawless substrate specificity. Therefore, engineering of proteases towards altered catalytic efficiency and specificity or enhanced stability in artificial environments is invaluable for both medical and industrial applications [46,53].

Among the first successful attempts on protease engineering was the specificity reprogramming of subtilisin BPN` from Bacillus amyloliquefaciens. Amazingly, amino acid substitutions at almost every position in the 275 residues long polypeptide chain of subtilisin BPN` are claimed in various patents [53]. Subtilisins are indeed the proteases that most extensively have been subjected to modifications. Numerous engineered variants of subtilisins with enhanced specificity and activity at lower temperatures as well as increased pH and surfactant endurance, are widely used in the detergent industry at present.

Page 22: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

12

Rational approaches for protease engineering

Even though rational design has been used for the successful engineering of some proteases like subtilisin BPN`, trypsin and neurolysin [54-56], it has proven difficult to apply as a generic approach for protease engineering. Despite the increasing number of available protein sequences and structures, together with powerful algorithms for modeling molecular dynamics, it is still very complicated to predict what effect a specific amino acid substitution has on protease function. Modern computational methods combined with contemporary strategies for site directed mutagenesis have actually been applied for enhancing the solubility and stability of certain proteases [57]. However, engineering a protease’s specificity and activity solely by rational design requires a deeper understanding on the intricate connection between a protease’s sequence, structure and proteolytic mechanism than is currently available.

Library-based approaches for protease engineering

Fortunately, there are other approaches utilizing random mutagenesis for the engineering of proteases and proteins in general that do not have the same prerequisites as rational design. These methods seek to emulate nature’s own mechanisms for protein optimization, and therefore are usually classified as directed evolution approaches. Directed evolution focuses on the creation of genetic libraries containing a large number of mutated protease variants and their subsequent screening for the identification of novel variants with the desired attribute (Figure 3).

Figure 3. Steps for directed evolution of proteases. Genetic libraries containing a largenumberofmutatedproteasevariantsarecreated.Proteasemutantswithdesiredattributesareisolatedthroughscreeningorselectionpriortocharacterizationoradditionalroundsofmutagenesisandscreening/selection.

Page 23: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

13

It is important to keep in mind that successful approaches used for the engineering of a particular protease are not necessarily applicable to all other proteases. Regardless the type of approach, rational or evolutionary, careful planning is required on where and how to create genetic diversity in the mutated protease variants that are to be analyzed for the preferred properties. Depending on how much structural and functional information is accessible for the protease of interest, different strategies have to be chosen. In many cases today, there is adequate knowledge on the proteases subjected to engineering. Hence, with the intention of minimizing the individual shortcomings of rational design and directed evolution, iterative combinatorial approaches are usually preferred to obtain the desired results faster. However, the kind and order of the methods employed in combinatorial approaches differs from case to case (Figure 4 describes different strategies for protease engineering strategies) [58-61].

Figure4.Differentstrategiesforcreatingproteasediversity.

Page 24: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

14

Error-prone PCR has been one of the most popular methods in protein engineering [62]. Although a powerful method for generating genetic diversity on whole genes, error-prone PCR has its limitations. Due to the mutational bias of the DNA polymerases utilized in error-prone PCR, the method can only produce restricted genetic diversity [63]. Furthermore, during error-prone PCR only one nucleotide is generally misincorporated per amino acid codon, thus limiting the maximal range of amino acids derived from each mutated codon to between 4 and 6 [64]. Thus, beneficial amino acid substitutions requiring more than one nucleotide change will probably be missed when employing error-prone PCR. An attractive alternative then is transversion-enriched sequence saturation mutagenesis (SeSaM-Tv+) that allows a less biased mutational spectra and adjacent nucleotide substitutions, but unluckily it lacks the convenient experimental setup of error-prone PCR [65].

Whole-gene randomization is not always necessary in protease engineering. Mutations enhancing protease stability might be scattered through the whole protease gene, but on the other hand, targeted amino acid substitutions in or near the active site of a protease could not only promote alterations in substrate specificity but can also lead to the acquirement of novel catalytic activities. Furthermore, it is not only amino acid substitutions that create genetic diversity. Insertions or deletions of amino acids and even mutations promoting circular permutation (giving rise to new N- and C-termini) are of equal importance [66,67].

Accordingly, in addition to error-prone PCR and SeSaM-TV+, there are plenty of other methods for the generation of diversity in mutant protease libraries. For example synthetic-oligonucleotide based methods such as theoretical maximum efficiency randomization (MAX) [68], codon shuffling and incorporation of synthetic oligonucleotides via gene reassembly (ISOR) are used when site-directed randomization is wanted [69]. Whereas, when more radical changes in protease structure or function are demanded, the choice falls on recombinatorial methods, which can be either homology dependent (synthetic DNA shuffling, staggered extension process (StEP), mutagenic and unidirectional reassembly (MURA)) or not (sequence-independent site-directed chimeragenesis (SISDC), structure-based combinatorial protein engineering (SCOPE), USER friendly DNA recombination (USERec)), recently reviewed by Lutz et al. [70].

Page 25: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

15

Discovering novel protease variants

However, in order to take advantage of the full potential of these combinatorial approaches and make huge leaps in protease evolution, it is imperative to have powerful screening or selection methods at hand that easily can handle large mutant libraries. Depending on which properties are desired from the engineered protease variants, the choice of screening or selection method varies accordingly. Several assays have indeed been developed for attaining engineered variants of target proteins with improved stability or solubility through screening of combinatorial protein libraries [71,72]. One of the most recent assays that holds great potential is a variant of an earlier protein-folding assay, which was based on the propensity of GFP to misfold when C-terminally fused to a “difficult-to-fold” protein [73]. According to the new refined method, combinatorial libraries of target proteins are first cloned into a reporter plasmid upstream of a 15 amino acid long fragment of GFP (GFP11) and subsequently transformed into E. coli cells harboring another expression vector encoding the rest of the complementary GFP fragment (GFP1-10). Because the successful GFP complementation is dependent on the folding state of the mutant-GFP11 fusion protein, only cells coexpressing correctly folded mutants and GFP1-10 will exhibit gain-of-fluorescence. Thereby, after induced expression of both genetic constructs on agar plates, the brighter colonies that likely contain the best folding mutants can be picked and subjected to multiple rounds of DNA shuffling and screening. Finally, the desired mutants are expressed and characterized in an analogous in vitro experimental setup for quantitative analysis of solubility [74] (Figure 5). Although methods such as the above-mentioned are applicable for the engineering of proteases towards enhanced solubility or stability as well, one has to keep in mind that alterations to a protease’s scaffold promoting such properties may actually inflict on the activity or substrate specificity of the protease.

Page 26: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

16

Figure 5. Two assays applied for protease engineering. (I) GFP‐based folding assay.Coexpressionoffoldedprotease‐GFP11fusionproteinsandthecomplementaryGFPfragment(GFP1‐10)resultsincorrectGFPformation.Misfoldingoraggregationofprotease‐GFP11hindersGFP complementation. (II) OmpT specificity engineering. Different fluorophore‐labeledsyntheticsubstratepeptidesareelectrostaticallyboundtotheE.colicellsurface,wheretheycan be cleaved by displayed OmpT mutants. Depending on the kind of substrate peptidecleaveddifferenttypesoffluorescenceisexhibited.Quencher(Q),selectionsubstrate(SUBs),counterselectionsubstrate(SUBcs),bodipyfluorophore(BD),tetramethylrhodamine(TMR).

Page 27: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

17

For that reason, methods capable of interrogating recombinant protease libraries, which take in consideration proteolytic activity rather than protease stability or solubility, have also been developed. One such method has been designed and used successfully by Varadarajan and his coworkers to create new engineered variants of endopeptidase OmpT with novel substrate specificities [64,75,76]. E. coli cells displaying genetically encoded libraries of mutated OmpT variants on the surface are incubated with a predefined mixture of two different substrate peptides; one containing the substrate sequence to select for (SubS) and the other a counterselection substrate sequence (SubCS), labeled with boron-dipyromethene (BODIPY) and tetramethylrhodamine (TMR) fluorophores, respectively. Due to its positive net charge, SubS is electrostatically bound to the E. coli cell surface where it upon cleavage by the displayed OmpT mutants gives rise to BODIPY fluorescence. On the contrary, the TMR labeled SubCS is not captured on the E. coli cell surface unless it is cleaved. Thus, cells expressing OmpT mutants processing the desired substrate peptide exhibit high BODIPY and no or low TMR fluorescence making their enrichment and characterization possible after a number of consecutive flow-cytometry sorting rounds (Figure 5).

Although the method established by Varadarajan and colleagues has proven very effective for reprogramming the substrate specificity of OmpT, its application is unfortunately limited to proteases that can effectively be displayed on the cell surface of E. coli. A method that does not share the same limitation is the genetic assay for site-specific proteolysis (GASP), which has been applied to modify the substrate specificity of hepatitis A virus (HAV) 3C protease (3CP) [77]. In GASP, mutant protease libraries are expressed in the yeast reporter strain EGY48 together with fusion proteins that contain the selection substrate peptide in a linker region between the transcription factor LexA-b42 and the truncated cytoplasmic domain of the yeast integral membrane protein STE2. Protease mutants capable of cleaving the substrate peptide release LexA-b42 from the membrane bound fusion protein. LexA-b42 is thereby free to enter the nucleus and activate the reporter genes Leu2 and LacZ. Consequently, when screening recombinant protease libraries using GASP, protease mutants with the desired substrate specificity can be isolated and further characterized since the cells harboring them (i) gain the ability to grow on media lacking leucine, and (ii) stain blue if supplemented with 5-bromo-4-chloro-3-indolyl-beta-D-galactopyranoside

Page 28: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

18

(X-gal) (Figure 6).

Another assay for screening libraries of protease mutants has been developed by O'Loughlin and coworkers. Their method is based on an earlier study, showing that inserting a recognition site for HIV protease in a specific structural element of β-galactosidase leaves its activity unaffected in vivo and in vitro, as long as the

Figure6.Geneticassayforsite‐specificproteolysis (GASP)andβ‐galactosidase‐basedsystemfor protease engineering. (I) In GASP, fusion proteins containing the selection substratepeptide (sub) in a linker region between the transcription factor LexA‐b42 (TF) and thetruncatedcytoplasmicdomainoftheintegralmembraneproteinSTE2areexpressedinyeastcells. LexA‐b42 is released from the plasma membrane upon site‐specific proteolysis andactivates reporter gene expression in the nucleus (Leu, LacZ). (II) Specific cleavage of asubstrate peptide located in a permissive loop of β‐galactosidase diminishes the enzyme’sactivity and the cells thereby lose the ability to convert X‐gal into a blue product. The β‐galactosidaseactivityincellscanbemonitoredonagarplatescontainingX‐gal.

Page 29: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

19

HIV protease is not present [78]. After cloning a desired protease substrate sequence into the permissive loop of the β-galactosidase, the reporter is coexpressed with a recombinant protease library in E. coli. The cells are then spread on agar plates containing X-gal for monitoring β-galactosidase activity. Because bacterial colonies expressing functional protease mutants cleaving the desired substrate peptide attain a light blue phenotype, due to reduced internal β-galactosidase activity, they are easily picked for additional characterizations (Figure 6). Although this assay has proved functional, it has relatively low sensitivity and limited throughput capacity.

For an overview of assays applied for protease engineering please see Table 3.

Table3.Assaysappliedforproteaseengineering

Assay Sensitivity Applicability Execution Comments

GFP‐folding +++ +++ ++Detectsonlycorrectfolding

E.colidisplay ++ + ++Onlyforcellsurfacedisplayedproteases

GASP + ++ ++Applicabletoeukaryoticproteases

Splitβ‐galactosidase + ++ + Lowthroughput

Page 30: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

20

Discovering substrate sequences

Given the important role of proteases in industrial applications as well as in medical research and therapy, innovative methods providing a deeper and wider understanding of protease function would be invaluable. In fact, the biological role of many newly discovered or already known proteases is still undetermined. As a result, a novel area of research has emerged, called degradomics, which encompasses all genetic and proteomic approaches developed for the identification and characterization of proteases, including their associated substrates and inhibitors.

Since the ability to discriminate between potential substrates is central to protease function, it is an absolute necessity to identify a protease’s substrate repertoire. This will not only result in a better understanding of the biological processes that proteases are involved in, but also provide important clues to how proteases recognize their targets and for development of potent protease inhibitors. Especially in combination with the steadily growing numbers of structurally determined proteins and proteases, a more detailed picture of the structure-function relationship is likely to emerge from this work. Because early methods for substrate sequence determination were based on cleavage of synthetic peptides and subsequent biochemical analysis, they were both tedious and inconvenient. Since then, more effective techniques have been developed utilizing chemically or biologically generated combinatorial substrate peptide libraries [79].

Page 31: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

21

Synthetic substrate libraries

One of the first methods using synthetic combinatorial substrate libraries is positional scanning (PS-SCL) (Figure 7). In PS-SCL, several sublibraries are synthesized; comprised of fluorogenic peptides containing only one constant position within fully randomized sequences. After protease addition, the fluorescence intensity detected from each sublibrary consequently reflects the average contribution of the fixed amino acid in the substrate’s proteolytic efficiency. Since the fluorogenic substrates often used require the fluorophore to be directly attached to the scissile bond, PS-SCL has merely been applied for the systematic analysis of subsite specificities. Although variations of PS-SCL, involving Edman peptide sequencing or FRET based peptide libraries, have been developed for sequential prime and non prime-site specificity determination, they have limited throughput since they require synthesis of multiple peptide libraries [80-82].

Protein microarrays have also been used successfully for screening of substrate peptide libraries. In contrast to PS-SCL, each fluorogenic substrate peptide is individually immobilized on a solid matrix or suspended in nanodroplets, thereby making it possible to detect synergies between different residues of the substrate sequence [79,83,84]. Nevertheless, proteolysis is typically detected by fluorescence detection in microarrays as well. Diáz-Mochón et al recently demonstrated an interesting DNA microarray method employing hybrid peptide-nucleic acid molecules (PNA) for determining the P4-P1 specificity of chymopapain and subtilisin [85]. The PNA fluorogenic substrate library they created were first incubated with a protease in solution and subsequent specific hybridization of the cleaved PNA-peptide substrate on DNA microarrays, through oligonucleotide (“probe”)-PNA interactions, allowed fluorometric readout and consequently determination of the non-prime protease specificity.

Although fluorogenic peptide libraries have proven useful for substrate sequence determination of proteases through PS-SCL and microarrays, their application and throughput is restricted due to their design and difficulties inherent to peptide synthesis. The commonly preferred way for surpassing some of these limitations is to synthesize and use smaller peptide libraries randomizing only one subsite of the protease substrate. However, such libraries leave out possible subsite

Page 32: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

22

cooperativity upon substrate recognition. FRET-based substrates including both substrate subsites have indeed been used to analyze the substrate specificity of caspases and Factor Xa, but only one residue was randomized in each substrate peptide [86,87].

Figure 7. Fluorogenic peptide libraries for protease substrate sequence determination. (I)Numberingnomenclature for the residuescomprisingaprotease substrate sequenceand theposition of cleavage (scissile bond). (II) Different designs of fluorogenic synthetic substratepeptides. F is a fluorophore, activated upon release from the substrate peptides. Q is aquencher.Risafluorophore(rhodamine),graduallyactivateduponsubstrateprocessing.(III)Inpositionalscanning(PS‐SCL),sublibrariescomprisedoffluorogenicpeptidescontainingonlyoneconstantposition(P)withinfullyrandomizedsequences(X)aresynthesized.F isafluorogenicleavinggroup.

Page 33: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

23

Biological substrate libraries

To avoid the obstacles related to synthetic peptide libraries, alternative methods based on biologically created peptide substrates have been developed. Among them, the most popular approach has been substrate phage display, which successfully has been applied to determine the specificity of several proteases such as subtilisin, furin, HIV protease, Factor Xa and caspases [88-91]. Substrate phage display relies on the presentation of genetically encoded combinatorial substrate peptides on filamentous phage particles, most often M13, f1 or fd phages, as N-terminal fusions to the minor coat protein pIII. In addition to the pIII domain, the substrate peptides also include an affinity tag that enables immobilization of the phages on solid support. Upon incubation with the protease of interest, phage particles displaying functional substrates are first released from the matrix and thereafter amplified in E. coli following infection. After multiple rounds of selection and enrichment, the DNA of selected phage clones is sequenced to reveal the potential substrate sequences (Figure 8).

Another method for protease specificity determination is CLiPS, which is based on cellular libraries of peptide substrates [92,93]. Unlike substrate phage display, CLiPS not only determines specificity but also provides time-dependent substrate processing rates of individual substrate sequences. CLiPS utilize circularly permutated variants of protein X (CPX) for displaying combinatorial substrate peptides on the surface of E. coli cells. Thanks to the design of the reporter proteins, the cells can be labeled through streptavidin-conjugated phycoerythrin binding to the substrate peptides anchored on the E. coli outer membrane. The fluorescently labeled cells are thereafter incubated with a protease, which detaches the fluorescent probes from cells displaying functional substrate peptides. Consequently, cells expressing optimal substrate peptides can be isolated from the total cell population by flow cytometric sorting based on their decreased fluorescence intensity. Enriched cells are regrown and subjected to additional cycles of flow cytometry screening and ultimately sequenced for substrate sequence determination (Figure 8).

Page 34: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

24

MS-based substrate identification

Unless coupled to downstream procedures, substrate phage display, CLiPS and other equivalent methods only provide the consensus substrate sequences of proteases. Therefore, mass spectrometry based techniques, have been developed

Figure 8. Overview of biological systems for protease substrate sequence identification. (I)Affinity‐taggedcombinatorialsubstratepeptidesaredisplayedonfilamentousphageparticles.After immobilizationonsolidsupport, thephageparticlesdisplaying functionalsubstratesarereleasedfromthematrixuponsite‐specificproteolysis.Aftermultipleroundsofphageparticlerecovery,selectionandenrichment,desiredphageclonesaresequencedtorevealthepotentialsubstrate sequences. (II) In CLiPS, fluorescently labelled libraries of substrate peptides aredisplayed on E. coli. A reduction of cellular fluorescence can be detected by flow cytometryanalysisuponsubstratecleavage.Typically,aftermultipleroundsofflowcytometricscreening,the enriched low‐fluorescent clones are subjected to DNA sequencing for identification ofsubstratepeptides.

Page 35: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

25

that not only discover the substrate sequence but simultaneously also pinpoint the actual scissile bond in a specific protease substrate. One common feature shared by most of these methods is that they take advantage of the new N-termini generated upon cleavage (referred to as neo N-termini). Cell proteomes or proteome-derived peptide libraries are first incubated with a protease of interest thereby creating new N-terminal peptides, which are thereafter selectively labeled to facilitate their isolation and subsequent identification by LC-MS/MS analysis (Figure 9). Efficient enrichment of the neo N-terminal peptides can be accomplished using positive selection strategies involving chemical or enzymatic biotinylation of the target peptides followed by different affinity chromatography techniques. However, approaches such as N-terminal combined fractional diagonal chromatography (COFRADIC) and terminal amine isotopic labeling of substrates (TAILS), which rely on negative selections, have also been developed for neo N-terminal peptide enrichment [94,95]. Substrate sequence determination using mass spectrometry methods is to a large extent based on single peptides, so called “one hit wonders”. Therefore, even small limitations in the enrichment of neo N-terminal peptides or the bioinformatic peptide identification can potentially result in false positive or missed peptide matches. Consequently, every substrate sequence derived from such methods has to be individually validated by other assays.

Figure 9. MS‐based methods for protease substrate sequence determination. Proteomiclibraries are digested with the protease of interest. The resulting neo N‐ or C‐terminalpeptides are enriched prior to tandem mass spectrometry analysis and subsequentbioinformaticidentificationofproteasesubstrates.

Page 36: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

26

In addition to the methods presented in this chapter for protease substrate sequence determination, others have also been developed [96-99]. For a brief overview please see Table 4.

Table4.Assaysappliedforproteasesubstratesequencedetermination

Assay Sensitivity Applicability Execution Comments

Syntheticsubstrates + ++ +Subsiteinvestigationonly

Phagedisplay ++ ++ ++ Needpurifiedprotease

CLiPS ++ ++ ++Directquantitativeanalysispossible

MS‐based ++ ++ ++OnehitwondersScissilebond

CyclicAMP‐based + ++ + Lowthroughput

Ribosomedisplay + ++ +20aalongsubstrateconsensussequence

Although substrate sequence specificity provides important clues for unraveling the specific roles of proteases in biological processes, several other questions have also to be answered in order to fully understand the complicated and tightly regulated cascades and networks proteases are involved in. Although not the scope of the thesis, some of these important aspects will be mentioned to illustrate the complexity of these questions. For example, it will be important to find answers to: (i) what proteases are present at a particular point in time? (ii) in which compartments/locations are they found? (iii) at what concentrations?, (iv) under what conditions are they active/inactive? Obviously, there is no single technique that can answer all these questions. For example, microarrays analyzing the expression levels of “all” proteases, protease-homologues and endogenous inhibitors in murine and human samples are available today, as well as methods utilizing mass spectrometry for measuring protease levels [100-102]. However, as indicated, proteases are not always active, some of them are under constant pressure of endogenous inhibitors or are produced as zymogens and await activation by other proteases. Therefore, assays targeting protease activity directly have recently been designed, involving activity-based probes (ABP) or protease signature peptides (PSP) [101]. Unquestionably, all current efforts made in the field of protease science are not only necessary but also critical to our understanding of protease biology and function in greater detail.

Page 37: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

27

Investigations

As discussed in the last two chapters, various biological and chemically based approaches have been developed to study protease substrate specificity and activity. Although their significant contribution to protease science should definitely not be neglected, all of them suffer more or less from limitations concerning sensitivity, applicability or labor intensity. Thus, novel high-throughput protease assays allowing accurate, qualitative and quantitative analysis are welcomed.

In this part of the thesis, our efforts for the development and application of two novel whole-cell protease assays will be presented briefly, based on Papers I-IV. Paper I, describes the rationale behind the design and optimization of a novel fluorescence-based protease screening method alongside its initial successful attempts of isolating an optimal substrate peptide or protease variant from a large excess of cells expressing suboptimal variants. After construction, the fluorescence-based assay was used successfully first to profile the substrate specificity of the tobacco etch virus protease (Paper II), and thereafter together with other methods applied to characterize four different tobacco etch virus protease variants (Paper III). Finally, the fluorescence-based method was evolved into an antibiotic resistance-based assay, thus forming a promising novel platform for engineering and characterization of proteases (Paper IV).

Page 38: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

28

From an extraordinary RNA to a novel protease assay (Paper I)

Upon halted protein biosynthesis, due to truncated mRNA molecules, bacteria bring into play a versatile RNA, known as ssrA RNA. This peculiar RNA is actively involved in the recycling of stalled ribosomes by serving both as tRNA and mRNA. First, operating as a tRNA, ssrA RNA supplies an alanine to the A site of the stalled ribosomes. Thereafter, taking the role of an mRNA, it does not only assists in the elongation and release of the trapped polypeptide chains but also decides their fate by encoding an eleven-residue long degradation peptide, called ssrA (AANDENYALAA), attached to the C-terminal end. Since several proteolytic complexes such as HflB, ClpAP, ClpXP and Tsp recognize the ssrA peptide, the subsequent degradation of ssrA-tagged polypeptides is inevitable [103,104].

Taking advantage of the above-mentioned feature of the ssrA peptide to target proteins for degradation, and earlier publications showing that the efficient degradation of ssrA-tagged green fluorescent protein (GFP) in the E. coli cytoplasm could be monitored by flow cytometry analysis [103,105], we sought to develop a convenient novel high-throughput system for characterization and engineering of proteases; preferably one that took advantage of the benefits of modern flow cytometry analysis and sorting. As described in Paper I, we created a system that utilizes genetically encoded fluorescent reporter peptides comprised of ssrA-tagged GFP with different substrate sequences (PS) positioned between the GFP and ssrA-moiety. If expressed alone in E. coli, the GFP-PS-ssrA reporter peptides are destined for rapid degradation by ClpXP. However, if a substrate specific protease, capable of removing the ssrA degradation tag through substrate processing, is coexpressed in the bacterial cytoplasm, the GFP evades degradation and gives rise to increased whole-cell fluorescence intensity that is easily detectable with a flow cytometer (Figure 10).

Page 39: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

29

In initial experiments, reporter constructs either lacking any substrate peptide or containing the tobacco etch virus protease (TEVp) recognition sequence ENLYFQ↓G were expressed in E. coli cells alongside TEVp. Flow cytometry analysis revealed that cells harboring TEVp and the reporter construct with a processable substrate, exhibited higher whole-cell fluorescence intensity than cells expressing the substrate-less reporter peptide (Figure 11).

With the intention of refining our method to better distinguish cells expressing functional substrate-protease combinations, the expression of the reporter peptides and TEVp was varied. For the same purpose, an engineered ssrA peptide containing an extra pair of asparagine and tyrosine residues (AANDENYNYALAA) with enhanced degradation efficiency [106] was also evaluated for use in our system, instead of the normal degradation tag. Both strategies proved to have a positive effect on the resolution between the positive and negative cells. After implementing the improved ssrA-tag together with the most productive conditions for TEVp and reporter peptide expression, our methodology made it possible not only to detect subtle differences in the

Figure 10. Schematic overview of the fluorescence‐based system. A reporter substrate,consistingofGFP fused to aprotease recognition sitewith an ssrAdegradation tag at theC‐terminus (I), and a target protease (II) are coexpressed in E. coli cells. Upper section: AfunctionalproteasecleavesoffthessrAdegradationtagandrescuesGFPfromClpXP‐mediateddegradation,thusconferringincreasedwhole‐cellfluorescence.Lowersection:Onthecontrary,intheabsenceofafunctionalprotease,thewhole‐cellfluorescenceintensityislowduetotheinevitabledegradationofthereporterconstructbyClpXP.

Page 40: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

30

processing efficiency of closely related TEVp substrate peptides (Paper I & Paper II) but also to discriminate TEVp variants according to their in vivo solubility (Figure 11) Additionally, flow cytometric analysis and sorting allowed for enrichment and subsequent identification of an optimal substrate peptide and protease variant from a large excess of cells expressing suboptimal variants (1:100,000). Two rounds of cell sorting resulted in a 69,000-fold and a 22,000-fold enrichment of the superior substrate peptide and protease variant, respectively.

Figure 11. (I) Flow cytometry analysis of E. coli cells that coexpress TEVp and differentreporters constructs, 2.5 h after induction (0.1 mM IPTG, 0.2% arabinose). Three reporterconstructs contained closely related TEVp substrate sequences that only differed in the P1’position(inbold).Theyarerepresentedbythehistogramsshowningreen(subG=ENLYFQG),purple(subV=ENLYFQV)andblue(subP=ENLYFQP).CellscoexpressingTEVpwiththenegativecontrol GFP‐ssrA (red) or the positive control, GFP‐subG (dark green). (II) Flow cytometryanalysis of E. coli cells that express different TEVp variants, one soluble (green) or oneaggregation‐prone(blue),togetherwiththereporterGFP‐subG‐ssrA.CellscoexpressingTEVpandGFP‐ssrA(red)servedasnegativecontrol.

Page 41: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

31

Substrate profiling of tobacco etch virus protease using a novel protease assay (Paper II)

Encouraged by the positive results presented in Paper I, we decided to apply our system for identification of the substrate sequence specificity of a protease through screening of large combinatorial substrate libraries. The choice of protease fell on TEVp, not only because it was already incorporated in our system but also due to the fact that its sequence specificity has until recently been examined merely by either comparing natural TEVp substrate sequences or analyzing a limited number of substrate peptide variants through ordinary cleavage reactions[107]. Thus, in Paper II, we first created and incorporated three different combinatorial substrate peptide libraries in our novel fluorescence-based protease assay. This was done through PCR-mediated randomization of the reporter construct GFP-ENLYFQG-ssrA, by utilizing oligonucleotides containing degenerate codons (NNK) at relevant positions of the TEVp recognition sequence. Since positions P6, P3, P1 and P1’ have been considered the most important specificity determinants [107], they were targeted both in library 1 (XNLXFX↓G) and library 2 (XNLXFX↓X). Whereas library 3 encoded substrate peptide variants randomized in position P1’ along with the less critical positions P5, P4 and P2 (EXXYXQ↓X). Subsequent comparison of the theoretical and actual sizes of the constructed libraries using the statistical analysis tool GLUE [108], showed that they should contain all possible substrate variants with more than 99% likelihood.

Unfortunately, the NNK-codon, used for the substrate library construction, besides encoding all 20 natural amino acids also codes for the amber stop codon (UAG), which upon incorporation into our reporter peptides hinder the full translation of the degradation tag, and thereby give rise to false-positive clones. In order to circumvent this problem, we initially tried E. coli strains that express amber stop codon suppressor tRNAs, but this strategy did not work to full satisfaction, due to insufficient suppression efficiency. Instead, the strategy that eventually proved viable was to deplete the created libraries from contaminating false-positive clones by collecting non-fluorescent clones through two initial rounds of flow cytometry analysis and sorting, without initiating the TEVp expression. After the pre-screening procedure, cells now co-expressing the

Page 42: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

32

substrate libraries and TEVp were subjected to two additional sorting rounds. However, during this secondary screening phase, it was cells exhibiting high fluorescence intensity that were isolated in hopes of identifying optimal TEVp substrate sequences. Regardless of the library used, subsequent DNA sequencing of randomly picked clones from the enriched libraries revealed a strong consensus substrate sequence, matching the protease’s natural substrate peptide ENLYFQG.

Table5Incidenceandinvivoprocessingefficiencyofpeptidesthatemergedfromsubstratelibraryscreenings.

Sequence1 Freq.2 Perc.3 MFI4

TEVwt ENLYFQG Library1 XNLXFXG

ENLYFQG 73 41 501 DNLYFQG 42 23 403 GNLYFQG 13 7 160 YNLYFQG 9 5 152 MNLYFQG 9 5 150 WNLYFQG 7 4 120 CNLYFQG 7 4 165 RNLYFQG 4 2 103 LNLYFQG 4 2 61

Library2 XNLXFXX ENLYFQG 143 80 500 RNLYFQS 14 8 271 ANLYFQG 4 2 313 QNLIFQG 2 1 230 RNLYFQC 2 1 200

Library3 EXXYXQX ENLYFQG 121 67 512 ECLYHQG 4 2 311 ERLYVQM 3 2 227 ESEYCQE 2 1 209 EVMYSQA 2 1 178 EFLYIQD 1 <1 139 ERGYGQV 1 <1 109 EVWYCQC 1 <1 107 EVAYGQK 1 <1 79 ESRYVQS 1 <1 68 EGEYWQR 1 <1 57 ESNYGQM 1 <1 52

Neg.ctrl 10Pos.ctrl 1075

1. Aminoacidsequences2. Incidenceofaparticularcloneafterscreeningoflibrary1,2and3.Frequencypercentage

ofclonaloccurrence3. Meanfluorescenceintensity(au)

Page 43: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

33

Moreover, we did not identify any other substrate peptide that was processed more efficiently than the native substrate peptide, which suggests the co-evolution of this particular protease-substrate combination. Our results in general agreed well with earlier studies regarding the amino acid preference in the critical positions P3, P1, P1’ of the TEVp recognition sequence. In position P6 though, it was surprising to observe a relatively large variety of residues in functional substrates, especially as the general view in previous publications by other groups seems to be that this position requires a glutamate (E) for efficient substrate-protease interaction. The fact that TEVp appears to have a larger potential substrate repertoire than previously known may have impact on its use in biomedical research and industry, where its popularity is based on its high sequence specificity and activity (Table 5).

Characterization of TEV protease mutants engineered for improved solubility (Paper III)

Among reagents presently used for removal of fusion tags from recombinantly produced proteins, TEVp has been one of the most preferred alternatives due to its stringent specificity. TEVp unfortunately suffers from reduced solubility and its expression in E. coli has been rather problematic. Although different strategies have been adopted to bypass these limitations, novel enhanced TEVp variants are still needed.

Two efforts for improving the function of TEVp were reported recently [109,110]. According to these studies, introducing the mutations L56V/S135G in TEVp confer improved in vitro activity and solubility, while the mutations T17S/N68D/I77V increase the amount of soluble protease upon overexpression in E. coli. In Paper III, all above-mentioned five mutations were combined into one protease molecule with the intention to create a novel TEVp mutant with superior solubility and activity compared to both parental variants as well as the wild-type TEVp. In order to analyze the different protease variants, all were subjected to the GFP-based method, presented in Paper I, and also other established methods assaying expression levels, solubility and activity. Collectively, our results indicated that the most beneficial substitutions are L56V/S135G, promoting enhanced activity and solubility both in vitro and in vivo. On the

Page 44: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

34

contrary, in our hands the substitutions T17S/N68D/I77V, quite surprisingly, had a negative effect on both the solubility and activity of every protease variant they were introduced into. Hence, eliminating our vision of creating an exceptional TEVp variant with extraordinary activity and solubility by introducing the mutations T17S/L56V/N68D/S135G/I77V.

Evolving a current protease assay (Paper IV)

Although the potential of the fluorescence-assisted whole-cell assay has been demonstrated in Papers I, II and III, the fact that it relies on flow cytometry entails certain limitations. For instance, when the sizes of combinatorial libraries of proteases or substrate peptides to be screened approach the magnitude of 109 library members, flow cytometric analysis and sorting becomes time consuming and therefore impractical. Furthermore, the availability of high-throughput cell sorters is restricted to laboratories with good financial support, thereby narrowing the wide spread application of the fluorescence-based method.

For the above-mentioned reasons, we designed a modified variant of the original fluorescence-based system, which should enable detection of site-specific proteolysis by monitoring growth on selective media rather than measuring fluorescence with a flow cytometer. This was achieved simply by replacing the GFP moiety of the previously created reporter constructs with the antibiotic resistance enzyme, chloramphenicol acetyltransferase (CAT) (Figure 12). Using this new approach, cells expressing TEVp and reporter peptides with or without a cleavage site could easily be discriminated, based on their propensity to survive on agar plates supplemented with chloramphenicol (Figure 12).

Page 45: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

35

Figure 12. Antibiotic resistance‐based assay for site‐specific proteolysis (I) E. coli cellscoexpressingthetargetproteaseandaproteolysis‐reportercontainingasubstratepeptide.Upper part: If the protease cleaves the substrate peptide, the enzyme chloramphenicolacetyltransferase (CAT) will be released from the ssrA degradation tag and escapesdegradationbyintracellularproteasessuchasClpXP.Thecellwillthereforegainresistanceto chloramphenicol (Cml) and consequently be able to grow inmedia supplementedwithCml. Lower part: In cells where no or inefficient substrate cleavage occurs, the wholereportercomplexisdestructedandcannotsupportcellgrowthinCml‐supplementedmedia.(II)E.coli cellsexpressingTEVp togetherwith thenegativecontrolCAT‐ssrA (left),orCAT‐subG‐ssrA (middle)or thepositivecontrolCAT‐subG (right),wereplatedonLBagarplatessupplementedwith20µg/mlCml,0.1mMIPTGand0.8%L(+)‐arabinose.

Page 46: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

36

In fact, additional experiments also showed that the processing efficiency of different substrate sequences by TEVp was reflected in the growth rates of the cells harboring them, i.e., the growth rate correlated with the substrate processing efficiency (Figure 13). In a model experiment, we also showed that the system enabled the enrichment of the best substrate-protease combination from a background of less-efficient combinations, through selection based on competitive growth in liquid culture (Figure 14).

Figure13.CellsharboringtheTEVpexpressionvectoraswellasoneofthereportervectorspCAT‐subG‐ssrA(substrate:ENLYFQG;bluediamonds),pCAT‐subV‐ssrA(substrate:ENLYFQV;redsquares)orpCAT‐subP‐ssrA(substrate:ENLYFQP;greentriangles),orthenegativecontrolpCAT‐ssrA(bluesquares),weregrowntoearlylogphasebeforeTEVpandthereporterswereinduced sequentially. After 2 hours of phenotyping, the cellswere inoculated into fresh LBbroth supplemented with the inducers (0.1 mM IPTG and 0.2 % L(+)‐arabinose) and theselectiveagentCml(20µg/ml).GrowthwasfollowedovertimebymeasuringOD600.

Page 47: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

37

Although the CAT-based protease assay proved to be a promising selection strategy for characterization and engineering of proteases and their substrates, it does not possess the same quality of providing simple quantification of the processing efficiency as the GFP-based method. Nevertheless, thanks to the design of the reporter constructs used in both systems (i.e., universal restriction sites) it is straightforward to switch between them, by simple cloning procedures, to take full advantage of the specific benefits of each system.

Figure14.Enrichmentofcellsexpressinganefficientlyprocessedsubstrate fromamixtureofcells expressing less‐efficient substrates. Cells coexpressing TEVp and either CAT‐subG‐ssrA(green)orCAT‐subP‐ssrA(blue)orCAT‐ssrA(red)weremixedatanapproximateratioof1:2:2beforebeing inoculated into selectivemedia supplementedwith20µg/mlCml,0.1mM IPTGand 0.2% L(+)‐arabinose. Shown are the distributions between cells expressing the differentsubstratepeptidesduringtheselection(asdeterminedbyDNAsequencing).

Page 48: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

38

Reflections

The undisputed significance of proteolysis in medical as well as industrial applications has made proteases justified targets of vast scientific focus the past decades. Enormous efforts have been and still are made today all over the world in small and larger scale projects, not only for elucidating the biological role of proteases but also for gaining information on proteases’ mechanism of action. Besides, due to the huge demand on novel protease variants with desired characteristics for various applications, protease engineering has rapidly emerged as a scientific area of great importance.

With the vision of contributing to the field of protease science we developed a platform of novel protease assays, one fluorescence-based and one antibiotic resistance-based. In contrast to many other equivalent methodologies, our platform does not require the production and subsequent purification of the target protease and/or synthetic peptides, which can be rather elaborate and tedious. Additionally, its high sensitivity and adequate dynamic range provides direct quantitative analysis of site-specific proteolysis, which along with the platform’s distinct genotype-phenotype link makes it attractive for characterization and engineering of proteases and their substrates.

Unfortunately, when applied for substrate profiling, our assays do not determine the scissile bond within the substrate sequences. Although this can be an issue when investigating less characterized proteases, it can be solved through in vitro cleavage of synthetic peptide substrates and subsequent identification of the cleavage products (e.g. mass spectrometry). However, with some optimizations, it should be feasible to integrate in our system a more “direct” MS-based identification of the exact cleavage site. One possible approach would be to extract the intracellular protein content of isolated clones expressing functional substrate-protease combinations and thereafter specifically capture the

Page 49: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

39

corresponding cleaved reporter peptides before subjecting them to MS (or MS-MS) analysis for revealing the scissile bond.

Depending on the nature of the protease investigation that our platform is applied for, different randomly mutated protease or substrate peptide libraries will most likely often be analyzed. Even though the presence of stop codons in protease mutant libraries does not aggravate the isolation of desired protease variants, it is detrimental in substrate sequence libraries since some of the reporter peptides will lack the degradation tag and thereby lead to false positive library members, which complicates the analysis. One way to avoid this problem is by using trinucleotide phosphoramidites chemistry (trimer codons) upon construction of mutated substrate libraries [111]. Another plausible strategy would be to use an N-terminal degradation tag in the reporter constructs instead of the C-terminal ssrA that presently is used. In such a case, a stop codon within the substrate peptide region will off course again result in a pre-mature stop, but now without any harmful effect. This is because the reporter protein in this instance will not be expressed at all, and, consequently, the clone is then negative.

A prerequisite for the successful utilization of our platform is that the target protease and substrate are functional within the bacterial cytoplasm. For that reason, proteases that require posttranslational modifications not offered in E. coli for conversion into its active form, would be problematic to employ in our system. However, recent studies have shown that the ClpXP degradation machinery can be stably integrated into the genome of Saccharomyces cerevisiae, and there confer to yeast the ability to degrade ssrA-tagged GFP with high efficiency [112]. Thus, adapting our assays into ClpXP-engineered yeast host strains, or perhaps even human cell lines (e.g., HEK293), would be a significant improvement and expand their application for a wider range of proteases. Nevertheless, proteases attacking proteins critical for the viability of the expression hosts will still be challenging to investigate.

Considering what has been presented in this thesis, our platform presents a new path forward for high-throughput substrate profiling of proteases, engineering of novel protease variants and identification of protease inhibitors, which all are areas of great biological, biotechnical and medical interest.

Page 50: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

40

Abbreviations

aa amino acid ACE Angiotensin-Converting Enzyme APC Activated Protein C BODIPY boron-dipyromethene CAT Chloramphenicol AcetylTransferase CLiPS Cellular Libraries of Peptide Substrates Cml Chloramphenicol COFRADIC Combined Fractional Diagonal Chromatography CPX Circular Permutated protein X DDP DipeptiDyl Peptidase DNA Deoxyribonucleic Acid FRET Förster Resonance Energy Transfer GFP Green Fluorescent Protein IPTG IsoPropyl β-D-1-ThioGalactopyranoside LB Luria Bertani MMP Matrix MetalloProteinase MS Mass Spectrometry OD Optical Density Omp Outer Membrane Protein PCR Polymerase Chain Reaction PDB Protein Data Bank PNA Protein-Nucleic Acid PS-SCL Positional Scanning Synthetic Combinatorial Libraries RNA Ribonucleic Acid TAILS Terminal Amine Isotopic Labeling of Substrates TEV Tobacco Etch Virus TMR TetraMethylRhodamine TNK-tPA TNK-tissue Plasminogen Activator X-gal bromo-chloro-indolyl-galactopyranoside

Page 51: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

41

Acknowledgements

Det har varit en lång resa… En resa full med överraskningar, glädje, E. coli, tårar, gåtor, GFP, forskning, ssrA, kunskap…

Nu lät det nästan som något Mikael Persbrandt skulle kunna säga i sitt tacktal, om han någonsin skulle vinna en Oscar för en roll i en film med titeln “ Inevitable Degradation”. Jag tackar alla på mitt sätt istället!

Patrik, min handledare och goda vän, du har alltid stått vid min sida i både motgångar och framgångar. Tiden har gått snabbt men nu är vi framme. Tack för allt! Någon dag ska jag summera alla timmar vi har spenderat tillsammans. Snart är det kanske även dags för oss att röka upp cigarren som finns i ditt skåp.

Stefan, min huvudhandledare, ditt glatta humör och goda råd har peppat mig in i det sista. Vi måste hitta brevbäraren på Gotland så att han inte snor våra foskningsresultat.

Per-Åke Jr, tack för att du har fått mig att tänka i hel andra dimensioner när det gäller forskning och presentationsteknik. Du har alltid varit där när jag har haft frågor och tagit tid att besvara dem.

Tack alla PIs för att ni har skapat DNA corner!

Tack alla mina rumskamrater genom tiderna, John, Emilie, Sara, Erik F, Torun, Björn, Per-Åke, Patrik, Calle, Nima, Ronald, Hanna, Tove A, Lisa och Francis, för alla grekfester, “Kycklingarna”, fikapauser, aktie råd, travtips, bra sällskap och självklart de vetenskapliga diskussionerna.

Page 52: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

42

Utan mina exjobbare och projektarbetare, David, Ramki, Sandra, Thomas, Lisa, Andreas, Mohammed skulle det varit minst sagt ännu mer att göra. Tack för det otroliga jobb ni har lagt ner i alla projekt ni har varit involverade i.

The Old School Mellanstadiet (Olof, Björn, Erik F) featuring Toris and Maximus. Tack för att ni fick mig att känna mig som hemma på labbet från första början. Med bra musik kommer fantastiska forskningsresultat!

The New School Mellanstadiet (Lisa, Feifan, Sebastian). Vilket team vi har varit de senaste åren! Tack för alla Westerns, klyvningar, kolonier… Allt!

Carlito, min lab-bror, som tack och lov har förvandlat mig till en Apple och Xbox maniac. Tack snälla för allt! Labkursen, avhandlingen, kloning, achievements… Snart är det fredag! Men det kanske funkar med tisdag också?

John, hur många mil har du fått springa till FACS-rummet fram och tillbaka för att hjälpa mig ? Tack så mycket för all hjälp med forskningen och pep-talken inför avhandlingsskrivandet.

Vad är Wild Dagga och PCA egentligen ? Per-Åke Sr, du har varit en otroligt källa till visdom i allt som har med växter, bygg och bioteknik att göra. Jag vill även tacka dig för ett bra sammarbete på labbet och för att du introducerade mig till den magiska e-cigarren.

Mos & Steel. Tack för alla härliga stunder med öl, mat och mycket s%#”snack.

För våra stora analyser vid autoklaven om bl.a. kärlek, Tiger Woods, Australien, skidor, tennis… Tack Emma Ö!

Tack Bahram för alla fantastiska sekvenser under alla dessa år. Jag märkte dock att de blev bättre efter att jag hade sponsrat Anahitas klassresa...

För att ha fått chansen att vara med i en fantastisk julfilm och andra sköna stunder vill jag tacka EP & Tomten (Rockis).

Jojje’s Angels (Basia, Lisa, Julia, Feifan, Hanna, Elzbieta) som gjorde ett otroligt job med labkursen i Molekylär Bioteknik i vintras. Thank you Angels!

Page 53: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

43

Tack Proteinfabriken och Molbio i HPR för alla tips om hur man ska bäst rena proteiner, göra kompetenta celler, klona, laga dumplings…

Vill även tacka den skönaste IT-suporten, Eric B och Kaj! Cheers!

Till alla på plan 3 stort tack för all hjälp och för att jag fick vara en del av världens bästa lab!

Vill även tacka FC Björnligan som har hållit mig i en otrolig fysisk form under min doktorandtid. FORZA LIGAN!

Sist men inte minst vill jag tacka hela min stora familj för allt stöd jag har fått under alla dessa år i form av god mat och uppmuntran. Nu är det äntligen dags för disputation och lite fest!

Jaaa… Hur skulle jag kunna glömma att tacka dig Martina. Du har funnits där för mig under all denna PhD-bergodalbana. Att bo med en zombie, som jag har varit i ungefär ett halvår, kan inte vara så lätt. Jag älskar dig!

Page 54: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

44

References

1. Encyclopedia Britannica Online www.britannica.com 2. Beadle GW, Tatum EL (1941) Genetic Control of Biochemical Reactions in

Neurospora. Proc Natl Acad Sci USA 27: 499-506. 3. Avery OT, Macleod CM, McCarty M (1944) studies on the chemical nature of

the substance inducing transformation of pneumococcal types : induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii. J Exp Med 79: 137-158.

4. Hershey AD, Chase M (1952) Independent functions of viral protein and nucleic

acid in growth of bacteriophage. J Gen Physiol 36: 39-56. 5. Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for

deoxyribose nucleic acid. Nature 171: 737-738. 6. Wilkins MHF, Stokes AR, Wilson HR (1953) Molecular structure of

deoxypentose nucleic acids. Nature 171: 738-740. 7. Franklin RE, Gosling RG (1953) Molecular configuration in sodium

thymonucleate. Nature 171: 740-741. 8. Brenner S, Jacob F, Meselson M (1961) An unstable intermediate carrying

information from genes to ribosomes for protein synthesis. Nature 190: 576-581. 9. Hoagland MB, Zamecnik PC, Stephenson ML (1957) Intermediate reactions in

protein biosynthesis. Biochim Biophys Acta 24: 215-216. 10. Morgan AR, Wells RD, Khorana HG (1966) Studies on polynucleotides, lix.

Further codon assignments from amino acid incorporations directed by ribopolynucleotides containing repeating trinucleotide sequences. Proc Natl Acad Sci USA 56: 1899-1906.

11. Crick FH (1958) On protein synthesis. Symp Soc Exp Biol 12: 138-163.

Page 55: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

45

12. Crick F (1970) Central dogma of molecular biology. Nature 227: 561-563. 13. Bessman MJ, Lehman IR, Simms ES, Kornberg A (1958) Enzymatic synthesis of

deoxyribonucleic acid. II. General properties of the reaction. J Biol Chem 233: 171-177.

14. Lehman IR, Bessman MJ, Simms ES, Kornberg A (1958) Enzymatic synthesis of

deoxyribonucleic acid. I. Preparation of substrates and partial purification of an enzyme from Escherichia coli. J Biol Chem 233: 163-170.

15. Weiss B, Jacquemin-Sablon A, Live TR, Fareed GC, Richardson CC (1968)

Enzymatic breakage and joining of deoxyribonucleic acid. VI. Further purification and properties of polynucleotide ligase from Escherichia coli infected with bacteriophage T4. J Biol Chem 243: 4543-4555.

16. Jackson DA, Symons RH, Berg P (1972) Biochemical method for inserting new

genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc Natl Acad Sci USA 69: 2904-2909.

17. Lobban PE, Kaiser AD (1973) Enzymatic end-to end joining of DNA

molecules. J Mol Biol 78: 453-471. 18. Smith HO, Wilcox KW (1970) A restriction enzyme from Hemophilus influenzae. I.

Purification and general properties. J Mol Biol 51: 379-391. 19. Cohen SN, Chang AC, Boyer HW, Helling RB (1973) Construction of

biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci USA 70: 3240-3244.

20. Morrow JF, Cohen SN, Chang AC, Boyer HW, Goodman HM, et al. (1974)

Replication and transcription of eukaryotic DNA in Escherichia coli. Proc Natl Acad Sci USA 71: 1743-1747.

21. Genentech press release (1978) http://www.gene.com/gene/news/press-

releases/display.do?method=detail&id=4160 22. Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang R-Y, et al. (2010)

Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329: 52-56.

23. Whitford D (2005) Proteins: structure and function . ISBN-13: 978-0-471-49894-

0 Wiley 528 p. 24. Sumner J (1926) The isolation and crystallization of the enzyme urease. The

Journal of Biological Chemistry 69: 435-441.

Page 56: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

46

25. Human Protein Atlas www.proteinatlas.org 26. Structural Genomics Consortium www.sgc.ki.se 27. Kurisu G, Kinoshita T, Sugimoto A, Nagara A, Kai Y, et al. (1997) Structure of

the zinc endoprotease from Streptomyces caespitosus. J Biochem 121: 304-308. 28. Bertenshaw GP, Norcum MT, Bond JS (2003) Structure of homo- and hetero-

oligomeric meprin metalloproteases. Dimers, tetramers, and high molecular mass multimers. J Biol Chem 278: 2522-2532.

29. Aehle W (2007) Enzymes in industry: production and applications. ISBN-13:

978-3-527-31689-2 Wiley 489 p. 30. Rawlings ND, Barrett AJ, Bateman A (2010) MEROPS: the peptidase database.

Nucleic Acids Res 38: D227-233. 31. López-Otín C, Bond JS (2008) Proteases: multifunctional enzymes in life and

disease. J Biol Chem 283: 30433-30437. 32. Miyoshi S, Shinoda S (2000) Microbial metalloproteases and pathogenesis.

Microbes Infect 2: 91-98. 33. Sheahan K-L, Cordero CL, Satchell KJF (2007) Autoprocessing of the Vibrio

cholerae RTX toxin by the cysteine protease domain. EMBO J 26: 2552-2561. 34. Freije JMP, Balbín M, Pendás AM, Sánchez LM, Puente XS, et al. (2003) Matrix

metalloproteinases and tumor progression. Adv Exp Med Biol 532: 91-107. 35. Murphy G, Nagase H (2008) Reappraising metalloproteinases in rheumatoid

arthritis and osteoarthritis: destruction or repair? Nat Clin Pract Rheumatol 4: 128-135.

36. Espada J, Varela I, Flores I, Ugalde AP, Cadiñanos J, et al. (2008) Nuclear

envelope defects cause stem cell dysfunction in premature-aging mice. J Cell Biol 181: 27-35.

37. Nalivaeva NN, Fisk LR, Belyaev ND, Turner AJ (2008) Amyloid-degrading

enzymes as therapeutic targets in Alzheimer's disease. Curr Alzheimer Res 5: 212-224.

38. Dollery CM, Libby P (2006) Atherosclerosis and proteinase activation. Cardiovasc

Res 69: 625-635. 39. Mohamed MM, Sloane BF (2006) Cysteine cathepsins: multifunctional enzymes

in cancer. Nat Rev Cancer 6: 764-775.

Page 57: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

47

40. Quesada V, Ordóñez GR, Sánchez LM, Puente XS, López-Otín C (2009) The Degradome database: mammalian proteases and diseases of proteolysis. Nucleic Acids Res 37: D239-243.

41. Kohl NE, Emini EA, Schleif WA, Davis LJ, Heimbach JC, et al. (1988) Active

human immunodeficiency virus protease is required for viral infectivity. Proc Natl Acad Sci USA 85: 4686-4690.

42. J. Barrett A, D. Rawlings N, Fred Woessner J (2004) Handbook of Proteolytic

Enzymes. ISBN-13: 978-0-120-79610-6 Academic Press 2368 p. 43. Drag M, Salvesen G (2010) Emerging principles in protease-based drug

discovery. Nature reviews Drug discovery 9: 690-701. 44. Turk B (2006) Targeting proteases: successes, failures and future prospects.

Nature reviews Drug discovery 5: 785-799. 45. Abbenante G, Fairlie DP (2005) Protease inhibitors in the clinic. Medicinal

chemistry (Shariqah (United Arab Emirates)) 1: 71-104. 46. Craik CS, Page MJ, Madison EL (2011) Proteases as therapeutics. Biochem J 435:

1-16. 47. Roy R, Yang J, Moses MA (2009) Matrix metalloproteinases as novel biomarkers

and potential therapeutic targets in human cancer. J Clin Oncol 27: 5287-5297. 48. Edwards D (2008) The cancer degradome: proteases and cancer biology. ISBN-

13: 978-0-387-69056-8 Springer 926 p. 49. Novozymes annual report http://www.novozymes.com/en/investor/financial-

reports/Documents/The%20Novozymes%20Report%202010.pdf 50. Kirk O, Borchert TV, Fuglsang CC (2002) Industrial enzyme applications. Curr

Opin Biotechnol 13: 345-351. 51. Gupta R, Beg QK, Lorenz P (2002) Bacterial alkaline proteases: molecular

approaches and industrial applications. Appl Microbiol Biotechnol 59: 15-32. 52. Rao MB, Tanksale AM, Ghatge MS, Deshpande VV (1998) Molecular and

biotechnological aspects of microbial proteases. Microbiol Mol Biol Rev 62: 597-635.

53. Cherry JR, Fidantsef AL (2003) Directed evolution of industrial enzymes: an

update. Curr Opin Biotechnol 14: 438-443. 54. Hedstrom L, Szilagyi L, Rutter WJ (1992) Converting trypsin to chymotrypsin:

the role of surface loops. Science 255: 1249-1253.

Page 58: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

48

55. Lim EJ, Sampath S, Coll-Rodriguez J, Schmidt J, Ray K, et al. (2007) Swapping

the substrate specificities of the neuropeptidases neurolysin and thimet oligopeptidase. J Biol Chem 282: 9722-9732.

56. Ballinger MD, Tom J, Wells JA (1996) Furilisin: a variant of subtilisin BPN'

engineered for cleaving tribasic substrates. Biochemistry 35: 13579-13585. 57. Cabrita LD, Gilis D, Robertson AL, Dehouck Y, Rooman M, et al. (2007)

Enhancing the stability and solubility of TEV protease using in silico design. Protein Sci 16: 2360-2367.

58. Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to

engineering enzyme activity: combining the benefits of directed evolution and rational design. Curr Opin Biotechnol 16: 378-384.

59. Antikainen NM, Martin SF (2005) Altering protein specificity: techniques and

applications. Bioorg Med Chem 13: 2701-2716. 60. Petrounia IP, Arnold FH (2000) Designed evolution of enzymatic properties.

Curr Opin Biotechnol 11: 325-330. 61. Chen R (2001) Enzyme engineering: rational redesign versus directed evolution.

Trends Biotechnol 19: 13-14. 62. Fromant M, Blanquet S, Plateau P (1995) Direct random mutagenesis of gene-

sized DNA fragments using polymerase chain reaction. Analytical Biochemistry 224: 347-353.

63. Wong TS, Roccatano D, Zacharias M, Schwaneberg U (2006) A statistical

analysis of random mutagenesis methods used for directed protein evolution. J Mol Biol 355: 858-871.

64. Varadarajan N, Cantor JR, Georgiou G, Iverson BL (2009) Construction and

flow cytometric screening of targeted enzyme libraries. Nat Protoc 4: 893-901. 65. Wong TS, Roccatano D, Loakes D, Tee KL, Schenk A, et al. (2008)

Transversion-enriched sequence saturation mutagenesis (SeSaM-Tv+): a random mutagenesis method with consecutive nucleotide exchanges that complements the bias of error-prone PCR. Biotechnol J 3: 74-82.

66. Morley KL, Kazlauskas RJ (2005) Improving enzyme properties: when are closer

mutations better? Trends Biotechnol 23: 231-237. 67. Kazlauskas RJ, Bornscheuer UT (2009) Finding better protein engineering

strategies. Nat Chem Biol 5: 526-529.

Page 59: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

49

68. Hughes MD, Nagel DA, Santos AF, Sutherland AJ, Hine AV (2003) Removing the redundancy from randomised gene libraries. J Mol Biol 331: 973-979.

69. Herman A, Tawfik DS (2007) Incorporating Synthetic Oligonucleotides via

Gene Reassembly (ISOR): a versatile tool for generating targeted libraries. Protein Eng Des Sel 20: 219-226.

70. Lutz S, Patrick WM (2004) Novel methods for directed evolution of enzymes:

quality, not quantity. Curr Opin Biotechnol 15: 291-297. 71. Cabantous S, Rogers Y, Terwilliger TC, Waldo GS (2008) New molecular

reporters for rapid protein folding assays. PLoS ONE 3: e2387. 72. Maxwell KL, Mittermaier AK, Forman-Kay JD, Davidson AR (1999) A simple

in vivo assay for increased protein solubility. Protein Sci 8: 1908-1911. 73. Waldo GS, Standish BM, Berendzen J, Terwilliger TC (1999) Rapid protein-

folding assay using green fluorescent protein. Nat Biotechnol 17: 691-695. 74. Cabantous S, Waldo GS (2006) In vivo and in vitro protein solubility assays using

split GFP. Nat Methods 3: 845-854. 75. Varadarajan N, Gam J, Olsen MJ, Georgiou G, Iverson BL (2005) Engineering

of protease variants exhibiting high catalytic activity and exquisite substrate selectivity. Proc Natl Acad Sci USA 102: 6855-6860.

76. Varadarajan N, Rodriguez S, Hwang B-Y, Georgiou G, Iverson BL (2008)

Highly active and selective endopeptidases with programmed substrate specificities. Nat Chem Biol 4: 290-294.

77. Sellamuthu S, Shin BH, Lee ES, Rho S-H, Hwang W, et al. (2008) Engineering

of protease variants exhibiting altered substrate specificity. Biochem Biophys Res Commun 371: 122-126.

78. O'Loughlin TL, Greene DN, Matsumura I (2006) Diversification and

specialization of HIV protease function during in vitro evolution. Mol Biol Evol 23: 764-772.

79. Diamond SL (2007) Methods for mapping protease specificity. Current Opinion in

Chemical Biology 11: 46-51. 80. Petrassi HM, Williams JA, Li J, Tumanut C, Ek J, et al. (2005) A strategy to

profile prime and non-prime proteolytic substrate specificity. Bioorg Med Chem Lett 15: 3162-3166.

Page 60: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

50

81. Turk BE, Huang LL, Piro ET, Cantley LC (2001) Determination of protease cleavage site motifs using mixture-based oriented peptide libraries. Nat Biotechnol 19: 661-667.

82. Cuerrier D, Moldoveanu T, Davies PL (2005) Determination of peptide

substrate specificity for mu-calpain by a peptide library-based approach: the importance of primed side interactions. J Biol Chem 280: 40632-40641.

83. Salisbury CM, Maly DJ, Ellman JA (2002) Peptide microarrays for the

determination of protease substrate specificity. J Am Chem Soc 124: 14868-14870. 84. Díaz-Mochón JJ, Bialy L, Bradley M (2006) Dual colour, microarray-based,

analysis of 10,000 protease substrates. Chem Commun (Camb): 3984-3986. 85. Winssinger N, Damoiseaux R, Tully DC, Geierstanger BH, Burdick K, et al.

(2004) PNA-encoded protease substrate microarrays. Chem Biol 11: 1351-1360. 86. Stennicke HR, Renatus M, Meldal M, Salvesen GS (2000) Internally quenched

fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8. Biochem J 350 Pt 2: 563-568.

87. Bianchini EP, Louvain VB, Marque P-E, Juliano MA, Juliano L, et al. (2002)

Mapping of the catalytic groove preferences of factor Xa reveals an inadequate selectivity for its macromolecule substrates. J Biol Chem 277: 20527-20534.

88. Beck ZQ, Hervio L, Dawson PE, Elder JH, Madison EL (2000) Identification of

efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. Virology 274: 391-401.

89. Matthews DJ, Goodman LJ, Gorman CM, Wells JA (1994) A survey of furin

substrate specificity using substrate phage display. Protein Sci 3: 1197-1205. 90. Matthews DJ, Wells JA (1993) Substrate phage: selection of protease substrates

by monovalent phage display. Science 260: 1113-1117. 91. Lien S, Pastor R, Sutherlin D, Lowman HB (2004) A substrate-phage approach

for investigating caspase specificity. Protein J 23: 413-425. 92. Boulware KT, Daugherty PS (2006) Protease specificity determination by using

cellular libraries of peptide substrates (CLiPS). Proc Natl Acad Sci USA 103: 7583-7588.

93. Boulware KT, Jabaiah A, Daugherty PS (2010) Evolutionary optimization of

peptide substrates for proteases that exhibit rapid hydrolysis kinetics. Biotechnol Bioeng 106: 339-346.

Page 61: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

GeorgeKostallas

51

94. Auf dem Keller U, Schilling O (2010) Proteomic techniques and activity-based probes for the system-wide study of Proteolysis. Biochimie 92: 1705-1714.

95. Impens F, Colaert N, Helsens K, Plasman K (2010) MS-driven protease

substrate degradomics. Proteomics 10: 1284-1296. 96. Dautin N, Karimova G, Ullmann A, Ladant D (2000) Sensitive genetic screen

for protease activity based on a cyclic AMP signaling cascade in Escherichia coli. J Bacteriol 182: 7060-7066.

97. Wigdal S, Anderson J, Vidugiris G (2008) A novel bioluminescent protease assay

using engineered firefly luciferase. Current Chemical Genomics 2: 16-28. 98. Sao K, Murata M, Fujisaki Y, Umezaki K, Mori T, et al. (2009) A novel protease

activity assay using a protease-responsive chaperone protein. Biochem Biophys Res Commun 383: 293-297.

99. Ju W, Valencia CA, Pang H, Ke Y, Gao W, et al. (2007) Proteome-wide

identification of family member-specific natural substrate repertoire of caspases. Proc Natl Acad Sci USA 104: 14294-14299.

100. Overall CM, Tam EM, Kappelhoff R, Connor A, Ewart T, et al. (2004) Protease

degradomics: mass spectrometry discovery of protease substrates and the CLIP-CHIP, a dedicated DNA microarray of all human proteases and inhibitors. Biological Chemistry 385: 493-504.

101. Doucet A, Overall CM (2009) Protease proteomics: Revealing protease in vivo

functions using systems biology approaches. Mol Aspects Med 29: 339-358. 102. Keller UAD, Doucet A, Overall CM (2007) Protease research in the era of

systems biology. Biological Chemistry 388: 1159-1162. 103. Karzai AW, Roche ED, Sauer RT (2000) The SsrA-SmpB system for protein

tagging, directed degradation and ribosome rescue. Nat Struct Biol 7: 449-455. 104. Gottesman S, Roche E, Zhou Y, Sauer RT (1998) The ClpXP and ClpAP

proteases degrade proteins with carboxy-terminal peptide tails added by the SsrA-tagging system. Genes Dev 12: 1338-1347.

105. DeLisa MP, Samuelson P, Palmer T, Georgiou G (2002) Genetic analysis of the

twin arginine translocator secretion pathway in bacteria. J Biol Chem 277: 29825-29831.

106. Hersch GL, Baker TA, Sauer RT (2004) SspB delivery of substrates for ClpXP

proteolysis probed by the design of improved degradation tags. Proc Natl Acad Sci USA 101: 12136-12141.

Page 62: Intracellular systems for characterization and …417165/FULLTEXT01.pdf · i Intracellular systems for characterization and engineering of proteases and their substrates George Kostallas

Intracellularsystemsforcharacterizationandengineeringofproteasesandtheirsubstrates

52

107. Dougherty WG, Cary SM, Parks TD (1989) Molecular genetic analysis of a plant virus polyprotein cleavage site: a model. Virology 171: 356-364.

108. Patrick WM, Firth AE, Blackburn JM (2003) User-friendly algorithms for

estimating completeness and diversity in randomized protein-encoding libraries. Protein engineering 16: 451-457.

109. van den Berg S, Löfdahl P-A, Härd T, Berglund H (2006) Improved solubility of

TEV protease by directed evolution. J Biotechnol 121: 291-298. 110. Cabrita LD, Gilis D, Robertson AL, Dehouck Y, Rooman M, et al. (2007)

Enhancing the stability and solubility of TEV protease using in silico design. Protein Science 16: 2360-2367.

111. Virnekas B, Ge L, Pluckthun A, Schneider KC, Wellnhofer G, et al. (1994)

Trinucleotide phosphoramidites: ideal reagents for the synthesis of mixed oligonucleotides for random mutagenesis. Nucleic acids research 22: 5600-5607.

112. Grilly C, Stricker J, Pang WL, Bennett MR, Hasty J (2007) A synthetic gene

network for tuning protein degradation in Saccharomyces cerevisiae. Molecular systems biology 3: 127.