zzo; L. Schaefer -- Proteoglycan Form and Function- A Comprehensive Nomenclature of Proteoglycans

45
Proteoglycan form and function: A comprehensive nomenclature of proteoglycans Renato V. Iozzo 1 and Liliana Schaefer 2 1 - Department of Pathology, Anatomy and Cell Biology and the Cancer Cell Biology and Signaling Program, Kimmel Cancer Center, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA 19107, USA 2 - Pharmazentrum Frankfurt/ZAFES, Institut für Allgemeine Pharmakologie und Toxikologie, Klinikum der Goethe-Universität Frankfurt am Main, Frankfurt am Main, Germany Correspondence to Renato V. Iozzo and Liliana Schaefer: [email protected]; [email protected] http://dx.doi.org/10.1016/j.matbio.2015.02.003 Edited by R. Sanderson Abstract We provide a comprehensive classification of the proteoglycan gene families and respective protein cores. This updated nomenclature is based on three criteria: Cellular and subcellular location, overall gene/protein homology, and the utilization of specific protein modules within their respective protein cores. These three signatures were utilized to design four major classes of proteoglycans with distinct forms and functions: the intracellular, cell-surface, pericellular and extracellular proteoglycans. The proposed nomenclature encompasses forty-three distinct proteoglycan-encoding genes and many alternatively-spliced variants. The biological functions of these four proteoglycan families are critically assessed in development, cancer and angiogenesis, and in various acquired and genetic diseases where their expression is aberrant. © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Introduction It has been nearly 20 years since the original publication of a comprehensive classification of proteoglycan gene families [1]. For the most part, these classes have been widely accepted. However, a broad and current taxonomy of the various proteoglycan gene families and their products is not available. In contrast to the classification of glycos- aminoglycans (GAGs), primarily based on the chem- ical structure of their repeating disaccharide units, classifying proteoglycans is a much more complex task [2]. We propose a comprehensive and simplified nomenclature of proteoglycans based on three criteria including: Cellular and subcellular location, overall gene/protein homology, and the presence of specific protein modules within their respective protein cores. Whereas the first two attributes have been utilized in the past for various nomenclatures, the third attribute is of more recent development and represents a sort of intrinsic signaturefor various protein cores. Indeed, modular design is based on the simple concept that protein cores are made up of finite units, like pieces of Lego. The units represent a minimum level of organization and a module can be thought of as a functional domain that affects cellmatrix dynamics. Another key feature is that each module/functional unit can be stable and can fold on its own, without being part of the large precursor protein. Thus, a module is a self-contained component. An example of this is the LG3 domain of endorepellin, the C-terminal globular-like domain of perlecan, which has recently been crystallized [3]. Below, we will critically assess the field of proteoglycans which now encompass forty three distinct genes and a much higher number of proteoglycans due to alternative splicing, thereby providing a very rich and biologically-active group of molecules. As hyaluronan and the enzymes involved in the synthesis and degradation of various GAGs are not covered in this review, readers are referred to recent reviews covering these closely-related sub- jects [418]. MATBIO-1140; No. of pages: 45; 4C: 2, 4, 8, 12, 15, 24 0022-2836/© 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Matrix Biol (2015) xx, xxxxxx Review Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proteoglycan form and function: A comprehensive nomenclature of proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbio.2015.02.003

description

proteoglican

Transcript of zzo; L. Schaefer -- Proteoglycan Form and Function- A Comprehensive Nomenclature of Proteoglycans

  • me

    CahilPh

    fer

    Edited by R. Sanderson

    lycans is a much more complexe a comprehensive and simplifiedtealogthon

    the simple concept thatof finite units, like piecesent a minimum level of

    globular-like domain of perlecan, which has recentlybeen crystallized [3]. Below, we will critically assessthe field of proteoglycans which now encompass forty

    MATBIO-1140; No. of pages: 45; 4C: 2, 4, 8, 12, 15, 24

    Reviewproteoglycan geneavailable. In contraaminoglycans (GAGical structure of thclassifying proteogtask [2]. We proposnomenclature of proincluding: Cellulargene/protein homoprotein modules wiWhereas the first twthe past for various

    is ofmore recent deveintrinsic signature fo

    0022-2836/ 2015 Publis(http://creativecommons.o

    Please cite this articleof proteoglycans, Matrixoglycans based on three criteriand subcellular location, overally, and the presence of specificin their respective protein cores.attributes have been utilized inomenclatures, the third attribute

    three distinct genes and a much higher number ofproteoglycans due to alternative splicing, therebyproviding a very rich and biologically-active group ofmolecules. As hyaluronan and the enzymes involvedin the synthesis and degradation of various GAGs arenot covered in this review, readers are referred tost to the classification of glycos-s), primarily based on the chem-eir repeating disaccharide units,publication of a comprehensive classification ofproteoglycan gene families [1]. For the most part,these classes have been widely accepted. However,a broad and current taxonomy of the various

    families and their products is not

    organization and a module can be thought of as afunctional domain that affects cellmatrix dynamics.Another key feature is that each module/functionalunit can be stable and can fold on its own, withoutbeing part of the large precursor protein. Thus, amodule is a self-contained component. An example ofthis is the LG3 domain of endorepellin, the C-terminalIntroduction

    It has been nearly 20 years since the original

    modular design is based onprotein cores are made upof Lego. The units represAbstract

    We provide a comprehensive classification of the proteoglycan gene families and respective protein cores.This updated nomenclature is based on three criteria: Cellular and subcellular location, overall gene/proteinhomology, and the utilization of specific protein modules within their respective protein cores. These threesignatures were utilized to design four major classes of proteoglycans with distinct forms and functions:the intracellular, cell-surface, pericellular and extracellular proteoglycans. The proposed nomenclatureencompasses forty-three distinct proteoglycan-encoding genes and many alternatively-spliced variants. Thebiological functions of these four proteoglycan families are critically assessed in development, cancer andangiogenesis, and in various acquired and genetic diseases where their expression is aberrant.

    2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).Proteoglycan forA comprehensivof proteoglycans

    Renato V. Iozzo1 and Liliana Schaefer 2

    1 - Department of Pathology, Anatomy and Cell Biology and theSidney Kimmel Medical College at Thomas Jefferson University, P2 - Pharmazentrum Frankfurt/ZAFES, Institut fr AllgemeineFrankfurt am Main, Frankfurt am Main, Germany

    Correspondence to Renato V. Iozzo and Liliana [email protected]://dx.doi.org/10.1016/j.matbio.2015.02.003lopment and represents a sort ofr various protein cores. Indeed,

    hed by Elsevier B.V. This is an open accesrg/licenses/by-nc-nd/4.0/).

    as: Iozzo Renato V., Schaefer Liliana, ProBiol (2015), http://dx.doi.org/10.1016/j.matband function:nomenclature

    ncer Cell Biology and Signaling Program, Kimmel Cancer Center,adelphia, PA 19107, USAarmakologie und Toxikologie, Klinikum der Goethe-Universitt

    : [email protected];recent reviews covering these closely-related sub-jects [418].

    s article under the CC BY-NC-ND licenseMatrix Biol (2015) xx, xxxxxx

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • 2General features

    Four major proteoglycan classes encompassnearly all the known proteoglycans of the mamma-lian genome (Fig. 1). Observing the types of

    Fig. 1. A comprehensive classification of proteoglycans. Thlocation, homology at the protein and genomic levels and the pby members of a given class. The key for the various modules istructure and function, please consult the text.

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbProteoglycan nomenclatureproteoglycans based on cellular and subcellularlocalization, we can see that there is only oneintracellular proteoglycan, serglycin. This uniqueproteoglycan forms a class on its own as it is the onlyproteoglycan that carries heparin side chains.

    e four families are based on their cellular and subcellularresence of unique protein modules which are often shareds provided in the bottom panel. For additional details about

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • associated with the cell surface or the pericellular

    matrix. The HSPGs are intimately associated withthe plasmamembranes of cells, either directly via anintercalated protein core or via a glycosyl-pho-sphatidyl-inositol (GPI) anchor, and function asmajor biological modifiers of growth factors suchas FGF, VEGF and PDGF among others. Similarfunctions are also performed by the HSPGs locatedin the basement membrane zone, in addition to theirability to interact with each other and with keyconstituents of the basement membrane, includingvarious laminins, collagen type IV, and nidogen.Presentation of growth factors to their cognatereceptors in a biologically-favorable form is a majorfunction of cell surface and pericellular HSPGs.Another key role is participating in the generationand long range maintenance of gradients formorphogens during embryogenesis and regenera-tive processes.As we move away from the cells in a centrifugal

    manner, chondroitin- and dermatan sulfate-contain-ing proteoglycans (CSPGs and DSPGs, respective-ly) predominate. These proteoglycans function asstructural constituents of complex matrices such ascartilage, brain, intervertebral discs, tendons andcorneas. Thus, among other functions, they provideviscoelastic properties, retain water and keeposmotic pressure, dictate proper collagen organiza-tion and are the main molecules responsible forcorneal transparency. The extracellular matrix alsocontains the largest class of proteoglycans, theso-called small leucine-rich proteoglycans (SLRPs)which are themost abundant products in terms of genenumber. These SLRPs can function both as structuralconstituent and as signaling molecules, especiallywhen tissues are remodeled during cancer, diabetes,inflammation and atherosclerosis. SLRPs interact withseveral receptor tyrosine kinases (RTKs) and Toll-likereceptors, thereby regulating fundamental processesincluding migration, proliferation, innate immunity,apoptosis, autophagy and angiogenesis. Below wewill discuss the rationale for grouping certain proteo-glycans in the same class and their overall biologicalfunction.

    Intracellular proteoglycans

    It is quite amazing that since the original cloning ofserglycin, the first proteoglycan-encoding gene to besequenced, no other true intracellular proteoglycanSerglycin is packaged in the granules of mast cellsand serves as biological glue for most of theintracellular proteases stored within the granules[19]. Another general observation is that heparansulfate proteoglycans (HSPGs) are prevalently

    Proteoglycan nomenclaturehas been discovered. Serglycin occupies a class ofits own insofar as it is the only proteoglycan that iscovalently substituted with heparin due to its

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbconsecutive (and quite unique) Ser-Gly repeats,essentially a silk-like sequence. Serglycin has beenutilized primarily by mast cells for the properassembly and packaging of the numerous proteasesthat are released upon inflammation [19]. Thedefects in the formation of mast cell granulesobserved in Srgn/ mice are remarkably similar tothose observed in mast cells derived from micelacking N-deacetylase/N-sulfotransferase 2, a keyenzyme involved in the sulfation of heparin [19].Thus, serglycin promotes granular storage viaelectrostatic interaction between its highly-anionicheparin chains and basic residues within the variousproteases of the secretory granules. It is becomingevident, however, that all inflammatory cells expressserglycin and store it within intracytoplasmic gran-ules where, in addition to proteases, serglycin bindsand modulates the bioactivity of several inflamma-tory mediators, chemokines, cytokines and growthfactors [20].More recently, serglycin has been found in a wide

    variety of non-immune cells such as endothelialcells, chondrocytes and smooth muscle cells [21].Cell-surface serglycin promotes adhesion ofmyelomacells to collagen I and affects the expression of MMPs[22]. These findings have been corroborated by in vivostudies where serglycin knockdown attenuates themultiple myeloma growth in immunocompromisedmice [23]. It has been proposed that some of theseeffects are mediated by a specific interaction betweenserglycin and cell-surface CD44 [23], a knownreceptor for hyaluronan [24,25]. It has been recentlyshown that serglycin is a key component of the cellinflammatory response in activated primary humanendothelial cells as both LPS and IL-1 increase itssynthesis and secretion [26]. Notably, serglycin canbe substituted with chondroitin sulfate (CS), and inseveral circulating cells serglycin contains lowersulfated CS-4 chains [21]. In contrast, severalhematopoietic cells (mucosal mast cells, macro-phages etc.) express serglycin with highly sulfatedCS-E. Although the significance of this phenomenonis not fully appreciated, it is likely that these isoformsof serglycin might have different functions in acell-context specific manner. Serglycin is a marker ofimmature myeloid cells and interacts with manybioactive components including histamine, TNF-and proteases [27]. In general, serglycin expressioncorrelates with a more aggressive malignant pheno-type and it has been recently proposed that serglycinprotects breast cancer cells from complement attack,thereby supporting cancer cell survival and progres-sion [28].

    Cell surface proteoglycans

    3In this class, there are thirteen genes, sevenencoding transmembrane proteoglycans and sixencoding GPI-anchored proteoglycans. With the

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • glyndechexception of two gene products, NG2 and phospha-can, all contain heparan sulfate side chains.

    Syndecans

    Fig. 2. Schematic representation of the cell surface proteois outside of the plasma membrane) proteoglycans (four syGPI-anchored proteoglycans, glypicans 16. The type of GAGkey for the various modules is provided in the bottom panel.

    4The eponym syndecan was coined by the lateMerton Bernfield [29] to define a class of transmem-brane proteoglycans that would connect (from theGreek syndein, bind together) the surface of the cellsto the underlying extracellular matrix. The syndecanfamily now comprises four distinct genes encodingsingle-pass transmembrane protein cores whichinclude an ectodomain, a transmembrane regionand an intracellular domain [4,30] (Fig. 2). Theectodomains exhibit the lowest amount of aminoacid sequence conservation, no more than 1020%,in contrast to the transmembrane and cytoplasmicdomains which are 6070% identical. A recent studyhas shown that the ectodomain of syndecans isnatively disordered and this characteristic allowssyndecans to interact with a variety of proteins andligands, thereby providing enrichment in their biolog-ical function [31]. The ectodomain contains the GAGattachment sites, which are often covalently-linked toHS and sometimes to CS, making syndecans hybridproteoglycans. Several cell types shed syndecan intothe pericellular environment through the action ofMMPs. For example, it has recently been shown thatshed syndecan-2 retards angiogenesis by inhibitingendothelial cell migration [32], a key step in neovas-cularization [33]. The transmembrane domain con-tains a dimerization motif (GxxxG) that mediates bothhomo-dimerization and hetero-dimerization [30]. The

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbintracellular domain is composed of two regions ofconserved amino acid sequence (C1 and C2),separated by a central variable sequence of aminoacids that is distinct for each family member (V) [34].

    cans, which comprise transmembrane type I (the N-terminuscans, CSPG4/NG2, betaglycan and phosphacan) and sixain and themajor protease sensitive sites are indicated. The

    Proteoglycan nomenclatureNotably, the C-terminus of all the four syndecansharbors a unique signature (EFYA) that bindsPDZ-containing proteins. Generally, PDZ-containingproteins contribute to a proper anchor of transmem-brane proteins to the cytoskeleton, thereby holdingtogether large signaling complexes.Syndecans are involved in a wide variety of

    biological functions, too vast to be reviewed here,but reviewed recently [5,30,34]. Briefly, syndecansbind numerous growth factors, especially throughtheir HS chains, and dictate morphogen gradientsduring development. In concert with other cell-surface HSPGs, syndecans can act as endocytosisreceptors and are also involved in the uptake ofexosomes [35]. Syndecans play key roles asco-receptors for many RTKs and can also functionas receptors for atherogenic lipoproteins [36].Indeed, there is strong genetic evidence thatsyndecan-1 is the main HSPG mediating clearanceof triglyceride-rich lipoproteins derived from eitherthe liver or from intestinal absorption [37].Many, if not all the syndecans, can also act as

    soluble HSPGs via partial proteolysis of theirjuxtamembrane region releasing their whole ectodo-mains. This shedding is considered a powerfulpost-translational modification that can regulate theamount of HSPG linked to the cell surface andthat present in the pericellular microenvironment[30]. Several inflammatory cytokines can induce

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • syndecan shedding by triggering outside-in signalingand by activating several metalloproteinases. In thecase of hepatocytes, shedding of syndecan-1 occursvia PKC-dependent activation of ADAM17, andthis impairs VLDL catabolism and promotes hypertri-glyceridemia [38]. Importantly, soluble syndecan-1promotes the growth of myeloma tumors in vivo [39],and this process, i.e. the shedding of syndecan-1, isenhanced by heparanase [40], thereby offering anovel mechanism for promoting cancer growth andmetastasis [41,42]. Notably, chemotherapy stimulatessyndecan-1 shedding, a potential drawback of thetreatment that could potentially favor tumor progres-sion [43]. The biological interplay between hepara-nase-evoked shedding of syndecan-1 and myelomacells leads to enhanced angiogenesis [44], furthersupporting cancer growth. As mentioned above,however, shed syndecan-2 inhibits angiogenesis viaa paracrine interaction with the protein tyrosinephosphatase receptor CD148, which in turn deacti-vates 1-containing integrins [32], presumably 11and 21, two main angiogenesis receptors. Incontrast, the ortholog syndecan-2 is required forangiogenic sprouting during zebrafish development[45].An emerging new role for syndecan-1 is linked to

    its ability to reach the nuclei in a variety of cells. Initialobservations showed that myeloma and mesotheli-oma cells contain syndecan-1 in their nuclei [46,47]and this nuclear translocation is also regulated byheparanase [46], indicating that there must be acellular receptor for shed syndecan-1 that couldmediate its nuclear targeting and transport. Insupport of these studies are previous observationsthat exogenous HS can translocate to the nuclei andmodulate the activity of DNA Topoisomerase I [48]and histone acetyl transferase (HAT) [49]. N-terminalacetylation of histones by HAT is linked to transcrip-tional activation, and this process is finely tuned by itscounteracting enzyme, histone deacetylase (HDAC).Heparanase-evoked loss of nuclear syndecan-1causes an increase in HAT enzymatic activity andenhances transcription of pro-tumorigenic genes [50].Syndecan-1 that is shed from myeloma tumor cells isuptaken by bone marrow stromal cells and istransported to the nuclei by amechanism that requiresits HS chains, as this process is inhibited by heparinand chlorate [51]. Once nuclear, soluble syndecan-1binds to HAT p300 and inhibits its activity, therebyproviding a new mechanism for tumorhost cellinteraction and cross-talk [52].

    CSPG4/NG2

    The melanoma-associated chondroitin sulfate pro-teoglycan (MCSP) was discovered over 30 years

    Proteoglycan nomenclatureago as a transmembrane proteoglycan and a highlyimmunogenic tumor antigen of melanoma tumor cells.This proteoglycan has been subsequently detected in

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbvarious species, with many names designating thesame gene product. The rat ortholog of MCSP iscalled nerve/glial antigen 2 (NG2) [53], while the termCSPG4 designates the human gene. We will useCSPG4/NG2 terminology with the idea that some ofthe functional properties have not been fully describedin the human and rat species [54]. CSPG4/NG2 is asingle-pass, type I transmembrane proteoglycancarrying one chondroitin sulfate chain, and harboringa large ectodomain composed of three subdomains(Fig. 2). The N-terminal domain (D1 subdomain)contains two laminin-like globular (LG) repeats. It islikely that the LG domains as in other proteoglycans(i.e. perlecan and agrin, see below) mediate ligandbinding, cellmatrix and cellcell interactions, as wellas interaction with integrins and receptor tyrosinekinase (RTK). The central subdomain D2 contains 15tandem repeats of a new module called CSPG [54].The CSPG repeat is a cadherin-like and tumor-relevant module which is predicted to be involved incellmatrix interaction, further modulated by the CSchain covalently attached to this module. Indeed,CSPG modules bind to collagens V and VI, FGF andPDGF. The juxtamembrane subdomain D3 contains acarbohydrate modification able to bind integrins andgalectin, as well as numerous protease cleavagesites. Accordingly, the intact ectodomain and frag-ments thereof can be detected in sera from normaland melanoma-carrying patients [54]. The transmem-brane domain of CSPG4/NG2 is quite interestinginsofar as it has a unique Cys residue, generally notfound in transmembrane regions. The intracellulardomain harbors a proximal region with numerous Thrphospho-acceptor sites for PKC and ERK1/2, and adistal region encompassing a PDZ-binding modulesimilar to the syndecan family. The latter can bind tothe PDZ domain of several scaffold proteins involvedin intracellular signaling, including syntenin, MUPP1and GRIP1.Functionally, CSPG4/NG2 proteoglycan promotes

    tumor vascularization [55] and because of itspredominant perivascular localization, CSPG4/NG2may modulate the availability of FGF at the cellsurface as well as the bioactivity and signaltransduction of FGF receptors [56]. This CSPGbinds to collagen VI in the tumor microenvironmentand promotes cell survival and adhesion via thePI3K pathway [57]. Indeed, targeting CSPG4/NG2 intwo animal models of highly-malignant brain tumorsreduces tumor growth and angiogenesis [58].Moreover, a combinatorial treatment using activatednatural killer cells and a monoclonal antibody towardCSPG4/NG2 is capable of eradicating glioblastomaxenografts more efficiently than single therapies[59].It has recently been discovered that NG2 controls

    5the directional migration of oligodendrocyte precur-sor cells by constitutively stimulating RhoA GTPases[60]. Based on NG2 ability to regulate adhesion,

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • RhoA GTPase and growth factor activities, it is likelythat this transmembrane proteoglycan might play akey role in regulating cell polarity in response toextracellular cues [61].Perdido/Kon-tiki, the Drosophila ortholog of mam-

    malian CSPG4, genetically interacts with integrinsduring Drosophila embryogenesis, and its loss isembryonic lethal [62]. RNAi-mediated suppressionof Perdido/Kon-tiki in the muscles, just before adultmyogenesis starts, induces misorientation anddetachment of Drosophila adult abdominal muscle,generating a phenotype similar to the embryoniclethal ones [63]. Thus, it is possible that, based on itshigh conservation through species, mammalianCSPG4 could also play a role in myogenesis andfunction as well.A recent study has added another function to

    CSPG4 by involving this cell surface proteoglycan inthe pathogenesis of severe pseudomembranouscolitis. CSPG4 acts as a receptor for the Clostridiumdifficile toxin B, one of the key toxins secreted by thisgram-positive and spore-forming anaerobic bacillus[64]. The interaction occurs between the N-terminusof CSPG4 and the C-terminus of toxin B. Thisdiscovery, if confirmed in future studies, opens newtherapeutic targets for the treatment of this severeand often lethal form of enterocolitis.

    Betaglycan/TGF type III receptor

    In 1991, two back-to-back papers reported on theisolation and cloning of a membrane-anchoredproteoglycan with high affinity for TGF, and thusnamedbetaglycan [65,66]. Betaglycan, also knownasTGF type III receptor (TGFB3), is a single-passtransmembrane proteoglycan that belongs to theTGF superfamily of co-receptors (Fig. 2). Theextracellular domain contains several potential GAGattachment sites and protease-sensitive sequencesnear the plasma membrane. The short intracellulardomain is highly enriched in Ser/Thr (N40%) andsome of these residues are candidate sites forPKC-mediated phosphorylation [65]. Betaglycanamino acid sequence is highly similar to that ofendoglin, a close member of the same superfamily.The membrane-proximal ectodomain of betaglycan

    contains a unique module called zona pellucida(ZP)-C [67]. The ZP module is a structural elementtypically found in the ectodomain of eukaryoticproteins composed of a Cys-rich bipartite structurejoined by a linker. Generally, proteins harboring ZPmodules tend to polymerize and assemble into longfibrils of specialized extracellular matrices [67]. In thecase of betaglycan and endoglin these ZP modulesare not utilized for polymerization, rather they functionas membrane co-receptors for the TGF superfamily

    6members [68]. The intracellular domain contains aPDZ-binding element similar to that observed in thesyndecan family of proteoglycans (Fig. 1).

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbBetaglycan is a ubiquitously-expressed cell sur-face proteoglycan that acts as a co-receptor formembers of the TGF superfamily of Cys knotgrowth factors which also include activins, inhibins,GDFs and BMPs [69,70]. For example, betaglycanenhances the binding of all the TGF isoforms to thesignaling TGF complex [71] and is needed forTGF2 high-affinity interaction with the receptorcomplex. Betaglycan also blocks the aggressive-ness of ovarian granulosa cell tumors by suppress-ing NF-B-evoked MMP2 expression [72].Betaglycan, together with other TGF-bindingSLRPs, i.e. decorin and biglycan (see below), canbe cleaved by granzyme B, thereby releasing anactive form of TGF [73]. Ectodomain shedding ofbetaglycan is indeed necessary for betaglycan-mediated suppression of TGF signaling and breastcancer migration and invasion [74]. The ability ofbetaglycan to affect epithelial mesenchymal trans-formation [70], together with genetic evidence ofembryonic lethality in Tgfbr3/ mice, suggests thatbetaglycan may play a unique and non-redundantfunction during development.Another important feature of betaglycan is its

    ability to modulate the subcellular topology of thesignaling receptor complex via its PDZ-bindingdomain, which interacts with PDZ-containing pro-teins such as -arrestin [75]. This interaction, aswell as that between betaglycan intracellular do-main and GIPC, would stabilize betaglycan at thecell surface and potentiate its bioactivity. Finally,betaglycan is involved in regulating many functionsincluding reproduction and fetal growth [75], and isa putative tumor suppressor in many forms ofcancer [76]. Several additional betaglycan-evokedactivities have been recently reviewed elsewhere[75].

    Phosphacan/receptor-type protein tyrosinephosphatase

    Phosphacan, originally isolated from rat brain, is aCSPG that interacts with neurons and neuralcell-adhesion molecules (N-CAM) and correspondsto the soluble ectodomain of a Receptor-type proteintyrosine phosphatase (RPTP) [77]. The phos-phacan gene (PTPRZ1) encodes a single-pass typeI membrane protein with a relatively large ectodo-main harboring an N-terminal module homologous tothe alpha-carbonic anhydrase (Fig. 2). Distal to this,there is a fibronectin type III domain. The ectodo-main contains six Ser-Gly repeats, at least four ofwhich are flanked by acidic residues suggestingpotential glycanation sites. Sporadically, phospha-can can also be substituted with keratan sulfatechains. Notably, alternative splice variants encod-

    Proteoglycan nomenclatureing different protein isoforms have been describedbut their full-length nature has not yet beenestablished.

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • morphogen gradients including Wnt, BMP and HhFunctionally, the ectodomain of phosphacan medi-ates cellcell adhesion by hemophilic binding. Inaddition, phosphacan's ability to bind N-CAM andtenascin in a calcium-dependent manner suggeststhat RPTPs may also modulate cellular interactionsvia heterophilic mechanisms [77]. Indeed, phospha-can blocks the growth-promoting ability of N-CAM,axonin-1 TAG-1 and tenascin, and is crucial in theorientedmovement of post-mitotic cells during corticaldevelopment of the brain [78]. Moreover, phosphacanbinds contactin, another member of the Ig superfamilylike N-CAM, and the extracellular portion of thevoltage-gated sodium channel [79]. The latter inter-action appears to be mediated by the carbonicanhydrase-like module of phosphacan's ectodomain.It has been proposed that phosphacan, as an integralextracellular matrix constituent of the neural stem cellcompartment, would contribute to the privilegedmicroenvironment that supports self-renewal andmaintenance of the neural stem cell niche [80].

    Glypicans/GPI-anchored proteoglycans

    Glypicans (GPC) are HSPGs that are bound tothe plasma membrane via a C-terminal lipid moietyknown as GPI, for glycosylphosphatidylinositol,linkage or anchor (Fig. 2). There are six independentgenes in the mammalian genome which can besubdivided into two broad classes: GPC1/2/3/6 andGPC3/5 with orthologs present across Metazoanincluding Dally and Dlp in Drosophila melanogaster[81]. Although most of the protein core is unique tothis family, there is a stretch of amino acid in theectodomain of the protein core with similarity to theCys-rich domain of Frizzled proteins. There are twounique features in the structural organization of allglypicans, with potentially important functionalimplications.First and in contrast to syndecans, the attachment

    of the GAG chains mostly HS chains is locatednear the juxtamembrane region. This allows thethree linear HS chains to span a great deal of plasmamembrane surface, thereby presenting variousmorphogens and growth factors in an active config-uration to their cognate receptors. Indeed, glypicansbind to and modulate the activity of Hedgehog (Hh),Wnt, and FGFs [8284]. More recently, it has beenshown that glypican-3 binds to Frizzled therebyacting directly in the modulation of canonical Wntsignaling [85].Second, glypicans are dually processed via partial

    proteases and lipases. In the former case, theectodomain of glypicans is processed via endopro-teolytic cleavage by a furin-like convertase. Thisprocessing generates two subunits that are thenbound via disulfide bonds, in a way similar to the Met

    Proteoglycan nomenclaturereceptor. In the latter case, the entire glypicanproteoglycan is released from the plasmamembranevia an extracellular lipase (Notum) that cleaves the

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbgradients [84].Notably, the anchorless GPC-1, devoid of the

    GPI anchor, is a stable -helical protein that restshigh concentrations of urea and guanidine HCL[86]. Unfolding data are consistent with a two-s-tate model, suggesting that GPC-1 protein core isa densely-packed globular protein. In agreementwith these data, the crystal structure of theDrosophila glypican Dally-like protein has re-vealed an extended -helical fold [87]. Thecrystal structure of human GPC-1 is very similarto Drosophila Dally-like, and consists of a stable-helical domain with 14 conserved Cys residues,followed by a GAG attachment site that isexclusively substituted with HS chains [88]. Ofinterest, removal of the -helical domain leads tosubstitution with CS chains instead of HS chains,indicating that there is a message embedded inthe -helical domain that drives a different post-translational modification [88].Functionally, glypicans have been involved in the

    control of tumor growth and angiogenesis. Forexample, glypican-3 has been implicated in cancerand growth control. Human mutations of GPC3cause the rare X-linked SympsonGolabiBehmel(SGB) syndrome, characterized by both pre- andpost-natal overgrowth, abnormal craniofacial fea-tures, cardiovascular anomalies, renal dysplasia andurinary tract malformations [84]. Originally, it washypothesized that GPC3 was an inhibitor of IGF-II,given the prominent function of IGF-II in develop-mental growth. However, it was later found that thelevels of IGF-II do not change in Gpc3/ mice nordoes GPC3 interacts with IGF-II. It appears thatGPC3 is an inhibitor of the Hh signaling, insofar asthe Hh-dependent signaling activity is elevated inGpc3/ mice. Moreover, purified glypican-3 bindswith high affinity to Indian and Sonic Hh as well as itcompetes with Patched for Hh binding [83,89]. Arecent study has shown that processing by con-vertases is required for GPC3-evoked suppressionof Hh signaling, and this process is dependent on theHS chains and their degree of sulfation [90]. Thus,the glypican family is not only complex in nature, butis also the control of various modifying enzymes(proteases and lipases) that modulate its biologicalactivity. We are positive than many surprises willhappen in the future regarding unsuspected biolog-ical functions of various glypicans.

    Pericellular and basement membranezone proteoglycansGPI anchor. Drosophila studies have shown that theNotum-mediated release of glypican can regulate

    7This group of four proteoglycans is closelyassociated with the surface of many cell types

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • 8anchored via integrins or other receptors, but theycan also be a part of most basement membranes.Pericellular proteoglycans are mostly HSPGs andinclude perlecan and agrin, which share homologyespecially at their C-termini, and collagens XVIII andXV, which share homology at their N- and C-terminalnoncollagenous domains (Fig. 1).

    Perlecan

    Perlecan is a modular HSPG encoded by a largegene [91,92] with a complex promoter [9395]. The~500-kDa protein core is composed of 5 domainswith homology to SEA, N-CAM, IgG, LDL receptorand laminin [96,97] (Fig. 3). The terminal LG3domain has been crystallized and reveals a jellyroll

    Fig. 3. Schematic representation of the pericellular proteogland XV. The collagenous (COL) and non-collagenous (NC) domof the lower schematics. For brevity only the structure of collprovided in the bottom panel.

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbProteoglycan nomenclaturefold characteristic of other LG modules [3]. Perlecanis expressed by both vascular and avascular tissues[97101], and is ubiquitously located at the apicalcell surface [102,103] and basement membranes[98,104106]. Perlecan regulates various biologicalprocesses primarily because of its widespreaddistribution [101,105] and its ability to interact withvarious ligands and RTKs [107], and more recentlythe potential utilization of perlecan splice variants inmast cells [108]. Perlecan is an early responsivegene and is induced by TGF [109] and repressedby interferon [95]. The heparan sulfate chainsof perlecan and the protein core can be cleaved byheparanase and various proteases [110112],respectively, releasing various pro-angiogenicfactors [113].

    ycans, which comprise perlecan agrin, and collagens XVIIIains of collagen XVIII are numbered on the top and bottomagen XVIII is shown. The key for the various modules is

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • Perlecan is involved in modulating cell adhesion[114,115], lipid metabolism [116], thrombosis and celldeath [117,118], biomechanics of blood vessels andcartilage [119121], skin and endochondral boneformation [122,123], and osteophyte formation [124].Perlecan binds and modulates the activity of severalgrowth factors and morphogens [106,125129] andits expression is often deregulated in several types ofcancer [130134]. In Drosophila, perlecan, known asTrol (for terribly reduced optical lobe) regulates Fgfand Hh signaling to activate neural stem signaling[135,136]. In addition, Trol is essential for thearchitecture and maintenance of the lymph glandand for the proliferation of blood progenitor cells [137].Loss of Trol is associated with premature differentia-tion of hemocytes and this phenotype can be rescuedby ectopic expression of Hh [137]. In mice, Hspg2controls neurogenesis in the developing telencepha-lon [138]. Moreover, perlecan can act as a lipoproteinreceptor and mediate its endocytosis and catabolism[116]. Specifically, domain II of perlecan has beenshown to bind low density lipoproteins and thisinteraction is mediated by the O-linked oligosaccha-rides [139], suggesting an important role for perlecanin atherogenesis and lipid retention.Perlecan is a complex regulator of vascular

    biology and tumor angiogenesis [33,140,141] byperforming a dual function: via the N-terminal HSchains, perlecan is pro-angiogenic [96] by bindingand presenting VEGFA and various FGFs to theircognate receptors [33,141152]. Moreover, hepar-anase-mediated cleavage of basement membraneperlecan releases FGF10 and enhances salivarygland branching morphogenesis [153]. Indeed,ablating Hspg2 or preventing Hspg2 expression inearly embryogenesis causes severe cardiovasculardefects [154157]. The critical role for the N-terminalHS chains of perlecan has been elegantly demon-strated by the generation of mice harboring agenomic deletion of exon 3, designated Hspg23/3

    mice, which encodes the SGDs responsible for thecovalent attachment of HS chains [158]. Thesemutant mice have impaired angiogenesis, delayedhealing after experimental wounding and suppres-sion of tumor growth [159]. When challenged withflow cessation of the carotid artery, the Hspg23/3

    mice show an enhanced intimal hyperplasia andsmooth muscle cell proliferation [160,161]. Moreover,during mouse hind-limb ischemia, the HS chainsof perlecan are key regulators of the angiogenicresponse [162]. Collectively, these studies reaffirm therole of HS perlecan in modulating pro-angiogenicfactors such as FGF2, VEGFA and PDGF.More recently other functions of perlecan have

    been discovered. Using a lethality-rescued Hspg2/

    where perlecan was reintroduced into the cartilage, it

    Proteoglycan nomenclaturewas found that perlecan deficiency leads to signif-icant depression of endothelial nitric oxide synthase[163]. This leads to endothelial cell dysfunction, as

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbshown by attenuated endothelial relaxation, likely asa consequence of endothelial nitric oxide synthaseexpression. This is another example of how asecreted HSPG affects the biology of vascularendothelial cells likely through a receptor-mediatedsignaling pathway. Another recently unveiled func-tion of perlecan is its ability to bind the clusteringmolecule gliomedin [164]. In this case, perlecanbinds dystroglycan at nodes of Ranvier which arerequired for fast conduction and accumulation of Na+

    channels. Perlecan seems to enhance clustering ofnodes of Ranvier components via a specific interac-tion with gliomedin. Thus, perlecan may havespecific roles in the biology and pathophysiology ofperipheral nodes [164].In contrast to the pro-angiogenic N-terminal

    domain I, the C-terminal processed form of perlecandomain V, named endorepellin [165], has a nearlyopposite function: it inhibits endothelial cell migra-tion, capillary morphogenesis, and in vivo angiogen-esis [166169]. A global proteomic analysis ofhuman serum has identified endorepellin as amajor circulating protein [170]. Moreover, endore-pellin has been detected in extracts of fetal cartilage,exclusively in the hypertrophic zone, and it wasspeculated that processing of perlecan protein corein the growth plate could play a role in inhibitingblood vessel invasion or formation in cartilage [171].Elevated endorepellin/LG3 peptides were found inthe plasma proteome of patients with refractorycytopenia with multilineage dysplasia [172], and inthe urine of end-stage renal failure patients [173].These LG3 fragments had N-terminal residues(i.e., cleaved by BMP-1) identical to those reportedby us [174]. Similar LG3 fragments are elevated inthe urine of patients with chronic allograft nephrop-athy [175,176], in the amniotic fluid of pregnantwomen [177] with a marked increase in women withpremature rupture of fetal membranes [178,179] andthose carrying trisomy 21 fetuses [180]. Recently,LG3 peptides have been proposed to represent apotential marker of physical activity [181]. Endorepel-lin fragments have also been detected in the urine ofchildren with sleep apnea [182], in the mediacondit ioned by apoptotic endothelial cells[118,183,184], and in the secretome of pancreaticand colon carcinoma cells [174,185188]. Endore-pellin can be pro-angiogenic in brain infarcts due tothe lack of anti-angiogenic 21 integrin and thepresence of the pro-angiogenic 51 integrin receptorfor endorepellin in brain microvascular endothelialcells [189]. In this context, LG3 can be released byoxygen-glucose deprivation and can be neuroprotec-tive [190,191]. Finally, circulating LG3 levels arereduced in patients with breast cancer, suggestingthat reduced LG3 titersmight be a useful biomarker for

    9cancer progression and invasion [192].Mast cells produce shorter forms of perlecan

    including functional endorepellin, suggesting a

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • potential role of endorepellin in inflammation andtissue repair [193]. Moreover, MMP-7 processing ofperlecan in the prostate cancer stroma acts as amolecular switch to favor cancer invasion [112].Thus, processed forms of perlecan protein coreharboring domains III and IV can function asprotumorigenic factors.Endorepellin binds to the 21 integrin receptor

    [140,166,194], and tumor xenografts generated in21/ mice are insensitive to systemic delivery ofendorepellin [168]. Endorepellin triggers the activa-tion of the tyrosine phosphatase SHP-1 which, inturn, dephosphorylates and inactivates variousRTKs including VEGFR2 [195]. Soluble endorepellinalters the proteomic profile of human endothelialcells [196], and exerts a dual receptor antagonism byconcurrently targeting VEGFR2 and the 21integrin [197]. Notably, the proximal LG1/2 domainsbind the Ig35 domain of VEGFR2 while the terminalLG3 domain, release by BMP-1/Tolloid-like metallo-proteinases [174], binds the 21 integrin [198]. Thisdual signaling causes: (a) Disassembly of actinfilaments and focal adhesions, via the 21 integrin,leading to suppression of endothelial cell migration[198,199], and (b) Activation of SHP-1 dephosphor-ylates Tyr1175, a key residue in the cytoplasmic tail ofVEGFR2, and consequent transcriptional inhibitionof VEGFA [200].More recently, we have discovered that endor-

    epellin induces autophagy in endothelial cells viaVEGFR2 signaling [201], similar to decorin (seebelow). This novel function could contribute to theangiostatic properties of this interesting fragment ofperlecan protein core.

    Agrin

    The second pericellular/basement membraneHSPG is agrin. A C-terminal portion of agrin lackingHS chains was first isolated from the Torpedoelectric organ as an agent responsible for acetyl-choline receptor (AChR) clustering, thereby theeponym agrin, from the Greek ageirein, meaningto assemble [202]. The majority of the research onagrin in mammalians has focused on agrin'scontribution to the control of the postsynapticapparatus in the neuromuscular junction. However,after many years of research, it was serendipitouslydiscovered that agrinwas indeedanHSPG interactingwith N-CAM in the avian brain [203]. Subsequently,orthologs of agrin have been cloned from multiplespecies and are all highly homologous.Agrin has a multimodular structural organization

    that is homologous to that of perlecan with potentialgeneration of several splice isoforms. The N-terminalregion can be spliced to generate either a Type II

    10transmembrane form (TM) of agrin, highly expressedin nervous tissue, or an isoform associated with mostbasement membranes that contains the N-terminal-

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbagrin (NtA) domain (Fig. 3). In the central nervoussystem, TM agrin is highly expressed by axons anddendrites; thus, neurite-associated TM agrin couldpotentially function as receptor or co-receptor forneurite function. The NtA domain has high affinity forthe laminin 1 chain's coiled-coil domain, therebyfunctioning as a link between the cell surface and thebasement membrane. Following the N-terminal do-main is a stretch of nine follistatin-like (FS) repeats,also known as Kazal-type protein inhibitor domains[204]. The last two repeats are separated by aninsertion of two laminin EGF-like (LE) domains.Notably, overexpression of TM agrin in non-neuronalcells induces filipodia-like processes similar to thoseinduced in CNS neurites, and this bioactivity waslocalized to FS repeat seven [205]. Thus FS modulescan modulate an important biological activity ofneurons by affecting the reorganization of the actincytoskeleton during active neurite growth.Following the FS repeats, there are two Ser/Thr

    (S/T)-rich domains which can be alternatively spliced(especially the second ST module) to generate anX+/ form [204]. The two S/T modules are separatedby a SEA module, similar to that of perlecan (seeabove), known to be involved in regulating O-glyco-sylation of mucins and glycoproteins. The N-terminaland central regions of agrin protein core contain theattachment sites for HS chains, and rotary shadowingelectron microscopy has revealed three attachmentsites for HS chains [206]. However, agrin can be ahybrid HS/CSPG with two clusters of Ser-Gly se-quences, one primarily carrying HS chains locatedbetween FS repeats 7 and 8, and one carrying mostlyCS chains, located in the first S/T module [207].An agrin fragment harboring all protein modulesdescribed so far inhibits neuronal outgrowth indepen-dently of HS or CS [208]. The HS chains of agrin,however, bind FGF2, thrombospondin, -amyloidpeptide, N-CAM, and the protein tyrosine phospha-tase [209].The C-terminus of agrin is structurally organized

    as perlecan domain V/endorepellin, with three LGdomains separated by EGF-like modules (Fig. 3).The only difference is the position of the EGFrepeats vis--vis the LG domains. The LG domainsof agrin bind -dystroglycan in skeletal muscle andlow-density lipoprotein-like receptor 4 (LRP4) [210].The latter interaction activates the RTK MuSK whichinitiates a signaling cascade that leads to theformation of pre- and post-synaptic specializations.The terminal LG3 domain of agrin can be alterna-tively spliced with inserts of 8,11 and 19 residuesand their bioactivity is influenced by Ca2+ binding[211]. Moreover, the overall function of agrin isregulated by site-specific processing via MMPs[212]. Agrin is a good example, together with

    Proteoglycan nomenclatureperlecan, of the evolved mechanisms in molecularrecognition and function achieved through utilizationof common protein folds, such as LG modules [211].

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • of the VEGF signaling cascade and, concurrently, to aThus, both agrin and perlecan bind, via their LG-richC-termini, multiple cell surface receptors includingRTKs, and can potently modulate cardiovascular andmusculoskeletal systems. Importantly, conjugation ofLG modules of agrin and perlecan to polymerizinglaminin-2 evokes clustering of acetylcholine receptors[213]. These data provide strong support for acooperative function of basement membrane HSPGsin AChR assembly and function.Of interest, recessive missense mutations in the

    AGRN genes cause congenital myasthenic syn-dromes characterized by defective neuromusculartransmission [214]. More recently, AGRN recessivemissense mutations have been identified as causa-tive factor for a congenital myasthenic syndromewith distal muscle weakness and atrophy, resem-bling distal myopathy [215]. Given the large numberand heterogeneous groups of neuromuscular disor-ders it is likely that in the future new syndromes willbe identified that are linked to genetic abnormalitiesof the AGRN gene.

    Collagens XVIII and XV

    Collagens XVIII and XV, two members of themultiplexin gene family [216220], harbor structur-al features of collagens and proteoglycans, beingsubstituted with HS and CS, respectively [221]. Likeagrin, collagen XVIII was serendipitously discoveredto be an HSPG when monoclonal antibodies wereused against an unidentified avian HSPG [222].Subsequent cloning and sequencing of the cDNAshowed that this avian HSPG protein core showshigh homology to the mammalian collagen XVIII.Collagen XVIII is a homotrimer comprised of threeidentical 1 chains and consists of ten interruptedcollagenous domains, flanked by eleven noncolla-genous domains at their respective N- and C-termini.Collagen XVIII also harbors three Ser-Gly consen-sus binding sites for the attachment of HS chains[223] (Fig. 3). The human COL18A1 gene cangenerate three protein variants derived from alter-native promoter usage and splicing events [221].Specifically, COL18A1 can produce a short variant,a middle variant containing a TSP-1 module, and along variant containing an additional Frizzled repeat.The latter is missing in collagen XV. Both collagensXVIII and XV contain a C-terminal noncollagenousdomain harboring the antiangiogenic endostatinand endostatin-like modules. Specifically, the NC1domain consists of an N-terminal trimerizationregion, a central hinge region sensitive to proteolyticactivity and the C-terminal endostatin domain(Fig. 3). Endostatin interacts with numerous recep-tors including integrins 51, v3 and v5[224,225] and VEGFR2 [226]. Interestingly, endo-

    Proteoglycan nomenclaturestatin, in analogy to endorepellin, is capable ofinducing autophagy in endothelial cells by modulat-ing Beclin 1 and -catenin levels [227]. These

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbstimulation of the synthesis of thrombospondin [228],a powerful angiostatic protein [229,230].Both collagens XVIII and XV are ubiquitously

    expressed in all vascular and epithelial basementmembranes of human and mouse tissues, with anoverall topography reminiscent of that of perlecanand agrin. Notably, Col18a1/ mice show multipleocular abnormalities, especially affecting the anteriorportion of the eyes [231,232]. In humans, mutationsin the COL18A1 gene cause Knobloch syndrome, arare autosomal recessive disease characterized byhigh myopia, vitreoretinal degeneration and retinaldetachment [233,234].Col18a1/ mice show enhanced neovasculari-

    zation and vascular permeability during atheroscle-rotic disease progression [235], and loss of this genein both mice and humans leads to hypertriglyc-eridemia [236]. Moreover, Col18a1/ mice displayenhanced angiogenesis during wound healing[237]. In contrast to Col18a1/, Col15a1/ shownormal vascular formation but primarily developa skeletal myopathy [238]. However, microscopicchanges in the small arterioles with collapsedcapillaries and endothelial cell degeneration inheart and skeletal muscles are also noted [238].Collectively, these findings implicate collagen XVIIIas a negative regulator of angiogenesis and as ananti-atherosclerotic factor. Collagen XV may func-tion as a key structural constituent required for thestabilization of skeletal muscle cells and microves-sels [238], and recently both collagens XV and XVIIIhave been involved in mediating the influx ofleukocytes in renal ischemia/reperfusion [239]. Ofinterest, mice lacking the long form of collagen XVIII(i.e. the N-terminal frizzled-like sequence) butproducing the short form, exhibit a decreasednumber of pre-adipocytes, hepatic steatosis andelevated VLDL and triglyceride levels [240]. Thuscollagen XVIII is directly implicated in the generationof adipose tissue and in hyperlipidemia associatedwith visceral obesity and fatty liver.

    Extracellular proteoglycans

    This is the largest class encompassing 25 distinctgenes. Four genes encode the hyalectans, keystructural components of cartilage, blood vesselsand central nervous systems. They all bind hyalur-findings suggest that C-terminal anti-angiogenicfragments of pericellular HSPGs may evoke endo-thelial cell autophagy which could contribute to theirangiostatic properties.The signaling network evoked by soluble endostatin

    leads to a downregulation of several key components

    11onan and form supramolecular complexes of highviscosity. The second class encompasses 18SLRPs, which have a multitude of functions and

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • 12often signal through various receptors as manymembers are now found in the circulation and invarious body fluids. The third class, SPOCK family,encompasses 3 testicans which are calcium-bindingHSPGs.

    Hyaluronan- and lectin-binding proteoglycans(hyalectans)

    Hyalectans comprise a distinct family of proteo-glycans with structural similarities at both thegenomic and protein levels. This family containsfour distinct genes, namely aggrecan, versican,neurocan, and brevican (Figs. 1 and 4). A sharedfeature of these proteoglycans is their tridomainstructure: an N-terminal domain that binds hyalur-

    Fig. 4. Schematic representation of the hyaluronan- and laggrecan, versican, neurocan and brevican. The full-length verGAG (V2) or both GAG and GAG (V3) are shown. A newGPI-anchored form of brevican is also not shown in the graphicshared by the other hyalectans. These modules are composesequence with four disulfide-bonded Cys residues. The key fo

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbProteoglycan nomenclatureonan, a central domain harboring the GAG sidechains, and a C-terminal region that binds lectins [2].Based on this dual activity at the N- and C-termini,the term hyalectans, an acronym for hyaluronan- andlectin-binding proteoglycans, has been proposed [1].Alternate exon usage and variability in the degree ofglycanation and glycosylation provide diverse func-tional attributes for these proteoglycans which oftenact as molecular bridges between cell surfaces andextracellular matrices.

    Aggrecan

    Aggrecan, as its eponym indicates, has thepropensity to aggregate into large supramolecularcomplexes N 200 MDa together with hyaluronan

    ectin-binding proteoglycans (hyalectans), which comprisesican (V0) and the three splice variants lacking GAG (V1),variant, V4, containing a portion of GAG is not shown. A. The dotted circles specify the globular domains (G1G3)of ~100 amino acids and have a characteristic consensusr the various modules is provided in top right panel.

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • and link protein, and is the principal load-bearingproteoglycan of cartilage [241]. These large aggre-gates generate a densely-packed, hydrated gelenmeshed in a network of reinforcing collagen fibrilsand other proteoglycans and glycoproteins [242].The N-terminal domain contains four link protein-likemodules or proteoglycan tandem repeats in additionto the Ig-like repeat (Fig. 4). The entire link module is~100 amino acids in length and has a characteristicconsensus sequence with four disulfide-bonded Cysresidues. These modules form two globular domainsknown as G1 and G2 [243]. The G1 domain isrelated to link protein and to the other G1 domains ofthe hyalectans, both in terms of structural domainsand subdomains [243]. The G1/hyaluronan/linkprotein ternary complex is very stable therebyimmobilizing the aggrecan into enormous com-plexes that maintain a stable network and providemechanical properties to cartilage. An interglobularregion, between G1 and G2, has a rod-like structureand harbors several protease-sensitive sites in-volved in the partial degradation of aggrecan inarthritis and other inflammatory diseases.Following the G2 domain is a relatively small region

    containing numerous KS chains. This domain is notwell conserved and its size significantly varies amongspecies. Next, is the largest domain of aggrecanwhich contains the GAG-binding region. This proteindomain is encoded by a single, very large (~4 kb)exon with ~120 Ser-Gly dipeptide repeats, which cangenerate N100 covalently-linked CS chains. Theconcentration of negatively-charged forces withinaggrecan accounts for its ability to hold large amountof water, not only in cartilage, but also in theintervertebral disc and brain. Moreover, electrostaticrepulsion forces generated by the numerous negati-vely-charged CS and KS chains of aggrecan providethe equilibrium compressive modulus (a measure ofstiffness) of cartilage. In humans, variable number oftandem repeats can generate different alleles in thegeneral population, ranging between 13 and 33repeats, causing a great variability in the aggrecandegree of glycanation and negative charge (due tosulfation) within cartilage.The G3 module of aggrecan contains 2 EGF-like

    repeats, a C-type lectin domain and a complementregulatory protein (CRP) domain. Notably, the EGFrepeats can be alternatively spliced in part becausein rodents exon 13 is a pseudoexon. Moreover, inrodent brain, the most common aggrecan specieslacks both EGF repeats [244]. As in the case of otherhyalectans, the C-type lectin domain of aggrecanbinds simple sugars, such as fucose and galactose,in a Ca2+-dependent manner. Thus, aggrecan G3may serve as a binding domain for the galactosepresent on collagen type II or other extracellular

    Proteoglycan nomenclaturematrix or cell surface constituents. Moreover, the G3domain of aggrecan interacts with tenascins, fibulinsand sulfated glycolipids [245]. Thus, aggrecan could

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbbridge and interconnect various constituents of thecell surface and extracellular matrix via its C-terminalG3 domain, thereby providing a mechanosensitivefeedback to the chondrocytes. Indeed, epiphysealchondrocytes grown on hydrogel substrata canmaintain their phenotype for up to six months withproper secretion of cartilage-specific constituents,such as aggrecan, and collagens type II and IX, butwithout expressing collagen type I [246].The essential role of aggrecan in cartilage is

    underscored by several genetic defects including twoautosomal recessive chondrodystrophies, nanomeliain chickens and cartilage matrix deficiency (cmd) inmice [247]. In nanomelia, the defect leads to theformation of a C-terminal truncated aggrecan, while incmdmice there is an even larger C-terminal truncation.In both mutant animals, there is little or no aggrecan incartilage leading to shortened long bones and lethality,most likely due to respiratory failure arising fromtracheal collapse [247]. Aggrecan is also involved inthe morphogenesis of limb synovial joints and articularcartilage [248], and fragments of aggrecan representbiomarkers for osteoarthritis [249].Aggrecan is also expressed in the brain, and unlike

    other hyalectans, is expressed primarily in theperineuronal nets [79]. A relatively small number ofcortical neurons express aggrecan, especially thecortical interneurons [244]. One of the hypothesizedfunctions of brain aggrecan is its potential regulation ofneural maturation, in addition to its physical ability toadduct cations and regulate osmotic imbalances.Thus, aggrecan could affect high-rate synaptic trans-mission, mechanical stabilization of synaptic contactsand neuroprotection by counteracting oxidative stressvia scavenging redox-active cations [244].

    Versican

    Versican, an eponym that signifies its highlyversatile function [250], is the largest member ofthe hyalectan family when expressed as a wholemolecule, designated V0 (Fig. 4). Versican is themammalian counterpart of the so-called PG-M, alarge chondroitin sulfate proteoglycan expressedduring chondrogenesis in chick limb buds [251,252].The VCAN gene, originally called CSPG2 [253255], encompasses 15 exons encoding a full-length(V0 variant) protein core of ~400 kDa, with 3396amino acid residues. The overall structural organi-zation of versican is similar to that of aggrecan, witha few exceptions. At the N-terminus there is only oneglobular domain instead of two. Specifically, theN-terminal domain of versican contains one IgG foldfollowed by two consecutive link protein modulessimilar to G1, which are involved in mediating thebinding of proteins to hyaluronan. Recombinant

    13versican and a truncated form of versican containingthe N-terminal domain bind to hyaluronan with highaffinity, KD ~ 4 nM, in the same range as the other

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • major aggregating CSPG, aggrecan [256]. Thecentral domain of versican comprises two relativelylarge subdomains, designated GAG (encoded byexon 7) and GAG (encoded by exon 8), which canbe alternatively spliced to generate the three mainvariants V1, V2 and V3 [255], with significant CSpolymorphism in the different versican isoforms.These large regions lack Cys residues and contain~30 potential consensus sequences for GAGattachment as well as several binding sites for N-and O-linked oligosaccharides. There is also vari-ability in tissue expression of the isoforms, with V0and V1 representing the most ubiquitous isoforms,expressed in the developing heart and limbs,vascular smooth muscle cells and several non-neuronal tissues, whereas the V2 isoform is mainlypresent in the brain [79]. Expression of the V3isoform in arterial smooth muscle cells regulatesmultiple signaling pathways, including TGF, EGFand NF-B pathways, thereby creating a microenvi-ronment resistant to monocyte adhesion [257].Recently, a new splice variant of Versican, V4, hasbeen identified in human breast cancer, whichcontains up to five CS chains [258]. This isoformcomprises only the first 1194 bp of exon 8 (encodingthe GAG) sandwiched between exon 6 and 9, andis highly expressed in breast cancer in contrast tonormal breast tissue where it is undetectable [258].Notably, the avian versican ortholog harbors anadditional exon, known as PLUS, in the N-terminalregion that is developmentally regulated [259]. Thisexon can be alternatively spliced giving rise to twoadditional isoforms. Although no similar region ispresent in the mammalian genome, sequencehomology suggests that the PLUS domain of avianversican may correspond to the KS attachmentregion in aggrecan.The C-terminal domain of versican is also very

    similar to that of aggrecan and other hyalectans in thatit harbors similar structural motifs, including twoEGF-like repeats, a C-type lectin domain, and acomplement regulatory protein-like module (Fig. 4).These motifs are generally found in the selectin familyof glycoproteins, which include several adhesionreceptors regulating leukocyte homing and extrava-sation during inflammation. Given the fact that thevarious C-type lectin modules may have differentsaccharide-binding specificity, the presence of thesedomains at the C-terminal ends of hyalectans couldprovide specialized and refined functions for theseCSPGs. Moreover, these findings suggest thatversican may form a molecular link between lectin-containing glycoproteins at the cell surface andextracellular hyaluronan. Because hyaluronan isbound to the cell surface via its CD44 receptor[241,260], versican may also stabilize a large supra-

    14molecular complex at the plasmamembrane zone [2].The functional roles of versican are multiple and

    complex. Versican is involved in the regulation of cell

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbadhesion, migration and inflammation [260262].During an inflammatory response, leukocytes need toemigrate from the inner bloodvessels into thedamagedsurrounding tissues. During this process, leukocytesencounter a provisional matrix highly enriched inversican, which in turn is capable of interacting withmany receptors on the surface of immune cellsincluding CD44, P-selectin glycoprotein-1, and Toll-likereceptors [261]. Another important role of versicanderives from the multiple processing of its protein core.Versican is degraded and partially processed byseveral MMPs, plasmin and members of the ADAMTSfamily [263,264]. Versican is also involved in thebiologyof leiomyosarcomas insofar as its levels are markedlyincreased vis--vis benign leiomyomas, and suppres-sion of versican expression attenuates malignantgrowth and tumor progression [265].Two autosomal dominant eye disorders, Wagner

    syndrome and erosive vitreo-retinopathy, which bothshow optically empty vitreous cavities, are causedby mutations in the VCAN gene [266]. Interestingly,the mutant alleles contain mutations around thesplice sites flanking exon 8, which encodes theGAG domain, likely producing exon skipping. Theultimate consequence of exon skipping is that mosttissues, and especially the eye, would have a lack ofthe GAG domain with much fewer CS chains, andthus a less charged environment.

    Neurocan and brevican

    The third member of the hyalectans is neurocan, adevelopmentally regulated CSPG originally clonedfrom rat brain, and thus its eponym to signifyneuronal origin [267]. Rotary shadowing electronmicroscopy of neurocan has revealed two globulardomains interconnected by a 6090 nm rod [268],similar to the predicted organization of other hya-lectans derived from biochemical and genomicanalyses. As other hyalectans, neurocan has anN-terminal domain with structural homology to thetypical arrangements found in link protein, harboringa G1 domain and an Ig repeat (Fig. 4). Functionally,recombinant N-terminal module of neurocan inter-acts with hyaluronan in solution, and isolatedcomplexes comprise gel permeation assays, andhyaluronan and globular profiles [268]. Therefore, itis highly likely that all the N-terminal domains of thehyalectans bind and interact with hyaluronan andlink protein in vivo, forming gigantic supramolecularaggregates. The next interglobular region of neuro-can, with little homology to other proteins, contains~seven potential CS binding sites. The C-terminalmodule of neurocan shares significant homology tothe G3 domain of aggrecan and versican, with ~60%identity between the rat neurocan and human

    Proteoglycan nomenclatureversican/aggrecan. By analogy to the other hyalec-tan members, this domain could bind several brainglycoproteins including Ng-CAM, N-CAM, and

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • Proteoglycan nomenclaturetenascin. Neurocan is known to inhibit neuriteoutgrowth in vitro and, in keeping with this function,the expression of neurocan is increased at the site ofmechanical and ischemic injury in the adult centralnervous system [78,269]. Neurocan has beenimplicated in path finding during development.However, Ncan/ mice develop normally with onlymild deficiency in long-term potentiation, suggestingthat neurocan might only have a redundant roleduring development.Brevican is one of the most important hyalectans

    of the central nervous system. It takes its eponymfrom the Latin word brevis (for short) as it harbors atypical hyalectan configuration with N- and C-termi-nal homologous domains, but with the shorterGAG-binding domain (Fig. 4) [270,271]. Brevicanwas simultaneously discovered by three laboratoriessearching for hyaluronan-binding proteoglycans inthe brain [271,272] and for synapse associatedproteins [273]. The eponym BEHAB, which issometimes used for brevican as they are the samegene products, refers to brain-enriched hyaluronanbinding protein [272]. Although sequence homologywith the other hyalectan members is quite uniform(~60% overall), the GAG-binding domain is poorly

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbFig. 5. Phylogenetic tree of thesmall leucine-rich proteoglycans(SLRPs) and crystal structure ofporcine decorin and biglycan dec-orin. (A) Dendogram of the fivehuman SLRP classes, numberedand color-coded. Protein se-quences were first aligned withCLUSTALW before an unrooteddendogram was generated by aneighbor joining method using Gen-omeNet. (B) Cartoon ribbon dia-gram of the crystal structure ofmonomeric bovine decorin ren-dered with Pymol v1.7 (PDB acces-

    15conserved and contains a high content of acidicamino acid residues (mainly glutamic acid). Thisstructural feature, shared with the link protein-likemodule of versican, could mediate binding tocationic proteins and minerals. In analogy toneurocan, brevican can exist as either a full-lengthCSPG or as a partially cleaved product without theGAG-binding module and the N-terminal domain.Similar to neurocan, brevican exists in vivo either asa full-length proteoglycan or as a proteolytically-processed form lacking the GAG-binding region andthe N-terminal domain. The C-terminal G3-likedomain is structurally organized like the otherhyalectans, although it harbors only one EGF-likerepeat instead of two as in all the other members(Fig. 4).In addition to secreted full-length brevican, an

    isoform of brevican encoded by a shorter 3.3 kbmRNA and highly expressed during post-nataldevelopment, is linked to the plasma membranevia a GPI anchor [273]. Notably, the GPI-anchoredbrevican lacks EGF, C-type lectin and CRP modulesbut contains a stretch of hydrophobic amino acidsresembling the GPI-anchor. Brevican is located atthe outer surface of neurons and is enriched at

    sion number 1XKU). Vertical arrowsindicate -strands, while coiled rib-bons indicate -helices. The leuci-ne- r ich repeats (LRRs) arenumbered above the diagram. Thesequence (SYIRIADTNIT) involvedin binding to collagen type I[306,307] is highlighted in yellow.The terminal LRR Cys cappingmotif, known as the ear repeat, isalso indicated [299].

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • [291,292] and in terms of modulating the bioactivity of

    various signaling pathways when in soluble form[293295]. Moreover, several SLRPs bind TGF andbone morphogenetic protein (BMP), and severalmembers of this family inhibit cell growth [296,297].The crystal structure of bovine decorin [298] shows

    a solenoid fold structure typical of LRRs (Fig. 5B).Each LRR unit is composed of ~24 amino acids,characterized by a conserved pattern of hydrophobicperisynaptic sites. Brevican interacts with tenas-cin-R and fibulin-2 via its G3-like domain [274].Functionally, brevican has been implicated in glioma

    tumorigenesis, nervous tissue injury and repair, and inAlzheimer's disease [274]. However, many morestudies need to be performed before a clear pictureof brevican's biology can be clearly drawn.

    Small leucine-rich proteoglycans/SLRPs

    General considerations

    This is the largest family of proteoglycans encom-passing 18 distinct gene products and numeroussplice variants and processed forms. The eponymSLRP, for small leucine-rich proteoglycans [1], isnow a widely-used abbreviation. SLRPs designate aclass of proteoglycans characterized by a relativelysmall protein core (as compared to the largeraggregating proteoglycans) of 3642 kDa andencompassing a central region constituted byleucine-rich repeats (LRRs) (Fig. 5) [275]. TheSLRPs are ubiquitously expressed in most extracel-lular matrices and are highly expressed duringdevelopment in the thin membranes enveloping allthe major organs such as meninges, pericardium,pleura, periosteum, perichondrium, perimesium andendomesium [276278] This strategic topologysuggests that SLRPs would be directly involved inregulating organ size and shape during embryonicdevelopment and homeostasis [279,280].The 18 SLRP members are grouped into five

    classes: Classes IIII are canonical genes, whereasClasses IV and V are non-canonical (Fig. 1). Althougheight non-canonical members do not carry glycos-aminoglycan side chains, they have been includedbecause they share close structural homology andseveral functional properties with the full-time proteo-glycans. This classification is based on severalconsiderations, including evolutionary conservation,homology at both the protein and genomic level, andchromosomal organization (Fig. 5A) [281]. It isimportant to note that SLRPs share many biologicalfunctions in terms of binding to various collagens[282286], RTKs [287290], innate immune receptors

    16residues, with short parallel -sheet on the concaveface interwoven with loops containing short -strands,310 helices and polyproline II helices on the convex

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matb(outer) side of the protein core (Fig. 5B). The LRRsform a curved, solenoid structure where protein/protein interactions occur primarily via the side chainsof variable residues protruding from the short parallel-strands that form the inner (concave) face of thesolenoid. The LRRs are flanked at the N- andC-termini by disulfide-bonded caps which define thevarious classes [277]. At the N-terminus, there arefour Cys residues with a variable number of interven-ing aminoacids,whereas theC-terminal cappingmotifencompasses two LRRs and includes the so-calledear repeat (Fig. 5B). This Cys-capping motif, desig-nated LRRCE, is present in the canonical SLRPs(Classes IIII) but absent in the other two non-canon-ical classes [299]. Likely, both capping motifs at eitherend of SLRPsClass IIII would function to stabilize theLRR central domain as in the case of other LRRprotein and receptors.Another characteristic feature of Class IIII SLRPs

    is the presence of a long penultimate LRR (LRR XI indecorin), that has been called the ear repeat [300].Typically, the ear repeats contain 30 or more aminoacid residues including an atypical sequence harbor-ing a Cys located at about 10 residues after theasparagine residue in the consensus LRR [300].Genetic mutations in the decorin gene leading to aterminal truncation of the decorin protein core, lackingthe ear repeat, cause congenital stromal cornealdystrophy [301]. This syndrome has been faithfullyreproduced in mice where this truncated decorin wasspecifically expressed into the cornea [302,303].Although bovine decorin has been crystallized as an

    anti-parallel dimer [298] and reported to be a dimer insolution [304], there is strong evidence that decorinacts as a monomer in solution [293], especially wheninteracting with the small binding site on the EGFRectodomain in vivo where a dimer could not fit thecavity [305]. Also supportive of a concave facebindingis the identification of the sequence (SYIRIADTNIT) inLRRVII (highlighted in yellow in Fig. 5B) of the decorinprotein core that is directly involved in binding tocollagen type I [306,307]. A recent study utilizingmutant forms of mouse decorin, where engineeredglycosylated sites in the concave face preventdimerization, has shown that the monomeric mutantsare as stable as the wild-type in solution [308]. Theconcave facemutants fail to bind collagen, regardlessof the dimerization state, thus providing robustbiological evidence for a concave face-mediatedbinding (i.e., monomeric decorin) to collagen [308].A hallmark shared by nearly all SLRPs, and bymost

    LRR-containing proteins, is their propensity to interactwith other proteins and to regulate collagen fibrillo-genesis [282,283,309,310]. For example, severalSLRPs interact with fibrils of collagen types I, II, III,V, VI and XI. Indeed, the eponym decorin derives

    Proteoglycan nomenclaturefrom its ability to decorate fibrillar (banded) collagen ina periodic fashion, that is, decorin protein corenon-covalently binds, about every 67 nm, to an

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • intraperiod site on the surface of collagen fibrils, everyD period [311,312]. In highly purified 1(I) procollagenmolecules, decorin protein core binds close to anintermolecular cross-linking site near the C terminus[313]. SLRP coating of various types of collagenserves a dual function: it regulates the lateralassociation of collagen molecules into proper fibrils,and protects collagen fibrils from proteolysis bysterically limiting the access of collagenases to theircleavage sites. It is important that, during evolution,these dual functional properties of SLRPs are sharedbyboth their sulfatedGAGsandprotein cores.Notably,few SLRP members contain stretches of amino acidsthat can be sulfated, such as the poly-Tyr sulfate infibromodulin or the poly-Asp region in asporin. Often,the GAGs are located in the N-terminus, in a locationthat is similar to that of these poly-sulfated amino acidstretches, and can be directly involved in collageninteraction [314,315]. An additional degree of com-plexity is provided by the heterogeneous structure ofthe GAG chains. For instance, Class I SLRPs containCS or DS chains, with the exception of asporin, ECM2,and ECMX. In contrast, Class II members containpoly-lactosamine or KS chains in their LRRs andsulfated Tyr residues at their N-termini. Class IIImembers contain CS/DS (epiphycan), KS (osteogly-cin), or no GAG (opticin). Finally, the non-canonicalClass IV and V members lack GAG chains with theexception of chondroadherin, which is substituted withKS.The biological functions of SLRPs are very vast and

    there are over 3000 published papers on decorinalone, the archetypal and most studied SLRP. Thus,we refer the readers to recent comprehensive andspecial ized reviews on SLRPs [275,281283,294,307,316325]. Moreover, it has been pro-posed that SLRPs can be transcriptionally co-regu-lated through utilization of HOX-Runxmodules in theirpromoters and genomic regions, including proximalexons and intergenic regions [326]. Below, is a briefoverview of each family with emphasis on recentdiscoveries of their multiple functional roles inphysiological and perturbed states.

    Class I SLRP

    Decorin, also known as PG40 and DSPG1, wasoriginally cloned from a fibroblast cDNA library [327],and subsequently named decorin because of itsability to decorate collagen fibrils [328]. Specifically,decorin protein core is a Zn2+ metalloprotein[329,330] that is biologically active in solution as amonomer [293]. As mentioned above, decorinprotein core binds non-covalently to an intraperiodsite on the surface of collagen fibrils about every67 nm, at the D period [312]. Using purified collagen

    Proteoglycan nomenclatureand procollagen molecules, that can be visualized bytheir C-terminal globular regions, it has been shownthat decorin protein core binds near the C terminus of

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbcollagen1(I), near an intermolecular cross-linking site[313]. Not only the protein core but also the N-terminalGAG chain of decorin plays a role in collagenfibrillogenesis and structure [285,314,315,331334].The strategic location of the GAG binding domain inthe N-terminus of decorin allows a higher degree ofmobility for the DS chain, which presumably couldalign orthogonally or parallel to the axis of the collagenfibrils. This dual function of decorin could help inmaintaining corneal transparency and biomechanicalproperties of various connective tissues [282,284,335].The decorin gene exhibits a complex genomic

    organization and transcriptional control [276,336338] and its transcription can be induced by quies-cence and suppressed by TNF [339,340]. It wasknown for many years that the small DSPG of tendon,mostly decorin, is capable of inhibiting lateral growthof collagen fibrils [309]. Thus, when the decorin-nullmice were generated, the first targeted deletion of aproteoglycan-encoding gene, the abnormal collagenstructure in the dermis and the skin fragility phenotype[310] provided the first genetic evidence for aregulatory role for the prototype member of SLRPgene family in collagen fibrillogenesis. The phenotypeof the decorin deficient mice includes abnormalcollagen fibril morphology in the skin and tail tendon,presumably by being less stable during developmentdue to abnormal cross-linking or enhanced suscepti-bility to collagenase. The prevalent phenotype of thedecorin-null mice is skin fragility caused by a thinningof the dermiswith concurrent reduced tensile strength,a biomechanical impairment directly linked to theabnormal collagen network. Overall, the Dcn/ miceresemble the cutaneous defects observed in theEhlersDanlos syndrome, characterized by skinhyperextensibility and tissue fragility [341], in a wayopposite to fibrosis [342]. Due to its mild phenotype,the Dcn/ mice have been utilized by a large numberof investigators using many experimental challengesand have provided strong genetic evidence fordecorin roles in Lyme disease [343,344], lungmechanics and asthma [345,346], diabetic nephrop-athy and tubulointerstitial fibrosis [347350], myocar-dial infarction [351], corneal transparency and tendonbiomechanical properties [352356], dentin mineral-ization and periodontal homeostasis [357359], he-patic fibrosis and hepatocellular carcinoma [318,360362], collagen fibrillogenesis [314,363,364], fetalmembrane biology [365367], wound healing andangiogenesis [368373], innate immunity and inflam-mation [291,374,375], adhesion and migration [376],and mesenchymal stem cell biology [377]. Decorinplays an important role during zebrafish developmentinsofar as zDcn knockdown causes a severe pheno-type characterized by abnormal convergent exten-sion, craniofacial abnormalities, and cyclopia [278].

    17As these genetic defects are reminiscent of severalzebrafish mutants affecting the non-canonical Wntsignaling pathway, it is possible that decorin might

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • also play a role in this pathway in mammalians.Indeed, a recent study has shown that decorin isdirectly involved in modulating the signaling pathwayofWnt3a shaping niches supportive of hematopoiesis[378].Mutations in the decorin gene have been linked to

    congenital stromal corneal dystrophy (CSCD) syn-drome [301,379] where a truncated form of decorinlacking the ear repeat, the C-terminal 33 aminoacids, acts in dominant negative fashion. A cornealknock in transgenic mouse lacking the C-terminal 33amino acid residues (952delTDcn) faithfully recapit-ulates the human phenotype of corneal opacities[302]. Mechanistically, the C-terminal truncated formof decorin is retained in the cytoplasm of keratino-cytes, triggering ER stress and an unfolded proteinresponse [380]. These data provide a cell-based,rather than ECM-based, interpretation of the CSCDphenotype whereby a truncated SLRP protein core,by inducing ER stress, causes an abnormal pro-cessing and secretion of decorin and other SLRPs,eventually generating an abnormal matrix assemblyand corneal opacities.Decorin was the first proteoglycan to be directly

    involved in the control of cell growth. Two seminalpapers identified decorin as a growth suppressor, viaa mechanism involving decorin's binding to andinhibiting TGF in Chinese hamster ovary cells[381,382]. Concurrently, decorin was identified as aproteoglycan highly expressed in the tumor stromaof colon carcinomas [383], primarily via hypomethy-lation of its promoter regions [384]. It was soonrecognized, however, that the growth of mostmalignant cells does not depend on the availabilityof TGF. Thus, there had to be other signalingreceptors for the growth suppressive function ofdecorin. The existence of such receptor(s) wassupported by an emerging body of literature describ-ing that ectopic expression of decorin or its proteincore suppress the malignant phenotype in a varietyof histogenetic malignant backgrounds [385,386].Utilizing A431 cells, a squamous carcinoma cell linewhich overexpress EGFR, it was discovered thatexogenous decorin proteoglycan or protein coretransiently activated the EGFR to induce growthinhibition via expression of the cyclin-dependentkinase inhibitor p21WAF1 [287,387,388]. Indeed,decorin binds to a narrow region of the EGFR,partially overlapping with but distinct from theEGF-binding epitope [305]. Mechanistically, decorintransiently activates the EGFR and elevates cytosolicCa2+ in A431 cells [389], but it causes a sustaineddown-regulation of this RTK, thereby providing aplausible mechanism for controlling tumor growthin vivo in various forms of cancer [390392].Specifically, soluble decorin evokes protracted inter-

    18nalization and degradation of the EGFR via caveolarendocytosis [393]. An anti-oncogenic role for decorinhas been also demonstrated in its ability to inhibit

    Please cite this article as: Iozzo Renato V., Schaefer Liliana, Proof proteoglycans, Matrix Biol (2015), http://dx.doi.org/10.1016/j.matbanother member of the ErbB family, namely theErbB2/Neu, in this case by inhibiting heterodimeriza-tion of ErbB4 with ErbB2, thereby leading to growthsuppression and cytodifferentiation of mammarycarcinoma cells [394]. It was subsequently foundthat decorin binds specifically and with higher affinity(KD ~ 2 nM) to hepatocyte growth factor receptorknown as Met [288] and causes proteasomal degra-dation of Myc and -catenin, two critical downstreameffectors ofMet [395]. An important downstreameffectof the decorin/Met interaction is induction of twoanti-angiogenic proteins, Thrombospondin 1 andTIMP3, with concurrent inhibition of two powerfulpro-angiogenic factors, HIF-1andVEGFA [371,372].Moreover, decorin binds and suppresses both theIGF-IR [289,396,397] and VEGFR2 [371,398].Loss of decorin in the tumor stroma correlates with

    poor survival of patients with invasive breastcarcinomas [275,399,400] and in mice with sponta-neous breast cancer [401]. Moreover, decorin ismarkedly reduced in the stroma of many solid tumors[402404], as well as low- and high-grade bladdercarcinomas, but is highly expressed in the normalbladder stroma [397]. Decorin levels are alsodecreased in multiple myeloma [405,406], soft tissuesarcomas [407], prostatic [408], urothelial [409411]and hepatic [362,412] carcinomas, together with acomplete loss of decorin expression by severaltumor cells [413,414]. Additional proof for an onco-static role of decorin as a soluble tumor repressorstems from genetic models wherein ablation ofdecorin under conditions of a high-fat, western-typediet, is linked to the spontaneous appearance ofintestinal tumors [415,416]. Moreover, compoundDcn/;Tp53/ mice die of aggressive T-cell lym-phomas much sooner than mice lacking only thetumor suppressor Tp53 [417]. Notably, systemicdelivery of decorin, either as a soluble factor or viaadenoviral gene delivery, significantly retards tumor-igenic and angiogenic growth in a wide variety ofmalignant solid tumors [413,418424]. Collectively,these findings provide strong support to the conceptthat decorin could act as a guardian from the matrixin analogy to p53, a guardian of the genome [414].Thus, decorin could become a potent therapeuticfactor, either alone or in combination with traditionalchemotherapy, in preventing tumor progression andmetastasis [297].Recently, it was discovered that soluble decorin

    evokes excessive autophagy in endothelial cells,independently of nutrient deprivation, through partialagonistic activity on VEGFR2 [425]. This signalingcascade emanating from the decorin/VEGFR2 inter-action leads to two effects. First, it activates AMPKand Vps34, which in turn stimulate the synthesis ofPeg3 [426], a recently-identified master regulator of

    Proteoglycan nomenclatureautophagy [422]. Peg3 recruits LC3 and Beclin 1,which evoke autophagy, and concurrently inducestranscription of both genes, while inhibiting VEGFA

    teoglycan form and function: A comprehensive nomenclatureio.2015.02.003

  • production [425]. These multiple biological roles ofdecorin would converge on oncostasis by suppress-ing RTK signaling in the growing cancer cells andinhibiting the supply of oxygen and nutrient viahindering angiogenesis and inducing a protracted,and in this case deleterious, stromal cell autophagy[427]. In view of the fact that decorin has been foundin the circulation in nanomolar amounts [428430],at concentrations similar to those used in theexperimental studies mentioned above, and asplasma decorin is significantly increased in cancerpatients [291], it is plausible that this endogenoustumor repressor might have a physiological role invivo.Biglycan, decorin's closest proteoglycan, was orig-

    inally isolated from bovine bone and then, following itscloning and sequencing, was found to contain twoSer-Gly attachment sites in theN-terminal region, thusits eponym meaning two GAG chains [431]. Both thehuman and mouse genes have an overall similarexonic arrangement [432,433]. It is highly homolo-gous to decorin, with N65% overall homology. Similarto decorin, biglycan binds TGF [434] and modulatesits bioactivity [435]. Ablation of the biglycan gene,Bgn/0 (this genetic symbol designates the presenceof Bgn gene on the X chromosome), which harbors agene with a ubiquitous tissue distribution and apronounced expression in bone [433,436], reveals akey function for this SLRP in regulating postnatalskeletal growth [437]. In general, the long bones inBgn/0 mice grow slower than wild-type littermatesand eventually are shorter and exhibit reduced bonemass. The latter is secondary to the marked decline innumber of osteoblasts with concurrent progressivedepletion of the bone marrow stromal cells [437].These mutant mice also display delayed osteogene-sis after marrow ablation [438], broader metadentin,and altered dentin mineralization, causing significantenamel structural defects. Thus, biglycan-deficientmice could be a promising animal model to studyskeletal diseases and osteoporosis [439]. AlthoughDcn/ mice also show abnormalities in bonecollagen fibril size and organization, they show neitherovert bone mass defects nor abnormal osteoblastgrowth as in the case of biglycan deficiency. Thesefindings underline non-overlapping functions that haveevolved for these two homologous Class I SLRPs.Biglycan modulates BMP-4-induced osteoblast

    differentiation [440], and it also binds Chordin andBMP-4 inXenopus embryos, thereby blocking BMP-4activity [441]. Moreover, biglycan affects the Wntsignaling pathway [442], in analogy to decorin (seeabove). However, a recent study has shown thatbiglycan acts as a pro-angiogenic stimulus in contrastto dec