Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal...

6
THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 264, No. 8, Issue of March 15, pp. 45714576,1989 Printed in U.S.A. Deduced Protein Sequence of Bone Small Proteoglycan I (Biglycan) Shows Homology with Proteoglycan I1 (Decorin) and Several Nonconnective Tissue Proteins in a Variety of Species* (Received for publication, September 9,1988) Larry W. Fisher$, John D. Termine, and Marian F. Young From the Bone Research Branch, National Institute of Dental Research, National Institutes of Health, Bethesda, Maryland20892 The small proteoglycans (PG) of bone consist of two different molecular species: one containing one chon- droitin sulfate chain (PG 11) and the other, two chains (PG I). These two proteoglycans are found in many connective tissues and have M, = 45,000 core proteins with clear differences in their NH2-terminal se- quences. Using antisera produced againstsynthetic peptides derived from the human PG I and PG I1 NH2 termini, we have isolated several cDNA clones from a X gt 11 expression library made against mRNA isolated from human bone-derived cells. The clones, which re- acted with antisera to the PG I1 peptide, were se- quenced and found to be identical with the PG I1 class of proteoglycan from human fibroblasts known as PG- 40 or decorin. The clones reacting to the PG I antisera, however, had a unique sequence. The derived protein sequence of PG I showed sufficient homology with the PG I1 sequence (55% of the amino acids are identical, with most others involving chemically similar amino acid substitutions) to strongly suggest that the two proteins were the result of a gene duplication. PG I1 (decorin) contains one attached glycosaminoglycan chain, while PG I probably contains two chains. For this reason, we suggest that PG I be called biglycan. The biglycan protein sequence contains 368 residues (M, = 42,510 for the complete sequence and M, = 37,983 for the secreted form) that appears to consist predominantly of a series of 12 tandem repeats of 24 residues. The repeats are recognized by their con- served leucines (and leucine-like amino acids) in posi- tions previously reported for a diverse collection of proteins (none of which is thought to be proteoglycans) including: two morphogenic proteins (toll and chaop- tin) in the fruitfly; a yeast adenylate cyclase; and two human proteins, the von Willebrand Factor-binding platelet membrane protein, GPIb, and a rare serum protein, leucine-rich glycoprotein. In 1968, Herring (1) firstpresented evidence that bone chondroitin sulfate chains are covalently associated with pep- tides or proteins. Using slightly modified Laemmli (2) gra- * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 504599. $ To whom correspondence should be addressed Rm. 106, Bldg. 30, Bone Research Branch, National Institute of Dental Research, NIH, Bethesda, MD 20892. dient polyacrylamide gels, we showed in 1983 (3) that fetal bovine bone contains two electrophoretically distinct small proteoglycan (PG)’ species. The apparent molecular weights of the core proteins (AIr = 45,000 each on SDS gels) and chondroitin sulfate chains (AIr = 40,000 each) were consistent with the two PG species consisting of either one or two chondroitin sulfate chains on small core protein(s). Using the same gel electrophoresis system, Rosenberg et al. (4) later showed a similar pair of small proteoglycans in bovine carti- lage and named them PG I and PG I1 (larger and smaller electrophoretic species, respectively). In 1985, we reported that, on separation, the two small bone proteoglycans had slightly different amino acid compositions and different im- munoreactivity (5). Using peptide mapping, Heinegard and co-workers (6) presented data suggesting that the predomi- nent proteoglycan in cow bone was different from the small proteoglycan(s) found in certain cartilages and other connec- tive tissues. Work with young human bone proteoglycans (7) then showed that the NH2 termini of PG I and PG I1 are different from each other. In this paper, we present the cDNA- derived protein sequence of bone PG I and show that it is a different gene product than PG 11. Furthermore, Northern analysis using PG I cDNA as a probe shows that a message of similar size is present in a number of different connective tissues. MATERIALS AND METHODS Construction and Screening of a Bone-deriued Cell Culture cDNA Library-RNA from primary cultures of adult human bone cells (28) was extracted using a guanidine HCl procedure (8), and poly(A)+ mRNA was isolated by affinity chromatography on oligo(dT)-cellu- lose (Pharmacia LKB Biotechnology Inc.). Approximately 20 pg of poly(A)+was used to construct a X gtll ZAP library (custom library section of Stratagene Cloning Systems). The amplified cDNA expres- sion library was first screened in Escherichia coli BB4 cells as de- scribed by Young and Davis (9) using a polyclonal antiserum from a rabbit injected with both human bone PG I and PG I1 (rabbit LF-5) and a peroxidase-conjugated second antibody made in goat (Kirke- gaard and Perry Laboratories). Positive plaques were identified by reaction with Hz02 and 4-chloro-1-naphthol. Positive clones were rescreened to purity with the same antiserum and then screened again in a second round with antisera made against synthetic peptides corresponding to residues 11-24 of the secreted form of human bone PG I (7) or 5-17 of human fibroblast PG I1 (10). The synthetic peptides were made on an Applied Biosystems Model 430A peptide synthesizer using t-butoxycarbonyl-protected amino acids and stand- ard reaction conditions suggested by the manufacturer. The peptides were deprotected using anhydrous HF and conjugated either to bovine serum albumin for PG I (LF-15) or to keyhole limpet hemocyanin for PG I1 (LF-30), respectively, as described (11). Clones containing cDNA for PG I were found to be missing the startcodon of the open The abbreviations used are: PG, proteoglycan; SDS, sodium do- decyl sulfate; bp, base pairs; kbp, kilobase pairs. 4571

Transcript of Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal...

Page 1: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

THE J O U R N A L OF BIOLOGICAL CHEMISTRY Vol. 264, No. 8, Issue of March 15, pp. 45714576,1989 Printed in U.S.A.

Deduced Protein Sequence of Bone Small Proteoglycan I (Biglycan) Shows Homology with Proteoglycan I1 (Decorin) and Several Nonconnective Tissue Proteins in a Variety of Species*

(Received for publication, September 9,1988)

Larry W. Fisher$, John D. Termine, and Marian F. Young From the Bone Research Branch, National Institute of Dental Research, National Institutes of Health, Bethesda, Maryland 20892

The small proteoglycans (PG) of bone consist of two different molecular species: one containing one chon- droitin sulfate chain (PG 11) and the other, two chains (PG I). These two proteoglycans are found in many connective tissues and have M, = 45,000 core proteins with clear differences in their NH2-terminal se- quences. Using antisera produced against synthetic peptides derived from the human PG I and PG I1 NH2 termini, we have isolated several cDNA clones from a X gt 11 expression library made against mRNA isolated from human bone-derived cells. The clones, which re- acted with antisera to the PG I1 peptide, were se- quenced and found to be identical with the PG I1 class of proteoglycan from human fibroblasts known as PG- 40 or decorin. The clones reacting to the PG I antisera, however, had a unique sequence. The derived protein sequence of PG I showed sufficient homology with the PG I1 sequence (55% of the amino acids are identical, with most others involving chemically similar amino acid substitutions) to strongly suggest that the two proteins were the result of a gene duplication. PG I1 (decorin) contains one attached glycosaminoglycan chain, while PG I probably contains two chains. For this reason, we suggest that PG I be called biglycan. The biglycan protein sequence contains 368 residues (M, = 42,510 for the complete sequence and M, = 37,983 for the secreted form) that appears to consist predominantly of a series of 12 tandem repeats of 24 residues. The repeats are recognized by their con- served leucines (and leucine-like amino acids) in posi- tions previously reported for a diverse collection of proteins (none of which is thought to be proteoglycans) including: two morphogenic proteins (toll and chaop- tin) in the fruit fly; a yeast adenylate cyclase; and two human proteins, the von Willebrand Factor-binding platelet membrane protein, GPIb, and a rare serum protein, leucine-rich glycoprotein.

In 1968, Herring (1) first presented evidence that bone chondroitin sulfate chains are covalently associated with pep- tides or proteins. Using slightly modified Laemmli (2) gra-

* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 504599.

$ To whom correspondence should be addressed Rm. 106, Bldg. 30, Bone Research Branch, National Institute of Dental Research, NIH, Bethesda, MD 20892.

dient polyacrylamide gels, we showed in 1983 (3) that fetal bovine bone contains two electrophoretically distinct small proteoglycan (PG)’ species. The apparent molecular weights of the core proteins ( A I r = 45,000 each on SDS gels) and chondroitin sulfate chains ( A I r = 40,000 each) were consistent with the two PG species consisting of either one or two chondroitin sulfate chains on small core protein(s). Using the same gel electrophoresis system, Rosenberg et al. (4) later showed a similar pair of small proteoglycans in bovine carti- lage and named them PG I and PG I1 (larger and smaller electrophoretic species, respectively). In 1985, we reported that, on separation, the two small bone proteoglycans had slightly different amino acid compositions and different im- munoreactivity (5). Using peptide mapping, Heinegard and co-workers (6) presented data suggesting that the predomi- nent proteoglycan in cow bone was different from the small proteoglycan(s) found in certain cartilages and other connec- tive tissues. Work with young human bone proteoglycans (7) then showed that the NH2 termini of PG I and PG I1 are different from each other. In this paper, we present the cDNA- derived protein sequence of bone PG I and show that it is a different gene product than PG 11. Furthermore, Northern analysis using PG I cDNA as a probe shows that a message of similar size is present in a number of different connective tissues.

MATERIALS AND METHODS

Construction and Screening of a Bone-deriued Cell Culture cDNA Library-RNA from primary cultures of adult human bone cells (28) was extracted using a guanidine HCl procedure (8), and poly(A)+ mRNA was isolated by affinity chromatography on oligo(dT)-cellu- lose (Pharmacia LKB Biotechnology Inc.). Approximately 20 pg of poly(A)+ was used to construct a X gtll ZAP library (custom library section of Stratagene Cloning Systems). The amplified cDNA expres- sion library was first screened in Escherichia coli BB4 cells as de- scribed by Young and Davis (9) using a polyclonal antiserum from a rabbit injected with both human bone PG I and PG I1 (rabbit LF-5) and a peroxidase-conjugated second antibody made in goat (Kirke- gaard and Perry Laboratories). Positive plaques were identified by reaction with Hz02 and 4-chloro-1-naphthol. Positive clones were rescreened to purity with the same antiserum and then screened again in a second round with antisera made against synthetic peptides corresponding to residues 11-24 of the secreted form of human bone PG I (7) or 5-17 of human fibroblast PG I1 (10). The synthetic peptides were made on an Applied Biosystems Model 430A peptide synthesizer using t-butoxycarbonyl-protected amino acids and stand- ard reaction conditions suggested by the manufacturer. The peptides were deprotected using anhydrous HF and conjugated either to bovine serum albumin for PG I (LF-15) or to keyhole limpet hemocyanin for PG I1 (LF-30), respectively, as described (11). Clones containing cDNA for PG I were found to be missing the start codon of the open

The abbreviations used are: PG, proteoglycan; SDS, sodium do- decyl sulfate; bp, base pairs; kbp, kilobase pairs.

4571

Page 2: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

4572 Deduced Protein Sequence of Bone Proteoglycan I reading frame, so the library was rescreened using a 200-bp 5”EcoRI- BglII fragment labeled with 32P (Amersham nick translation kit and 3000 Ci/mmol of [’*P]O, a-CTP, Du Pont-New England Nuclear). Prehyhridization, high stringency hybridization, washing, and detec- tion on Kodak X-Omat AR film were as described (12).

DNA Sequence of cDNAs-Purified insert was isolated from plas- mid DNA according to the instructions provided by Stratagene Clon- ing Systems. Briefly, the purified X phage vectors containing the cDNAs were used to co-infect a BB4 E. coli stain with R408 helper phage (Stratagene Cloning Systems). The pBluescript plasmid DNA (packaged in M13 or fl phage particles a t this stage) was rescued by infecting and plating fresh bacteria on ampicillin-containing plates. (X Phage in the helper phage preparation were destroyed by heating to 70 “C for 20 min.) Colonies (containing the pBluescript plasmid and the cDNA insert) were plucked and grown in a larger scale preparation. After purification of the plasmids, the cDNA inserts were liberated either with a combination of BamHI and KpnI or XbaI and NaeI (the latter removing only a -1.0-kbp 5’ piece). These large fragments were directionally subcloned in M13mp18 or -19 (13) previously restricted with the above enzymes (using SmaI for the NaeI site). Smaller restriction fragments were also made from the XbaIINaeI fragment using BglII and RsaI and subcloned into appro- priately cut M13 vectors. After transformation into JMlOl cells, single-stranded DNA from recombinant phage was annealed to either the 17-bp universal primer or the appropriate synthetic oligonucleo- tide of an internal site (13) and sequenced using the dideoxy chain termination method (14). The Sequenase kit (United States Biochem- icals) using both GTP and ITP nucleotide mixes and CT-[~~S]ATP (1110 Ci/mmol, Du Pont-New England Nuclear) were used for the sequencing reactions. The nucleotide sequences were determined by electrophoresis on 6% and 8% polyacrylamide urea gels followed by exposure on X-AR film (Kodak).

Northern Analysis-3.5-pg portions of total RNA from cells in culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS 17/2), or tissues (bovine: skin, articular cartilage, and cornea; and rat swarm chondrosarcoma) were electrophoresed in 1.2% formaldehyde-agarose gels and transferred to nitrocellulose as described (15). An agarose gel-purified XbaI-NaeI fragment of PG I (clone P6, included >90% of the coding region of the cDNA but no noncoding regions and corresponded to nucleotides -180-1130) or PG I1 (clone P2, entire BamHIIKpnI 1.6-kbp insert) were labeled by nick translation (see above) and hybridized to the RNA bound to the nitrocellulose. Hy- bridization was carried out a t 37 “C in 40% formaldehyde for 16 h, washed, and exposed to x-ray film for 3-6 h as described previously (16). A blot was probed for PG I mRNA, autoradiographed, stripped of PG I probe by boiling for 5 min in 2 X SSC, and 0.1% SDS followed by an ice-cold wash in 2 X SSC, and reprobed with the PG I1 cDNA probe.

Polyacrylamide Gel Electrophoresis and Electroblotting-Polyacryl- amide gradient (4-20%) SDS slab gels (160 X 140 X 1.5 mm) topped with 3% stacking gel were prepared and electrophoresed as described (3). Electrotransfer of core proteins (in 150 pg dry weight of adoles- cent monkey bone mineral compartment extract; 200 pg each of 4 M guanidine HC1-extracted monkey skin, tendon, cartilage, cornea, or muscle (crude extracts); or 10 pg of purified human bone PG I (7) each digested for 1 h at 37 “C with 10 milliunits of chondroitinase ABC (Miles) (3)) from the SDS gels and onto nitrocellulose was according to the method of Towbin et al. (17). Indirect immunodetec- tion using antisera and conjugated second antibodies was as above.

RESULTS

The strategy for cloning the cDNA encoding human bone proteoglycan I (PG I) was to produce antisera distinguishing PG I from PG 11. For this purpose, we produced antisera against synthetic peptides corresponding to amino acid se- quences near the NH2 termini. The PG I synthetic peptide corresponded to residues 11-24 and is directly COOH-termi- nal to the two likely chondroitin sulfate attachment sites at positions 5 and 10 (7). The PG I1 synthetic peptide (positions 5-17) is directly COOH-terminal to the single glycosamino- glycan attachment site of the PG I1 molecule (also known as DSPG 11, PG-40, and decorin (18)). Both peptides were con- jugated to carrier proteins, and 1-mg portions were injected into rabbits at 10 intradermal and 2 intramuscular sites (one

injection series with Freund‘s complete adjuvant and two booster series in incomplete adjuvant). Fig. 1 shows the results of immunodetection of PG I and PG I1 core proteins in extracts from a variety of adolescent monkey tissues. PG I1 was present in many connective tissues, while PG I had a more limited distribution at this stage of development in monkey.

Primary screening of the gtll ZAP bone cell expression library was with an antiserum made against a combination of purified human bone PG I and PG 11. Twelve positive plaques were seen by indirect immunodetection using peroxidase- labeled second antisera and were purified to homogeneity. Each clone was rescreened with peptide antiserum to PG I and separately to PG 11. Three clones were positive for PG I1 and four for PG I. These seven clones were subsequently converted from X phage to pBluescript plasmid as described. The cDNA insert from the longest clones in each class were subcloned into M13 vectors and sequenced as described under “Materials and Methods.” A 1.6-kbp PG I1 clone (P2) was partially sequenced and was found to be identical with the PG-40 clone 3C previously published by Krusius and Rous- lahti (10) except that at the 5’ end our clone started at nucleotide -70 (and was identical as far as sequenced from this end, to base +130). Furthermore, our sequencing reac- tions from the 3’ end of clone P2 were in exact agreement from nucleotide 1357 to 1428 of clone 3C, then disagreed for the last 7 nucleotides of clone 3C (GGAATTC, which is probably the sequence of the synthetic E c o R I linker used by Krusius and Rouslahti (10) during the synthesis of their cDNA library and not an example of any real differences in the original two mRNAs). Our cDNA clone, P2, was then found to be 93 bases longer than the 3C clone at the 3’ end (data not shown).

Sequencing of the 5‘ end of one putative PG I clone (P6) confirmed its identity as PG I when an open reading frame was found that included the entire 29-residue sequence pre- viously found by NHz-terminal protein microsequencing (7). Neither this clone (P6) nor the other three PG I clones (data not shown) contained the start of translation codon ATG (Fig. 2). The most 5’ 200-bp fragment of the insert was used to rescreen the original library. Nineteen positive clones were purified. Analysis showed that the clones were about equally divided into two size classes, a 1.6- and a 2.8-kbp class (the 1.6-kbp class appears to have been an artifact of the construc- tion of the library as no 1.6-kbp message is seen on the Northern analysis). Restriction mapping showed both classes

Anti PG I (N-peptide 11-24)

t 4 3 K

t 4 3 K

Anti PG I I (N-peptide 5-17)

FIG. 1. Indirect immunodetection of PG I and PG 11 core proteins from 4 M guanidine HCI extracts (crude) of monkey tissues and from purified human bone PG I. Chondroitinase ABC-digested extracts were electrophoresed on 4-20% gradient poly- acrylamide SDS gels, electroeluted onto nitrocellulose, and detected using the appropriate peptide antiserum, peroxidase-conjugated sec- ond antiserum, and 4-chloro-1-naphthol. The molecular weight stand- ard shown is ovalbumin.

Page 3: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

Deduced Protein Sequence of Bone Proteoglycan 1 4573

I '7 2b0 4b0 6b0 SA0 ,do0 1200 1200 16bo 2800

FIG. 2. Restriction map and DNA sequencing strategy of PG I cDNA. P 6 / -3' Scale at top is in base pairs. ATG and ATG TAG TAG show positions of start and stop p 1 6 codons, respectively. Arrows show se- A 4 A A A H H

quencing reactions in M13 with the open 0 .E 0 s g circles representing positions of syn- = z thetic oligonucleotide sequencing. Other sequencing reactions used subcloned re- I- - " - I I 3'

.c I L

X

striction ~ fragments and universal primers.

GAGTAGCTGCTTTCGGTCCGCCGGACACACCGGACAGATAGACGTGCGGACGGCCCACCACCCCAGCCCGCCAACTAGTCAGCCTGCGCC TGGCGCCTCCCCTCTCCAGGTCCATCCGCCATGTGGCCCCTGTGGCGCCTCGTGTCTCTGCTGGCCCTGAGCCAGGCCCTGCCCTTTGAG

MatTrpProLauTrpArgLauValSarLauLauAlaLaUSerGlnAlaLaUPKoPhaGlU

CAGAGAGGCTTCTGGGACTTCACCCTGGACGATGGGCCATTCATGATGMCGATGAGGAAGCTTCGGGCGCTGACACCTCAGGCGTCCTG GlnArgG1yPhoT~pAspPhoThrLauAspAspGlyProPhaHotHatAsnAspGluGluAlaSerGlyAlaAspThrSerGlyValL~U

-1 +1

I G A C C C ~ ~ A C T C T ~ T C ~ C A C C ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ A T G T G T ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - AspProAspSarValThrProThrTyrS~rAlaHatCysProPhaGlyCysHisCysHisLouArgVa~Va~GlnCysSorAspLauG~y

CTGAAGTCTGTGCCCMAGAGATCTCCCCTGACACCACGCTGCTGGACCTGCAGAACAACGACATCTCCGAGCTCCGCAAGGATGACTTC LauLysSorValProLysGluIloSorProAspThrThrLouL~uAspLauGlnAsnAsnAspIlaSarGluLauArgLysAspAspPh~

MGGGTCTCCAGCACCTCTACGCCCTCGTCC~GGTGAACMCAAGATCTCCAAGATCCATGAGAAGGCCTTCAGCCCACTGCGGMCGTG LysG1yLauG1nHisLauTyrAlaLauV~lLauValAsnAsnLysIlaSarLysIlaHi6GluLysAlaPhaSarProLauArgAsnVal

CAGMGCTCTACATCTCCMGMCCACCTGGTGGAGATCCCGCCCAACCTACCCAGCTCCCTGGTGGACGTCCGCATCCACGACAACCGC GlnLysLauTyrIloSarLysAsnHisLauV~~G~uIleProP~oAsnLauProSarSarLauValAspVa~ArgIlaHisAspAsnArg

ATCCGCAAGGTGCCCAAGGGAGTGTTCAGCGGGCTCCGGAACATGAACTGCATCGAGATGGGCGGGAACCCACTGGAGAACAGTGGCTTT IlaArgLysValP~oLysG1yValPhaSarGlyLeuAsgAsnHetAsnCysIleGluHatGlyGlyAsnProLauGluAsnS rGlyPhe

GAACCTGGAGCCTTCGATGGCCTGMGCTCMCTACCTGCGCATCTCAGAGGCCAAGCTGACTGGCATCCCCAAAGACCTCCCTGAGACC GluProGlyAlaPhaAspGlyLouLysL~uAsnTyrLauArgIlaSarGluAlaLysLauThrGlyIlaProLysAspLauProGluThr

CTGMTGAACTCCACCTAGACCACMCMAATCCAGGCCATCGAACTGGAGGACCTGCTTCGCTACTCCAAGCTGTACAGGCTGGGCCTA LouAsnGluLouHisLauAspHisAsnLysIloGlnAl~IlaGluLouGluAspLauLouArgTyrSarLysLauTyrArgLauGlyLau

GGCCACMCCAGATCAGGATGATCGAGMCGGGAGCCTGAGCTTCCTGCCCACCCTCCGGGAGCTCCACTTGGACAACAACAAGTTGGCC GlyHisAsnGlnIloA~gHatIloG1uA nGlySerLeuSarPhaLeuProThrLauArgGluLauHisLauAspAsnAsnLysLauAla

AGGGTGCCCTCAGGGCTCCCAGACCTCAAGCTCCTCCAGGTGGTCTATCTGCACTCCAACAACATCACCAAAGTGGGTGTCAACGACTTC ArgVa1ProSorGlyLauProAspLouLysLouLauGlnValValTyrLauHisS~rAsnAsnIloThrLysValGlyValAsnAspPha

TGTCCCATGGGCTTCGGGGTGMGCGGGCCTACTACMCGGCATCAGCCTCTTCAACAACCCCGTGCCCTACTGGGAGGTGCAGCCGGCC CysProHatGlyPhaGlyValLysArgAlaTyrTyrAsnGlyIlaSarLauPhoAsnAsnProValProTyrTrpGluValGlnProAla

ACTTTCCGCTGCGTCACTGACCGCCTGGCCATCCAGTTTGGCAACTACAAAAAGTAGAGGCAGCTGCAGCCACCGCGGGGCCTCAGTGGG ThrPhaArgCysValThrAspArgLauAlaIlaGlnPhaGlyAsnTyrLysLys

GGTCTCTGGGGAACACAGCCAGACATCCTGATGGGGAGGCAGAGCCAGGAAGCTAAGCCAGGGCCCAGCTGCGTCCAACCCAGCCCCCCA CCTCAGGTCCCTGACCCCAGCTCGATGCCCCATCACCGCCTCTCCCTGGCTCCCAAGGGTGCAGGTGGGCGCAAGGCCCGGCCCCCATCA CATGTTCCCTTGGCCTCAGAGCTGCCCCTGCTCTCCCACCACAGCCACCCAGAGGCACCCCATGAAGCTTTTTTCTCGTTCACTCCCAAA CCCAAGTGTCCAMGCTCCAGTCCTAGGAGAACAGTCCCTGGGTCAGCAGCCAGGAGGCGGTCCATAAGAATGGGGACAGTGGGCTCTGC CAGGGCTGCCGCACCTGTCCAGMCATOTTCTOTTCTGTTCCTCCTCCTCATGCATTTCCAGCCTTG

0 Q

P A

,"11"~1,~1"11"1~~,"""""""""""""""""""~~~"""""~~"""".

GGACAGCGGTCTCCCCAGCCTGCCCTGCTCAGCCCTGCCCCCAAACCTGTACTGTCCCGGAGGAGGTTGGGAGGTGGAGGCCCAGCATCCC GCGCAGATGACACCATCAACCGCCAGAGTCCCAGACACCGGTTTTCCTAGMGCCCCTCACCCCCACTGGCCCACTGGTGGCTAGGTCTCC CCTTACTCTTCTGGTCCAGCGCAACCAGGGGCTGCTTCTGAGGTCGGTGGCTGTCTTTCCATTAMGMACACCGTGCAAMAAA

FIG. 3. cDNA sequence of the PG I clone (P16) and the deduced amino acid sequence. The bottom Portion of this panel shows the 3' end of the longer (P6) clone. (This longer clone had an identical sequence in the coding regon.) Arrow shows start of sequence of the secreted form of the proteoglycan. Underlined portion is identical with the protein microsequence determined previously (7). circles, possible glycosaminoglycan attachment sites; closed triangles, possible N-linked oligosaccharide attachment sites.

180 90

-18

270 13

360 43

450 73

103 540

630 133

720 163

810 193

900 223

990 253

1080 283

1170 313

1260 (343)

1350 1440 1530 1620

(1710)

contained approximately 200 bp more at the 5' end of the complete protein and 37,983 for the secreted product) (Fig. 3) original mRNA and identical open reading frames (data not starting with a 19-residue hydrophobic sequence correspond- shown). We subcloned and sequenced on 1.6-kbp clone (P16, ing to a secretion signal was found in many proteins. The Fig. 2), and the results are shown in Fig. 3. next 18 residues (-18 to -1) contain several charged amino

The 1685-bp cDNA has an open reading frame of 378 amino acids uncharacteristic of a leader sequence. This may corre- acids (corresponding to a molecular weight of 42,510 for the spond to a propeptide immediately NHz-terminal to the Asp-

Page 4: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

4574 Deduced Protein Sequence of Bone Proteoglycan I

Glu-Glu-Ala-Ser.. . sequence (underlined portion of Fig. 3) previously shown to be the NH, terminus of the PG I molecule in bone matrix (7). The open reading frame of the secreted portion of the PG I molecule shows four possible Ser-Gly attachment sites of the chondroitin of dermatan sulfate chains (closed circles in Fig. 3). The first two of these are more likely attachment sites of the two glycosaminoglycan chains based on the partial consensus sequence Asp/Glu-XXX-Ser-Gly described by Rouslahti e t al. (18) and because microsequenc- ing of the protein showed evidence of a substituted serine at both positions 5 and 10 (7). Shortly carboxyl-terminal to these probable glycosaminoglycan chain attachment sites is a short cysteine-rich region (positions 26-39). Starting within this latter region is the beginning of a series of consensus repeats of a nominal 24-residue sequence (positions 34 to 305, also shown at the top of Fig. 4) which constitutes greater than 80% of the secreted form of the molecule. The remainder of the protein is composed predominantly of hydrophobic amino acids, perhaps partially enclosed within a single cysteine disulfide loop (between positions 284 and 317). The sequence also contains two possible N-linked oligosaccharide attach- ment sites at positions 233 and 274.

Total RNA was isolated from a variety of animal tissues and cells and probed with radiolabeled cDNA encoding first PG I and then PG I1 (Fig. 5). The essentially ubiquitous distribution of the 1.6-kbp PG I1 message was similar to that of the core protein seen in Fig. 1. The level of the '2.6-kbp message of PG I (often seen as a closely spaced doublet, not shown) appeared to be substantially lower (relative to the PG I1 mRNA levels) in some tissues and cells in culture. This

N S G F E P G A F D G L K G I O D L P E T A I E L E D L R Y S S K M I E N G S L E I F

U

34-57 58-78 79-102 103-126 127-147 148-172 173-193 194-217 218-238 239-264 265-283 298-305

h p c I hpc I1

doll dChaoptin yAdCyclase hLRG hCPIb

FIG. 4. Sequence similarities of nominal 24-residue repeats found in the deduced protein sequence of human PG I, residues 34-305 (upper). Numbers on the right are amino acid positions in the known secreted protein. Boxed regions are positions of identical amino acids used to determine the consensus sequence shown for human PG I. Notice the number of chemically similar amino acids (but not boxed) found in the same column, an observation which increases the probability that the regions are a result of tandem repeats and subsequent divergence. Lower case letters represent amino acids that are found in a significant number of repeats at the same position, but are not found as often as the upper case coded amino acids. Also shown are similar consensus repeat sequences reported for regions of: human PG 11, hPG II (18); Drosophila Toll gene, dToll (25); Drosophila chaoptin, dChaoptin (26); yeast adenylate cyclase, yAdCyclnse (24); human serum leucine-rich glycoprotein, hLRG (21);

tein, hGPIb (22). and human von Willebrand factor-binding platelet membrane pro-

Y 0 Human Bovine - 0 Rat

rmrrrn.* r w = - = - q q

2.6kb

W

PG I

PG II FIG. 5. Detection of PC I (upper) and PC I1 (lower) mRNA

in various tissue and cell cultures. 32P-Labeled PG I cDNA probe hybridized to a 2.6-kbp PG I message in human bone, skin, and periodontal ligament (PDL) cells in culture, but not in this sample of gingival (ging.) cells in culture. A similar sized PG I message was seen in bovine skin and cartilage tissue, but not cornea tissue. Bovine bone, rat osteosarcoma cells (ROS) , and rat chondrosarcoma (RCS) tissue all showed hybridization to a message of similar size. Chicken embryo fibroblasts (CEF) either did not contain detectable levels of PG I mRNA or the probe failed to cross-hybridize under stringent conditions. Human PG I1 probes bound to 1.6-kbp message in all of the human and bovine samples tested. Chicken mRNA showed a weak but reproducible band of slightly larger size. Both rat sarcoma cell systems either contained no mRNA for PG I1 or the probe shows no cross-hybridization.

may have also have been reflected in lower amounts of its core protein in similar tissues (Fig. 1). While the human PG I cDNA did appear to hybridize to rat mRNA from the ROS osteosarcoma cell line and chondrosarcoma, either the human PG I1 probe did not hybridize to rat PG I1 mRNA or there was no PG I1 message present in these cells (Fig. 5). The lack of hybridization of human PG I1 probe to rat PG I1 message may be more likely because bovine PG I1 probes do not hybridize well with rat mRNA (29). The human PG I and PG I1 cDNA did not hybridize with each other under stringent conditions.

DISCUSSION

A cDNA encoding the small proteoglycan I of bone has been cloned and sequenced. The longest cDNA clone was approximately 2.8 kbp, containing a 1134-bp open reading frame. The deduced PG I protein sequence is shown diagram- matically in Fig. 6. The protein can be seen to contain six regions including (from NH2 to COOH terminus): 1) leader sequence; 2) a putative propeptide; 3) a likely glycosaminogly- can attachment segment; 4) a short cysteine-rich region; 5) a large region comprised of 12 repeats of a nominal %-residue

Page 5: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

Deduced Protein Sequence of Bone Proteoglycan I 4575

consensus sequence; and 6) a hydrophobic COOH terminus possibly including a single disulfide loop. The PG I cDNA (Fig. 7a) and the deduced protein sequence (Fig. 7 b ) shows that PG I sequences have sufficient homology with the cor- responding human fibroblast PG I1 sequences to propose that PG I and PG I1 are a direct consequence of gene duplication. For comparison, human and bovine PG I1 core proteins have diverged less than 10% since the two species have taken independent paths of evolution (19). Our observation that bone of many different species contains two proteoglycans of a size similar to human bone PG I and PG I1 on SDS gels (5) would suggest that the presence of the two proteoglycans is

500 1000 1500 J " ~ ' " " ' : " " " " " ' ' " ' " " ' . . ' . ""I

hPG II

ATFRCVTDRLAIOFGNYRR 368

STFRCVWRSAIOLGNYK. 359 I I I I I I I l l I I I I b

FIG. 7. Comparison of human PG I and PG I1 nucleotide (a) and amino acid (b ) sequences. The nucleic acid sequences of PG I and PG I1 were analyzed using the Compare program of the University of Wisconsin Genetics Computer Group using the stand- ard comparison methods of Maize1 and Lenk (27). The broken l ine near the diagonal represents regions of homology. Segments off of the diagonal are regions of homology between sequences further off register. The Bpanel shows that PG I and PG I1 proteins have regions of amino acid identity throughout their entire sequence where identity is highlighted by vertical lines.

not just a case of an evolutionarily neutral gene duplication (within a single species) and subsequent random mutation, but represents a sustained divergent evolution of two gene products with possibly related but different functions. An interesting but presently unanswerable question is whether one of the two proteoglycans represents the original primor- dial gene (and function) or whether both have evolved to better perform two slightly different functions of the original gene.

As originally reported by Patthy (20) and later expanded upon by Rouslahti (18), the human PG I1 protein sequence contains 10 tandem repeats of a 24-residue consensus repeat that has homology to several non-proteoglycan gene products seen in a wide variety of species, including the fruit fly and yeast (Fig. 4). Our analysis of the closely related human PG I protein sequence has extended the number of reasonable tandem repeats to 12. In effect, PG I (and possibly PG I1 by analogy) may be considered to be a collection of tandem repeats (>80% of the protein core) with a glycosaminoglycan attachment region. Interestingly, all but one (the leucine-rich glycoprotein of human serum, a protein of unknown function, Ref. 21) of these (nonproteoglycan) proteins reported to con- tain these consensus sequences have been postulated to use the repeats to bind other molecules. In fact, the repeats in the platelet membrane glycoprotein GPIb have been directly shown to be the binding region for von Willebrand factor (22, 23). The other proteins containing tandem repeats of the consensus sequence include: 1) yeast adenylate cyclase (with the repeats being implicated in the binding of the enzyme to the cytosolic side of the cell membrane) (24); 2) Drosophila toll gene, a maternal effect gene that plays a role in embryonic dorsal-ventral patterning (hypothetically using the extracel- lular repeats in cell adhesion) (25); and 3) Drosophila chaop- tin, a cell surface glycoprotein required for photoreceptor cell morphogenesis (with the repeats being used to bind the pro- tein to the extracellular side of the cell membrane) (26).

In summary, we have isolated and sequenced a cDNA clone corresponding to the human bone proteoglycan I. Because this proteoglycan core protein contains two glycosaminogly- can chains, we suggest the name biglycan. The secreted bi- glycan protein is predominantly a series of 12 tandem repeats of a nominal 24-residue consensus sequence with its two probable glycosaminoglycan attachment sites associated with the NHp-terminal region of the molecules. Both the cDNA and deduced protein sequences show that biglycan (PG I) and decorin (PG 11) result from a gene duplication. The original primordial PG 1/11 gene appears to have been a result of a series of tandem gene duplications of a short gene that appar- ently has been used many times in diverse living organisms for generating protein domains with the capacity to bind (or perhaps more accurately adsorb) to other protein surfaces and possibly cell membrane structures.

Acknowledgments-We would like to thank Drs. Wolfgand Lindner of the Institute of Pharmaceutical Chemistry, University of Graz, Austria, W. Lee Maloy, National Institute of Allergy and Infectious Diseases for production of the synthetic peptides used for antisera production, and Michael Lowe for expert help in the Northern analy- sis.

REFERENCES

1. Herring, G. M. (1968) Biochem. J. 107,41-49 2. Laemmli, U. K. (1970) Nature 227,680-685 3. Fisher, L. W., Termine, J. D., Dejter, S. W., Jr., Whitson, S. W.,

Yanagishita, M., Kumura, J. H., Hascall, V. C., Kleinman, H. K., Hassell, J. R., and Nilsson, B. (1983) J. Biol. Chem. 268,

4. Rosenberg, L. C., Choi, H. U., Tang, L.-H., Johnson, T. L., Pals, 6588-6594

Page 6: Deduced Protein Sequence of Bone Small Proteoglycan I ... · culture (human bone, skin, periodontal ligament, gingiva, bovine bone, chicken embryo fibroblasts, rat osteosarcoma (ROS

4576 Deduced Protein Sequence of Bone Proteoglycan I S., Webber, C., Reiner, A,, and Poole, A. R. (1985) J. Biol. C1oning:A Laboratory Manual, Cold Spring Harbor Laboratory, Chem. 260,6304-6313 Cold Spring Harbor, NY

5. Fisher, L. W. (1985) in The Chemistry and Biology of Mineralized 16. Shimokawa, H., Sobel, M. E., Sasaki, M., Termine, J. D., and Tissues (Butler, W. T., ed) pp. 188-196, EBSCO Media Inc., El Young, M. F. (1987) J. Biol. Chem. 262,4042-4047 Toro, CA 17. Towbin, H., Staehelin, T., and Gordon, J. (1979) Proc. Natl. Acad.

dell, S., Malstrom, A., Paulsson, M., Sandfalk, R., and Vogel, 18. Rouslahti, E. (1988) Annu. Reu. Cell Biol. 4 , 229-255 K. (1985) Biochem. J. 2 3 0 , 181-194 19. Day, A. A., McQuillan, C. I., Termine, J. D., and Young, M. R.

(1987) J. Biol. Chem. 262,9702-9708 20. Patthy, L. (1987) J. Mol. Biol. 198, 567-577

K. M., de Crombrugghe, B., and Pastan, I. (1977) Proc. Natl. Natl. Acad. Sci. U. S. A. 8 2 , 1906-1910 Acad. Sci. U. S. A. 7 4 , 3399-3403 Natl. Acad. Sci. U. S. A. 8 4 , 5610-5614

Z. M. (1986) J. Biol. Chem. 261 , 12579-12585

6. Heinegird, D., Bjorne-Persson, A., Coster, L., Franzbn, A., Gar- Sci. U. s. A. 7 6 , 4350-4354

7. Fisher, L. W., Hawkins, G. R., Tuross, N., and Termine, J. D. (1987) Biochem. J. 248,801-805

8. Adams, s. L., Sobel, M. E., Howard, B. H., Olden, K., Yamada, 21. Takahashi, N., Takahashi, y., and Putnam, F. w. (1985) Pmc.

22. Titani, K., Takio, K., Handa, M., and Ruggeri, Z. M. (1987) Proc.

23. Handa, M., Titani, K., Holland, L. Z., Roberts, J. R., and Ruggeri, 9. Young, R. A., and Davis, R. W. (1983) Proc. Natl. Acad. Sci. U. S. A. 80, 1194-1198

10. Krusius, T.9 and Rouslahti, E. (1986) Proc. Natl. A d . SCi. 24. Kataoka, T., ~ ~ ~ ~ k , D., and wigler, M. (1985) cell 4 3 , 493-505 U. S. A. 8 3 , 7683-7687

11. Lindner, W., and Robey, F. A. (1987) Znt. J. Peptide Protein Res.

12. Young, M. F., Vogeli, G., Nunez, A. M., Fernandez, M. P., Cell 52,291-301

25. Hashimoto, C., Hudson, K. L., and Anderson, K. V. (1988) Cell

26. Reinke, R., Krantz, D. E., Yen, D., and Zipursky, S. L. (1988)

Sullivan, M.3 and Sobel, M. E. (1984) Nucleic Acid5 Res. 12* 27. Maizel, J. V., and Lenk, R. P. (1981) Proc. Natl. Acad. Sci. 4207-4228

13. Messing, J., Crea, R., and Seeburg, p. H. (1981) Nucleic Acids 28. Gehron Robey, P., and Termine, J. D. (1985) Cakif. Tissue Znt. Res. 9,309-321 37,453-460

14. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. 29. Day, A. A., Ramis, C. I., Fisher, L. W., Gehron Robey, P., Acad. Sci. U. S. A. 7 4 , 5463-5467 Termine, J. D., and Young, M. F. (1986) Nucleic Acids Res. 14,

15. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecuhr 9861-9876

30,794-800 52,269-279

U. S. A. 78,7665-7669