Sequence and organization of the human vitamin D-binding protein gene

10
Biochimica et Biophysica Acta, 1216 (1993) 385- 394 385 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00 BBAEXP 92556 Sequence and organization of the human vitamin D-binding protein gene Andreas Braun, Andrea Kofler, Susanne Morawietz and Hartwig Cleve * Institute of Anthropology and Human Genetics, The University of Munich, Richard- Wagner-Str. 10 / I, 8000 Munich 2 (Germany) (Received 30 March 1993) Key words: Genomic structure; Genetic variation; Evolution; Vitamin D-binding protein (DBP); (Human) The structure and organization of the human vitamin D-binding protein (DBP) gene has been determined. The gene is composed of 13 exons and 12 intervening sequences. With the help of the polymerase chain reaction (PCR) introns were amplified using exon-specific oligonucleotide primers, and were sequenced after subcloning; the exon/intron borders were determined. The introns 2, 5, 7, 9 and 10 were sequenced completely; the introns 1, 3, 4, 6, 8, 11 and 12 were sequenced in part. We designed intron-specific primers for the amplification of each exon by the PCR-method. This permits the analysis of mutational and function-related sites. By comparison with the genes for human albumin and c~-fetoprotein the gene for DBP/GC is confirmed as a member of this multigene family. The location of the introns in the coding region of the truman DBP-gene is identical with the position of the introns in the rat DBP-gene. Introduction The vitamin D-binding protein (DBP), a 51 200 Da [1] protein of the az-globulin fraction of human serum, was first discovered by Hirschfeld who called it group specific component (GC) [2]. There are three common alleles (GC.2, GC* 1S and GCI*F) and more than 120 variant alleles of the GC/DBP-system in the hu- man population [3]. Several independent biological functions of this pro- tein have been identified: GC is the major plasma carrier for vitamin D-3 and its natural derivates [4], it stabilizes extracellular monomeric G-actin, preventing the spontanous polymerization to F-actin [5], and bind- ing of DBP to the complement components C5a and C5a des-Arg increases neutrophil chemotaxis [6,7]. An additional indication for a role of DBP in the immune response is the observed association with membrane Ig [8]. Furthermore, Yamamoto and Homma described GC as a precursor of a macrophage activating factor [9]. An extensive worldwide screening of human popula- tions failed to discover a GC* 0 homozygote in spite of evidence for several GC*0 alleles [10]. This finding was regarded as an indication that complete deficiency of DBP might be lethal in early ontogenesis. * Corresponding author. Fax: +49 89 5203286. The amino acid sequence of the DBP and the nu- cleotide sequence of its cDNA have been determined [1,11,12]. The amino acid sequence and one of the eDNA sequences were obtained from the genetic GC2 type and were found to be in agreement [1,13]. Another cDNA sequence was apparently from a genetic GC1 type (GC1F or GC1S) and was different at seven sites [11]. The molecular differences of the three common GC alleles reside in exon eleven and concern the amino acid positions 416 and 420 [14,15]. At positon 416, the codon for aspartic acid was identified in the alleles GC* 1F and GC* 2 and the codon for glutamic acid was found in GC*IS. The latter change from GAT to GAG results in the generation of a HaelII restriction site. At position 420, the codon for threo- nine is present in GC* 1F and GC* 1S and the codon for lysine in GC* 2. The nucleotide change from ACG for threonine to AAG for lysine results in the genera- tion of a StyI restriction site. Thus, the common ge- netic DBP/GC types can also be classified by restric- tion fragment analysis at the genomic level [14]. Part of the 5' flanking region of the DBP/GC gene and the border of exon 1/intron 1 was examined previously for regulatory elements by Yang et al. [16]. Homology analysis based on cDNA and protein sequence data demonstrates a strong evolutionary rela- tionship of GC, albumin and a-fetoprotein, classifying them in the same multigene family [11,12,17]. The three proteins share a three domain structure which is

Transcript of Sequence and organization of the human vitamin D-binding protein gene

Page 1: Sequence and organization of the human vitamin D-binding protein gene

Biochimica et Biophysica Acta, 1216 (1993) 385- 394 385 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00

BBAEXP 92556

Sequence and organization of the human vitamin D-binding protein gene

Andreas Braun, Andrea Kofler, Susanne Morawietz and Hartwig Cleve * Institute of Anthropology and Human Genetics, The University of Munich, Richard- Wagner-Str. 10 / I, 8000 Munich 2 (Germany)

(Received 30 March 1993)

Key words: Genomic structure; Genetic variation; Evolution; Vitamin D-binding protein (DBP); (Human)

The structure and organization of the human vitamin D-binding protein (DBP) gene has been determined. The gene is composed of 13 exons and 12 intervening sequences. With the help of the polymerase chain reaction (PCR) introns were amplified using exon-specific oligonucleotide primers, and were sequenced after subcloning; the exon/intron borders were determined. The introns 2, 5, 7, 9 and 10 were sequenced completely; the introns 1, 3, 4, 6, 8, 11 and 12 were sequenced in part. We designed intron-specific primers for the amplification of each exon by the PCR-method. This permits the analysis of mutational and function-related sites. By comparison with the genes for human albumin and c~-fetoprotein the gene for DBP/GC is confirmed as a member of this multigene family. The location of the introns in the coding region of the truman DBP-gene is identical with the position of the introns in the rat DBP-gene.

Introduction

The vitamin D-binding protein (DBP), a 51 200 Da [1] protein of the az-globulin fraction of human serum, was first discovered by Hirschfeld who called it group specific component (GC) [2]. There are three common alleles ( G C . 2 , GC* 1S and G C I * F ) and more than 120 variant alleles of the G C / D B P - s y s t e m in the hu- man population [3].

Several independent biological functions of this pro- tein have been identified: GC is the major plasma carrier for vitamin D-3 and its natural derivates [4], it stabilizes extracellular monomeric G-actin, preventing the spontanous polymerization to F-actin [5], and bind- ing of DBP to the complement components C5a and C5a des-Arg increases neutrophil chemotaxis [6,7]. An additional indication for a role of DBP in the immune response is the observed association with membrane Ig [8]. Furthermore, Yamamoto and Homma described GC as a precursor of a macrophage activating factor [9].

An extensive worldwide screening of human popula- tions failed to discover a GC* 0 homozygote in spite of evidence for several G C * 0 alleles [10]. This finding was regarded as an indication that complete deficiency of DBP might be lethal in early ontogenesis.

* Corresponding author. Fax: +49 89 5203286.

The amino acid sequence of the DBP and the nu- cleotide sequence of its cDNA have been determined [1,11,12].

The amino acid sequence and one of the eDNA sequences were obtained from the genetic GC2 type and were found to be in agreement [1,13]. Another cDNA sequence was apparently from a genetic GC1 type (GC1F or GC1S) and was different at seven sites [11]. The molecular differences of the three common GC alleles reside in exon eleven and concern the amino acid positions 416 and 420 [14,15]. At positon 416, the codon for aspartic acid was identified in the alleles GC* 1F and GC* 2 and the codon for glutamic acid was found in G C * I S . The latter change from G A T to G A G results in the generation of a HaelII restriction site. At position 420, the codon for threo- nine is present in GC* 1F and GC* 1S and the codon for lysine in GC* 2. The nucleotide change from ACG for threonine to A A G for lysine results in the genera- tion of a StyI restriction site. Thus, the common ge- netic D B P / G C types can also be classified by restric- tion fragment analysis at the genomic level [14]. Part of the 5' flanking region of the D B P / G C gene and the border of exon 1 / in t ron 1 was examined previously for regulatory elements by Yang et al. [16].

Homology analysis based on cDNA and protein sequence data demonstrates a strong evolutionary rela- tionship of GC, albumin and a-fetoprotein, classifying them in the same multigene family [11,12,17]. The three proteins share a three domain structure which is

Page 2: Sequence and organization of the human vitamin D-binding protein gene

386

thought to have arisen via triplication of a single pri- mordial domain. However, the third domain of DBP is truncated by 124 amino acids [11]. The close associa- tion (1.5 centimorgan) of the three genes on human chromosome 4 supports their common origin [18,19,20].

In this study we present the structure and organiza- tion of the human DBP gene. Based on these results the hDBP gene can be compared with the genes for albumin and o~-fetoprotein [21,22] and also with the rat DBP gene [23].

Furthermore, each exon can now be amplified using primers designed after the identified flanking intron sequences. Amplifying each exon by PCR is the pre- requisite for the identification of further D B P / G C mutants characterized by changes in exons other than exon eleven. It is also necessary for the analysis of D B P / G C inherited polymorphism in species other than man, as reported for instance in apes and mon- keys [24,25].

Materials and Methods

Preparation of genomic DNA Preparation of genomic DNA from peripheral blood

leukocytes was done as described by Braun et al. [14].

In this study we examined a homozygous 1S-1S GC-type determined by isoelectric focusing.

Polymerase chain reaction (PCR) For amplifying the introns by PCR we used exon-

specific oligonucleotides which were designed after the cDNA sequence published by Yang et al. [12] (Table I). For amplifying the non-coding exon 13 we used one intron-specific primer called I 12 which hybridizes near the 3' site of intron 12. The Taq DNA polymerase was purchased from Promega, Germany. The total reaction volume of 80/.tl included 0.5-1 # g of genomic DNA of G C * I S homozygous blood donors, 50 ng of each primer, 200 /~mol of each dNTP, 2.5 mM magnesium chloride and 1.25 U Taq DNA polymerase. The PCR was carried out in a Hybaid Thermal Reactor. After an initial denaturation step (94°C, 7 rain) the conditions shown in Table II were used.

Genomic anchor PCR This method is based on the cDNA anchor PCR [26]

and was developed in our laboratory by one of us (A.B.) for amplifying an unknown region of DNA adja- cent to a segment of known DNA. Typical PCR amplification utilizes oligonucleotide primers that hy-

T A B L E I

Oligonuch, otides used for amplifying GC-specific .fragments

All p r imers n a m e d by an 'A ' and a n u m b e r are exon-specif ic p r imers des igned af ter the c D N A sequences pub l i shed by Y a n g et al. [12,13] and by Yang et al. [16]. The p r imers n a m e d by an ' I ' and a n u m b e r are deduced from int ron sequences which are the resul t of this study. ' s i te ' (i)

Exon-specif ic pr imers : ind ica ted is the b ind ing si te at the 5 ' nuc leo t ide posi t ion of the p r imers to cDNA. Here the first nuc leo t ide of the start codon (ATG) is n u m b e r e d + 1. (ii) In t ron-speci f ic pr imers : the b ind ing site ( - ) is in the intron (number ) nea r the 5' end or the 3' end,

repectively. ( -* ) forward (5 ' to 3 ' direct ion); ( *- ) reverse (3 ' to 5 ' direct ion) . T M indica tes the mel t ing t e m p e r a t u r e of the o l igonuc leo t ides (4°C

for every G or C, 2°C for every A or T).

PCR-p roduc t P r imer 5 ' s equence 3' Site Dir. T M

E2-I2-E3 A 38 G T C T G C A A G G A A T T C T C C C A 101 -~ 61)°C

A 40 T C A T A G C A G T C A G G G T C A G C 254 *-- 62°C

E3-13-E4 A 15 C T C T G T C A C T A G T C C T G T A C 125 ---* 60°C

A 16 A G C A G T G C C T G G G T G A A C G G 327 *- 66°C

E4-E4 A 17 C C T G T G A A A G T A A T T C T C C A 283 --+ 56°C A 18 A T T C C T T T G G A T C T T T C C T G 463 ~ 56°C

E4-I4-E5 A 3 C C C A C A A A T G A T G A A A T C T G 415 --+ 56°C A 4 C A T A C A G T T G G G C T T G C A G A 593 *- 60°C

E5-I5-E6 A 19 C T A T G G T A G G G T C C T G C T G T 551 ~ 62°C A 36 TTCTTCTCCCCATAAGCAGCA 689 ~ 62°C

E6-I6-E7 A 35 C T C T G T C A A A T A G A G T C T G C T 641 ~ 60°C A 85 GCCATGCAATCTTCAGAGGC 824 ~ 62°C

E7-I7-E8 A 21 C A A T C T C A T A A A G T T A G C C C 702 -+ 56°C A 22 G C A G C T G G C A T G A A G T A A G T 953 *- 60°C

Eg-18-E9 A 5 C A G C C A T G G A C G T T T T T G T G 910 - , 60°C A 6 T T A C T G A G G A A T A C T T C C G G 1089 ~ 58°C

E9-19-E10 A 23 C T A A G C A G A A G G A C T C A T C T 1049 --* 58°C A 24 T T C T T G T A C T C A G T A A A T G T 1 260 ~- 52°C

E 10-I 10-E 11 A 7 G A C A A G G G A C A A G A A C T A T G 1 202 ~ 580( ̀ A 8 A A T C A C A G T A A A G A G G A G G T 1 391 *-- 56°C

I12-EI3 I 12 G G T T G T T G C C C A T T A G C T C - ---' 56°C A 26 G C A T T A T A T T T C A A T T T A T T T 1 625 *- 58°C

T G A T A G C

Page 3: Sequence and organization of the human vitamin D-binding protein gene

TABLE II

Program of polymerase chain reaction

D, denaturation; A, annealing; E, extension. The numbers in paren- thesis represent time in min.

PCR-product PCR 30 cycles

D A E

E2-I2-E3 94°C (1) 53°C (1) 72°C (2) E3-I3-E4 94°C (1) 53°C (1) 72°C (2) E4-E4 94°C (1) 53°C (1) 72°C (2) E4-I4-E5 94°C (1) 53°C (1) 72°C (2) E5-I5-E6 94°C (1) 53°C (1) 72°C (2) E6-I6-E7 94°C (1) 56°C (1) 72°C (3) E7-17-E8 94°C (1) 53°C (1) 72°C (2) E8-I8-E9 94°C (1) 53°C (1) 72°C (2) E10-I10-E11 94°C (1) 53°C (1) 72°C (2) I12-E13 94°C (1) 50°C (1) 72°C (0.5)

bridize to opposite strands. The primers are oriented in such a way that extension proceeds inwards across the region between the two primers. In the genomic anchor PCR the specific primers are oriented in the same direction and hybridize to the same strand. How- ever, they do not work together in the same PCR, rather in two separate PCRs one after the other. The procedure of the genomic anchor-PCR is schematically shown in Fig. 1. The first step is a linear PCR only with the first specific oligonucleotide. The second PCR is carried out after tailing poly(G) to the 3' end of the product of the first PCR. Here the second specific oligo is used together with an anchor-oligo and anchor-poly(C)-oligo so that this reaction is exponen- tial. The exact procedure will be described by Braun (unpublished data).

387

In Table III the primers used are listed. For the linear PCR the volumes were as described

above for PCR with the exception that more genomic DNA (2-4 ~zg) was used. After an initial denaturation step (94°C, 7 min) the samples were subjected to 40 cycles with the conditions listed in Table IV.

After purification and tailing (A. Braun, unpub- lished data), the exponential anchor PCR was started with a denaturation step of 94°C for 5 min followed by 5 cycles with 1 rain 94°C, 1.5 min 42°C, and 2 min 72°C for all probes. Subsequently, 25 amplification cycles of which the conditions are also listed in Table IV were carried out. The program finished with 5 min 72°C for each sample. If a booster reaction was necessary, a second PCR round was performed according to the conditions listed in Table IV. The reaction was carried out with 1 /xl of the 1 : 10 diluted anchor-PCR product of the first round.

Ligation, subcloning and sequencing of the PCR products in the pUC19 vector

Ligation and subcloning was performed as described in Braun et al. [14]. DNA sequencing was done by using M13 forward and reverse primers, and in part internal DBP-specific primers synthesized by MWG (Ebersberg, Germany).

Results

Characterization of the PCR-products With the help of PCR, fragments of the human DBP

gene from a GC* 1S homozygote were amplified using exon-specific and in part intron-specific primers. These

LINEAR PCR FOR PRODUCING SPECIFIC SINGLE-STRANDED DNA FRAGMENTS

POLY-G-TAILING OF THE 3'-END OF THE SINGLE STRANDED DNA FRAGMENTS

EXPONENTIAL PCR WITH A DNA FRAGMENT SPECIFIC NESTED PRIMER AND A

POLY-G SPECIFIC ANCHOR PRIMER

SUBCLONING AND DNA SEQUENCING OF THE UNKNOWN DNA FRAGMENT BETWEEN

THE ANCHOR PRIMER AND THE KNOWN SEQUENCE FOLLOWING THE SPECIFIC

PRIMER

Fig. l. Flow diagram of thegenomicanchorPCR.

Page 4: Sequence and organization of the human vitamin D-binding protein gene

388

TABLE I11

Ol(gomtch'otides used for genomic anchor PCR (ga-PCR)

Table III is named like Table 1 with some additions: An indicates the anchor-oligonucleotide, An-poly(C) the anchor-oligonucleotide with poly(C) tail, both designed after Loh et al. [26]. ° Marks the primers A86 and A87. At the 5' end, each has two nucleotides (bold-face letters) which bind in the bordering intron. For each sample the first primer described is used in the linear PCR, the second primer in the exponential anchor PCR. For the sample El2-111 a third PCR (booster) was carried out with the oligonucleotide A91 as described by Braun (unpublished

data).

PCR product Primer 5' Sequence 3' Site Dir. T M

E 1-11 A 13 TACCACTTTTACATGGTCAC 64 ~ 56°C A 54 CTGCAAGACTCTCTGGTAG - 23 ~ 58°C

E2-11 1 2 CAAAGTTACCTCACACTCAG ~ 56°C A 37 TGG GAGAATTCCTTGCAGAC 101 , - 60°C

E6-16 A 19 CTATGGTAGGGTCCTGCTGT 551 ~ 60°C A 35 CTCTGTCAAATAGAGTCTGCT 641 --+ 6IJ°C

E7-16 1 7 GTACTTGGCACTCAATACC ~- 56°C A 85 GCCATGCAATCTTCAGAGGC 824 ~- 62°C

El 1-I11 A 25 CCTCCAACTGCTGTTCCA 1 346 - , 56°C A 51 ACCTCCTCTTTACTGTGATTC 1 391 ~ 6(I°C

E 12-11 I A 87 ° ACCAAACTTAATAAACA 1 451 ~- 42°C A 71 GTTAATAAACATGCTTCAGG 1447 ~ 54°C A 91 CTACAGGATATTCTTCAATTC 1 426 ,-- 56°C

E 12-112 A 86 ° AGATTGATGCTGAATTG 1397 -~ 46°C A 72 TGAAGAATATCCTGTAGTCC 1 410 ~ 53°(;

E 13-I 12 A 55 GCAAATCATCATACTTGTC 1 593 ~ 52°C A 34 TCTTCCCAGAAGCTCAGTGG 1 526 ~ 62°C An GCATGCGCGCGGCCGCGGAGG 66°C

CC GCATGCGCGCGGCCGCGGAGG CCCCCCCCCCCCCC

An-poly(C) max. 56°C

fragments were sequenced and characterized as shown in Table V.

The PCR-product called E6-16-E7 which showed a 5.7 kb band on agarose gel, could not be subcloned. However, by hybridization with a GC specific oligo it could be identified as GC specific (data not shown). The exact position of intron 6 in this fragment was determined with the method of genomic anchor PCR.

Genomic anchor PCR was also used to identify the exon/ intron borders of those introns not amplified by PCR. This method proved to be useful for amplifying a region of unknown D N A flanking a part of known

TABLE IV

Program of genomic anchor PCR (ga-PCR)

D, denaturation; A, annealing; E, extension. The numbers in paren-

thesis represent time in min.

ga-PCR Linear PCR 4[) cycles Anchor PCR 25 cycles

product D A E D A E

EI-II 94°C(1) 57°C(1) 72°C(2) 94°C(I-) 57°C(1) 72°C(2) E2-11 94°C(1) 53°C(1) 72°C(0,5) 94°C(1) 57°C(1) 72°C(2) E6-16 94°C(1) 55°C(1) 72°C(0,5) 94°C(1) 53°C(1) 72°C(2) E7-16 94°C(1) 53°C(1) 72°C(0,5) 94°C(1) 53°C(1) 72°C(2) E l l - I l l 94°C(1) 53°C(1) 72°C(0,5) 94°C(1) 57°C(1) 72°C(2) E12-Ill 94°C(1) 40°C(1) 72°C(0,5) 94°C(1) 50°C(1) 72°C(2) E12-112 94°C(1) 42°C(1) 72°C(0,5) 94°C(1) 50°C(1) 72°C(2) E13-I12 94°C(1) 50°C(1) 72°C(2) 94°C(1) 57°C(1) 72°C(2)

DNA. The subcloned and sequenced PCR products are characterized in Table VI.

Structure and sequence of the hDBP gene The DBP gene is composed of 13 exons and 12

introns (Fig. 2).

TABLE V

Characterization of the PCR products

PCR- Length 5' Portion 3' Portion lntron Length of product of exon of exon introns

E2-I2-E3 1066 bp 47 bp 126 bp 2 893 bp E3-I3-E4 ~ 2800 bp 155 bp 44 bp 3 ~- 2600 bp E4-E4 181 bp - E4-14-E5 = 1 600 bp 59 bp 120 bp 4 = 1 450 bp E5-15-E6 438 bp 56 bp 81 bp 5 301 bp E6-16-E7 - -5700bp 59bp 122bp 6 ~ 5 5 0 0 b p E7-17-E8 1393bp 140bp 122bp 7 1131bp E8-I8-E9 ~ 1 800 bp 124 bp 54 bp 8 ~ 1600 bp E9-19-EI0 683bp l l 7 b p 98bp 9 468bp EI0-I I0-El l 1947bp 62bp 128bp 1[I 1757bp 112-E13 218 bp - - -

(i) Column 2: indicated is the length of the PCR-products containing introns not completely sequenced. The length was estimated from an agarose gel. (ii) Column 6: indicated is the estimated length of the introns not completely sequenced. The PCR-product E4-E4 only contains exon sequence. The PCR product I12-E13 contains 58 nucleotides of the 3' end of intron 12 and the non coding nucleotides of exon 13.

Page 5: Sequence and organization of the human vitamin D-binding protein gene

Exon i> 119 (58)bp

TACCACTTTT ACATGGTCAC CTACAGGAGA GAGGAGGTGC TGCAAGACTC TCTGGTAGAA AA

ATG AAG AGG GTC CTG GTA CTA CTG CTT GCT GTG GCA TTT GGA CAT GCT TTA

Met Lvs Aru Val Leu Val Leu Leu Leu Ala Val Ala Phe Glv His Ala Leu

Intron 1 > (no length data) -i +i

GAG AGA G GTAAGATTTC TTTTGTTGTG ACCATTTACA GGAATTCTTA CTAGTTTAAT

GIu Arg G

TTTATATTAT CACTTAAAAA ATGAAATAAA AAGTAAGAAA CAGAAGAACT GGTGAATATG

TTGGGGAGAG GGATAAGGAC CAGAGAGGCA TTAATCAGAG CTCCACGATG CCATGTAATG

TTGAATTGGA ATTGACAAAA AAAATCAACA CAAGAATCTT GTTGGAGAAA GTAAATGGAA

TGATATAATT TGTCAGTTAT CTCTTCCTCC TTTGTTGATA AGCAAGACCT AGAGAGAATA

ATTTAAAATT AGACATTTGC TAATTGTTAC AAGATAATGC AAATAAAGAC T .........

.......... ACATGTAACA TAATATTCAT CAAAAGGGGA CACATACAAA TTTTTCTATT

Exon 2> 70bp

TCCGCCTCCT AG GC CGG GAT TAT GAA AAG AAT AAA GTC TGC AAG GAA TTC

ly Arg Asp Tyr GIu Lys ASh Lys Val Cys Lys Glu Phe

i0 Intron 2> 893bp

TCC CAT CTG GGA AAG GAG GAC TTC ACA TCT CT GTAAGTGTGC AGCAGCCACT

Ser His Leu Gly Lys GIu Asp Phe Thr Ser Le

20

CACTCTGTTG GTGATTTGAT CTTGAACAAA ACTGAGTGTG AGGTAACTTT GATTCAGCAT

GGAGGTTAAT CCTTAATCCT TGCCAAGAGT ~%GAGGACAG AGCCCCAGGC CCAAGGTAGG

CACAGGAGTC AAGCAAATTA AGGACATCCA GAGATTCCAG ATGACCTCTG ATTACCACAG

TCTAAGAATT TGTCACCAGT AAGTTTATGA ATGAGTTCAA AGATAATACG TAAAGCATAG

CTTTCTAAGG TCATTAAAAT GCAATTGTAC ATTCTAGGAA TCTGGGAAAC CAAGGCAATG

AGGTACAGAG CAAGTACATT TGGGGAGGCT TAAGAAGGAC AACACCAGAT TGGTTTGATG

GAAGCCCTAA GTTCAATTTC TACTGCTTTC CAAGATAAAT TTGCCTAGAA CAAAAACTGC

TTTTTAAATT AAACCTCATT TTTTTCATTA GCCAAATAAA GAAGTAAATG TGTACAATAT

TTTTTAAATT AAACCTCATT TTTTTCATTA GCCAAATAAA GAAGTAAATG TGTACAATAT

TTCATTTAAA CATTGAACAG GTTTCATATT TTAATGTGCC AAAATGTAAA AAGAAAGTTT

CAGTTCTGGT CGGGCACGCG GCTCACGCCT GTAATCCCAG CCCTTTGGGA GGACAAGGAG

GGTGGATCGC AAAGTCAAGA GATCTAGACC ATCCTGGCCA ACAGGTGAAA CCCTGTCTCT

ACTAAAAATA CAAAAATTAG CCGGGCGTAT TGGTGCACAC CTGTAGTCCC AGCTACTCGG

GAGGCTGAGG CAGAATTGCT TGAACCTGGG AGGCAGAGAG TGCAGTGAGC TGAGATCACA

CCACTGCACT CCAGCCTGGG TAACAGAGCG ACTCCGTCTC AAAAAAAAAA ACCTTTCAGT

Exon 3> 133Dp

TCTAATTGAT ATATTGTCTT ATTTTCCCCT CAG G TCA CTA GTC CTG TAC AGT AGA

u Ser Leu Val Leu Tyr Set Arg

3O

AAA TTT CCC AGT GGC ACG TTT GAA CAG GTC AGC CAA CTT GTG AAG GAA GTT

Lys Phe Pro Ser Gly Thr Phe GIu Gln Val Set Gln Leu Val Lys Glu Val

40 50

GTC TCC TTG ACC GAA GCC TGC TGT GCG GAA GGG GCT GAC CCT GAC TGC TAT

Val Ser Leu Thr Glu Ala Cys Cys Ala GIu Gly Ala Asp Pro Asp Cys Tyr

6O

Intron 3> ~ 2.6kb

GTAGGTTTCT GTGGCTGGCC GTCTCTGTGG CAGCCCAGAG AAGGAAGCCA GAC ACC AGG

Asp Thr Arg

70

AAATAGGCCT TTATACCACG TGTTGAAAAA ATTTT ...........................

AAAAAATCAC TTCATGGAGC CCTTGGCATG AATACATAGT TATAAATGAG AATTTTCTAA

TTATAGGCTA AGATATTATA TTAAATGGAA GTCTAAAAGC CTGGAAATTT ACATATATAT

Exon 4> 212bp

ATGAGTTTCC CTTTTTCCTT CTCCTCCAG ACC TCA GCA CTG TCT GCC AAG TCC TGT

Thr Ser Ala Leu Ser Ala Lys Ser Cys

80

389

GAA AGT AAT TCT CCA TTC CCC GTT CAC CCA GGC ACT GCT GAG TGC TGC ACC

GIu Ser Asn Ser Pro Phe Pro Val His Pro Gly Thr Ala Glu Cys Cys Thr

9O

AAA GAG GGC CTG GAA CGA AAG CTC TGC ATG GCT GCT CTG AAA CAC CAG CCA

Lys GIu Gly Leu Glu Arg Lys Leu Cys Met Ala Ala Leu Lys His Gln Pro

i00 ii0

CAG GAA TTC CCT ACC TAC GTG GAA CCC ACA AAT GAT GAA ATC TGT GAG GCG

Gln Glu Phe Pro Thr Tyr Val Glu Pro Thr Asn Asp GIu Ile Cys Glu Ala

120 130

Intron 4> = 1.45kb

TTC AGG AAA GAT CCA AAG GAA TAT GCT AAT CA GTGAGTGCCT TCATCATAAA

Phe Arg Lys Asp Pro Lys GIu Tyr Ala Asn G1

140

TAGAACTTTA GGACCTAAAG TATCAGAAAT GACTCTAATC TAACCCCCTA CTCTGTGGAA

ACACCTCTTC TACAGCAACC TGAACAAATT CTGATAACTT GACTGTCATC TGAGACAGCC

CATCTTCCCT TTGGTGCTGA GTTTTGTGAA TGCCTTTCTC AGAGATAAAA TCAATTTTCC

CAAACTCTTA CTAACCCTAA TCCTATTATC TGAACCAGAA TCACTTGAGT ATCCTGTCCA

CATGATATAT TTTCAAATAT TTGACAGCCC CTGTGGTGAT TTCCCTGATG CTAAGGATCA

TTGCGTATGT CATATGATAC AAGCAATTTT ATTGCTGTAC ACATTTTAAA TACTTAATTT

CCAGTAAGCA TAAAAAATAA AC~GCTTCTC TCTTCACTCC TATACATCCC AGAAACATGT

GACGTGTACG TGTAAATATG ACACATCTAC TACATCAAGT AAACACGTAA CCTTAAAATT

TTTATGGTTT AACATGCACA AGGGG .... CATGGTGGCA TGCACCTGTA GTCCCAGCTA

CTCAGGAAGC TGAAGCAGGA GGATCGCTTG AGGCCAGGAG GTTGAGACTG CAGTGAGCTA

TGATTATACC ATGTACTCCA GCCTAGATGA CAGGCAAGAC ACTGTCTCAA ATAAAAAAGT

Exon 5 >

AGATTTTGTT TATTCACAAA TTTTTTGACC AATTCTGTTT TTTCCTCCAC AG A TTT ATG

n Phe Met

133bp

TGG GAA TAT TCC ACT AAT TAC GGA CAA GCT CCT CTG TCA CTT TTA GTC AGT

Trp GIu Tyr Ser Thr Asn Tyr Gly Gln Ala Pro Leu Ser Leu Leu Val Ser

150 160

TAC ACC AAG AGT TAT CTT TCT ATG GTA GGG TCC TGC TGT ACC TCT GCA AGC

Tyr Thr Lys Ser Tyr Leu Ser Met Val Gly Ser Cys Cys Thr Set Ala Ser

170

Intron 5> 440bp

CCA ACT GTA TGC TTT TTG AAA GAG GTATGTCCCA TTTTACTTAT TATATGTTTA

Pro Thr Val Cys Phe Leu Lys Glu

180

C~TTTTTTTC TTCTAATAGC ATTCCTTTTA GATAATGGTA GAAATTGAGA TTTTGCAATC

AGTAAAAGTT TCTAAAATGC CACCCTTGCA TCTTCTTAAT TGGAAAATAA ATGTAATACC

TCTGGAAAAG TAATATCAAT AAGTTCCTTT CACCAGTGTT CAGTTTCCTA TTCATTAAAA

AATCATGAAA TTAATAGTCC CTATTTTTGC CATTTATACA GTATTCAATG AGTCTTGACC

Exon 6> 95bp

ATATAATGAG ATTCTTTCAC TTGTTTTCTA G AGA CTC CAG CTT AAA CAT TTA TCA

Arg Leu Gln Leu Lys His Leu Ser

Ig0

CTT CTC ACC ACT CTG TCA AAT AGA GTC TGC TCA CAA TAT GCT GCT TAT GGG

Leu Leu Thr Thr Leu Ser Asn Arg Val Cys Ser Gln Tyr Ala Ala Tyr Gly

200 210

Intron 6> = 5.Skb

GAG AAG AAA TCA AGG CTC AG GTAAAGATTA AGTGCTATTG TCATATTTAG ATGTTTGGTT

Glu Lys Lys Ser Arg Leu Se

CTGCACTGTC TATCCATAAT ATT ................... CTCCAAATAG TGCGTTCATC

ACGAAACCAA GAAAACTGAG AATGTTGATA AGCTAACATA TTTATTTTTA ACTGTTTAG

Page 6: Sequence and organization of the human vitamin D-binding protein gene

390

Exon 7> 130bp

C AAT CTC ATA AAG TTA GCC CAA AAA GTG CCT ACT GCT GAT CTG GAG GAT GTT

r ASh Leu Ile Lys Leu Ala Gln Lys Val Pro Thr Ala Asp Leu GIu Asp Val

220 230

TTG CCA CTA GCT GAA GAT ATT ACT AAC ATC CTC TCC AAA TGC TGT GAG TCT

Leu Pro Leu Ala GIu Asp Ile Thr Ash Ile Leu Set Lys Cys Cys Glu Set

240 250

Intron 7> ll31bp

GCC TCT GAA GAT TGC ATG GCC AAA GAG GTAAGACAAG CTTTATTATC ATTATCACCA

Ala Set GIu Asp Cys Met Ala Lys GIu

260

TGTGCCAGGT ATTGAGTGCC AAGTACTGAG TGGCAGGTAC TCTTCTAAGT TCTATTTCAT

ATGCATTATC CTAACGGATG CTTACATGTA TTCTATGAAA TATAAATAAT TATTAACATC

ACTGTACAGG TGAGAAAACA GAAGATTAGT AAGGGTCAAT AATTTGTTAC CACACAACTA

GTAGGTGGCA AAGTCCATAT GCAAAGATGA GATTATCTGA TTTCCAAGAC CTTATGCTAC

TTGCACTGTA TGGTATTCAT TACCACATAT GGTCATTTAA ATTTAATATA GTTAAAGTTA

AACATATATG TTTAAGTTGA ATATGATTTA AAATTCAGTT CTTCAGTTAC ACTAACCACA

TTTCAAATGC ACAATGGTAT TGGATAGTGC AAATACAGAT CATAATTTTA ATTATTGCGT

AAAGTTCCAT TAAGCGGCAC TTCAACCATT ATACTAGTAG ACGCTACTAC TGTTTTTATT

ATACTTTTTA TAATTTGATG CCCATCTCAA TTTTTTTTAC CATTTTTCAT TTTCATAATC

CTTTAAAACT ATTTAAATTT GAATAATTGC TTTAGCTGAA GGTCAGTCTC CAAGATCTTG

TCAATCTCTG TAGTAATAGA TACATTGTGT TGCCTTGTCT TTTTCCATTA AAAATGCATG

TCTGTGTTAA TAGATGCTTT TCTATAAGGA TTTATAATGG CTAAATAGGA CTCTGTTGCC

TATATCCATG TCATGTATTA CATCTATCTA TCTAGACACA CCATATTTTT TATTTTTACT

AATCCTCTAT TGTATTTAGG TTGTGTTGGT TTTTTTAAAA TATTTTAGGC AATCGTTGCA

CTTAATAGCC TTATAGTTGT TGTCGTTTTA TCTACAATTA TTTTCTGAGT ATAAATTCTT

AAAAATAGAA TTGTACATTT TTAAGACTTT TAGTACATAT TGACAAACTG CCTTTCCAAA

TATTGTACCA ATTTTCTTTC TGAGAGTCTT CATACTGACT TTCATAAATG TGATGCCAAA

TGTATGCATA ATGATGGCAT ACAAATATTT CACAAATAGT TTGAAGCAGC AGGTTGAAAT

ATCAATTGCA ATATCAGATG GCAGAAGTTA CATAGGACTG AAACTTTACA TTTTTCTAAA

TGTGTATTGC TTTTATAAGT GCATAGACAG CAGTGGGGTA AGACATAATG AAGGAATTTA

CACAAACCAA TTTTCACAGA AAAGCCTTTT GAAGTATCCA CTTGATGCTT TATACAGAGT

CAATTATGCT TGCACAAAGA CAGTTCCATA TGCATTTGAA GTAGATATGT TTACATATAA

TATATTTAAA TGTATATTAC ATGCAAAGCT TTAATTCAGT ATTGACTAGT CACAATAGCA

AGTATTAATA GCTTGTATCT CAATGAAAGT ATTTTATAAT CCAAAGCCTG TTGATTTAGA

GTCAGTCTGA GCTAAATCTC TTCTTTTTAA TTAAGATGCA GAAACCAGGT GCTCTGCCTT

TATGATAGAT ATTTCACATT TTGAAATGTA GGCCAGGCAC AGTGGCTCAT CCATCTAATC

CCAGCACTTT GGCAGGCCGA GTCAGGCGAA TCACTTGAGG TCAGAAGTTC CAGACCACCC

TAGCCAACAT GGTGAAACCC TGTCTCCACT AAAAATACAA AACTTAGCCG GGCATGGTGG

TGGGCGCCTA TAATCCCAGG TACTCGGGAG GCTGAGGCAT GAGAATCGCT TGAACCCAGG

AGGCAGAGGT TACAGCGAGC CAAGATGGCA CCACTTCACT CCAACCTGGG TGACAGAGGG

AGACTCTGTC ACAAAAAATA AATAAATAAA TAAATAAATA AATAGGAAAT ATATAACACA

Exon 9> 130bp

TATCTCCTTT TCTCCCTCAT GCTAG G TAT ACA TTT GAA CTA AGC AGA AGG ACT

s Tyr Thr Phe Glu Leu Ser Arg Arg Thr

330

CAT CTT CCG GAA GTA TTC CTC AGT AAG GTA CTT GAG CCA ACC CTA AAA AGC

His Leu Pro GIu Val Phe Leu Ser Lys Val Leu GIu Pro Thr Leu Lys Set

340 350

CTT GGT GAA TGC TGT GAT GTT GAA GAC TCA ACT ACC TGT TTT AAT GCT AAG

Leu Gly GIu Cys Cys Asp Val Glu Asp Ser Thr Thr Cys Phe ASh Ala Lys

360 3?0

Intron 9> 46Bbp

GTATATTTGT TGGATTTTCT TTATCAAGCA CATAGATCAA GGGTTGGAAA ATTATAGTTC

ATGGGCCAAA ACTTGCTCAC TGCCTACTTT TTTATGGCCT ACAAACTAAG AAQAGAAGAA

TGGTTTTACA TTTTTACATT GTTGGGAAAA AGCAAAAGAA CATCAGTATC TCATGACCCA

GCTTGTAAGC CTATGTAATT CTTATTTTTA ATTAGACTAT CAGGAAACTA ACAATGCACA

Exon 8> 203bp

AACTAACCAT TCGTTTTGTA G CTG CCT GAA CAC ACA GTA AAA CTC TGT GAC AAT

Leu Pro Glu His Thr Val Lys Leu Cys Asp Ash

270

TTA TCC ACA AAG AAT TCT AAG TTT GAA GAC TGT TGT CAA GAA AAA ACA GCC

Leu Set Thr Lys ASh Ser Lys Phe Glu Asp Cys Cys Gln Glu Lys Thr Ala

280

ATG GAC GTT TTT GTG TGC ACT TAC TTC ATG CCA GCT GCC CAA CTC CCC GAG

Met Asp Val Phe Val Cys Thr Tyr Phe Met Pro Ala Ala Gln Leu Pro Glu

290 300

CTT CCA GAT GTA GAG TTG CCC ACA AAC AAA GAT GTG TGT GAT CCA GGA AAC

Leu Pro Asp Val GIu Leu Pro Thr Asn Lys Asp Val Cys Asp Pro Gly Ash

310 320

Intron 8> =l.6kb

ACC AAA GTC ATG GAT AA GTAAGTAGAG GTGATGTGAA AACGTGTTCC

Thr Lys Val Met Asp Ly

CATTTTAAGT TCATCTGGAC AAAAAGTGGG GGCCAGCCAG ATGGAGGGAA GGTATGCTAT

CACCTCATTT CA~LACATAGG CAAAGTACTA CCTTTATGCT TTTCATGGAC TTTAAGTATT

TGTTAATTAA TTGTAAATGA CTAAAATTTT TAAAACACCA TAATCGAGTG TGATAGAAAG

ATTTTGGGTA GGGAGAAAGG AGCTTAGAGT TCCTAATTGT GCCTTCTCTG TTTCCTGTGC

ATGGCCAAAA GAAGTGACTT GATAATTCTG AACTTCAGCC TATTGTTAAT TAATTTGGAA

ATGTTAACTC TTGCCTTACT TAGCACACAG ACTGCTAATG AGAAATGCAT GTAGAAATGq

AAGTGAAAGT AAATGTGA/~ ACACATATTT GCATACAATG CTGAACACAT GTAAAATATA

TCATGCAACC AGAAATGTTT ATAATATTAT TAACTAGTTT TATCATATTA CTTCTGGTTC

AAAGTCACAA ATCAAAAGCC TAAAGTATAC TAAACTTTTG GACAATAAGC ATTTGATACA

AATA~GAG ATA .................. AAAGAGATTT AATGCAAATG GTAAACTTTT

ATCCTGTTTC ATCCAACTAA AAAAAAGTAA GTTAAAAATA EACCATGGTT TGAATAGA'IG

TGAAAATTAT TCAAAATTCA AATTTCAGTG TTCATACATA CATTAACTTC CACTGGGACA

CAATCATCCT CATTGTTTCT GCAAAGCTTA CAACTGCTTC TGTACAGAGT TGAGTAGTTG

CAACAAAGGC CACATATGGC CTGCAAACCC TAAAATTATT ATCCGGCCTT CTGTAGAAAA

TAGTTGCCAA TGCTTAATGA TACAATATTT GGAATTTCGA TTTAATGGCT TGATGGGACA

Exon i0> 98Dp

AGGTTTCAAG ACAAATAATT TTGTTATGCA ATTATTGTTT TCTTACAG GGC CCT CTA CTA

Gly Pro Leu Leu

AAG AAG GAA CTA TCT TCT TTC ATT GAC AAG GGA CAA GAA CTA TGT GCA GAT

Lys Lys Glu Leu Ser Ser Phe Ile Asp Lys Gly Gln GIu Leu Cys Ala ASp

380 390

Intron 10> 1757bp

TAT TCA GAA AAT ACA TTT ACT GAG TAC AAG AAA AA GTAAGAAACT TGTTCTGGCT

Tyr Set GIu Ash Thr Phe Thr Glu Tyr Lys Lys Ly

400

GTATCCTCCA AATTTATCAA TAATATTTTC ATAGTACTAT GAATTGAAAG CATAGTTGAA

CACTTAAGCT TGTCTTCAGT GAACAACAAC AAAAGGAGGT TCAATTAGGA GTTTTTTAAA

GTAGTCAGAA TCTCATAGAG TGGAATAATA GACACTGGAG ACCCCAAAAG GTGGGAGGGT

AGGAGAGGGG TGAAGGATAA GGAATTCCCT ATTGGATACA ATGTACACTA TTTGGGTGAT

GATTATACTA AAAACCTAGA CTTCACCACT ATGCCATACA TCCATGTAAC A2~TGCAC

TTGTACCCCC TAAATCATTT TATTTTTTAA AAGTAGTTGG AATCTAACAT CTGTATAAGA

AGAAGCTAAG TTTTTCTGGA GGAAACATGT CATGTGTAGC AAGTGGGTGG CCAATTAAAA

CAGCCATAAA AATATAGATG ACGAAATCTA TCTGCAGGAA CTTGAGTCAA AATGGCAAAA

TGAGCACATA GCTTAGCAAC ATTCTTACCC CAGAATTTTC TGTAGATATT GCTGTACAAA

ATTGAGGACC TATAAAACTG CTAAAAATGT TATCTAACTT CATGGCATAT TGCCTGGCAA

GAAGACCCAA ACTTTCAATC AAGCATACTG TCAGCTATCA CCAGCATATA TTCCCTATAA

ATTAGAAATA GA/kATGTTAA CTACCAATAA TGTTATATAA TAGTCATTTT TCTCAATTAA

GAGAAGGAAA AAGCATTTCT CTATAAATAG ACATTAACAG TAAAATAATA GGCCACAATA

AGGTAAGAAG TGGTACTGGG TTCTTGGGGA AAGTCTCCAT TGAGAAGACC ATGTCATTTT

Page 7: Sequence and organization of the human vitamin D-binding protein gene

391

TCCTAACGTT AGTAAATTTC TCAGATCCTA TTCCCTCCTC CTCCCCCCAA CTTTCTTTTT

TTTTGCCTCC TTTGAGTCTC TTCTTGCCTC ATTTGCACCC CCCGGCAGCT CAAACCTACT

TGTTTCAACA ATGCTTTTAT GAACTAACTG GTCCCCCTTA CCCTTTTGTC CCTTCCCAGC

ATCCACTCTT GCCAAGCCCA ACTCTAGATT AATTTGAGTC TCCTTACCTC CCATTGCCCC

CGCTGGGCTG GCAAGCATTT CCTGAAGAAA GTCATAAAAC CAGGCTGAGT TGTTCATGGG

AGTGGACACA CGCCGAGATC ATCGATTGTA CAAATTGAAA GAATCAGGAG ATCTCAAGTC

TTATCACCAT CCTGAAAGAG AAATGGAGAG TTCGTTTTTA GTAGAGCTAA AAATAAGAGC

GAAAATTTCC TTACTCCTGT ATTGGTGTTT TCCAACATAG TGAGTTGAGA AGTAGATAAA

TTGAGGGTGA AATTATATAA ATTATGAGAG GAAAAAAAGG CATTAAGCTG GTATGAGGTC

CTGTAAAGGA AATATTCTTT AAGGAATTTG AAATTTAAAA CTGAAGAGAA GGTGAAAGGT

TAGGATAAAA TAGAAGAGGT ACTCTTCCAT TTTGAAATAA TGAGCAAATG AAAGAAGACT

GGACTTCCAA TTCAGCAGCG ATTTGTATGT TTATTTTTAT GATCTCGAAG AGGCATGTTT

CACTTTCTGA TCTCAAATTG ACTATTCTAT ACCACAGGTA TAGAATTTTC TTGAGACAGG

CAAGTATTTC TATTTTCATT TTTATTGTAA AAGATCTGAA ATGGCTATTA TTTTGCATTA

GAAATTTGTA TAAAATAAAT ACATGTAGTA AGACCTTACA TTTAAATGGT TTTTCAG

Exon ii> 133bp

A CTG GCA GAG CGA CTA AAA GCA AAA TTG CCT GAG GCC ACA CCC ACG GAA

s Leu Ala Glu Arg Leu Lys Ala Lys Leu Pro Glu Ala Thr Pro Thr Glu

410 420

CTG GCA AAG CTG GTT AAC AAG CGC TCA GAC TTT GCC TCC AAC TGC TGT TCC

Leu Ala Lys Leu Val Ash Lys Arg Ser Asp Phe Ala Ser Asn Cys Cys Ser

430

Intron ii> (no length

ATA AAC TCA CCT CCT CTT TAC TGT GAT TCA GAG GTAGGAAAAT GTAACCCTCC

Ile ASh Ser Pro Pro Leu Tyr Cys Asp Ser GIu

440

data)

ACTTAACATG GCAGAATCTT TTAAGAACGT ATGCACTCCA ATCTACTCAT TTCTTTCCTG

TTATTGAGAT GCCATTATGT GACAGGCTTT TCCTGGTGTT ATTGTAACTT GGCTGTCTTT

GCAATGAAAG TAAGAAACAT AACTGATTTC ATGCTATGCT CATTTAAAAG CAA .......

TTTAAATGAA AAAGAGAAGA ATAAGAATAC AGTGAGAACA TCAGAGTTCA TTACCTTCAC

ATAGTGAAGG TAAATTGTTT TCATATATCT GTATGTGTGT TTATGTAGAT ACATACATAT

GCATATATAG GCAGGTAGAT TGATAAAAAT AAATACTTTA GAAAAATGGG GTCATATCAT

ACA ....... ACTTTTCCCA GCAGACTCTA AGCCCTTCAA GGGAGAGATG ATGTAAAGCT

AAATAAATGC AAAGGGTGGC TAATAAAAAA CTGCAATGTA ACATCGCATC TTGGCAAGAA

TCCGCTCCCC TAACCAAGGT TTCAAGATTT TGTATACCTT CTGCATGATC TAGTAGCCTA

EXOn 12>

55(30)bp

CAAATGAAAT ACTTCATAAG TTATATAACA TTCTTTTTTC ATTATCAG ATT GAT GCT

Ile Asp Ala

450

Intron 12>

GAA TTG AAG AAT ATC CTG TAG TCCTGAAGCA TGTTTATTAA CTTTG GTAAGTATAT

Glu Leu Lys ASh Ile Leu ***

(no length data)

TTTATAAATG CAGGACCAAC GGATCCAGCA TTGTAGGATG TATTTTAAAT ATCAAA ....

........... TTTTTTCTTA TTCTGGACCT ATGAGATGGG CCATAGTGGA TATATGGGTC

ATCCTAGACA GAGTCATTTT TdGCATAAAT ATTAAAATAG CATGAGTTTA TATGCGTCTG

TTGAGCAATA TTATGGTTGT TGCCCATTAG CTCTGTTTTT CTAACATAAA ATTCATTTCA

Exon 13> 173bp (not translated)

TTATTCTCAT AG ACCAGAGTTG GAGCCACCCA GGGGAATGAT CTCTGATGAC CTAACCTAAG

CAAAACCACT GAGCTTCTGG GAAGACAACT AGGATACTTT CTACTTTTTC TAGCTACAAT

ATCTTCATAC AATGACAAGT ATGATGATTT GCTATCAAAA TAAATTGAAA TATAATGCAA

Fig. 2. Genomic sequence of the human DBP allele GC* 1S.Exon sequences are shown in triplets, intron sequences in blocks of 10 nucleotides. The introns 2, 5, 7, 9 and 10 were completely sequenced. Nonsequenced regions of introns are indicated ( - - - - - - ) . The number of coding nucleotide residues of exon 1 and exon 12 is shown in parentheses. The 16 amino acids long leader peptide is underlined. The polyadenylation

signal ( A A T A A A ) in exon 13 is typed in bold. The numbers under the triplet codons represent amino acid numbers.

Apart from the genetic polymorphism, the published cDNA sequences differ in five nucleotides [11,12]. In our DNA preparation from a GCIS homozygous donor

TABLE VI

Characterization of the products of genomic anchor PCR (ga-PCR)

Ga-PCR Length of the Portion of Intron Portion of product subcloned the exon the intron

ga-PCR 5' or 3' product

EI-I1 = 450 bp 81 bp 1 5' 341 bp E2-I1 = 300 bp 43 bp 1 Y 62 bp E6-I6 = 900 bp 61 bp 6 5' 63 bp E7-I6 = 500 bp 125 bp 6 3' 375 bp E l l - I l l = 300 bp 25 bp 11 5' 193 bp E12-111 = 300 bp 30 bp 11 3' 218 bp E12-I12 --- 400 bp 42 bp 12 5' 66 bp E13-I12 --- 500 bp 75 bp 12 3' 182 bp

The length of the subcloned PCR-products which is shown in column 2 was estimated from the agarose gel after plasmid preparation and digest with the restriction enzymes EcoRI and PstI. In column 5 the number of the sequenced nucleotides is shown.

the nucleotides found in these positions were identical to those in the GC2 cDNA preparation [12]. The cDNA sequence revealed by Yang et al. [12] may thus be regarded as established. The introns 2, 5, 7, 9, and 10 were completely sequenced, the remaining introns only in part. The sequences at the 5' and 3' ends of the introns agree with the G T / A G rule for exon/intron boundaries of eukaryotic genes (Table VII). Our re- sults for the boundary of exon 1 to intron 1 are identical with the sequence published by Yang et al. [16].

Three alu-elements were identified, one each in intron 2, 4, and 8. In the latter case the alu-repeat sequence in intron eight showed a polymorphic length at the 3' end and this polymorphism showed linkage disequilibrium to the common DBP-alleles [27].

Amplification of the exons After sequencing the introns we designed intron-

specific primers for amplifying each exon with the PCR-method (data not shown). Fig. 3 shows the result- ing PCR-products on ethidium bromide stained agarose gel.

Page 8: Sequence and organization of the human vitamin D-binding protein gene

392

TABLE VII

Exon / intron boundaries qf" the human DBP gene

The invariant GT and AG residues at the respective 5' and 3' ends of the introns are in bold. Triplet codons are separated by a space. The non coding sequences are shown in blocks of nine nucleotides each. * Indicates products obtained by genomic anchor PCR (ga-PCR).

primer PCR-product Exon 5' Intron 3' Intron Exon

A13 /A54 El-f1 * 1 AGA G G T A A G A T T T 1 I2 /A37 E2-11 * 1 GCCTCCTAG GC CGG 2 A38/A41) E2-12-E3 2 TCT CT G T A A G T G T G 2 TCCCCTCAG G TCA 3 A15 /A16 E3-13-E4 3 CC A G G G T A G G T T T C 3 CTCCTCCAG ACC TC 4 A 3 / A 4 E4-I4-E5 4 A A T CA G T G A G T G C C 4 CCTCCACAG A TTT 5 A I 9 / A 3 6 E5-15-E6 5 AA G A G GTATGTCCC 5 GTTTTCTAG A G A C 6 A19 /A35 E6-16 * 6 CTC AG G T A A A G A T T 6 17/A85 E7-I6 * 6 ACTGTTTAG C A A T 7 A21 /A22 E7-I7-E8 7 AA GAG G T A A G A C A A 7 GTTTTGTAG CTG CC 8 A 5 / A 6 E8-I8-E9 8 G A T AA G T A A G T A G A 8 TCATGCTAG G TAT 9 A 2 3 / A 2 4 E9-I9-Elll 9 CT A A G G T A T A T T T G 9 TTCTTACAG GGC CC 10 A 7 / A 8 EI0-110-Ell 10 A A A AA G T A A G A A A C 10 GTTTTTCAG A CTG 11 A25/A51 E l l - I l l * 11 C A G A G G T A G G A A A A 11 A 8 7 / A 7 1 / A 9 1 E12-I11 * 11 CATTATCAG ATT GA 12 A86 /A72 E12-112 * 12 CTTTG G T A A G T A T A 12 A 5 5 / A 3 4 E13-I12 * 12 ATAGACCAG A G T T G G 13 112/A26 1t2-E13 13 end

Discussion

Analysis of the hDBP gene structure and compari- son with the genes for hALB and hAFP substantiates the concept of their common origin. Remarkably the exons 3, 4, 5, 7, and 10 are identical in size in all three genes. The overall similarity between the genes for ALB and AFP is however higher than of each and the GC gene. The gene for albumin as well as the gene for c~-fetoprotein consists of 15 exons, in contrast to the DBP gene with only 13 exons as illustrated in Fig. 4. This fact is reflected in the shorter amino acid se- quence of GC with only 458 amino acid residues [1] compared to ALB with 585 amino acid residues [21] and AFP with 590 [22]. As first suggested by Gibbs and Dugaiczyk [28], it appears that the truncation is caused

kb

1 , 6 -

1 , 0 -

0 , 5 -

0 ,3 -

1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 3. Exons of the hDBP gene amplified by PCR with intron-specific primers. Lane, 1 marker (Gibco BRL); lane 2, exon 1; lane 3, exon 2; lane 4, exon 3; lane 5, exon 4; lane 6, exons 5 and 6, amplified together; lane 7, exon 7; lane 8, exon 8; lane 9, exons 9 and 10, amplified together; lane 10, exon 11; lane 11, exon 12; lane 12, exon

13; lane 13. marker.

by the loss of two internal exons of the DBP progenitor gene after divergence from the ALB and AFP precur- sor gene rather than by a nonsense mutation. Our results support this concept since, the exons 12 and 13 of the DBP gene are more similar in size to exons 14 and 15 of ALB and AFP than to their corresponding exons 12 and 13 (Fig. 4). In addition similar to the exons 14 and 15 of albumin and a-fetoprotein, the exon 12 of the GC gene contains both translated and untranslated sequences while the exon 13 consists only of untranslated bases. These observations support the interpretation that the exons of the GC gene corre- sponding to exons 12 and 13 of A L B / A F P have been lost, at least in their function as exons. Analysis of the rat DBP gene led Ray et al. [23] to suggest that intron 11 of the DBP gene could contain remnants of the progenitor exons 12 and 13. However, this supposition has not yet been confirmed.

The exons 1 and 2 of the DBP gene are quite different in size from the corresponding exons in ALB and AFP. At positions 13 and 59 in exon 2 and exon 3, respectively, cysteine residues were found which are not present in ALB or AFP. Thus, a disulfide bond, unique to DBP, can be formed in the first domain of the protein. It may be that this structural divergence permits the unique function of 25(OH)D 3 binding as first suggested by Cooke [29]. In exon 10 of the rat gene a DBP unique sequence coding for 19 amino acids is present [29]. This amino acid sequence is thought to contain a consensus sequence for an actin binding domain [30]. Exon 10 might, therefore, encode the actin binding region of GC [29]. However, in the human DBP this region contains three amino acid

Page 9: Sequence and organization of the human vitamin D-binding protein gene

Human Albumin gene (Minghetti et al., 1986)

hon I 2 3 4 5 6 7 8 9 I0 Ii 12 13 118(79) 58 133 212 133 98 130 215 133 98 139 224 133

S' i - ! - - - ! " " ' , - - 1 - - - L - - I - - - I - - - H - i , , I II I 709 1454 [832 549 824 1587 1293 1399 1088 1177 418 1192

Intron I 2 3 4 5, 6 7 8 g i0 ii 12 ALU elements <- -> <- <- <-

14 15 60(45) 163(o) m m ---I-END 3'

614 770 13 14

393

Human a-Fetoprotein gene (Gibbs et al., 1987)

Exon I 2 3 4 5 6 7 8 9 I0 II 12 13 14 15 129(85) 52 133 212 133 90 [30 215 [33 98 13g 224 133 55(45)145(0)

s ' - - - - I - - - I - - - - I - - - I - - - t - - 4 - - - I - - 4 - - - I - - 4 - - - I - - - - - I - - - - I - - - - i - - - t - - E N D 3' 012 962 2207 1486 918 1548 2275 1657 568 482 1647 I140 1338 340

Intron 1 2 3 4 5 6 7 8 9 i0 11 12 13 14 ALU elements -> 4-<-

Human vitamin D-binding protein gene (this study)

Exon i 2 3 4 5 6 7 8 9 119 50) 70 133 133 95 203 13o

s' I - - I - - t - - - t - - t - - - I F - - - I - - - I - - - - I - ? 893 =2600 =1450 301 =5500 1131 =1600 468

Intron I 2 3 4 5 6 7 8 9 ALU elements -> -> ->

10 11 12 13 98 133 55(30) 173(0) I - - | ', | - - .D3 '

1757 ? ? 10 " II 12

Rat vitamin D-binding protein gene (Ray et al., 1991)

Kzon 1 2 3 4 5 6 7 8 g 120(50) 70 133 212 133 95 130 203 130

I • • • • • • • • 5 1 - - I • • I • • • •

• 10450 993 =2000 =2250 426 =3000 1120 2160 441 Intron 1 2 3 4 5 6 7 8 9

10 11 12 13 98 133 60(36) 159(0) • • • • • • ~--EWD 3'

1620 1310 3730 10 11 12

Fig. 4. Comparison of the gene structures of the human albumin, human a-fetoprotein, human vitamin D-binding protein and rat vitamin D-binding protein. Exons and introns are numbered. The exons are shown as black boxes (•). The orientation of the Alu elements is indicated

by arrows. The numbers in parentheses represent the coding nucleotides of these exons, including the termination codons.

residues also found in human albumin. In addition rDBP and hDBP show a total of 13 nucleotide differ- ences and seven amino acid changes in this segment indicating that it is not a particularly conserved region. The assignment of actin binding activity to this region therefore remains to be established.

The alu-elements identified in the GC-gene were not found at corresponding positions of the ALB- and AFP-genes. We conclude that the alu-sequences were inserted after the first duplication of the common ancestor gene.

Comparison of the genomic structure of the human and rat DBP gene [23] illuminates their high conserva- tion during evolution (Fig. 4). All coding exons are identical in size with the exception of exon 12 which has the stop codon two triplets later in the rat gene. As a result the amino acid sequence of the rat DBP is longer by two residues and comprises 460 amino acids [29] in contrast to 458 amino acids of human DBP. Exon 13 which only contains non coding nucleotides [11,12] is 14 nucleotides longer in the human gene. The introns are more variable in size but they show compa- rable length.

In our study, we obtained the D B P / G C gene spe- cific sequences by sequencing subclones originating from single PCR molecules. Therefore, we can not exclude the infidelity of the Taq polymerase. In the sequenced exons we did not detect errors, when com- pared with the published cDNA sequence [12], but in the introns the error rate remains uncertain. However, all oligonucleotides designed from these intron se- quences were able to amplify D B P / G C specific prod- ucts, and this includes the exons which we plan to examine in our program for the mutant analysis.

A c k n o w l e d g e m e n t s

This study was supported by a grant from the Deutsche Forschungsgemeinschaft, Bad Godesberg, Germany (C1 27/14-2), for which we are grateful. We acknowledge the expert technical assistance of Ms. A. Brandhofer. We would like to thank Judith Johnson, Ph.D., Department of Immunology, University of Mu- nich, for her help in the revision of this manuscript. Data comprised in this paper will be presented in a

Page 10: Sequence and organization of the human vitamin D-binding protein gene

394

M . D . thes is by A n d r e a s Braun , Ph .D . at t he F a c u l t y o f

M e d i c i n e , U n i v e r s i t y o f M u n i c h .

References

l Schoentgen, F., Metz-Botigue, M.-H., Joll~s, J., Constans, J. and Joll~s. P. (1986) Biochim. Biophys. Acta 871, 189-198.

2 Hirschfeld, J. (1959) Acta Pathol. Microbiol. Scand. 47, 160-168. 3 Cleve, H. and Constans, J. (1988) Vox. Sang, 54, 215-225. 4 Daiger, S.P., Schanfield, M.S. and Cavalli-Sforza, L.L. (1975)

Proc. Natl. Acad. Sci. USA 72, 2076-2080. 5 Van Baelen, H., Bouillon, R. and De Moor, P. (1980) J. Biol.

Chem. 255, 2270-2272. 6 Kew, R.R. and Webster, R.O. (1988) J. Clin. Invest. 82, 364-369. 7 Perez, H.D., Kelly, E., Chenoweth, D. and Elfman, F. (1988) J.

Clin. Invest. 82, 360-363. 8 Petrini, M., Emerson, D.L. and Galbraith, R.M. (1983) Nature

306, 73-74. 9 Yamamoto, N. and Homma, S. 11991) Proc. Natl. Acad. Sci. USA

88, 8539-8543. 10 Vavrusa, B., Cleve, H. and Constans, J. (1983) Hum. Genet. 65,

102-107. 11 Cooke, N.E. and David, E.V. (1985) J. Clin. lnvest. 76, 2420-2424. 12 Yang, F., Brune, J.L., Naylor, S.L., Cupples, R.L., Naberhaus,

K.H. and Bowman B.H. (1985) Proc. Natl. Acad. Sci. USA 82, 7994-7998.

13 Yang, F., Luna, V.J., McAnelly, R.D., Naberhaus, K.H., Cupples, R.L. and Bowman, B.H, 11985) Nucleic Acids Res. 13, 8007-8017.

14 Braun, A., Bichlmaier, R. and Cleve, H. (1992) Hum. Genet. 89, 401-406.

15 Reynolds, R.L. and Sensabough, G.F, (1990) in Advances in

forensic haemogenetics (Polesky, H.F. and Mayr, W.R., eds.), Vol. 3, pp. 158-161, Springer, Berlin.

16 Yang, F., Naberhaus, K.H., Adrian, G.S., Gardelle, J.M., Bris- senden, J.E. and Bowman, B.H. (1987) Gene 54, 285-290.

17 Schoentgen, F., Metz-Boutigue, M.-H., Joll~s, J., Constans, J. and Joll~s, P. 11985) FEBS Lett. 185, 47-50.

18 Weitkamp, L.R., Rucknagek D.L., Gershowitz, H. (1966) Am. J. Hum. Genet. 18, 559-565.

19 Mikkelsen, M., Jacobsen, P. and Henningsen, K. (1977) Hum. Hered. 27, 105-107.

20 McCombs J.L., Yang, F. Bowman, B.H., McGill, J.R. and Moore C.M. (1986) Cytogenet. Cell Genet. 42, 62-64.

21 Minghetti, P.P., Ruffner, D.E., Kuang, W., Dennison, O.E., Hawkins, J.W., Beattie, W.G. and Dugaiczyk, A. 11986) J. Biol. Chem. 261, 6747-6757.

22 Gibbs, P.E.M., Zielinski, R., Boyd, C., Dugaiczyk, A. (1987) Biochemistry 1332-1343.

23 Ray, K., Wang, X., Zhao, M. and Cooke, N.E. (1991) J. Biol. Chem. 266, 6221-6229.

24 Cleve, H., Constans, J. and Scheffrahn, W. (1991) Folia Primatol. 57, 232-236.

25 Constans, J., Gouaillard, C., Bouissou, D. and Dugoujon, J.M. (1987) Am. J. Phys. Anthropol. 73, 365-377.

26 Loh, E.Y., Elliott, J.F., Cwirla, S., Lanier, L.L. and Davis, M.M. 11989) Science 243, 217-220.

27 Braun, A., Bichlmaier, R., Miiller, B. and Cleve. H. (1993) Hum. Genet. 90, 526-532.

28 Gibbs, P.E.M. and Dugaiczyk, A. (1987) Mol. Biol. Evol. 4, 364-379.

29 Cooke N.E. (1986) J. Biol. Chem. 261, 3441-3450. 31) Tellam, R.L., Morton, D.J. and Clarke, F.M. 11989) Trends

Biochem. Sci. 14, 130-133.