Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the...

7
Hum Genet (1993) 90 : 526-532 human .. gene cs Springer-Verlag 1993 Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AIuVpA) in the vitamin D binding protein (DBP) gene Andreas Braun I, Regina Bichlmaier ~, Bertram Miiller 2, Hartwig Cleve j i lnstitut ftir Anthropologieund Humangenetikder Universitfit, Richard-Wagner-Strasse 10/I, W-8000 M/inchen 2, Federal Republic of Germany 2Abteilung for P~idiatrische Genetik, Kinderpoliklinik, UniversitY.itMiinchen,W-8000 MOnchen, Federal Republic of Germany Received: 15 April 1992 Abstract. We investigated an Alu element at the end of in- tron 8 of the human vitamin D-binding protein (hDBP, group-specific component, GC) gene that shows a polymor- phic poly(A) tail due to a variable number of tandem repeats (AluVpA) forming the 3" end of this member of the most abundant class of short interspersed repeated DNA element (SINES). The Alu element sequence in intron 8 of the GC gene was identical in all three common GC alleles (GC* IF, GC*IS, and GC*2) and could be classified as an Alu-Sa or Alu class-II sequence. The polymerase chain reaction was used to amplify selectively a fragment of about 200 bp con- taining the identified (TAAA)n repeat from genomic DNA of 188 unrelated human subjects. The size of the amplified products was determined by polyacrylamide gel electropho- resis. Four alleles (named GC-18"6, GC-I8*8, GC-IB*10, and GC-18"11 ) were found that differed in size by multiples of four nucleotides. The allele frequencies ranged from 0.0053 to 0.8511 and the observed heterozygosity was 26%. The sta- ble inheritance of this polymorphic patterned poly(A) se- quence was confirmed by a segregation study of a highly in- formative family with 19 members. Statistically significant linkage disequilibrium between the AIuVpA and the GC iso- electric focusing (IEF) phenotypes was found in a sample of 188 unrelated individuals and delta values were calculated from the observed haplotype distribution. Introduction The Alu repeat family is a prominent member of short inter- spersed repeated DNA elements (SINEs) in primate and other mammalian and non-mammalian genomes. There are about 500,000 copies in the haploid human genome. Each single Alu repeat sequence is composed of two halves with an A-rich region in the middle and a poly(dA) tail, which can be variable in length, on the 3" side. The 3" half contains an additional 31 bp compared with the 5" half (Deininger et al. 1981). Inverted Alu repeated sequences have also been de- scribed in eukaryotic genomes (for reviews see Schmid and Corre~spondence to: A. Braun Jelinek 1982; Schmid and Shen 1986; Deininger 1989). It is thought that the Alu sequences are dispersed via an RNA in- termediate, because of their structural homology with 7S RNA (Ullu et al. 1982; Ullu and Tschudi 1984). This process of transposition has been termed retroposition (Rogers 1983). Alu sequences can be divided into subfamilies: they have been classified by Jurka and Smith (1988) into Alu-J and Alu-S subfamilies; by Britten et al. (1988) into classes I-IV; by Quentin (1988) into classes A-F; and by Labuda and Striker (1989) into classes J, J*, and S. The different degrees of divergence of the Alu repeats suggest that insertion (retroposition) occurred at different evolutionary times. A re- cent retroposition, which is found only in the human genome, was discovered by Economou-Pachnis and Tsichlis (1985) and Friezner-Degen et al. (1986). These human specific Alu sequences represent 0.1% of the total number of Alu mem- bers in the human genome (Batzer and Deininger 1991). Economou et al. (1990) have described the poly(dA) tract of the Alu repetitive elements in the human genome as poly- morphic with respect to either the total number of (dA) or patterned (dA) rich repeats or both. The vitamin D-binding protein (DBP), also known as group-specific component (GC), is a polymorphic protein of the o~2-fraction of human serum with a molecular mass of 51,200 kDa, originally discovered by Hirschfeld et al. (1959). The gene is located at chromosome 4ql l-q13 (Cooke et al. 1986). The cDNA nucleotide sequence has been determined (Cooke and David 1985; Yang et al. 1985). The gene struc- ture of rat DBP has recently been established (Kunal et al. 1991). There are three common alleles (GC*IF, GC*IS, and GC*2) and more than 120 rare variant alleles of the GC/DBP system in the human population, which have been classified by isoelectric focusing (IEF; Cleve and Constans 1988). The three common alleles can also be discriminated by SO, I re- striction fragment length polymorphism (RFLP) analysis with a direct genomic DNA probe (Braun et al. 1991). Four obviously independent biological functions have been ascribed to GC. First, GC binds vitamin D 3 and its natural de- rivatives with highest affinity for 25-(OH)-vitamin D> the hormonally inactive precursor of the 1,25-(OH)2-vitamin D~. In this function GC is, thus, the transport protein for vitamin

Transcript of Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the...

Page 1: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

Hum Genet (1993) 90 : 526-532 human ..

gene cs �9 Springer-Verlag 1993

Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AIuVpA) in the vitamin D binding protein (DBP) gene Andreas Braun I, Regina Bichlmaier ~, Bertram Miiller 2, Hartwig Cleve j

i lnstitut ftir Anthropologie und Humangenetik der Universitfit, Richard-Wagner-Strasse 10/I, W-8000 M/inchen 2, Federal Republic of Germany 2 Abteilung for P~idiatrische Genetik, Kinderpoliklinik, UniversitY.it Miinchen,W-8000 MOnchen, Federal Republic of Germany

Received: 15 April 1992

Abstract. We investigated an Alu element at the end of in- tron 8 of the human vitamin D-binding protein (hDBP, group-specific component, GC) gene that shows a polymor- phic poly(A) tail due to a variable number of tandem repeats (AluVpA) forming the 3" end of this member of the most abundant class of short interspersed repeated DNA element (SINES). The Alu element sequence in intron 8 of the GC gene was identical in all three common GC alleles (GC* IF, GC*IS, and GC*2) and could be classified as an Alu-Sa or Alu class-II sequence. The polymerase chain reaction was used to amplify selectively a fragment of about 200 bp con- taining the identified (TAAA)n repeat from genomic DNA of 188 unrelated human subjects. The size of the amplified products was determined by polyacrylamide gel electropho- resis. Four alleles (named GC-18"6, GC-I8*8, GC-IB*10, and GC-18"11 ) were found that differed in size by multiples of four nucleotides. The allele frequencies ranged from 0.0053 to 0.8511 and the observed heterozygosity was 26%. The sta- ble inheritance of this polymorphic patterned poly(A) se- quence was confirmed by a segregation study of a highly in- formative family with 19 members. Statistically significant linkage disequilibrium between the AIuVpA and the GC iso- electric focusing (IEF) phenotypes was found in a sample of 188 unrelated individuals and delta values were calculated from the observed haplotype distribution.

Introduction

The Alu repeat family is a prominent member of short inter- spersed repeated DNA elements (SINEs) in primate and other mammalian and non-mammalian genomes. There are about 500,000 copies in the haploid human genome. Each single Alu repeat sequence is composed of two halves with an A-rich region in the middle and a poly(dA) tail, which can be variable in length, on the 3" side. The 3" half contains an additional 31 bp compared with the 5" half (Deininger et al. 1981). Inverted Alu repeated sequences have also been de- scribed in eukaryotic genomes (for reviews see Schmid and

Corre~spondence to: A. Braun

Jelinek 1982; Schmid and Shen 1986; Deininger 1989). It is thought that the Alu sequences are dispersed via an RNA in- termediate, because of their structural homology with 7S RNA (Ullu et al. 1982; Ullu and Tschudi 1984). This process of transposition has been termed retroposition (Rogers 1983).

Alu sequences can be divided into subfamilies: they have been classified by Jurka and Smith (1988) into Alu-J and Alu-S subfamilies; by Britten et al. (1988) into classes I-IV; by Quentin (1988) into classes A-F; and by Labuda and Striker (1989) into classes J, J*, and S. The different degrees of divergence of the Alu repeats suggest that insertion (retroposition) occurred at different evolutionary times. A re- cent retroposition, which is found only in the human genome, was discovered by Economou-Pachnis and Tsichlis (1985) and Friezner-Degen et al. (1986). These human specific Alu sequences represent 0.1% of the total number of Alu mem- bers in the human genome (Batzer and Deininger 1991).

Economou et al. (1990) have described the poly(dA) tract of the Alu repetitive elements in the human genome as poly- morphic with respect to either the total number of (dA) or patterned (dA) rich repeats or both.

The vitamin D-binding protein (DBP), also known as group-specific component (GC), is a polymorphic protein of the o~2-fraction of human serum with a molecular mass of 51,200 kDa, originally discovered by Hirschfeld et al. (1959). The gene is located at chromosome 4ql l-q13 (Cooke et al. 1986). The cDNA nucleotide sequence has been determined (Cooke and David 1985; Yang et al. 1985). The gene struc- ture of rat DBP has recently been established (Kunal et al. 1991).

There are three common alleles (GC*IF, GC*IS, and GC*2) and more than 120 rare variant alleles of the GC/DBP system in the human population, which have been classified by isoelectric focusing (IEF; Cleve and Constans 1988). The three common alleles can also be discriminated by SO, I re- striction fragment length polymorphism (RFLP) analysis with a direct genomic DNA probe (Braun et al. 1991).

Four obviously independent biological functions have been ascribed to GC. First, GC binds vitamin D 3 and its natural de- rivatives with highest affinity for 25-(OH)-vitamin D> the hormonally inactive precursor of the 1,25-(OH)2-vitamin D~. In this function GC is, thus, the transport protein for vitamin

Page 2: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

527

D3 and its derivatives (Daiger et al. 1975; Haddad and Wal- gate 1976). Secondly, GC stabilizes monomeric G-actin in plasma and prevents the spontaneous polymerization of G-

actin to F-actin and possible damage to the blood capillary system (Van Baelen et al. 1980, 1988). Thirdly, GC binds to immunoglobulin G and, therefore, it could play a role in im- munological responsiveness (Constans et al. 1981; Petrini et al. 1985). And lastly, GC can serve as a cochemotaxin for C5a and C5a des Arg and, thereby, enhances the chemotactic activity and restores the activity of these complement compo- nents (Perez et al. 1988; Kew and Webster 1988).

In this study we investigated an Alu repeat in intron 8 of the human DBP (hDBP) gene. We analysed the Alu se- quences of the three common DBP alleles (see below) in our population and compared the results with the general consen- sus Alu sequence published by Jurka and Smith (1988) and the human specific consensus Alu sequence from Batzer and Deininger (1991). On the 3"side of this Alu repeat we discov- ered a polymorphic patterned (dA) rich repeat of which we examined the Mendelian inheritance, the allele frequencies in the population of southern Germany, and the association with the genetic DBP variants as disclosed by IEF.

Materials and methods

Preparation of genomic DNA

Genomic DNA was prepared from l0 ml EDTA blood samples that were collected from 188 unrelated healthy blood donors from the Blood Bank of the Bavarian Red Cross at Munich, Bavaria, and from family members. After centrifugation of the whole blood sample the plasma su- pernatant was preserved for GC typing by IEF. The erythrocytes were lysed twice in ice-cooled isotonic ammonium buffer (155 mM NH4C1 , 10 mM KHCO3, 0.1 mM EDTA, pH 7.4). The white blood cells were re- suspended in 5 ml 75 mM NaCI, 25 mM EDTA, pH 8.0, lysed with 250 ml 20% SDS solution, and incubated with 150 ~tg Pronase E (Sigma, Munich, FRG) at 37~ for 14 h. The next day the sample was vortexed with 1.5 ml 5 M NaC1 for 10 s and centrifuged for 10 min at 3,000g. The supernatant DNA was precipitated with 20 ml 100% ethanol and washed in 10 ml 70% ethanol. The precipitated DNA was resuspended in 0.5 ml 10 mM TRIS-HC1, 1 mM EDTA, pH 8.0.

nant plasmids were subcloned in Escherichia coli MC 1061 and purified with the Quiagen midi-kit (Diagen, Diisseldorf, FRG) according to the manufacturer's instructions. DNA sequencing was performed by the dideoxy chain termination method (Sanger et al. 1977) using the T7 polymerase sequencing kit supplied by Pharmacia (Freiburg, FRG). Se- quencing products were labelled using [a-32p]dATP (Amersham, Braun- schweig, FRG). The sequencing electrophoresis was done with a 6% acrylamide, 6 M urea gel in a DNA sequencing electrophoresis unit (21 x 50 cm) from Biorad (Munich, FRG). The running conditions were a constant 2,000 V at 50~ for 2 h.

Detection of polymorphic alleles by polyacrylamide gel electrophoresis (PAGE) followed by silver staining

In order to detect the different polymorphic alleles, 20 ~tl of the PCR product from primer combination A56/A6 was mixed with 0.1 vol. of loading buffer (15% Ficoll 400, 0.25% bromphenol-blue, 0.25% xy- lene-cyanole) and loaded on an 8% polyacrylamide gel [12 ml of a 30% acrylamide stock solution (29 g acrylamide and 1 g bisacrylamide) were mixed with 33 ml TBE buffer (85 mM TRIS, 90 mM boric acid, 2 mM EDTA), 315 ILl 10% ammonium peroxodisulfate solution, and 16 [tl TEMED]. The gel solution was immediately transferred to 20 x 14 cm casting chamber that was 1.5 mm thick. Polymerisation was allowed for at least 30 min at room temperature. Running conditions were either about 16 h at a constant 50 V or about 5 h at a constant 150 V. After this time the xylene-cyanole band was about 1 cm from the bottom of the gel. The allelic DNA bands were revealed using a silver staining kit from Biorad (Munich, FRG) according to the manufacturer's instruc- tions. The developing reaction was stopped with 5% acetic acid.

IEF of the GC phenotypes

IEF of the plasma GC phenotypes corresponding to the individual ge- nomic DNA preparations was performed as described by Braun et al. (1990).

Testing for allelic association

The frequencies of the GC-I8 haplotypes and the corresponding delta values were obtained from the GC-I8 genotype distribution data (Table 1) using a modified version of the program ASSOC (Ott 1985). This program uses the gene counting method (Ceppellini et al. 1955), which allows the phase to be derived in persons heterozygous at both loci (e.g. GC and GC-I8) considered. The delta values obtained were then tested for significance using an exact test of goodness of fit (Mtiller and Cler- get-Darpoux 1991). Care was taken to correct the significance level for multiple testing.

Polymerase chain reaction (PCR)

For the PCR we used two exon-specific oligonucleotides spanning cod- ing region nucleotides 911 to 920 (5"-CAGCCATGGACGTTTTTGT- G-3"), named A5, for the 5" end and nucleotides 1069 to 1088 (5"- TTACTGAGGAATACTTCCGG-3"), named A6, for the 3" end. Fur- thermore, we used an oligonucleotide (5"-CAGCGAGCCAAGATG- GCAC-3"), named A56, which lies in the 3" part of the Alu repeat. The Taq DNA polymerase was purchased from Boehringer (Mannheim, FRG). The total reaction volume of 40 ~tl included about 1 ~g genomic DNA for primer A5/A6 and 100 ng for primer A56/A6, 50 ng of each primer, 1.25 U Taq DNA polymerase, 200 ~tmol of each dNTP, and 1.5 mM MgCI2. Each sample was subjected to the following 30 amplifi- cation cycles: 1 min at 94~ for denaturation, 1 min at 53~ for anneal- ing, and 2 min for primer A5/A6 or 1 min for primer A56/A6 at 72~ for extension.

Ligation, subcloning and sequencing of the allelic PCR products

The allelic PCR products were blunt-ended with T4 DNA polymerase (Boehringer, Mannheim, FRG) by elevating the magnesium concentra- tion after PCR to 10 mM and adding 0.5 U of the enzyme. After incuba- tion at 37~ the PCR fragments were ready for ligation in the HindII site of pUC 19 (BRL, Eggenstein, FRG). Ligation was done with 1 U T4 DNA ligase (Boehringer, Mannheim, FRG) at 15~ for 14 h. Recombi-

Results

Alu repeated sequence in the DBP gene

PCR was used for amplification of intron 8 of the DBP gene in which resides the Alu repeat. With the help of the exon- specific primers A5 (exon 8) and A6 (exon 9) we obtained a single PCR product of about 1-8 kb in length from three in- dividual genomic DNA preparations, which represented the three common homozygous DBP genotypes. The exon se- quences on the left- and on the right-hand side were 178 bp long in total and, therefore, the intron 8 sequence was about 1.6 kb. Figure 1 shows parts of the intron sequence, the Alu repeat, the polymorphic poly(dA) tail (allele 6), and parts of the exon 9 sequence. Except for the variable poly(A) se- quence at the 3" end, the Alu repeated sequences in intron 8 of all three common DBP alleles (GC*IF, GC*IS , and GC*2) were identical. They showed 86.5% homology with the Alu consensus sequence published by Jurka and Smith (1988) and 83.7% homology with the consensus sequence of Britten et al. (1988). The position data refer to the consensus

Page 3: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

528

i n t r o n 8 TTAAGATGCA GAAACCAGGT GCTCTGCCTT TATGATAGAT ATTTCACATT TTGAAATGTA

IAlu sequence in intron 8 > GGCCAGGCAC AGTGGCTCAT CCATCTAATC CCAGCACTTT GGCAGGCCGA GTCAGGCGAA

I I i fill I li I GGCCGGGCGC GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCGA GGCGGGCGGA

TCACTTGAGG TCAGAAGTTC CAGACCACCC TAGCCAACAT GGTGAAACCC TGTCTCCACT

I I I I I i i TCACCTGAGG TCAGGAGTTC GAGACCAGCC TGGCCAACAT GGTGAAACCC CGTCTCTACT

AAAAATACAA AACTTAGCCG GGCATGGTGG TGGGCGCCTA TAATCCCAGG TACTCC4]GAG

I I II i I AAAAATACAA AAATTAGCCG GGCGTGGTGG CGCGCGCCTG TAATCCCAGC TACTCGGGAG

A56 > GCTGAGGCAT GAGAATCGCT TGAACCCAGG AGGCAGAGGT TACAGCGAGC CAAGATGGCA

I I I i I I i i GCTGAGGCAG GAGAATCGCT TGAACCCGGG AGGCGGAGGT TGCAGTGAGC CGAGATCGCG

< AIu sequence in intron 8 I polymorphic AIuVpA ~CACTTCACT CCAACCTGGG TGACAGAGGG AGACTCTGTC ACAAAAAATAA ATAAATAAA

I I I I I I 1 2 3 CCACTGCACT CCAGCGTGGG CGACAGAGCG AGACTCCGTC TC

allele 6 intron 8 exon 9 ~AA~TA AATAGGAAAT ATATAACACA TATCTCCTTT TCTCCCTCAT GCTAG GTATA 4 5 6 sTyrT

< A6

CATTTGAACT AAGCAGAAGG ACTCATCTTC CGGAAGTATT CCTCAGTAA hrPheGluLe uSerArgArg ThrHisLeuP roGluValPh eLeuSerLy

consensus

consensus

consensus

consensus

consensus

Fig, l. Partial nucleotide sequence of intron 8 and exon 9 of the human vitamin D-bind- ing protein (hDBP) gene. The Alu sequence of 282 bp is indicated. The direct repeats flanking the Alu sequence are doubly under lined. The RNA polymerase llI promoter sequence is also underlined. The primer se- quences A56 and A6, which were used lbr the amplification of the fragment comprising the TAAA repeat, are indicated. The allele shown is that with six repeals (GC-18*6). The consensus sequence of Jurka and Smith (1988) is compared with the Alu repeat of intron 8. The 38 differences are indicated by vertical lines

sequence of Jurka and Smith (1988). The homology to the human specific Alu consensus sequence (Batzer and Dei- ninger 1991) was only about 82%. For classification of this Alu repeat see the Discussion.

The Alu insertion unit that we investigated had almost all of the previously described characteristics of this most successful class of mobile elements, which is distributed throughout primate genomes: the consensus sequence related part is 282 bp in length and consists of two directly repeated monomer units of which the first is 120 bp and the second 150 bp long. A typical A-linker (position 121-133, AAAAA- TACAAAA) connects the two monomers. The end of the second monomer consists of an oligo(dA) tail, which is a pat- terned dA rich repeat. The 13 bp long internal promoter se- quence for RNA polymerase IlI (position 74-86, underlined in Fig. l) shows the required Pu/Py pattern with the excep- tion of one GC transversion. There is also an imperfect flank- ing direct repeat (doubly underlined in Fig. 1) of the se- quence TNNGAAATPuTA. Only the AluI restriction site (AG/CT) at position 168-171 is not present because of a CG transversion.

Altogether there are 38 nucelotide exchanges of which 19 occur at CpG positions that have a tenfold greater mutation rate (Bird 1980; Bains 1986), so that only 6 of the 24 consen- sus CpG doublets are found in the described Alu element. Ten transversions and nine transitions can be observed at non-CpG positions.

Polymorphic variable poly(dA ) tail (AluVpA )

On the 3" side of this Alu repeat we discovered a polymor- phic poly(dA) tail with a patterned (dA) rich tetranucleotide repeat (TAAA)n. In the population of Southern Germany we

detected three common alleles of this patterned repeat, termed allele GC-18"6, GC-IS*8, and GC-I8*10 depending on the number of TAAA unit repeats, and one rare variant, termed allele GC-18* 11 (I8 is the abbreviation for intron 8). Figure 2 shows the sequences of the common allele GC-I8*8 and al- lele GC-I8*I0, the variant allele GC-IS*II, and allele GC*I8*9, seen only after subcloning of the total PCR prod- ucts of a genomic DNA preparation heterozygous for the al- leles GC-IS*8 and GC-I8* 10. For the estimation of the allele frequency in the examined population we established a PCR procedure specific for this polymorphic marker. The PCR primers were the exon specific A6 on the 3"side of the repeat and on the other side the primer A56, which binds near the repeat in the Alu sequence (both primers are marked in Fig. 1). The allele specific products were 187 bp for GC- I8"6, 195 bp for GC.I8*8, 203 bp for GC-I8* 10, and 207 bp for GC-I8*ll. The products of the PCR reaction were analysed by PAGE and the specific DNA bands detected by silver staining. The method was suitable for screening DNA preparations for this polymor-phic locus.

Figure 3 shows all observed homozygous and heterozy- gous phenotypes in the sample of 188 individuals. The distri- bution of the four alleles in the tested population sample of southern Germany is listed in Table 1. The most frequent al- lele was that with an intermediate number of repeats (GC- I8"8), and this accounted for 85.11% of all alleles. On the other hand the alleles GC-I8*6 and especially GC-I8*ll were very rare and, therefore, only found in the heterozygous state (Fig. 3, lanes 4, 5 for allele GC-I8*6 and lanes 7, 8 for allele GC-I8*I 1). The observed heterozygosity was 26.0% which is nearly exactly the calculated frequency of 26.1% that would be expected if all four alleles were in Hardy- Weinberg equilibrium.

Page 4: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

Fig. 2. Sequences of the different GC-I8 alleles (AIuVpA). The alleles GC-I8"8, GC-I8* 10, and GC-I8*I 1 are shown. (The allele GC-I8*6 is not shown.) 1 GC-I8"9 was only seen after subcloning the polymerase chain reaction (PCR) products of a heterozygous GC-I8 8/10 phenotype in the pUC 19 system. We consider it to be the result of an unequal sister chromatid exchange in the white blood cells that were used for prepara- tion of genomic DNA

529

Table 1. Distribution of the GC-I8 (AluVpA) and GC phenotypes and alleles in a sample of 188 unrelated individuals in southern Germany

GC-I8 AIuVpA allele GC isoelectric focusing (IEF) phenotypes

Repeat n ob- n ex- Type nob- n ex- units served pected served pected

6/6 0 0.2 1S-IS 65 65.5 6/8 10 10.2 6/10 2 1.3 1F-IF 5 4.6 6/11 0 0 8/8 137 136.2 2-2 12 12.0 8/10 35 35.7 8/11 1 1.7 2-1S 57 56.1 10/10 2 2.4 10/11 1 0.2 2-1F 14 14.9 l l / l l 0 0

IF-IS 35 34.8

Total 188 188.2 Total 188 187.9

Allele frequencies Allele frequencies

GC-I8"6 0.0319 GC* 1S 0.5904 GC-I8"8 0.8511 GC* IF 0.1569 GC-I8* 10 0.1117 GC*2 0.2527 GC-I8* 11 0.0053

Total 1.0000 Total 1.0000

Fig. 3. The seven different observed phenotypes of the polymorphic pat- terned AIuVpA in intron 8 of the hDBP gene are shown. The repeats are located next to exon 9. The polymorphism is genetically determined; the locus is termed GC-18 (intron 8) and comprises three common and one rare alleles. The GC-I8"6 and GC-I8*I 1 alleles were only seen in the heterozygous state. GC-I8 was amplified by PCR using the specific primers A56 and A6, the products separated by 8% non-denaturing polyacrylamide gel electrophoresis (PAGE) and revealed by silver- staining

Mendelian inheritance of the polymorphic Alu VpA

For proof of Mendelian inheritance we tested several fami- lies. In two families with four children all members were ho- mozygous for the allele GC-I8*8. These families were, there- fore, not very informative. One family with 19 members comprising all three common alleles was informative con- cerning Mendelian inheritance and probable linkage of the

polymorphic AIuVpA to the common DBP phenotypes (Fig. 4). In this pedigree the allele GC-I8"6 was coinherited with the GC1S phenotype, the allele GC-I8"8 with the GC2 phenotype, and allele GC-I8*10 with the GC1F phenotype. In this family, therefore, the polymorphic repeat in intron 8 of the DBP gene showed a Mendelian inheritance pattern as- sociated with the GC/DBP polymorphism.

Allelic association of the common DBP alleles (GC*IF, GC*IS, GC*2)

Considering the combinations of GC-I8 and GC IEF types obtained (Table 2) an association can be seen between the al- leles GC*IF and GC-I8*10:38 of 42 GC-I8*10 alleles co- occurred with a GC IEF type heterozygous or homozygous for the allele GC* IF. In addition, it was striking that the only two homozygous GC-I8* 10 phenotypes were observed with the homozygous GC* IF type. On the other hand, 11 of the 12 GC-I8"6 alleles found appeared in combination with a GC type heterozygous or homozygous (in 6 cases) for the allele GC* 1 S, which seemed to suggest an association of the alleles GC*IS and GC-I8"6.

Table 3 shows the results of the test for allelic association. The P values give the significance level obtained using an exact test of goodness of fit (Mtiller and Clerget-Darpoux 1991). As 12 delta values were tested, a P value of 0.004 in l out of 12 tests corresponds to a P value of 0.05 if only a sin- gle test were performed. This correction is conservative in our case, as the 12 tests are not independent, which is usually

Page 5: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

531)

Fig. 4. Segregation of the AluVpA (GC-18) poly- morphism in a family wilh 19 members. The GC-18 phenotypes were determined as described in Fig. 3. The GC phenotypes were classified by isoelectric focusing (IEF)

Table 2. Distribution of genotype combina- tions in a sample of 188 unrelated individuals in southern Germany

GC IEF GC-I8 allele

types 6/6 6/8 6/10 6/11 8/8 8/10 8/11 10/10 10/ll l l / l I

1S IS 0 6 0 0 58 I 0 0 0 0 1F I F 0 0 0 0 0 2 0 2 1 0 2-2 0 0 0 0 I 1 l 0 0 0 0 2 IS 0 3 0 0 52 2 0 0 0 0 2 1F 0 1 0 0 4 9 0 0 0 0 IS-IF 0 0 2 0 12 20 1 0 0 0

Table 3. Frequencies of the GC-I8 (AIuVpA) allele and the GC pheno- type combinations

GC-18/GC No allelic Allelic Delta P-values combinations association association values

GC* 1 S/GC-I8"6 0.0188 0.0283 0.0095 0.1777 GC* 1 S/GC-I8"8 0.5025 0.5582 0.0557 0.0303 GC* 1S/GC-18* 10 0.0660 0.0039 0.0620 0.0000 GC* 1S/GC-I8* 11 0.0031 0.0000 -0.0031 0.4240 GC* I F/GC-I8*6 0.0050 0.0023 0.0027 0.7296 GC* 1F/GCq8*8 0.1335 0.0484 0.0848 0.0000 GC*IF/GC-18*I0 0.0175 0.1005 0.0830 0.0000 GC* 1F/GC-18* 11 0.0008 0.0053 0.0045 0.0370 GC*2/GC-18*6 0.0081 0.0013 -0 .11068 0.0822 GC*2/GC-18*8 0.2150 0.2441 0.0291 0.1672 GC*2/GC-I8* 10 0.0282 0.0073 -0.0210 0.0176 GC*2/GC-I8* 11 0.0013 0.0000 -0.0013 1.0000

impl ied in the correct ion made above. The delta values of the genotype combinat ions GC* 1F/GC-I8* 10 and GC* I F / G C - I8"8 express the signif icant excess of the first and the defici t of the last. The defici t of the combina t ion GC* 1S/GC-IS* 10 is also highly significant. The associat ion assumed by inspec-

tion be tween G C * I S / G C - I 8 * 6 was not conf i rmed statisti- cally. The allele GC*2 did not appear preferential ly with one o f the detected GC-I8 alleles. The associat ion of the alleles G C * I F / G C - I 8 * 1 0 , which was found by the results of the populat ion study (Table 1) was also apparent in the segrega- tion pattern in a family with 19 members (Fig. 4).

D i s c u s s i o n

The Alu sequence detected at the 3" end of intron 8 in the h D B P gene shows the main dis t inguishing characterist ics of these most p rominent members of the SINES. Because of specific differences f rom the c o m m o n Alu consensus se- quence at specific cal led "diagnost ic" - sites, several au- thors have def ined a number of subfamil ies that probably originated at different t imes in the history of primates (Jurka and Smith 1988; Britten et al. 1988; Labuda and Striker 1989). To classify our Alu repeat we used the categories of Jurka and Smith (1988) whose consensus sequence has al- ready provided the basis l~r the homology analysis. In this classif icat ion the Alu elements are divided into the Alu J sub- family, which consists of the oldest representat ives of the en- tire Alu family, and the three t imes larger Alu S subfamily, which is split into three branches, a, b, and c. The average

Page 6: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

531

Table 4. Comparison of the bases at diagnostic positions of the Alu re- peat GC-I8 with the Alu-Sa consensus sequence of Jurka and Smith (1988)

Diagnostic Alu Sa Alu GC-I8 position

65 T/C T 66 T T 78 T T 88 G G 95 C C

100 T T 153 C/G G 163 A/G A 197 C C 200 T T 219 G G

overall similarity between each analysed Alu-S sequence and the consensus sequence given by Jurka and Smith (1988) is 86.59%, which is nearly exactly the value for our Alu se- quence. Looking at the diagnostic positions (Table 4), an ab- solute conformity with the nucleotides suggested for the Alu- Sa branch, the "oldest" of the three "sub"-families, can be seen.

The average CpG content of the Alu-Sa sequences is 7.75 + 2.95 and thus the 6 CpG doublets in our Alu element correspond with that value. Two additional positions, 244 and 272, may lead to a further subdivision of the Alu-Sa branch into Alu-Sd (244:T/272:A) and Alu-Se (244:C/272:G). Accordingly, our Alu element was identified as a member of the Alu-Se class. The equivalent of the Alu-Sa branch in the classification of Britten et al. (1988) is called class II with its specific changes compared with their consensus sequence. The analysis of 26 diagnostic positions in our Alu sequence showed identity with the class I! consensus at 22 positions, so that the above described results were confirmed. The number of mutations (differing from the consensus) in the positions that are not diagnostic or CpGs makes it possible to date the insertion of the investigated Alu sequence into the DBP gene (Britten et al. 1988). Seven observed nucleotide exchanges at 254 non-CpG, non-diagnostic positions and the supposed drift rate of about 0.15% (Britten et al. 1989; Miyamoto et al. 1987) result in an age of about 45 Million years for this Alu repeat, which would suggest its absence from the prosimian genome but possible presence in the genome of new world apes. Its presence in the genome of old world monkeys and apes has already been demonstrated (Bichlmaier et al., in preparation). Looking at the polymorphic structured A-rich 3" end of the examined Alu repeat one can see that this Alu tail begins with a 5" run of pure As and ends with a 3" region consisting of a variable number of repeated TAAA elements. One can speculate that the structured part of the 3" end results from an originally pure poly(A) tract that was added via a post-transcriptional mechanism (Matera et al. 1990), and then, after retroposition into the genome, stabilized by ac- quiring other nucleotides. The suggestion that the structured 3" ends of the repeats were formed subsequently to their post- transcriptional polyadenylation is supported by the observa- tion that all detected members of the youngest Alu subfamily,

the PV (predicted variant) or HS (human-specific) subfamily (Matera et al. 1990; Batzer and Deininger 1991; Batzer et al. 1991) end in a stretch of uninterrupted A residues. In addi- tion, the tandem A-rich element of our Alu repeat resembles the short flanking direct repeat, which also shows an A-T- rich composition like the flanking regions of most known Alu sequences (Bains 1986; Kariya et al. 1987). Thus, this tandem repeat probably evolved by slippage events during replication or DNA repair (Fresco and Alberts 1960; Efstra- tiadis et al. 1980) after an adenosine to thymidine transver- sion.

Because of the high abundance of allele GC-I8*8 with its intermediate number of repeat units and the resulting low ob- served heterozygosity we consider this allele to be the ances- tral patterned A-rich tail of the Alu element, which produced the alleles GC-I8* 10 and GC-I8*6 by a relatively recent un- equal homologous crossing over or sister chromatid exchange (SCE) in the germline with mispairing at the TAAA repeated tracts. Another explanation would again be slippage muta- tions that must also have taken place recently, because in vitro studies have shown that this mechanism would accumu- late the allele with the fewest repeats (Levison and Gutman 1987), which occurred at a frequency of only about 3.2% (GC-I8*6) in our intron 8 Alu sequences. Thus, we prefer the first mechanism of unequal crossover either between chro- mosome homologues or sister chromatids that - starting with allele GC-I8*8 - could have symmetrically produced the al- leles GC-I8*10 and GC-I8*6. We think that it is very likely that this AluVpA will reach a higher degree of polymorphism in the course of time. The rare allele GC-I8* 11 is considered as an indication of this development. The occurrence of the subcloning product GC-I8*9 (Fig. 2), which may be the re- sult of an unequal SCE in the white blood cells of the blood donor as well as the consequence of a slippage event during PCR or bacterial amplification, shows the mutational ten- dency towards a variable number of repeat units. The 3" end of an Alu repeat within the flanking sequence of the ADA gene for example also shows a variable number of TAAA re- peated units, but with a higher degree of polymorphism. Seven alleles have been identified, of which the shortest were the most frequent, and that differed from each other by one unit (Economou et al. 1990). In spite of its stable inheritance our described AluVpA with an observed heterozygosity of 26.0% is not very informative for relationship and linkage studies. Nevertheless, 2,527 bp upstream from the polymor- phic coding regions for amino acids 416 and 420, which re- sult in the different DBP plasma protein phenotypes GC 1F, GC 1S, and GC 2, another genetic marker exists in an intron of the same gene, which must have formed prior to the exon polymorphism: while all apes show the plasma protein phe- notype GC 1 the presence of this AluVpA with a variable number of repeat units in the genome of the great apes and old world monkeys could be proved by PCR amplification and sequencing (Bichlmaier et al., in preparation).

Acknowledgements. This study was supported by a grant from the Deutsche Forschungsgemeinschaft, Bad Godesberg, FRG, (C1 27/14-2) for which we are grateful. We thank G. Honold for the excellent prepa- rations of the various oligonucleotides, to Ms. A. Brandhofer for her technical assistance, and to Mrs. Berndt from the Bavarian Red Cross Blood Bank at Munich for her help with the collection of the blood sam- ples.

Page 7: Molecular evaluation of an Alu repeat including a polymorphic variable poly(dA) (AluVpA) in the vitamin D binding protein (DBP) gene

532

References

Bains W (1986) The multiple origins of human Alu sequences. J Mol Evol 23:189 199

Batzer MA, Deininger PL (1991) A human-specific subfamily of Alu Sequences. Genomics 9:481-487

Batzer MA, Vandana AG, Mena JC, Foltz DW, Herrera R J, Deininger PL (1991) Amplification dynamics of human-specific (HS) Alu fam- ily members. Nucleic Acids Res 19:3619-3623

Bird AD (198/)) DNA methylation and frequency of CpG in animal DNA. Nucleic Acids Res 8:1499 1504

Braun A, Brandhofer A, Cleve H (1990) Interaction of the vitamin D- binding protein (group-specific component) and its ligand 25-hy- droxy-vitamin D~: binding differences of the various genetic types disclosed by isoelectric focusing. Electrophoresis 11:478-483

Braun A, Bichlmaier R, Cleve H ( 1991 ) Molecular analysis of the gene for the human vitamin D binding protein (group-specific compo- nent): allelic differences of the common genetic GC types. Hum Genet 89 : 401 406

Britten RJ, Baron WF, Stout DB, Davidson EH (1988) Sources and evo lution of human Alu repeated sequences. Proc Natl Acad Sci USA 85 : 4770-4774

Britten R J, Stout DB, Davidson EH (1989) The current source of human Alu retroposuns is a conserved gene shared with the old world mon- key. Proc Natl Acad Sci USA 86 : 3718 3722

Ceppellini R, Siniscalco M, Smith CAB (1955) The estimation of gene frequencies in a random mating population. Ann Hum Genet 20: 97-115

Cleve H, Constans J (1988) The mutants of the vitamin D binding pro- tein: more than 120 variants of the Gc/DBP system. Vox Sang 54: 215 225

Constans J, Oksman F, Viau M (1981) Binding of the apo and holo forms of the serum vitamin D-binding protein to human lymphocyte cytoplasm and membrane by indirect immunofluorescence, hnmunol Lett3:159 162

Cooke NE, David EV (1985) Serum D-binding protein is a third mem- ber of the albumin and alpha fetoprotein gene family. J Clin Invest 76 : 2420-2424

Cooke NE, Willard HF, David EV, George DL (1986) Direct regional assignment of the gene for vitamin D binding protein (Gc-globulin) to human chromosome 4ql I-q13 and identification of an associated DNA polymorphism. Hum Genet 73 : 225 229

Daiger SP, Schanfield MS, Cavalli-Sforza LL (1975) Human group- specific component (Go) proteins bind vitamin D and 25-hydroxy vi- tamin D. Proc Natl Acad Sci USA 72:2076 2080

Deininger PL (1989) SINEs, short interspersed DNA elements in higher eucaryotes. In: Howe M, Berg D (eds) Mobile DNA. ASM Press, Washington, DC, pp 619 636

Deininger PL, Jolly DJ, Rubin CM, Friedmann T, Schmid CW (1981) Base sequence studies of 300 nucleotide renatured repeated human DNA clones. J Mol Biol 151 : 17-33

Economou EP, Bergen AW, Warren AC, Antonarakis SE (1990) The polydeoxyadenylate tract of Alu repetitive elements is polymorphic in the human genome. Proc Natl Acad Sci USA 87:2951 2954

Economou-Panchnis A, Tsichlis PN (1985) Insertion of an Alu SINE in the human homologue of the Mlvi-2 locus. Nucleic Acids Res 13: 8379-8387

Efstratiadis A, Posokony JW, Maniatis T, Lawn RM, O'Connell C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, Blechl AE, Smithies O, Baralle FE, Shoulders CC, Proudfoot NJ (1980) The structure and evolution of the human [3-globulin gene family. Cell 21:653-668

Fresco JR, Alberts BM (1960) The association of noncomplementary bases in helical polyribonucleotides and deoxyribonucleic acids. Proc Natl Acad Sci USA 46:311-321

Friezner-Degen S J, Rajput B, Reich E (1986) The human tissue plas- minogen activator gene. J Biol Chem 261:6972-6985

Haddad JG, Walgate J (1976) 25-hydroxy vitamin D transport in human plasma: isolation and partial characterization of caleifidiol binding protein. J Biol Chem 251:4803-4809

Hirschfeld J (1959) hnmune-electrophoretic demonstration of qualita- tive differences in human sera and their relation to the haptoglobins. Acta Pathol Microbiol Scand 47:160-168

Jurka J, Smith T (1988) A fundamental division in the Alu family of re- peated sequences. Proc Natl Acad Sci USA 85:4775-4778

Kariya Y, Kato K, Hayashizaki Y, Himeno Y, Tarui S, Matsubara K (1987) Revision of consensus sequence of human Alu repeats a re view. Gene 53:1 l0

Kew RR, Webster RO (1988) Gc-globulin (vitamin D-binding protein) enhances the neutrophil chemotactic activity of C5a and C5a des Arg. J Clin Invest 82:364-369

Labuda D, Striker G (1989) Sequence conservation in Alu evolution. Nucleic Acids Res 17:2477 2491

Levison G, Gutman GA (I 987) High frequencies of short frameshifts in poly-CATG tandem repeats borne by bacteriophage M 13 in E. coli K-12. Nucleic Acids Res 15:5323-5338

Matera GA, Hellmann U, Hintz MF, Schmid CW (1990) Alu repeats re- sult from multiple source genes. Nucleic Acids Res 18 : 6019 6023

Miyamoto MM, Slightan JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the pseudo-~l- globin region. Science 238:369-373

Miiller B, Clerget-Darpoux F ( 1991 ) A test based on the exact probabil ity distribution uf the chi-square statistic-incorporation into the MASC method. Ann Hum Genet 55 : 69 75

Ott J (1985) A chi-square test to distinguish allelic association from other causes of phenotypic association between two loci. Genet Epi- demiol 2:79 84

Perez HD, Kelly E, Chenoweth D, Elfman F (1988) Identification of the C5a des Arg cochemotaxin. Homology with vitamin D-binding pro- tein (group-specific component globulin). J Clin Invest 82:36/) 363

Petrini M, Galbraith RM, Emersun DL, Nel AE, Arnaud P (1985) Struc- tural studies of T lymphocyte Fc receptors. Association of Gc protein with lgG binding to FC~. J Biol Chem 260:1804 1810

Quentin Y (1988) The Alu family developed through successive waves of fixation closely connected with primate lineage history. J Mol Evul 27:194-202

Ray K, Wang X, Zhao M, Cooke NE (1991) The rat vitamin D binding protein (Gc-globulin) gene. Structural analysis, functional and evolu- tionary correlations. J Biol Chem 266:6221 6229

Rogers J (1983) Retroposons defined. Nature 301:460 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-

termination inhibitors. Proc Natl Acad Sci USA 74:5463-5467 Schmid CW, Jelinek WR (1982) The Alu family of dispersed repetitive

sequences. Science 216:1065-1070 Schmid CW, Shen CKJ (1986) The evolution of interspersed repetitive

DNA sequences in mammals and other vertebrates. In: Maclntyre RJ (ed) Molecular evolutionary genetics. Plenum, New York, pp 323-358

Ullu E, Tschudi C (1984) Alu sequences are processed 7 SL RNA genes. Nature 312:171 172

Ullu E, Murphy S, Melli M (1982) Human 7S RNA consists of a 140 nucleotide middle repetitive sequence inserted in an Alu sequence. Cell 29:195 202

Van Baelen H, Bouillon R, De Moor P (1980) Vitamin D-binding pro- rein (Gc-globulin) binds actin. J Biol Chem 255 : 2270 2272

Van Baelen H, Allewaert K, Bouillon R (1988) New aspects of the plasma carrier protein for 25-hydroxy-cholecalciferol in vertebrates. Ann NY Acad Sci 538:60-68

Yang F, Brune JL, Naylor SL, Cupples RL, Naberhaus KH, Bowman BH (I985) Human group-specific component (Gc) is a member of the albumin family. Proc Natl Acad Sci USA 82:7994-7998