THE JOURNAL OF BIOLOGICAL Vol. 258. No. 24, Issue pp ... · 24, Issue of December 25, pp....

11
THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 258. No. 24, Issue of December 25, pp. 15245-15254, 1983 Printed in U.S.A. Boundaries of Gene Conversion within the Duplicated Human @-Globin Genes CONCERTED EVOLUTION BY SEGMENTAL RECOMBINATION* (Received for publication, February 7, 1983) Alan M. Michelsont: and Stuart H. Orkin5 From the Division of Hematology-Oncology, Children’s Hospital Medical Center and the Dana-Farber Cancer Institute, Department of Pediatrics and Committee on Cell and Developmental Biology, Haruard Medical School, Boston, Massachusetts 021 15 Thehumanadult a-globin genes, a1 and a2, are embedded in homologous duplicationunits,each of which spans approximately 4 kilobase pairs of chro- mosomal DNA. Previous studies established that the 3‘-ends of the duplication units are located adjacent to the polyadenylation sites of the two genes. We have now determined the 5”boundary of the homology whichincludesboth the structural genes andtheir upstream sequences. The 5”flanking regions of a1 and a2 are perfectly homologous for 868 base pairs, with the exception of two single nucleotide differences. This is in contrast to the considerable divergence of the 3‘- ends of these loci. Since the a-genes undergo concerted evolution by homologous unequal crossing over and/or gene conversion, the presence of adjacent regions with different degrees of homology indicates that this proc- essis segmental. Furthermore, we have determined that an a-thal-2 gene, a variant a-globin allele result- ing from unequal crossing over between normal a1 and a2 genes, has a mosaic arrangement of parental se- quences. This patchwork structure may have arisen from a single recombination event which was limited in both the 5’ and 3’ directions by flanking non-ho- mologies and in which mismatch repair occurred in a heteroduplex intermediate. Unequal crossing over and gene conversion of this type may effect the segmental concerted evolution of the human a-globin locus. Re- striction mapping of additional a-thal-2 genes and of the reciprocal triplicated a-gene complex was consist- ent with this hypothesis. The genes encoding the a-like chains of human hemoglo- bins form a small multigene family on the short arm of chromosome 16 (1-6). The a-gene cluster spans approxi- mately 30 kb’ and includes a single functional embryonic locus ({), two functional adult genes (a1 and a2), and two * This work was supported by grants from the National Institutes of Health and from the National Foundation-March of Dimes. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisernent” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. $ Supported by fellowships from the Insurance Medical Scientist Scholarship Fund (through the generosity of the North American Reassurance Co.) and from the O’Brien Foundation. § Recipient of a Research Career Development Award from the National Heart, Lung, and Blood Institute. To whom correspondence should be addressed at: Division of Hematology-Oncology, Children’s Hospital, 300 Longwood Ave., Boston, MA 02115. The abbreviations used are: kb, kilobase pairs; bp, base pairs. pseudogenes (${ and $al), arranged in the same transcrip- tional orientation in the order, 5’-{-1){-+al-a2-al-3’ (3,7,8). Both adult a-loci are expressed (9) and direct the synthesis of identical polypeptides (10). TheDNA sequences of the a1 and a2 genes confirmthese conclusions (11-13) and have permitted quantitation of a l - and a2-specific RNA tran- scripts in human erythroid cells (14, 15). In contrast to the amino acid sequence identity of the a- chains encoded by the nonallelic a-loci within a species, significant divergence occurs in a-globin peptide sequences between species, including closely related primates (16). The maintenance of such sequence homology among nonallelic members of a multigene family within a single species has been termed coincidental evolution (17) and, more recently, concerted evolution (18). Several mechanisms have been pro- posed to account for this phenomenon, including gene con- version and unequal crossing over (17-24). It is believed that both of these processes involve the exchange of DNA strands between homologous parental molecules (25, 26), underscor- ing the importance of sequence homology in mediating con- certed evolution. In this context, it is of interest that electron microscopic heteroduplex analysis of cloned fragments from the human a-globin cluster revealed that the adult genes, each of which spans only 850 bp, are embedded in homologous duplication unitsof approximately 4 kb (3). This large stretch of sequence homology may mediate unequal crossing over and gene conversion, the repeated occurrence of which would maintain the evolutionary homogeneity of DNA sequences which otherwise would diverge in the absence of selective pressures (17-24). The identification of individuals possessing one (27-29) or three (30-33) adult a-genes on a single chromosome instead of the normal two genes provides strong genetic evidence for the occurrence of‘ unequalcrossing over in the human a- globin complex. The alignment of the restriction maps of the a-loci residing on the one-, two-, and three-gene chromosomes reinforces this hypothesis. Additional evidence that sequence homology in the a-cluster promotes unequal recombination is the production of deletions, which are indistinguishable from those found in the human population, upon propagation of the cloned @-gene region in Escherichia coli (3). Although the duplicated human a-globin genes encode iden- tical polypeptides, we (11) and others (12, 13) have recently demonstrated that the a1 and a2 genes are not identical at the DNA sequence level. Whereas the 5”untranslated regions, the three coding blocks, all of the first and the 5’ four-fifths of the second intervening sequences (IVSl and IVS2) are highly homologous, the 3’-endshave markedly diverged. This finding must be reconciled by any mechanism purported to 15245 by guest on January 25, 2019 http://www.jbc.org/ Downloaded from

Transcript of THE JOURNAL OF BIOLOGICAL Vol. 258. No. 24, Issue pp ... · 24, Issue of December 25, pp....

THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 258. No. 24, Issue of December 25, pp. 15245-15254, 1983 Printed in U.S.A.

Boundaries of Gene Conversion within the Duplicated Human @-Globin Genes CONCERTED EVOLUTION BY SEGMENTAL RECOMBINATION*

(Received for publication, February 7, 1983)

Alan M. Michelsont: and Stuart H. Orkin5 From the Division of Hematology-Oncology, Children’s Hospital Medical Center and the Dana-Farber Cancer Institute, Department of Pediatrics and Committee on Cell and Developmental Biology, Haruard Medical School, Boston, Massachusetts 021 15

The human adult a-globin genes, a1 and a2, are embedded in homologous duplication units, each of which spans approximately 4 kilobase pairs of chro- mosomal DNA. Previous studies established that the 3‘-ends of the duplication units are located adjacent to the polyadenylation sites of the two genes. We have now determined the 5”boundary of the homology which includes both the structural genes and their upstream sequences. The 5”flanking regions of a1 and a2 are perfectly homologous for 868 base pairs, with the exception of two single nucleotide differences. This is in contrast to the considerable divergence of the 3‘- ends of these loci. Since the a-genes undergo concerted evolution by homologous unequal crossing over and/or gene conversion, the presence of adjacent regions with different degrees of homology indicates that this proc- ess is segmental. Furthermore, we have determined that an a-thal-2 gene, a variant a-globin allele result- ing from unequal crossing over between normal a1 and a2 genes, has a mosaic arrangement of parental se- quences. This patchwork structure may have arisen from a single recombination event which was limited in both the 5’ and 3’ directions by flanking non-ho- mologies and in which mismatch repair occurred in a heteroduplex intermediate. Unequal crossing over and gene conversion of this type may effect the segmental concerted evolution of the human a-globin locus. Re- striction mapping of additional a-thal-2 genes and of the reciprocal triplicated a-gene complex was consist- ent with this hypothesis.

The genes encoding the a-like chains of human hemoglo- bins form a small multigene family on the short arm of chromosome 16 (1-6). The a-gene cluster spans approxi- mately 30 kb’ and includes a single functional embryonic locus ({), two functional adult genes (a1 and a2), and two

* This work was supported by grants from the National Institutes of Health and from the National Foundation-March of Dimes. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisernent” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

$ Supported by fellowships from the Insurance Medical Scientist Scholarship Fund (through the generosity of the North American Reassurance Co.) and from the O’Brien Foundation.

§ Recipient of a Research Career Development Award from the National Heart, Lung, and Blood Institute. To whom correspondence should be addressed at: Division of Hematology-Oncology, Children’s Hospital, 300 Longwood Ave., Boston, MA 02115.

The abbreviations used are: kb, kilobase pairs; bp, base pairs.

pseudogenes (${ and $al ) , arranged in the same transcrip- tional orientation in the order, 5’-{-1){-+al-a2-al-3’ (3,7,8). Both adult a-loci are expressed (9) and direct the synthesis of identical polypeptides (10). The DNA sequences of the a1 and a2 genes confirm these conclusions (11-13) and have permitted quantitation of a l - and a2-specific RNA tran- scripts in human erythroid cells (14, 15).

In contrast to the amino acid sequence identity of the a- chains encoded by the nonallelic a-loci within a species, significant divergence occurs in a-globin peptide sequences between species, including closely related primates (16). The maintenance of such sequence homology among nonallelic members of a multigene family within a single species has been termed coincidental evolution (17) and, more recently, concerted evolution (18). Several mechanisms have been pro- posed to account for this phenomenon, including gene con- version and unequal crossing over (17-24). It is believed that both of these processes involve the exchange of DNA strands between homologous parental molecules (25, 26), underscor- ing the importance of sequence homology in mediating con- certed evolution. In this context, it is of interest that electron microscopic heteroduplex analysis of cloned fragments from the human a-globin cluster revealed that the adult genes, each of which spans only 850 bp, are embedded in homologous duplication units of approximately 4 kb (3). This large stretch of sequence homology may mediate unequal crossing over and gene conversion, the repeated occurrence of which would maintain the evolutionary homogeneity of DNA sequences which otherwise would diverge in the absence of selective pressures (17-24).

The identification of individuals possessing one (27-29) or three (30-33) adult a-genes on a single chromosome instead of the normal two genes provides strong genetic evidence for the occurrence of‘ unequal crossing over in the human a- globin complex. The alignment of the restriction maps of the a-loci residing on the one-, two-, and three-gene chromosomes reinforces this hypothesis. Additional evidence that sequence homology in the a-cluster promotes unequal recombination is the production of deletions, which are indistinguishable from those found in the human population, upon propagation of the cloned @-gene region in Escherichia coli (3).

Although the duplicated human a-globin genes encode iden- tical polypeptides, we (11) and others (12, 13) have recently demonstrated that the a1 and a2 genes are not identical at the DNA sequence level. Whereas the 5”untranslated regions, the three coding blocks, all of the first and the 5’ four-fifths of the second intervening sequences (IVSl and IVS2) are highly homologous, the 3’-ends have markedly diverged. This finding must be reconciled by any mechanism purported to

15245

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

Human a-Globin Gene Conversion Units

effect the concerted evolution of the a-globin genes. The available DNA sequences only define the 3”extent of

the tu-gene homology. To precisely identify the 5’-homology boundary of the a-genes, we have sequenced approximately 900 bp upstream of both a1 and a2. In addition, we have cloned and sequenced a naturally occurring product of recom- bination between the normal a-genes in an attempt to define the molecular mechanisms which underlie this process. Con- sideration of these results allows us to propose an evolutionary model to account for the differing degrees of sequence conser- vation within adjacent segments of the a-globin genes.

EXPERIMENTAL PROCEDURES

Normal a-Globin Gene Clones-The normal a1 and a2 genes whose 5”flanking sequences we determined were originally cloned by J. Lauer and T . Maniatis (Harvard University) and were supplied to us as plasmid subclones (3). The regions included in these clones have been described (3, 11). These plasmid subclones were constructed from the same bateriophage recombinant and therefore the a-genes which were examined derive from the same chromosome of a single individual.

Cloning ofa Rightward Deletion a-Thal-2 Gene-Lymphocyte DNA from a Chinese patient with classical hemoglobin H disease a-thal- assemia (genotype a37-/--) was digested to completion with EcoRI and EamHI. DNA fragments of about 10 kb were purified by sucrose gradient centrifugation and were ligated to EcoRI-BamHI phage arms of the bacteriophage cloning vector, Charon 30 (34). The subsequent cloning and screening procedures were as previously described (35). The 5’-portion of an a-specific phage clone was then subcloned as a 1.6-kb PuuII-Hind111 fragment in pBR322.

DNA Sequence Analysis-DNA restriction site end-labeling with [r-’v2P]ATP and polynucleot,ide kinase, fragment isolation, and chem-

and Gilbert (36). Blunt S n a I termini were labeled with [LU-”P]~CTP ical sequencing reactions were carried out as described by Maxam

and T4 DNA polymerase (37), while [~~-”~P]cordycepin triphosphate and terminal transferase were used to label PstI ends, as described (38). The strategy used to sequence the 5”flanking regions of the normal r u l and tu2 genes is shown in Fig. 1. The -733 and -634

the SmaI site a t position -700. The IVS2 and 3”untranslated se- nucleotides were sequenced in the a-thal-2 plasmid subclone from

quences ofthe tu-thal-2 gene were determined from intragenic Hind111 and DdeI sites, respectively, using appropriate restriction fragments isolated directly from the recombinant phage DNA.

Southern Blot Hybridization-Human genomic and a-globin phage DNAs were digested to completion with ApaI (Boehringer Mann- heim), subjected to electrophoresis through horizontal agarose slab gels, and transferred to nitrocellulose filters by the method of South- ern (39). Filter-bound a-globin restriction fragments were subse- quently detected by hybridization with either the 1.5-kb PstI-PstI or 1.0-kb PstI-Hind111 probes illustrated in Figs. 4 and 6, respectively. These probes were isolated from polyacrylamide or low melting point

BsiNl BstNI BstNl BstNl Hlnf I Ava II I I 1 1 u +

T T 1 1 T T T SmaI/ Psi I Hlnf I Sma I/ Small Small Hlnf I Xma I Xmal Xmal Xma I

Xma I (5’) ” Sma I ( 3 ) - PSt I ( 3 ) P

- Hlnf I (5’) -2

Ava I1 ( 5 ) CI

Est I (5’) -I FIG. 1. Strategy for sequencing of the 5’-flanking regions

of normal a l - and a2-globin genes. Appropriate restriction frag- ments end-labeled at the indicated sites were sequenced by the method of Maxam and Gilbert (36). 0, restriction sites labeled at their 5’-ends using polynucleotide kinase and [r-”PIATP. 0, restric- tion sites labeled at their 3’-ends using terminal transferase and [a- “‘P]cordycepin triphosphate (PstI) or [a-”PIdCTP and T4 DNA polymerase (SmaI). All of the indicated sites are present in the 5’- flanking regions of‘ both a1 and a2 genes. The structural a-gene sequences are indicated by the box in the upper right of the diagram.

agarose gels following digestion of the normal a1 and tu2 plasmid subclones (see above) with the appropriate restriction enzymes and were labeled by nick translation in the presence of [ c ~ - ~ * P ] ~ C T P (40).

RESULTS

Sequence Comparison of Normal a1 - and a2-Globin Genes- Previous electron microscopic heteroduplex analysis of cloned human a l - a n d a2-globin genes revealed the presence of extensive homology both within and 5’ to the structural gene sequences (3). This was referred to as the Z-homology. To precisely determine the upstream boundary of Z, we sequenced approximately 900 bp of the 5”flanking regions of both a1 and a2 according to the strategy depicted in Fig. 1. Compari- son of these flanking sequences reveals that they are almost identical (Fig. 2). No gaps are required to align the proximal 868 bp of the a1 and a2 genes and only two single nucleotide differences are found in these sequences: position -634 rela- tive to the cap sites is A in a1 and G in 1x2, and position -733 is C in a1 and T in a2. Further 5 ’ , a 2-bp gap must be introduced in a1 (at position -869 of a2) to maintain the sequence alignment. In contrast to the strong conservation in the proximal 5’-flanking sequences, the portions distal to the short gap abruptly diverge due to a 224-bp insertion/deletion difference between a1 and a2.’ These results are in agreement with the previous heteroduplex analysis of these regions (3).

Consideration of the a1 and a2 structural gene sequences (11-13; Fig. 2) permits a complete description of the Z- homology at the nucleotide level. The two genes are identical in their 5”untranslated regions, IVS1, and all three coding blocks. The 5’ four-fifths of IVS2 are also the same in the two genes except for a single base difference: position 55 is G in a1 but T in a2. In contrast, the 3’-ends of the genes have markedly diverged. The 3”untranslated regions differ by 19 of 113 nucleotides, a total of 17% divergence (11,13). The 3‘- portions of IVS2 also have several differences, the most nota- ble being the absence from a2 of 7 contiguous base pairs cor- responding to nucleotides 115 to 121 of al’s IVS2. In addi- tion, a C is found in a1 a t IVS2 position 126 while this nucle- otide is G in a2. Finally, a short region of homology adjacent to the polyadenylation sites terminates in the sequence CCTG(TG):,CCTG, which is also located at the +al/a2 boundary (7). This finding led to the suggestion that this short repeated sequence represents the ends of the a-dupli- cation units (7).

In summary, there is a 1436-bp sequence, extending from nucleotide 868 upstream of the a1 and a2 cap sites to nucleo- tide 114 of their large introns, in which the nonallelic a-genes have continuous and uninterrupted homology with the excep- tion of three single base differences (Fig. 3). That is, the limited (0.2%) divergence of these coding and noncoding regions is solely due to point mutations; no insertions or deletions are present, as is characteristic of the noncoding portions of other duplicated genes (41). Each end of this homology block is flanked by a short gap in one gene relative to the other and beyond these points the two seuquences are considerably more divergent. In particular, the segments be- tween the 7-bp gap in IVS2 and the repeated sequence mark- ing the ends of the duplication units contain 20 variant nucleotides representing 7.2% divergence.

A Recombinant a-Globin Gene Is a Mosaic of Normal a1 and a2 Sequences-The extensive sequence homology in and around the human a-globin genes represents a large target for homologous recombination. Crossing over in the Z-ho- mologies of unequally paired a1 and a2 genes generates one

Hess, J. F., Fox, M., Schmid, C., and Shen, C. J. (1983) Proc. Natl. Acad. Sci. I / . S. A. 80, 5970-5974.

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

Human a-Globin Gene Conversion Units 15247

0 1

0 2 '

oll.

a2

01:

m 2 :

oil-

02

0 1 :

(12 :

0 1

0 2 .

a t :

(12 :

01:

a2 :

01:

0 2 :

m1:

0 2 :

al: a2 :

01:

0 2 :

01:

02 :

01

a2

01

a2

a1

02

0 1

a2

ill

0 2

01

m2

a1 :

a2 :

a1:

02 :

cI1:

a2:

Lll:

0 2 :

0 1 :

02 :

i l l

012 :

G C T C C A C C C G G T T r C A G C T A T n ; C T T T C m A C C T G ~ . . ~ r C A G T A ~ A C C T A G C * * G ~ T T C C A T C ~ ~ A T

.TC.~I.. ..cTA......,GC.. A. G ........ T. ... AC ..................................

A G C A ~ C G * G A G C T G G G C C T G T C A C A G T G A A C C A C G P I T

...........................................................................

GAGTCCATCACTTGGGCCTTAGCCAGCACCCACCACCCCACGCGCCACCCCAC**CCCCGGGTAGACGAGTCTGA

"""""""""""""~""."----------""....... .................

ATCCX;GACCCGCCCCCAGCCCAT.CCCCGT~CTTTTTGCGTCCTGGCX;TTTATTCCTTCCCGGT~CTGTCAC~A

.................................................. G ........................

AGLACACTAGTGACTATLGCCAGA(;GGIUU\GGGAG~TGC~GC~GCGAGGCTG~GAGCAGGGGGGCTC~CGC

...........................................................................

AGIUU\TTCTmT.ACTTCCTATGGi;CCAGGG~GTC~GG~TGr~~GCATTCCTCTCCGCCCCAGGA~GGGCG~G

...........................................................................

CCCTCCGGCTCGCACTCGCTCGCCGn;TGTTCC~CGATCCCGCTGGAGTCGATGCGCGTCCAGCGCGTGCCAGG

G C T C C C G G G T G C A C G A G C C G C A G C G C C C G C C C C A A C G G C C G C C C T G C C C G G

...........................................................................

G C ~ C C G G G T G ~ ~ ~ C U ; G A G T G G A G T G C C C G G T G G A G G G T C A C C C C C

sPheAspLeuSerH~sGlySer/ilaClnvailysC,lyH~sClyLysLysVaIAlaArpALaLeu7hrl\.nAl~V~ C ~ C A C C T G A G C C A C G G C T C ~ C C C A G T . T T I U \ ( ; ( ; G

...........................................................................

FIG. 2. Sequence comparison of the Z-homology of the ( ~ 1 - and a2-globin genes. The complete DNA sequence of a normal a l - globin gene, from nucleotide 907 upstream of the cap site (cup) to nucleotide 55 downstream of the polyadenylation site (poly A ) , is shown in the upper line. The lower line represents the corresponding sequence of an a2 gene with only the variant nucleotides indicated; a dash signifies identity between a1 and a2. Asterisks designate gaps

chromosome with three a-genes and a second chromosome with one a-gene (27-31). The latter recombinant allele is referred to as the rightward deletion a-thalassemia-2 (a-thal- 2) gene. Given the a l - and a2-specific sequence markers described above, we reasoned that the fine structure of such an a-thal-2 allele might reveal some of the molecular mecha- nisms underlying the recombination event which led to its creation. Furthermore, since homologous but unequal recom- bination is believed to have occurred during the evolution of the normal a-loci (3, 18), the structure of the a-thal-2 gene might provide insight into the origin of the contemporary a- complex.

We therefore cloned a rightward deletion a-thal-2 gene and sequenced the recombinant in the vicinity of all the a1 and a2 markers. The results of this analysis are summarized in Fig. 3. Rather than having a discretely polar arrangement of a1 and a2 markers, the or-thal-'2 gene is an unexpected mosaic of normal a-sequences. The CY-thal-2 gene contains the AC dinucleotide found only in the a2 sequence at position -8691 -870, as well as an adjacent a2-specific Y block,' indicating that the most distal 5"flanking DNA is derived from a2. The more proximal upstream sequences are alternatively a l - and a2-specific since position -733 of the a-thal-2 gene corre- sponds to the al nucleotide and position -634 is equivalent to that of a2 (Fig. 3). In contrast to this patchwork organi- zation of the 5"flanking region, the 3'-end of the a-thal-2 allele is identical with a normal al-gene. That is, both the a- thal-2 IVS2 and 3"untranslated region share homology with 01, only. No additional sequence variations from normal a- sequences were present in the portions of the a-thal-2 gene that were examined.

Independent a-Thal-2 Genes Have a1 -specific IVS2 and 3'- Untranslated Sequences-Sequence analysis of a single cloned a-thal-2 gene revealed that its 3'-end is identical with a normal a1 gene. Since it is of interest to determine whether additional independent or-thal-2 genes have a common IVS2 and 3"untranslated region structure, we developed a rapid screening procedure for examining this possibility. The strat- egy employed is based on the placement of ApaI restriction sites both within and surrounding the normal a-globin genes. As indicated in Fig. 4 A , ApaI cleaves a short distance upstream of a2 and downstream of CUI, once between a2 and a l , once within the 3"untranslated region of a2, only, and once within IVS2 of a l , only. These sites were identified by inspection of the available DNA sequences of these regions' (Fig. 2) and were confirmed by Southern blotting (Fig. 4B). Thus, ApaI recognizes multiple sites of sequence divergence between a1 and a2.

Given this ApaI restriction map of the normal a-globin

in one sequence relative to the other and have been introduced to maximize sequence homology. The alignment of the most distal 5'- flanking sequences is confirmed by the additional data of Shen and co-workers.' The sequences shown here encompass the entire Z- homology which was defined by heteroduplex analysis (3) and extend into the Y-homology of a2 and the nonhomologous segment between Y and Z of nl (Fig. 7). The translational initiation and termination codons are underlined and the intervening sequences are in lower case letters. The 5"flanking sequences were determined in the present study, while the remaining sequences were compiled from previous work (7, 11-13) and are included here for comparative purposes (see text for details). We note one correction of our previously published a1 sequence (11): nucleotide 110 of IVS2 should be G, not A. In addition, our tu2 sequence differs from that of Liebhaber et al. (12, 13) a t three IVSP positions in the 45 bp closest to the acceptor splice junction. Our assignments for a2 in this region are based on analysis of the a2 gene which resides on the same chromosome as our se- quenced tu1 gene (3) and on an tu2 gene from a patient with t ~ -

thalassemia (42).

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

15248 Human tr-Globin Gene Conversion Units

5 U T E l I1 E2 12 E3 3'-UT o(

I I I I I I I I I .

E-Thal-2 GGCCCTCGGCC

FIG. 3. Mosaic sequence of a rightward deletion a-thal-2 gene. At the top of the diagram is a typical (Y-

glohin gene wih its various structural features indicated: E , exon; I , intron; 5'-UT, 5"untranslated region; :I'-LJT, 3"untranslated region. The boxed arrou',s below the gene represent the long (Z1.c) and short (&) conversion units of t u 1 and (r2, as defined in the text as adjacent regions of uninterrupted sequence homology flanked hy nonhomologous segments. Each of the variant nucleotides between nl and n2 is shown in the lower part of the diagram. T h e negatiue and positiue numbers indicate the distance in base pairs upstream and downstream from the cap site, respectively. Asterisks designate sequence gaps, as in Fig. 2. The 3'-untranslated region divergence is not shown here in detail. IVS2 positions 5 6 9 7 9 are included to emphasize the 7-hp gap in a (r2 relative to c r l and the presence of a short direct repeat which may have mediated the deletion of these 7 bp from n2. The nucleotide assignment of each t r l - and tr2- specific marker is given in the lower line for the tu-thal-2 gene. The arrangement of parental sequences in the cr-thal-2 gene is 5'-tr2-1rl-n2-nl-3'. The n-thal-2 sequence upstream of position -869 is entirely tr2-specific.

230- h

-0.89

F I ~ ; . 1. Mapping of ApaI sites in normal and a-thal-2 DNAs. .A. the locations o l Apnl restriction sites in and around the normal tr-

glohin genes are illustrated. All of these sites were identified in the availahle DNA sequences of these regions (Fig. ?),'except the site 3' t o 111. The latter site, as well as those in the vicinity of the tr-thal-2 variant. were deduced from the Southern hlot illustrated in H . The sizes o f the restriction fragments are given in kilobases. R. blot hyhridization o f Alpnl-digested phage clones containing duplicated N- glohin genes and an ,I-thal-2 allele. Phage clones containing either

genes and the sequence of the cloned tu-thal-2 gene, we antic- ipated that the latter would contain the normal 0.89-kb frag- ment diagnostic of the 3'-end of tu l , plus a new 2.5-kb frag- ment resulting from fusion of the upstream tu2 sequences to the Z-homology of tu1 (see "Discussion"). At the same time, the intergenic ApaI fragments should be deleted from the n- thal-2 gene. These predictions are substantiated by the blot shown in Fig. 4H. Alternative tu-thal-2 3'-ends can also be identified by this approach. For example, 2.7- and 0.69-kb fragments would replace the 2.5- and 0.89-kh bands if the C Y -

thal-2 IVS2 and :l'-untranslated region were derived from an tu2 rather than from an t u 1 gene.

Having established a simple method for determining the ?,'-structure of a n tu-thal-2 gene, we next screened a panel of human DNAs isolated from individuals of different ethnic groups, each of whom carries such a variant allele. These subjects all have the genot-ye n".'-/--, that is, they each have a single structural tu-locus on one chromosome (the rightward deletion tu-thal-2 gene), with both tu-loci deleted from its homologue. As shown in Fig. 5 , 16 unrelated individuals representing 8 different ethnic groups all have the 2.5- and 0.89-kb ApaI tu-globin fragments. Thus, they share the tu-thal- 2 gene structure that was previously identified by sequence analysis of a n tu-thal-2 allele cloned from a Chinese patient.

The Middle tu-Gene of the Triplicated n-Complex Has n2- spccific IVS2 and 9'-CJntranslated Sequences-The middle n- globin gene of the triplicated tu-complex and the tu-thal-2 allele should have precisely reciprocal structures, a conse- quence of unequal crossing over between parental t u 1 and tu2 genes. Rased on the above data for a series of n-thal-2 var- iants, the middle tr-gene should have an tul-specific distal 5 ' - flanking region and tu2-specific IVS2 and :3'-untranslated

duplicated ( 3 5 ) or (1-thal-2 glohin genes were digested with ApaI and subjected t o Southern hlot analysis using the wl- ( i i ) and cr2- (iii) specific P s t I fragments shown in A as hyhridization prohes. The c r l prohe hybridizes weakly with the 1.9-kh ApnI fragment derived from the :<'-end o f c 1 2 due to the sequence divergence hetween the two t r -

genes in this region. Similarly, the cr2 prohe gives a weak signal with the O.89-kh fragment from the :{'-end of t r l . ( i ) , ethidium hromide- stained gel of the Apol-digested tr-glohin phage clones. The k f f Ianc, is a Hind111 digest o f Xc1857 Sam7 DNA with the sizes of the marker fragments shown in kilohases.

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

Human 0-Globin Gene Conversion Units 15249

A 1 2 3 4 5 B 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3

~- ~- " ~~

2.7 - 2 5 -

1.7 -

089-

FIG. 5. Blot hybridization analysis of a series of a-thal-2 alleles using ApaI. DSAs from an individual with normal tv-globin genes as well as from a panel of patients of different ethnic origins each 01 whom carries an ti-thal-2 allele on one chromosome and no tu-genes on the other chromosome (genotype, tr"'-/--) were digested with Apal and suhjected to Southern hlot analysis using an trl-specific 1.5-kb P s t l restriction fragment as the hybridization probe. A : lane 1. normal; lanes 2 and .5, American Black; lane 3, Filipino; lane 3 . Czechoslovakian. H: Lanc 1. normal; lanes 2, 3 , 3. 5, Southeast Asian; lanes 6. 7, 8, C-ypriot; lane 9, Saudi Arabian; lanes IO. 11, 12. Jamaican; lane I.'], Sardinian. The sizes of the tu-specific restriction fragments are indicated at the left in kilobases.

27 - 2.5 -

1.9 - 17-

I 2 3 4 5

FIG. 6. Mapping of ApaI sites in the triplicated a-globin complex. A , the placement ofApn1 restriction sites in and around the three tu-glohin genes from a triplicated tu-complex is illustrated, assuming that the middle tu-gene has an trl-specific distal :,'-flanking region and cu2"specific IVS2 and 8"untranslated sequences. The middle gene is designated tu2 to indicate the origin of its distinctive structural gene sequences. The sizes of the restriction fragments are given in kilohases. H. hlot hybridization analysis of DNAs containing triplicated tr-globin genes using Apal. DNAs isolated from individuals with normal, cy-thal-2, and triplicated n-globin genes were digested with ApaI and suhjected to Southern hlot analysis using the 1.0-kh crl-specific I'stl-HindIII fragment shown in A as the hybridization probe. Lane 1, normal; lane 2, Southeast Asian ~ t h a l - 2 ; lanes 3. 4 , Jamaican (genotype cutrtr/trtr); lane 5 , Jamaican (genot-ype w r t r / t r - ) . The presence of the 1.9-kb band in each of the triplicated tu-complexes suhstantiates the restriction map shown in A.

sequences. As a result, the ApaI fragment encompassing the middle 0-gene should be 1.9 kb long (Fig. 6A). To test this possibility, we used a variation of the previously described ApaI blotting strategy in which the 1.5-kb MI-Ps t I 0-gene probe (Fig. 44) was replaced with a 1.0-kb probe that extends from a PstI site in the 5"flanking region of 01 to an intragenic Hind111 site (Fig. 6A). When a Southern blot ofApaI-digested DNA containing the triplicated tu-gene chromosome was hy- bridized with this probe, a 1.9-kb band was detected, in addition to the 2.7- and 1.7-kb fragments characteristic of the

normal tu-genes (Fig. 6 B ) . Since the probe lacks 3' a-se- quences, the new 1.9-kb ApaI fragment must extend from an tul-specific 5'-end of the middle a-gene to its tr2-specific IVS2 and 3"untranslated region; it cannot correspond to the simi- larly sized fragments from the 3'-ends of both the 5' and middle a-genes (Fig. 6A). Of the three individuals with tripli- cated tu-complexes who were examined, two have the (Y(Ya/(YN

genotype (Fig. 6B, lanes 3 and 4 ) , while the third must be t~tutu/tu- since an additional 2.5-kb band diagnostic of an a- thal-2 gene is present (Fig. 623, lane 5). In summary, the middle a-gene on a chromosome bearing three a-loci has an IVS2 and 3"noncoding region that are exactly reciprocal in sequence to those of the a-thal-2 allele.

DISCUSSION

Boundaries of Sequence Homology between C U I and 1u2 Genes-We have demonstrated that almost 900 bp of the proximal 5"flanking regions of the human tr-globin genes are highly homologous, whereas the sequences immediately up- stream of nucleotide -869 diverge significantly (Fig. 2). This information, combined with the sequences of the structural genes themselves (11-13), permits us to define regions of segmental homology in and around the tul- and cu2-globin genes. Since these homologies encompass sequences which are under no apparent constraint from divergence, they are presumed to result from specific rectification mechanisms which effect concerted evolution (17-24). These include gene conversion, unequal crossing over, or a combination of these two processes.

Three segments, each having a different degree of uninter- rupted sequence homology, can be distinguished in the ho- mologous region previously identified by electron microscopic heteroduplex analysis and referred to as Z. By uninterrupted, we mean maximal homology alignment without the introduc- tion of gaps. The first segment extends for 1436 bp from a dinucleotide gap at position -869 to a 7-bp gap at nucleotide 115 of IVS2 and contains 3 single nucleotide mismatches (0.2% divergence). The second spans the 229 bp between the IVS2 gap and a single base length difference at nucleotide 76 of the 3"untranslated regions, while the third continues for an additional 53 bp from the latter position to the presumptive ends o f the tu-duplication units (7). For simplicity of further

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

15250 Human @-Globin Gene Conversion Units

discussion, we will consider the second and third blocks as one continuous stretch of 283 bp containing 20 single nucleo- tide differences (7.1% divergence). However, the interpreta- tion which follows can be simply modified to accommodate the additional complexity.

The most striking features of the two homologous segments defined above are their differing extents of sequence homology (>99% as opposed to 93% for the regions 5' and 3' to IVS2, respectively) and their sharp demarcation by flanking non- homologies. If gene conversion and unequal crossing over serve to homogenize these sequences during the course of evolution, then these processes must be acting in a differential manner on the two adjacent regions. We therefore propose that there are two contiguous but independent conversion units encompassing the human a-globin loci, the boundaries of which are the dinucleotide gap at position -869 of the 5'- flanking regions, the 7-bp gap in IVS2, and the direct repeats at the 3'-ends of the a-duplication units. In keeping with the established nomenclature for the homology blocks of the a- genes, we refer to these as ZLc and Zsc for the long and short conversion units on the 5' and 3' sides of the IVS2 gap, respectively (Figs. 3 and 7). A specific model for the inde- pendent concerted evolution of ZLc and Zsc is outlined below, the details of which derive in part from the structure of a variant a-globin allele that is a product of unequal recombi- nation between normal a1 and a2 genes.

Molecular Basis for the Mosaic Structure of an a-Thal-2 Gene-The cloned a-thal-2 allele is composed of a mosaic of parental a1 and a2 sequences (Fig. 3). The 5'-end of this recombinant gene is alternatively a2-, a l - , and a2-specific, while its 3'-end (including IVS2 and the 3"untranslated region) is derived entirely from al . The al-specific identity of the 3'-noncoding sequence of the a-thal-2 gene was antic- ipated from earlier studies of a-globin mRNA obtained from individuals carrying this allele (14, 15). However, the present results establish that the maximum 3"extent of the cross- over which led to the creation of the a-thal-2 gene is the 7-bp gap in IVS2 of a2, as depicted by the hatched box in Fig. 7. I t should be noted that this site corresponds to our assignment of the 3'-border of the long conversion unit, ZLc, and that multiple a-thal-2 alleles from individuals of different ethnic origins have the same sequence in this region. Based on the size of the a-thal-2 deletion (27-29), its 5'-boundary must lie 3.7 kb (that is, the intergenic distance) upstream in the homologous position of a2. The deletion breakpoints cannot be defined more precisely due to the absence of additional a l -

-2 111

m d m d I ,I,,,2ScY F -;2;:

-. . 3

a-Thal-2

= w;.:, LC IC

FIG. 7. Boundaries of the a-thal-2 deletion. The duplicated normal a-globin genes, including their X, Y, ZLC, and ZSC homologies, are shown in the top part of the diagram. The lower portion indicates the structure of the a-thal-2 gene that is generated by unequal crossing over within the ZLC units of a2 and a1 (segments A and C , respectively). The resulting deletion, which is 3.7 kb long, includes either A + B or B + C (m). The maximum 3'-extent of the deletion is the 7-bp gap in IVS2 of a 2 (designated by the ''v" in a2 and the "A" in tul), while the maximum 5"extent of the crossover is the dinucleotide gap a t position -869 of a l . Thus, the a-thal-2 gene is rul-specific 3' to IVS2 and is a2-specific 5' to ZLC.

Q

Q

R =@c I 3 - - - - - - - - - -

Displacement Uptake Loop cleavage Assimilation Isomerization

I?'

Q

I Branch migration

I R

W

Q R I" - - - - - - - - - - - - - - - - - - - -

FIG. 8. Inhibition of branch migration by a sequence non- homology. Recombination between unequally paired a1 (QR) and a2 (Q'R') genes (Z) is presumed to occur initially by asymmetric strand exchange followed by symmetric heteroduplex formation (ZI; (30)). The resulting branch or joint between the interacting DNA molecules is free to migrate through regions of sequence homology. The arrow in ZZ indicates the direction of net movement. However, upon reaching a nonhomologous segment such as the 7-bp gap in IVS2 of a2 (indicated by the loop in a l ) , further movement of the branch is inhibited (ZZZ) and resolution of the recombination inter- mediate produces the structures depicted in ZV. Q'R corresponds to the a-thal-2 gene and QR' corresponds to the middle a-gene of the triplicated chromosome. The non-homology at the 5'-ends of ZLC would serve a similar function but has been omitted here for clarity. Although reciprocal exchange of the flanking arms is depicted, a similar model can be constructed in which only patches are exchanged with maintenance of the parental configuration of the flanking se- quences. The latter model would account for segmental gene conver- sion events which are limited in extent by sequence non-homologies and which do not involve expansion and contraction of gene number. Such a mechanism is applicable to the human y-globin genes (23) and may also occur in the human a-globin gene complex.

and a2-specific sequence markers within IVSl and the first two exons. Thus, the deletion spans regions A + B or E + C of Fig. 7, where A and C are the ZLC homologies of a2 and al, respectively. In either case, the a-thal-2 gene represents a fusion of the ZI,c portions of a1 and a2, although not neces- sarily a structural gene fusion (14).

Although three separate exchanges within ZLc could ac- count for the observed mosaic sequence arrangement of the cloned a-thal-2 gene, we propose that this novel structure was generated by a single recombination event between unequally paired a1 and a2 genes, a process which leads to the deletion of the intergenic region that was excluded from the original pairing (27-29). The molecular details of this proposal can be considered separately for the 5'- and 3'-ends of the a-thal-2 gene.

Fig. 8 is an adaptation of the Meselson-Radding model of genetic recombination (25) which incorporates the concept of

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

Human a-Globin Gene Conversion Units 15251

a barrier to branch migration as a mechanism to limit the extent of hybrid DNA formation; this in turn restricts marker exchange. This barrier or obligatory recombinational bound- ary is the non-homology at the 3’-end of IVS2 and consists of a 7-bp insertion/deletion difference between a1 and a2 (Figs. 2 and 3). As noted earlier, this region defines the maximum 3”extent of the a-thal-2 deletion, as well as the 3”boundary of Z,,(.. In Fig. 8, QR and Q’R‘ represent homol- ogously but unequally paired a1 and a2 genes, respectively, with the IVS2 non-homology denoted by the loop in al. Recombination is presumed to occur by homologous strand displacement and uptake, loop cleavage, strand assimilation and isomerization with reciprocal exchange of the flanking arms (Fig. 811; Ref. 25). The resulting heteroduplex joint is then free to migrate bidirectionally through regions of se- quence homology (42,43). However, this symmetric exchange cannot penetrate the IVS2 non-homology, resulting in the termination of branch migration at or 5’ to the point marked T, (Fig. 811Z), and the resolution of the recombination inter- mediate into the reciprocal products, QR‘ and Q‘R (Fig. 81V). The ty-thal-2 gene corresponds to Q’R and thus must include the additional 7 bp characteristics of al’s IVS2, as well as the al-specific sequences 3’ to this point. This prediction is satisfied by the structures of the a-thal-2 genes that we have examined. Furthermore, this model requires the reciprocal crossover product to have an IVSP and 3”untranslated se- quence characteristic of a parental a2 gene. This is indeed the case, as demonstrated by the ApaI mapping analysis of the triplicated a-globin genes (Fig. 6).

The dinucleotide gap at the 5‘-end of ZL(. could serve a similar function as that elaborated for the IVS2 non-homol- ogy. As predicted by our model, the cloned a-thal-2 gene contains these two nucleotides and the more distal a2-specific sequences. This finding reinforces our earlier definition of the 5”boundary of Z1,r.

Although the above model is presented as a reciprocal exchange of the flanking DNA sequences with the resulting expansion and contraction of gene number, the process could also take place as a patch exchange in which the parental configuration of the flanking arms is maintained. Gene con- version could occur by this type of interaction between a1 and a2 sequences (see below). Alternatively, the recombina- tion may take place intrachromosomally with concomitant intergenic deletion but without the production of the recip- rocal duplication (29). The actual occurrence of interchro- mosomal or interchromatid unequal crossing over is inferred from the existence of chromosomes bearing either one or three a-genes in the human population (27,31). However, the more frequent occurrence of intrachromosomal recombina- tion could combine with selective pressures to produce the observed predominance of the a-thal-2 chromosome over the triplicated state (44).

The unique feature and major proposal of our molecular model is that a non-homology is capable of blocking branch migration. Evidence has been obtained that this may indeed occur in E. coli (45-48) and in certain fungi (49). Whether such a process also occurs in mammalian cells remains to be determined.

If the heteroduplex which formed as a recombination inter- mediate between paired a1 and a2 genes (Fig. 811) extended into the 5’-flanking regions, then it would include two single nucleotide mismatches, one at position -634 and the other at position -733 (Fig. 9). Tracts of hybrid DNA can indeed be sufficiently long to encompass the entire Zl,c homology (50- 52). Repair enzymes could then act at these sites to correct the mispairing using one of the two parental strands as template (51-555). Because one of the two nucleotides is a1-

5’ dl ~ ~ l l l l l l l l l l l l l l l ~ l l l l l l l l

3’

1 I Strand y h a n g e

I c t

]Mismatch repair

Or1 oc2

FIG. 9. Mismatch repair in a heteroduplex between the 5’- flanking sequences of a1 and n2. a l - and a2-globin genes are shown unequally paired over their homologous 5”flanking regions. Only the variant nucleotides a t positions -733 and -634 are included. Homologous strand exchange results in a heteroduplex containing two single nucleotide mismatches. Mismatch repair creates an al - specific base pair a t -733 and an n2-specific base pair a t -634 of the wthal-2 gene product.

specific and the other is a2-specific in the a-thal-2 product, both of the heteroduplex strands must have been selected as the repair template, one in each of the adjacent segments. These two sites are separated by only 100 bp and therefore could be included in a single excision tract produced during mismatch repair (53, 54). Thus, either excision tracts on each of the complementary strands must have terminated within the 100 bp separating the two mismatches or a more site- directed mechanism was operative. Whatever the details of its origin, the 5”flanking region of the a-thal-2 gene reflects the two genetic consequences of gene conversion: sequence homogenization and the generation of a new nucleotide se- quence (56).

A History of the Segmental Concerted Evolution of the a- Globin Genes-The nonallelic human a-globin genes have >99% sequence homology in their proximal 5”flanking re- gions and in most of their coding and noncoding segments. This is the hallmark of concerted evolution (18). In contrast, the sequences contained within the duplication units 3’ to the previously described non-homology boundary in JVS2 are significantly less conserved. These findings led us to propose that the IVS2 non-homology delimits two independent con- version units, Zr.c. and Zsc (Figs. 3 and 7). Expanding on the proposal that non-homologies inhibit branch migration during genetic recombination (Fig. 8), we now describe an evolution- ary history of the a-globin genes in which segmentat strand exchanges account for the observed polarity of sequence con- servation within these loci.

A single ancestral tu-globin gene (ag, Fig. 10) is presumed to have inserted into staggered chromosomal breaks so as to produce a short direct repeat on either side (7, 57, 58). Sub- sequent duplication could occur by homologous recombination between the flanking repetitive sequences (59), giving rise to two initially identical, closely linked repeat units (aI~.s. and a I ’ ~ l ’ , Fig. 10). At some time following the duplication of the original tu-gene, 7 bp were deleted from IVSX of the 5’ member of the pair, perhaps by slipped mispairing mediated by a short direct repeat (GGCC, Fig. 3) during replication (60, 61); insertion of the 7-bp sequence in the 3’ gene cannot be excluded, however. These duplicated genes are the immediate ancestors of the contemporary a1 and a2 sequences.

Due to the length of the sequence homology both within and flanking the a-genes, a product of the original amplifi- cation, unequal alignments between a1 and a2 genes on the same or different DNA duplexes (that is, intrachromosomal,

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

15252 Human a-Globin Gene Conversion Units

A L S

I p.atto"dp, jlVS2 d L P ",

I I Deleton 1"

I W/?A

-1 /cro".o"e' I\

5 / 4 0 s s . m e , ~ L

=2 " n .IA A' x e 1

i I - 4 + A" I I

n

Sekctm and Ftxalm

Selection and Frxafon

FIG. 10. Concerted evolution of the human a-globin genes by segmental recombination. A single ancestral a-globin gene (cys) undergoes duplication to produce two initially identical precursors to the contemporary n-globin loci ( ~ p . ~ . and CY^.^, evolve to a2 and a l , respectively). A deletion of 7 bp (indicated by the "v" in a2 and the

A In t u l ) then occurs in IVS2 of ap.5' which corresponds to the IVS2 gap in the current a2 gene. This sequence non-homology now demar- cates two adjacent homologous segments within a2 and al-the long 5' ZLr unit (I in a2 and o in a1) and the shorter 3' ZSC unit (o in a2 and m in a l ) . Unequal recombination between the ZLc segments of a2 and a1 matches these sequences during the course of evolution by the combined processes of random sequence shuffling (shown here) and gene conversion (Fig. 9). The IVSZ non-homology prevents recombination events which initiate in ZLC from propagating into Zsc. Similarly, independent events that occur in Zsc cannot include ZLc (not shown). Thus, although both ZLC and ZSC undergo concerted evolution, they do so independently by obligatory segmental recom- bination mediated by flanking regions of non-homology.

" 1, .

interchromosomal, or interchromatid pairings) are possible. This is shown in the middle of Fig. 10 as an equal pairing of an a2 gene on chromosome A and an a1 gene on chromosome B. Homologous recombination can then occur, the probability of which is much greater on the 5'-side of the IVSB gap than on the 3'-side due to the larger target size of the former region. The reciprocal products of such an unequal crossover are chromosomes A' and B' containing one and three a- genes, respectively (Fig. 10). If a second unequal crossover takes place between the single a-gene on chromosome A' and the middle a-gene on chromosome B', the products are chro- mosomes A" and B" each containing two a-genes once again. The net result is that through expansion and contraction of gene number mediated by homologous but unequal crossing over, one of the two original a-sequences has become predom- inant on a single chromosome. This can be seen clearly in Fig. 10 by the juxtaposition of two open or two solid boxes representing a-gene ZLc units on chromosomes A" and B", respectively, having started with a combination of one solid and one open box on each of the ancestral chromosomes, A and B. Although this basic mechanism has been elaborated by others (17-22, 24), the unique feature that we have intro- duced is the recombinational barrier in IVSB (Fig. 8). Thus, the IVS2 non-homology isolates the 3'-ends of the genes from the processes that homogenize their 5' sequences. In a similar manner, strand exchanges that initiate within Zsc cannot propagate into Zr,c. Repeated rounds of such segmental recom- bination lead to the independent concerted evolution of the adjacent homology blocks. Since crossovers should occur less frequently in the shorter Zsc segment, these sequences should be less homologous than those of Zr.c. This is indeed the case:

the two Z1.c units diverge by only 0.2% while the two Zsc units diverge by 7.1%.

Significantly, all of the a1 genes that have been analyzed have the same IVS2 sequence including the 7 bp of interest, while all of the examined a2 genes lack these same 7 bp (11- 13, 35, 62). Similarly, the 3"untranslated regions are unique to a1 and a 2 with no interchanges apparent (14,15). If branch migration proceeded past the IVSB non-homology, then we would expect to find a1 and a2 genes resembling each other with respect to their 3'-ends. The common 3'-end structure of the multiple a-thal-2 genes that was revealed by the ApaI blot analysis (Fig. 5) also supports the concept of segmental recombination between 01 and a2.

In addition to the homogenizing effects of unequal crossing over, the physical interaction between parental DNA strands in the recombination intermediate can lead to sequence cor- rection by mismatch repair (Fig. 9). This represents a true gene conversion mechanism (24, 25, 63), as manifest in the structure of the cloned a-thal-2 gene. It should be emphasized that although gene conversion can occur independently of unequal crossing over, these two processes may function si- multaneously in matching related sequences. Since the rela- tive contribution of each cannot be assessed in the case of the human a-globin genes, we have referred to ZLc and Zsr as conversion units.

It is striking that although the Zsc segments of a1 and a2 are significantly divergent, no nucleotide differences are found in exon 3. If this region belongs to ZSC, then silent substitu- tions would be expected to accumulate at a rate similar to that of noncoding regions (64). However, sequence compari- sons of pseudogenes and their functional counterparts have revealed that codon usage by active genes is non-random (65, 66). Functional constraints against synonymous codon changes may therefore explain the lack of divergence in this short coding block. An alternative possibility that is consist- ent with our model is that the exon 3 homology is due to a recent conversion which was limited to this coding block.

The noncoding region 5' to the Z blocks of a1 and a 2 contain several sequence discontinuities interspersed with the X and Y homologies (3, 7), the evolution of which is consistent with a segmental correction mechanism.' In agreement with the expected correlation between conversion unit length and degree of homology (see above), the relatively short Y blocks are more divergent than either ZSC or ZLC.' Also relevant to this point is the finding from statistical analysis of the human y-globin gene sequences that regions lacking sequence gaps have a reduced number of single nucleotide substitutions (67).

The evolution of the human y-globin (23, 68) and goat N -

globin (69) genes can be traced in a similar manner as that of the human a-loci. In each case, a gap in an intervening sequence forms a boundary between independent conversion units having homologies proportional to their respective lengths. A 20-bp gap is found in IVS2 of the human Ay-gene relative to its ('y counterpart, while a 7-bp gap is present in IVSl of the goat "a-gene compared to the linked 'a-locus. Additional non-homologies are located at the opposite ends of these presumptive conversion units. Length-dependent seg- mental recombination during the course of evolution would then produce the patterns of sequence homology evident in these contemporary genes. The closely linked human {- and +{-globin genes also possess regions of homology flanked by nonhomologous sequences (8).

Rapid drift in the intervening sequences and flanking re- gions of other gene pairs may similarly accelerate the diver- gence of their duplication units if the types of mutations and their rates of accumulation have been sufficient to overcome homogenizing influences (41,61,70). The frequent occurrence

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

Human a-Globin Gene Conversion Units 15253

of short insertions and deletions (41, 71), as well as the transposition of repeated elements to intervening or flanking sequences (72, 73),’ would effectively disrupt gene correction mechanisms according to our proposal. Thus, conversion units, which are initially co-linear with the original duplica- tions, are gradually reduced in size by sequence alterations that interfere with homogenizing recombination events. The human a-globin genes represent one stage of such a process in which considerable flanking homology is preserved along with that of the structural genes. However, the extreme 5 ’ - portions of the a-duplication units have begun to diverge (3, 7). The duplicated goat a-globin genes may have reached a later stage of divergence in that their only conserved regions are those contained within the transcription units (69). An even more advanced stage in which structural gene sequences diverge is typified by the mouse @ma’- and /3”‘”-globin genes (41), the chicken a*- and aD-globin genes (74), and the human 6- and @-globin genes (61). Finally, the formation of pseudo- genes in some multigene families may be an extreme conse- quence of the isolation from sequence rectification mecha- nisms.

Acknowledgments-We thank Joyce Lauer and Tom Maniatis for providing the normal n-globin gene clones used in our sequence analysis and Douglas Higgs and John Phillips for the genomic DNAs used in the mapping studies. We are grateful to James Shen and co- workers for communicating their work prior to publication and to James Shen, Jack Szostak, Matthew Meselson, and Richard Kolodner for stimulating discussions on the mechanisms of genetic recombi- nation and the evolution of multigene families. The excellent tech- nical assistance of Sabra Goff is greatly appreciated.

1

2 3

4.

5.

6.

- 8.

9.

10.

11. 12.

13.

14. 15.

16.

17.

18.

19.

20. 21. 22. 23.

REFERENCES Deisseroth, A,, Nienhuis, A., Turner, P., Velez, R., Anderson, W.

F., Ruddle, F., Lawrence, J., Creagan, R., and Kucherlapati, R. (1977) Cell 12,205-218

Orkin, S. H. (1978) Proc. Natl. Acad. Sci. U. S. A . 75,5950-5954 Lauer, J., Shen, C.-K. J., and Maniatis, T. (1980) Cell 20, 119-

Gerhard, D. S., Kawasaki, E. S., Bancroft, F. C., and Szabo, P.

Koeffler, H. P., Sparkes, R. S., Stang, H., and Mohandas, T .

Barton, P., Malcolm, S., Murphy, C., and Ferguson-Smith, M. A.

Proudfoot, N. J . , and Maniatis, T. (1980) Cell 21,537-544 Proudfoot, N. J., Gil, A., and Maniatis, T. (1982) Cell 31, 553-

563 Hollan, S . R., Szelenyi, J. G., Brimhall, B., Duerst, M., Jones, R.

T., Koler, R. D.. and Stocklen, Z. (1972) Nature 235, 47-50 Foldi, J., Cohen-Solal, M., Valentin, C., Blouquit, Y., Hollan, S.

R., and Rosa, J. (1980) Eur. J . Biochem. 109, 463-470 Michelson, A. M., and Orkin, S. H. (1980) Cell 22, 371-377 Liebhaber, S. A., Goossens, M. J., and Kan, Y. W. (1980) Proc.

Liebhaher, S. A., Goossens, M. J., and Kan, Y. W. (1981) Nature

Orkin, S. H., and Goff, S. C. (1981) Cell 24, 345-351 Liehhaber, S. A., and Kan, Y. W. (1981) J. Clin. Inuest. 68,439-

446 Dayhoff, M. 0 . (1975) Atlas of Protein Sequence and Structure,

Vol. 5, National Biomedical Research Foundation, Washington, D. C.

Hood, L., Campbell, tJ. H., and Elgin, S. C. R. (1975) Annu. Reu. Genet. 9, 305-353

Zimmer, E. A., Martin, S. L., Beverley, S. M., Kan, Y. W., and Wilson, A. C. (1980) Proc. Natl. Acad. Sci. ti. S. A. 77, 2158- 2162

Smith, G . P. (1973) Cold Spring Harbor Symp. Quant. Biol. 38, 507-514

Smith, G. P. (1976) Science 191, 528-535 Black, d. A., and Gibson, D. (1974) Nature 250,327-328 Tartoff, K. D. (1975) Annu. Reu. Genet. 9, 355-385 Slightom, J . L., Blechl, A. E., and Smithies, 0. (1980) Cell 21.

130

(1981) Proc. Natl. Acad. Sci. U. S. A. 78, 3755-3759

(1981) Proc. Natl. Acad. Sci. U. S. A . 78, 7015-7018

(1982) J . Mol. Biol. 156, 269-278

Natl. Acad. Sci. U. S. A. 77, 7054-7058

290, 26-29

627-638

24. Baltimore, D. (1981) Cell 24,592-594 25. Meselson, M. S., and Radding, C. M. (1975) Proc. Natl. Acad. Sci.

26. Dressler, D., and Potter, H. (1982) Annu. Reu. Biochem. 51,727- 761

27. Orkin, S. H., Old, J., Lazarus, H., Altay, C., Gurgey, A., Weath- erall, D. J., and Nathan, D. G. (1979) Cell 17,33-42

28. Embury, S., Lebo, R., Dozy, A., and Kan, Y. W . (1979) J. Clin. Inuest. 63,1307-1310

29. Embury, S. H., Miller, J. A,, Dozy, A. M., Kan, Y. W., Chan, V., and Todd, D. (1980) J . Clin. Inuest. 66, 1319-1325

30. Goossens, M., Dozy, A. M., Embury, S. H., Zachariades, Z., Hadjiminas, M. G., Stamatoyannopoulos, G., and Kan, Y. W. (1980) Proc. Natl. Acad. Sci. U. S. A. 77, 518-521

31. Higgs, D. R., Old, J. M., Pressley, L., Clegg, J. B., and Weatherall, D. J. (1980) Nature 284, 632-635

32. Lie-Injo, L. E., Herrera, A. R., and Kan, Y. W. (1981) Nucleic Acids Res. 9, 3707-3717

33. Trent, R. J., Higgs, D. R., Clegg, J . B., and Weatherall, D. J. (1981) Br. J . Haematol. 49, 149-152

34. Rimm, D. L., Horness, D., Kucera, J., and Blattner, F. R. (1980) Gene 12,301-309

35. Orkin, S. H., Goff, S. C., and Hechtman, R. (1981) Proc. Natl. Acad. Sci. U. S . A. 78, 5041-5045

36. Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 65, 499-560

37. Challberg, M. D., and Englund, P. T. (1980) Methods Enzymol.

38. Michelson, A. M., and Orkin, S. H. (1982) J. Biol. Chem. 257,

39. Southern, E. M. (1975) J. Mol. Biol. 97, 503-517 40. Rigby, P. J., Dieckman, M., Rhodes, C., and Berg, P. (1977) J.

41. Konkel, D. A., Maizel, J . V., Jr., and Leder, P. (1979) Cell 18,

42. Meselson, M. S. (1972) J. Mol. Biol. 71, 795-798 43. Sigal, N., and Alberts, B. (1972) J . Mol. Biol. 71, 789-793 44. Weatherall, D. J., and Clegg, J. B. (1982) Cell 29, 7-9 45. Fox, M. S., Dudney, C. S., and Sodergren, E. J. (1979) Cold Spring

Harbor Symp. Quant. Biol. 43, 999-1007 46. Sodergren, E. J., and Fox, M. A. (1979) J . Mol. €501. 130, 357-

377 47. Cox, M. M., and Lehman, I. R. (1981) Proc. Natl. Acad. Sci.

U. S. A . 78,6018-6022 48. DasGupta, C., and Radding, C. M. (1982) Proc. Natl. Acad. Sci.

U. S. A. 79, 762-766 49. Hamza, H., Haedens, V., Mekki-Berrada, A., and Rossignol, J.-

L. (1981) Proc. Natl. Acad. Sci. U. S. A. 78,7648-7651 50. Broker, T. R. (1973) J. Mol. Biol. 81, 1-16 51. White, R. L., and Fox, M. S. (1974) Proc. Natl. Acad. Sci. ti. S. A.

52. Wildenberg, J., and Meselson, M. (1975) Proc. Natl. Acad. Sci.

53. Wagner, R., and Meselson, M. (1976) Proc. Natl. Acad. Sci.

54. Miller, L. K., Cooke, B. E., and Fried, M. (1976) Proc. Natl. Acad.

55. Roberts, J . M., and Axel, R. (1982) Cell 29, 109-119 56. Petes, T., and Fink, G. R. (1982) Nature 300, 261-217 57. Jagadeeswaran, P., Forget, B. G., and Weissman, S. M. (1981)

58. Van Arsdell, S. W., Denison, R. A., Bernstein, L. B., Weiner, A.

59. Jeffreys, A. J., and Harris, S. (1982) Nature 296, 9-10 60. Farabaugh, P. J., Schmeissner, V., Hofer, M., and Miller, J . H.

(1978) J . Mol. Biol. 126, 817-863 61. Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R. M.,

O’Connell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G., Weissman, S. M., Slightom, J . L., Blechl, A. E., Smithies, O., Baralle, F. E., Shoulders, C. C., and Proudfoot, N. J. (1980) Cell 2 1,653-668

62. Goossens, M., Lee, K. Y., Liebhaber, S. A., and Kan, Y . W. (1982) Nature 296, 864-865

63. Egel, R. (1981) Nature 290, 9-92 64. Perler, F., Efstratiadis, A,, Lomedico, P., Gilbert, W . , Kolodner,

65. Miyata, T., and Yasunaga, T. (1981) Proc. Natl. Acad. Sci. ti. S. A.

U. S. A. 72,358-361

65,39-43

14773-14782

Mol. B i d 113, 237-251

865-873

71, 1544-1548

I/. S. A . 72, 2202-2206

ti. S. A. 73, 4135-4139

Sci. ti. S. A. 73, 3073-3077

Cell 26, 141-142

M., Manser, T., and Gesteland, R. F. (1981) Cell 26, 11-17

R., and Dodgson, J . (1980) Cell 20,555-566

78, 450-453

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

15254 Human a-Globin Gene Conversion Units

66. Miyata, T., and Hayashida, H. (1981) Proc. Natl. Acad. Sci.

67. Smithies, O., Engels, W. R., Devereaux, J. R., Slightom, J. L.,

68. Shen, S., Slightom, J. L., and Smithies, 0. (1981) Cell 26, 191-

69. Schon, E. A., Wernke, S. M., and Lingrel, J. B. (1982) J. Bid.

70. Weaver, S., Comer, M. B., Jahn, C. L., Hutchison, C. A., 111, and

U. S. A. 78, 5739-5743

and Shen, S. (1981) Cell 26,345-353

203

Chem. 257,6825-6835

Edgell, M. H. (1981) Cell 24, 403-411

71. van Ooyen, A,, van den Berg, J., Mantei, N., and Weissmann, C. (1979) Science 2 0 6 , 337-344

72. Page, G. S., Smith, S., and Goodman, H. M. (1981) Nucleic Acids Res. 9, 1087-2104

73. Schon, E. A., Cleary, M. L., Haynes, J. R., and Lingrel, J. B. (1981) Cell 27, 359-369

74. Dodgson, J. B., McCune, K. C., Rusling, D. J., Krust, A,, and Engel, J. D. (1981) Proc. Natl. Acad. Sci. U. S. A. 7 8 , 5998- 6002

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from

A M Michelson and S H OrkinConcerted evolution by segmental recombination.

Boundaries of gene conversion within the duplicated human alpha-globin genes.

1983, 258:15245-15254.J. Biol. Chem. 

  http://www.jbc.org/content/258/24/15245Access the most updated version of this article at

 Alerts:

  When a correction for this article is posted• 

When this article is cited• 

to choose from all of JBC's e-mail alertsClick here

  http://www.jbc.org/content/258/24/15245.full.html#ref-list-1

This article cites 0 references, 0 of which can be accessed free at

by guest on January 25, 2019http://w

ww

.jbc.org/D

ownloaded from