Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984...

5
Proc. Nad. Acad. Sci. USA Vol. 81, pp. 2650-2654, May 1984 Biochemistry Structure of the 5' ends of immunoglobulin genes: A novel conserved sequence (B lymphocyte-specific gene regulation/promoters) TRISTRAM G. PARSLOW*t, DEBRA L. BLAIRt, WILLIAM J. MURPHYt, AND DARYL K. GRANNER* Departments of *Internal Medicine and Biochemistry and the tDiabetes and Endocrinology Research Center, Veterans' Hospital, University of Iowa College of Medicine, Iowa City, IA 52240 Communicated by Leonard A. Herzenberg, December 30, 1983 ABSTRACT Recent investigations have suggested that tis- sue-specific regulatory factors are required for immunoglob- ulin gene transcription. Cells of the mouse lymphocytoid pre- B-cell line 70Z/3 contain a constitutively rearranged immuno- globulin K light chain gene; the nucleotide sequence of this gene exhibits all the known properties of a functionally compe- tent transcription unit. Nevertheless, transcripts derived from this gene are detectable only after exposure of the cells to bac- terial lipopolysaccharide, implying that accurate DNA rear- rangement is not sufficient to activate expression of the gene. Comparison of the sequence of the 70Z/3 K light chain gene with those encoding other immunoglobulin heavy and light chains has revealed that a distinctive promoter region struc- ture is characteristic of this multigene family. The sequence A- T-T-T-G-C-A-T lies approximately 70 base pairs upstream from the site of transcriptional initiation in every light chain gene examined; in heavy chain genes, the corresponding loca- tion is occupied by the precise inverse (A-T-G-C-A-A-A-T) of this sequence. Although adjacent regions of DNA have di- verged extensively in evolution, these octanucleotide sequences are stringently conserved at this location among diverse immu- noglobulin genes from at least two mammalian species. The proximity of this conserved octanucleotide block to the site of transcriptional initiation suggests that it may serve as a recog- nition locus for factors regulating immunoglobulin gene expression in a tissue-specific fashion. The extraordinary diversity of immunoglobulin molecules is due, in part, to the existence of a large and heterogeneous family of genes encoding variable (V) region domains of the heavy chain and light chain polypeptides. The formation of a complete immunoglobulin transcription unit requires specif- ic DNA rearrangements that fuse a single V gene, along with its 5' flanking sequences, to separate genetic elements cod- ing for the remainder of the polypeptide chain. In the case of the murine K light chain genes, these rearrangements link the selected V gene to one of four junctional (J) elements located upstream of the K light chain constant (C) region gene, CK (1, 2). Analogous DNA rearrangements are required for assem- bly of an intact heavy chain gene (3). Although their nucleotide sequences differ widely, all functional V genes share certain common structural features and are believed to have evolved from a single ancestral se- quence. Each comprises two coding regions (exons) separat- ed by a short intervening sequence. The first exon encodes a hydrophobic signal peptide; the second specifies -95 amino- terminal residues of the V domain. Each exon contains splice junctions for RNA processing and specifies a few invariant amino acid residues thought to be essential for proper func- tion of the protein (4). Two short-sequence elements found near the 3' ends of unrearranged V genes have been implicat- ed in the mechanism of DNA rearrangement (2). In addition, each V gene harbors at its 5' end a functional promoter (5), which can serve as the site of transcriptional initiation in a fully assembled heavy or light chain gene. The precise se- quences required for promoter function in these genes have not yet been elucidated. We investigated the structure and expression of an immu- noglobulin light chain gene in the mouse leukemia cell line 70Z/3. Under ordinary growth conditions, cells of this line constitutively express cytoplasmic A heavy chains without associated light chain synthesis, a phenotype characteristic of the early stages of B-lymphocyte ontogeny (6-8). Al- though these cells harbor a single rearranged K light chain gene (7), they ordinarily contain neither K light chain protein nor its corresponding mRNA. When grown in the presence of bacterial lipopolysaccharide (LPS), however, 70Z/3 cells accumulate cytoplasmic K light chain mRNA (8) and begin to synthesize K light chain protein; after 12 hr of optimal LPS treatment, 70-100% express both heavy and light chain de- terminants on their surfaces (6). Therefore, the 70Z/3 cell line can serve as a model system for the study of several critical events in early lymphocyte differentiation, including K light chain gene activation and the induction of surface immunoglobulin expression. In this report, we present the complete V region and 5' flanking DNA sequences of the rearranged K light chain gene of 70Z/3 and further characterize the effects of LPS treat- ment on the expression of RNA transcripts derived from this gene. In addition, we identify a highly conserved octanucleo- tide sequence at the 5' ends of diverse immunoglobulin V genes and present evidence that a distinctive promoter re- gion structure is characteristic of this multigene family. MATERIALS AND METHODS Cell Culture. 70Z/3 cells, a gift from R. P. Perry, were grown in suspension culture as described (8) in the presence or absence of LPS from Salmonella typhosa (10 ,ug/ml; Difco). LPS treatment had no effect on either the doubling time (12 hr) or the total polyadenylylated RNA content of these cells. c Light Chain Gene Isolation and Sequence Determination. DNA was prepared from the nuclei of untreated 70Z/3 cells as described (9) and digested to completion with EcoRI. Di- gested DNA (70 jig) was layered over a linear 10-40% su- crose gradient and centrifuged for 24 hr at 20°C in an SW 27 rotor. Fractions containing restriction fragments greater than 14 kb in length were pooled and concentrated by etha- nol precipitation. The DNA was then ligated into the EcoRI arms of the bacteriophage Charon 4A, and a clone library was prepared by the method of Maniatis et al. (10). This li- Abbreviations: LPS, bacterial lipopolysaccharide; bp, base pairs; kb, kilobases or kilobase pairs; V, variable; J, joining; C, constant. tPresent address: Department of Pathology, University of California at San Francisco, San Francisco, CA 94143. 2650 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on July 28, 2020

Transcript of Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984...

Page 1: Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984 Biochemistry Structure ofthe 5' endsofimmunoglobulingenes: Anovel conserved sequence

Proc. Nad. Acad. Sci. USAVol. 81, pp. 2650-2654, May 1984Biochemistry

Structure of the 5' ends of immunoglobulin genes: A novelconserved sequence

(B lymphocyte-specific gene regulation/promoters)

TRISTRAM G. PARSLOW*t, DEBRA L. BLAIRt, WILLIAM J. MURPHYt, AND DARYL K. GRANNER*

Departments of *Internal Medicine and Biochemistry and the tDiabetes and Endocrinology Research Center, Veterans' Hospital, University of Iowa College ofMedicine, Iowa City, IA 52240

Communicated by Leonard A. Herzenberg, December 30, 1983

ABSTRACT Recent investigations have suggested that tis-sue-specific regulatory factors are required for immunoglob-ulin gene transcription. Cells of the mouse lymphocytoid pre-B-cell line 70Z/3 contain a constitutively rearranged immuno-globulin K light chain gene; the nucleotide sequence of thisgene exhibits all the known properties of a functionally compe-tent transcription unit. Nevertheless, transcripts derived fromthis gene are detectable only after exposure of the cells to bac-terial lipopolysaccharide, implying that accurate DNA rear-rangement is not sufficient to activate expression of the gene.Comparison of the sequence of the 70Z/3 K light chain genewith those encoding other immunoglobulin heavy and lightchains has revealed that a distinctive promoter region struc-ture is characteristic of this multigene family. The sequence A-T-T-T-G-C-A-T lies approximately 70 base pairs upstreamfrom the site of transcriptional initiation in every light chaingene examined; in heavy chain genes, the corresponding loca-tion is occupied by the precise inverse (A-T-G-C-A-A-A-T) ofthis sequence. Although adjacent regions of DNA have di-verged extensively in evolution, these octanucleotide sequencesare stringently conserved at this location among diverse immu-noglobulin genes from at least two mammalian species. Theproximity of this conserved octanucleotide block to the site oftranscriptional initiation suggests that it may serve as a recog-nition locus for factors regulating immunoglobulin geneexpression in a tissue-specific fashion.

The extraordinary diversity of immunoglobulin molecules isdue, in part, to the existence of a large and heterogeneousfamily of genes encoding variable (V) region domains of theheavy chain and light chain polypeptides. The formation of acomplete immunoglobulin transcription unit requires specif-ic DNA rearrangements that fuse a single V gene, along withits 5' flanking sequences, to separate genetic elements cod-ing for the remainder of the polypeptide chain. In the case ofthe murine K light chain genes, these rearrangements link theselected V gene to one of fourjunctional (J) elements locatedupstream of the K light chain constant (C) region gene, CK (1,2). Analogous DNA rearrangements are required for assem-bly of an intact heavy chain gene (3).Although their nucleotide sequences differ widely, all

functional V genes share certain common structural featuresand are believed to have evolved from a single ancestral se-quence. Each comprises two coding regions (exons) separat-ed by a short intervening sequence. The first exon encodes ahydrophobic signal peptide; the second specifies -95 amino-terminal residues of the V domain. Each exon contains splicejunctions for RNA processing and specifies a few invariantamino acid residues thought to be essential for proper func-tion of the protein (4). Two short-sequence elements foundnear the 3' ends of unrearranged V genes have been implicat-

ed in the mechanism ofDNA rearrangement (2). In addition,each V gene harbors at its 5' end a functional promoter (5),which can serve as the site of transcriptional initiation in afully assembled heavy or light chain gene. The precise se-quences required for promoter function in these genes havenot yet been elucidated.We investigated the structure and expression of an immu-

noglobulin light chain gene in the mouse leukemia cell line70Z/3. Under ordinary growth conditions, cells of this lineconstitutively express cytoplasmic A heavy chains withoutassociated light chain synthesis, a phenotype characteristicof the early stages of B-lymphocyte ontogeny (6-8). Al-though these cells harbor a single rearranged K light chaingene (7), they ordinarily contain neither K light chain proteinnor its corresponding mRNA. When grown in the presenceof bacterial lipopolysaccharide (LPS), however, 70Z/3 cellsaccumulate cytoplasmic K light chain mRNA (8) and begin tosynthesize K light chain protein; after 12 hr of optimal LPStreatment, 70-100% express both heavy and light chain de-terminants on their surfaces (6). Therefore, the 70Z/3 cellline can serve as a model system for the study of severalcritical events in early lymphocyte differentiation, includingK light chain gene activation and the induction of surfaceimmunoglobulin expression.

In this report, we present the complete V region and 5'flanking DNA sequences of the rearranged K light chain geneof 70Z/3 and further characterize the effects of LPS treat-ment on the expression ofRNA transcripts derived from thisgene. In addition, we identify a highly conserved octanucleo-tide sequence at the 5' ends of diverse immunoglobulin Vgenes and present evidence that a distinctive promoter re-gion structure is characteristic of this multigene family.

MATERIALS AND METHODSCell Culture. 70Z/3 cells, a gift from R. P. Perry, were

grown in suspension culture as described (8) in the presenceor absence of LPS from Salmonella typhosa (10 ,ug/ml;Difco). LPS treatment had no effect on either the doublingtime (12 hr) or the total polyadenylylated RNA content ofthese cells.

c Light Chain Gene Isolation and Sequence Determination.DNA was prepared from the nuclei of untreated 70Z/3 cellsas described (9) and digested to completion with EcoRI. Di-gested DNA (70 jig) was layered over a linear 10-40% su-crose gradient and centrifuged for 24 hr at 20°C in an SW 27rotor. Fractions containing restriction fragments greaterthan 14 kb in length were pooled and concentrated by etha-nol precipitation. The DNA was then ligated into the EcoRIarms of the bacteriophage Charon 4A, and a clone librarywas prepared by the method of Maniatis et al. (10). This li-

Abbreviations: LPS, bacterial lipopolysaccharide; bp, base pairs;kb, kilobases or kilobase pairs; V, variable; J, joining; C, constant.tPresent address: Department of Pathology, University of Californiaat San Francisco, San Francisco, CA 94143.

2650

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

July

28,

202

0

Page 2: Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984 Biochemistry Structure ofthe 5' endsofimmunoglobulingenes: Anovel conserved sequence

Proc. NatL Acad. Sci. USA 81 (1984) 2651

brary was screened for chimeric phage harboring sequenceshomologous to the C, region coding sequence by using theplaque hybridization technique of Benton and Davis (11); atotal of 14 K light chain-bearing plaques were identifiedamong the 68,000 plaques tested. Phage containing the rear-ranged C. allele were identified by restriction mapping ofpurified phage DNA. Portions of the cloned rearranged genewere subcloned into the plasmid pBR322 and subjected tonucleotide sequence analysis by the method of Maxam andGilbert (12).RNA Isolation. For isolation of cytoplasmic RNA, 109

washed cells were lysed by gentle stirring for 5 min in 15 mlof ice-cold 10 mM Tris, pH 7.5/4 mM MgCl2/100 mMKCl/0.05% Triton X-100. After centrifugation for 5 min at3,000 rpm in an HB-4 rotor at 4TC, the supernatant wasadjusted to 20 mM EDTA and 2% sodium dodecyl sulfate.The nuclear pellet was washed in 15 ml of ice-cold 10 mMsodium acetate, pH 5.0/50 mM NaCl and centrifuged as be-fore. Supernatants were combined with those from the previ-ous step and then extracted twice against equal volumes ofphenol/chloroform, 1:1 (vol/vol), and twice against absoluteether. The aqueous phases were adjusted to 0.5 M ammoni-um acetate/10 mM magnesium acetate/1 mM LDTA, andRNA was precipitated by addition of 2.5 vol of ethanol. TheRNA samples were then incubated at 37°C for 1 hr with pro-teinase K (10 ug/ml) in 10 mM Tris, pH 7.5/0.5% sodiumdodecyl sulfate, extracted twice against chloroform, and pre-cipitated from ethanol as before.Nuclear RNA was isolated from the nuclear pellets de-

scribed above by cesium density centrifugation in the pres-ence of guanidinium thiocyanate (13). The purified RNA wasenriched for polyadenylylated sequences by two successivepasses over an oligodeoxythymidylate affinity column.

Hybridization Analysis of Cellular RNA. To detect K lightchain-specific transcripts, RNA samples were subjected toelectrophoresis on 1% agarose gels containing 6% (wt/vol)formaldehyde, then transferred to nitrocellulose mem-branes, and probed for CK region sequences by the methodof Thomas (14). The probe was a radiolabeled 300-base-pair(bp) restriction fragment produced by combined HincII/HinfI digestion of cloned K light chain cDNA (15).

RESULTS

Only one of the two C,. alleles of 70Z/3 is rearranged; thesecond allele retains the germ-line configuration (7, 15).LPS-stimulated induction of K light chain gene expressionoccurs without detectable change in the sequence organiza-tion of either allele (9). Nelson et al. (16) have observed thatsubstantially larger quantities of K light chain-specific RNAare synthesized in vitro by nuclei isolated from LPS-treated70Z/3 cells than by nuclei from untreated cells, implying thatLPS acts, at least in part, by inducing transcription of theconstitutively-rearranged K light chain gene. To determinethe organization of this inducible transcription unit, we firstexamined the sizes of the RNA transcripts produced. 70Z/3cells were grown for 20 hr in the presence or absence of LPS.Polyadenylylated RNA isolated from nuclear and cytoplas-mic fractions of the cells was subjected to agarose gel elec-trophoresis under denaturing conditions, transferred to ni-trocellulose membranes, and probed for C, sequences.The cytoplasm of untreated 70Z/3 cells contained no de-

tectable K light chain-specific sequences (Fig. 1). As previ-ously reported by Perry and Kelley (8), however, exposureto LPS resulted in the accumulation of mature-sized [1.2 ki-lobase (kb)] cytoplasmic K light chain mRNA. Similar analy-sis of nuclear RNA revealed that LPS treatment induced theexpression of three larger K light chain transcripts in additionto the mature mRNA. The largest of these is comparable inlength (5.1 kb) to a complete K light chain transcription unit

Cytoplasm Nucleus

*

- 5.1-4.4-2.8

-1.2

LPS

FIG. 1. Induction of nuclear and cytoplasm c K light chain-spe-cific RNA in 70Z/3 cells exposed to bacterial LPS. RNA isolatedfrom nuclear and cytoplasmic fractions of LPS-treated and untreat-ed cells was subjected to denaturing agarose gel electrophoresis,transferred to nitrocellulose membranes, and probed for C, se-quences. Cytoplasmic RNA samples contained 3 ,g of polyadenyly-lated RNA per lane; polyadenylylated nuclear RNA samples fromcontrol and LPS-treated cells contained 64 ,g and 32 AofRNA perlane, respectively. Sizes of the various transcripts (lOi kb) are indi-cated.

and, presumably, represents the primary transcript of thegene. The two remaining homologous transcripts (4.4 and 2.8kb) are probably intermediate forms or by-products in pro-cessing of the primary transcript (17). No such precursorforms were detectable in the nuclei of cells grown in the ab-sence of LPS; the trace of mature K light chain mRNA foundin nuclear RNA from untreated cells probably reflects thepresence of a small minority (<5%) of spontaneously acti-vated cells in the untreated population.The entire rearranged K light chain gene is contained with

a 17-kb EcoRI restriction fragment of 70Z/3 DNA (7, 15). Todetermine the structure of this gene, DNA isolated from un-treated 70Z/3 cells was cleaved to completion with EcoRIand cloned in the coliphage vector Charon 4A. Chimericphage harboring the rearranged K light chain locus were iden-tified by plaque hybridization and partial restriction map-ping, and selected regions of the cloned gene were subjectedto nucleotide sequence analysis.The sequence of a continuous 1369-bp region at the 5' end

of the rearranged gene is depicted in Fig. 2. The site ofjoin-ing of the V and J elements can be identified between resi-dues 1301 and 1302 (solid arrow); downstream from this site,the nucleotide sequence is identical to that of the unrear-ranged K light chain locus of murine embryo DNA (19). In70Z/3, DNA rearrangement has resulted in fusion of the Vgene to the J element lying farthest upstream from the C,.locus; by convention, this element is designated J1 (residues1302-1339). Because the translational reading frame of the J1element is known (2), the amino acid sequence of the V re-gion (residues 1016-1339) could readily be deduced by usingas landmarks the relatively invariant amino acids found atparticular positions in other K light chain proteins (4) and therecipient RNA splice site AGG at the 5' end of the secondexon (open arrow).The first exon of a K light chain gene, comprising the signal

peptide coding element and the 5' untranslated sequences, istypically separated from the second exon by a 100- to 400-bpintervening sequence (18, 20-24). Although the sequence ofthe signal coding region varies considerably among K lightchain genes, it generally consists of a single uninterruptedtranslational reading frame encoding 16-26 amino acids, be-ginning with the start codon ATG and ending with the donorRNA splice site GGT. We have identified such a region atresidues 776-826 of the sequence shown in Fig. 2. This re-

Biochemistry: Parslow et aL

Dow

nloa

ded

by g

uest

on

July

28,

202

0

Page 3: Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984 Biochemistry Structure ofthe 5' endsofimmunoglobulingenes: Anovel conserved sequence

2652 Biochemistry: Parslow et al.

100GTCGTT IC1T1 G TL1GA1bAGbIGbLbLNbAATLIAbLAITITITAU TI

200

300

400

TATAAGTAAAATGTTTGATCAAGCTTATGACTTAAATTTGCAGACTCGGGGGCTGTCGATCTTTATAAATAAATGTAATTTATTTGAAAAGTGCTCTCAG

ICI

800ATCTCACAGTTGGTTTAAAGCMAGTACTTATGAGAATAGCAGTAATTAGCTTGGGACCAAAATTCAAAGACAAAATGGATTTTCAAGTGCAGATTTTCA

.i ii MetAspPheGl nValGl nIl ePheS

GCTTCCTGCTAATCAGTGCT CAGGTAACAGAGGGCAGGGAATTTGAGATCAGAATCCAACCAAAATTATTTTCCCTGGGGMTTTGAATCTAAAATACAerPheLeuLeuIl eSerAl aSerITTTTTTTTTCTTTTTCGTTCATCTGAATGTTGGGTG(;TATAAAATTATTTTTGTATCTCTATTTTTACTAATCCICTCTGICTITTITICTITTI1000

AGTCATAAGTCCAGAGGACAAATTGTTCTCTCCCAGTCTCCAGCAATCCTGTCtGCATCTCCAGGGGAGAAGGTCACAATGACTTGCAGGGCCAGCTCAAGlyGlnIleValLeuSerGi nSerProAlaIleLeuSerAlaSerProGlyGluLysValThrMetThrCysArgAlaSerSerS

GTGTAAGTTACATGCACTGGTACCAGCAGAAGCTTGGATCCTCCCCCAMeCCATGGATTTATGCCACATCCMACCTGGCTTCTGGAGTCCCTGCTCGCTTerVal SerTyrletHi sTrpTyrGl nGl nLysLeuGl ySerSerProLysProTrplIleTyrAl aThrSerAsnLeuAl aSerGl yVal ProAl aArgPh

1100

1200

1300CAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCAGAGTGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTAACCCAeSerGlySerGlySerGlyThrSerTyrSerLeuThrl 1eSerArgValGl uAl aGl uAspAl aAl aThrTyrTyrCysGl nGl nTrpSerSerAsnPro

1400CGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGTAAGTAGAATCCAAAGTCTCTTTCTTCCGTT ................ArgThrPheGlyGlyGlyThrLysLeuGl u0l eLysArg

FIG. 2. V region and 5' flanking DNA sequence of the rearranged K light chain gene of 70Z/3 cells. The nucleotide sequences of both DNAstrands were determined for the cloned rearranged gene by the method of Maxam and Gilbert (12). The predicted amino acid sequences of thesignal peptide and V region domain are indicated, along with the donor and recipient RNA splice junctions (open arrows) and the site of DNArecombination (solid arrow). Nested brackets indicate a zone of partial inverted repeat symmetry within the signal coding region. Sequences ofthe J1 junctional region and of the "TATA" box homologue are underlined. Asterisks denote the putative site of transcriptional initiation,assuming a first exon 78 ± 2 bp long (18). The conserved sequence A-T-T-T-G-C-A-T (box) lies 154 bp upstream from the 3' end of the firstexon.

gion also exhibits other features characteristic of light chainsighal-coding sequences: it contains a zone of partial invert-ed repeat symmetry upstream from the terminal splice junc-tion and encodes a relatively hydrophobic peptide containinga central pair of leucine residues. No other region within thesequence depicted in Fig. 2 fulfills these criteria.

Kelley et al. (18) recently compared the 5' terminal struc-tures of several different V genes and observed that a uni-form distance of 78 ± 2 bp separates the initiation site fromthe donor RNA splice junction at the 3' end of the signalcoding region. Application of this principle suggests thattranscription of the rearranged K light chain gene of 70Z/3 isinitiated at residues 745-749 (asterisks in Fig. 2). This puta-tive initiation region is located -555 bp upstream from the5' end of J1i which in turn lies 4406 bp upstream from thepolyadenylylation site at the 3' end of the C, region (19).Assuming a tail of 50-250 adenylate residues, the primarytranscript of this rearranged gene would be expected to be5.0-5.2 kb long, a value in excellent agreement with our

analysis of nuclear K light chain transcripts in these cells(Fig. 1). Subsequent RNA processing would generate a 1.1-to 1.3-kb mature K light chain message.

In examining the published sequences of a variety ofimmunoglo'bulin light chain genes, we observed that the oc-

tanucleotide A-T-T-T-G-C-A-T is nearly always present-100 bp upstream from the 5' end of the signal peptide cod-ing region. Fig. 3 depicts portions of the 5' flanking se-

quences of several niurine and human light chain V genes,aligned with respect to this conserved octanucleotide. The

sequence A-T-T-T-G-C-A-T occurs without variation in

eight of the nine murine K light chain genes shown (20-25), as

well as in the murine XI and XI, light chain V regions (26, 27)and the human hil1 K light chain gene locus (5). Two other

human K light chain genes (h122 and the unexpressed hJOOpseudogene) and the K light chain gene expressed by MPC11mouse myeloma cells (MH) each contain a single base sub-stitution within this octanucleotide (5, 18). In contrast, com-

parison among the various genes reveals extensive sequence

divergence throughout the DNA regions flanking this octa-nucleotide. The conserved sequence is located 90-110 bp up-

stream from the ATG start codon in nearly all of the lightchain genes shown. For comparisons among immunoglob-ulin genes, however, the 3' end of the first exon is a prefera-ble landmark, as it occupies a fixed position with respect tothe site of transcriptional initiation. In those genes for whichthe necessary sequence data are available, the conserved oc-

tanucleotide lies 150 ± 10 bp upstream from the GGT splicejunction at the 3' end of the first light chain exon.A search for this conserved octanucleotide in the 5' flank-

ing regions of various immunoglobulin heavy chain genes ledto an unexpected finding. The sequence A-T-T-T-G-C-A-Trarely occurred in the published heavy chain gene sequencesand was never observed at the location typical for light chaingenes. Instead, the corresponding position in heavy chaingenes was occupied by. the precise inverse of the conservedsequence. As illustrated in Fig. 3, the inverse sequence (A-

T-G-C-A-A-A-T), with only occasional alterations, is pres-ent 150 ± 10 bp upstream from the first RNA splice junctionin all of the heavy chain genes for which adequate data are

available (28-30).

DISCUSSION

Specific DNA rearrangements, which assemble the V, J, andCK coding elements to form a single transcription unit, are

essential for the synthesis of a functional immunoglobulin

500

600

700

Proc. Nad Acad Sci. USA 81 (1984)

Dow

nloa

ded

by g

uest

on

July

28,

202

0

Page 4: Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984 Biochemistry Structure ofthe 5' endsofimmunoglobulingenes: Anovel conserved sequence

Proc. Natl. Acad. Sci. USA 81 (1984) 2653

-20 -10 1 10 20 30

ATG GGTMurine Kappa Genes

70Z/3 TGCCTAGACTGTATCTTGCG ATTTGCAT ATTACA M TCAGTAACCACAA 107 154

K2 GCTGTGCCTACCCTTTGCTG ATTTGCAT GTACCCAAAGCATAGCTTACTG 100 147

M173B ATCCTAACTGCTTCTTAATG ATTTGCAT ATCCTCACTACATCGCCTTGGG 91 146

M41 ATCCTAACTGCTTCTTAATA ATTTGCAT ACCCTCACTGCATCGCCTTGGG 92 144

M167 ----CAGCACTGACCAATGG ATTTGCAT AATGCTCCCTAGGGTCCACTTC 106 153

T1 GCAATAACTGGTTCCCAATG ATTTGCAT GCTCTCACTTCACTGCCTTGGG 97 146

T2 AGCAACATGAAGACAGTATG ATTTGCAT AAGTTTTTCTTTCTTCTAATGT 109 156

K21C AAACAGTACATACTCCGCTG ATTTGCAT ATGAAATAATTMTATAACAGCC 93 143

M11 ACTTCCTTATTTGATGACTC CTTTGCAT AGATCCCTAGAGGCCAGCACAG 73 148

Murine Lambda Genes

XI TAAACCTGTAAATGAAAGTA ATTTGCAT TACTAGCCCAGCCCAGCCCATA 106 151

AII TAAACCTGTAAATGAAAGTA ATTTGCAT TACTAGCCCAGCCCAGCCCATA 105 151

Human Kappa Genes

hlOl GCCTGCCCCATCCCCTGCTC ATTTGCAT GTTCCCAGAGCACAACCTCCTG 96 NA

h122 GCCTGCCCCATCCCCTGCTG ATTTGCCT GTTCCTAGAGCACAGCCCCCTG 102 NA

hlOO TCATTCTTGCATCTGTTGAA ATTTTCAT TTTCAAAAAAACACAGCCAACT 96 NA

VH167 TAATGATATAGCAGAAAGAC ATGCAAAT TAGGCCACCCTCATCACATGAA 118 NA

VH105 GTAATGCACTGCTCATGAAT ATGCAAAT CACCTGGGTCTATGGCAGTAAA 114 159

VHlll GTAATGCACTGCTCATGAAT ATGCAAAT CACGCAAGTCTTTGGCAGTAAA 108 153

VH104 GAAGTACCCTGCTCATGAAT ATGAAAAT TACCCAAGTCTATGUTAGTMA 108 153

VH108 AAAGTCCCCTGCTCATGAAT ATGCAAAT TACCGTTCTCTATGTTGGTTAA 109 154

VH101 TCTCTCAGGAACCTCCCCCA ATGCAAAG CAGUCCTCAGGCAGAGGATAAA 85 143

FIG. 3. Conserved octanucleotide sequences in the 5' flankingregions of immunoglobulin genes. Portions of the published nucleo-tide sequences of several V genes are aligned to demonstrate theselective conservation of the octanucleotides A-T-T-T-G-C-A-T andA-T-G-C-A-A-A-T in light chain and heavy chain genes, respective-ly. Point mutations within these conserved sequences are under-lined. The relative locations of the ATG start codon and of the GGTsplice junction at the 3' end of the first exon are indicated for eachgene. Murine light chain gene sequences are from this paper (70Z/3)and from refs. 20 (K2), 21 (M173B), 22 (M41), 23 (M167), 24 (T1 andT2), 25 (K41C), 26 (XA), and 27 (X11). The M11 data are from a cor-rected version (R. P. Perry, personal communication) of the se-quence in ref. 18. Human K light chain sequences are from ref. 5.The VH167 and VH101 sequences are from refs. 28 and 29, respec-tively; the remaining heavy chain data are from ref. 30. NA, re-quired sequence data not available.

protein. These rearrangements are not, however, sufficientto activate transcription of the K light chain gene. We havedemonstrated that untreated cells of the 70Z/3 line contain afully rearranged K light chain gene; the sequence of this genereveals no obvious anomalies that might interfere with itstranscriptional or translational function. Nevertheless, wecan detect virtually no transcripts derived from this gene inuntreated 70Z/3 cells. Expression of this gene can be in-duced by exposure to LPS, resulting in the accumulation of Klight chain mRNA in the cytoplasm (8) and of its precursortranscripts in the nucleus.

This observation implies that factors other than accuratejoining of the V and J loci are required to activate K lightchain gene transcription. Recently, attention has focusedupon events occurring within a small (<250 bp) region ofDNA closely linked to the CK locus (9, 15, 31-33). This re-gion, which lies -3.5 kb downstream from the site of tran-scriptional initiation, undergoes localized changes in chro-matin structure that correlate with transcriptional activity ofthe gene; in 70Z/3 cells, these chromatin changes occur afterexposure to LPS (9). Chung et al. (33) have observed that thenucleotide sequence of this region is homologous to that of

the enhancer elements of certain eukaryotic viruses-ele-ments that can act in cis over a distance of several kilobasesto increase the rate of transcription from cellular promoters(34-36). Changes in chromatin structure of the region nearthe CK locus, occurring as a result of LPS treatment, mayserve to activate an enhancer-like element at this site, whichcould, in turn, activate the promoter of the rearranged Vgene.Our analysis of the K light chain gene of 70Z/3 revealed a

previously unrecognized feature of the 5' flanking sequencesof immunoglobulin V genes. In every known instance, a dis-tinctive octanucleotide sequence occurs 150 ± 10 bp up-stream from the 3' end of the first exon: the sequence A-T-T-T-G-C-A-T is found at this location in light chain V genes,while the inverse (complementary) sequence A-T-G-C-A-A-A-T is characteristic of heavy chain V genes. Although adja-cent sequences on either side have diverged extensively inevolution, these octanucleotides have been selectively con-served among diverse V genes in at least two mammalianspecies. Of particular note, the octanucleotides are morestringently conserved than the A+T-rich region (TATA box)found =30 bp upstream from the initiation site in these (18)and other genes. Of the five mutations identified in Fig. 3,three occur in unrearranged genes cloned from embryonictissues; curiously, all five mutations represent purine-py-rimidine transversions.

In conjunction with the findings of Kelley et al. (18), ourobservations suggest a consensus structure for the 5' end ofan immunoglobulin gene (Fig. 4). This structure consists of afirst exon (coding region and 5' untranslated sequence) mea-suring 78 ± 2 bp in length, along with only two short regionsof sequence conservation in the 5' flanking DNA: a TATAbox homologue and the octanucleotide block, situated "30and 70 bp, respectively, upstream from the initiation site.Despite considerable variability in such parameters as thelocation of the ATG start codon and the length of the firstintervening sequence, essentially all published V gene se-quences conform to this archetypal pattern. We have uncov-ered no evidence that the octanucleotide sequences are sys-tematically associated with the 5' ends of genes encodingnonimmunoglobulin proteins (37), although the sequence A-T-T-T-G-C-A-T occurs at an intriguingly similar location inthe heavy chain gene of HLA-DR (38), a protein evolution-arily related to the immunoglobulins. The organization of thepseudogene promoter of the unrearranged CK locus differs inseveral respects from the consensus V gene structure andincludes no detectable homology to the octanucleotide block(39).The selective evolutionary conservation of the octanu-

cleotide block implies that it may serve a significant biologic

Heavy ATGCAAATLight ATTTGCAT

S- --

Init Splice

TATAK , ATG GGT

78 4 2bp 2bp

150 ±l0bp

FIG. 4. Consensus structure of the 5' ends of immunoglobulingenes. The first exon, comprising the signal coding sequence (cross-hatched) and 5' untranslated regions, extends from the transcrip-tional initiation site (Init) to the first donor RNA splice junction, adistance of 78 ± 2 bp (18). An A+T-rich region (TATA box) is locat-ed '30 bp upstream from the initiation site. The octanucleotideblock, situated 150 ± 10 bp upstream from the 3' end of the firstexon, exhibits one of two complementary sequences: A-T-T-T-G-C-A-T in light chain genes or A-T-G-C-A-A-A-T in heavy chain genes.The remainder of the 5' flanking sequence varies widely among dif-ferent immunoglobulin genes.

Biochemistry: Parslow et aL

Dow

nloa

ded

by g

uest

on

July

28,

202

0

Page 5: Structure genes: Anovel sequence - PNAS · Proc. Nad.Acad. Sci. USA Vol. 81, pp. 2650-2654, May1984 Biochemistry Structure ofthe 5' endsofimmunoglobulingenes: Anovel conserved sequence

2654 Biochemistry: Parslow et al.

function. The nature of this function, however, remains ob-scure. In every V gene studied, the octanucleotide lies up-stream from the site of transcriptional initiation; consequent-ly, it is not transcribed and cannot be directly involved inevents that occur subsequent to transcription. Recently, sev-eral laboratories have presented evidence that factors uniqueto B lymphoid cells are essential for transcriptional activa-tion of V gene promoters (32, 40-43). Some of these B-cellfactors may be required to activate the enhancer-like activityof sequences downstream from the V gene. Our findingsraise the additional possibility that certain tissue-specificfactors may recognize distinctive sequences at the 5' terminiof these genes and, thereby, select the appropriate site fortranscriptional initiation. Alternatively, factors binding tothe octanucleotide sequences may serve to modulate the rateof immunoglobulin gene transcription during various stagesof B-lymphocyte ontogeny. Clearly the most remarkableproperty of the octanucleotide block is the fact that, at thisunique site, the sequences of light chain genes are preciselycomplementary to those of heavy chain genes. We proposethat this distinctive sequence element may have a role in mo-lecular events that require discrimination between these twodiverse classes of genes, perhaps serving as a binding site fortranscription factors mediating the coordinate regulation ofheavy chain and light chain gene expression.

We thank C. Katzen and C. Caldwell for technical assistance, M.Granner for software, and K. R. Yamamoto for reviewing the manu-script. This work was supported in part by National Institutes ofHealth Grant AM25295 and by funds from the Veterans Administra-tion. T.G.P. was supported by Medical Scientist Training GrantGM07337. D.K.G. was a Veterans Administration Medical Investi-gator.

1. Seidman, J. G. & Leder, P. (1978) Nature (London) 276, 790-795.

2. Max, E. E., Seidman, J. G. & Leder, P. (1979) Proc. Natl.Acad. Sci. USA 76, 3450-3454.

3. Early, P., Huang, H., Davis, M., Calame, K. & Hood, L.(1980) Cell 19, 981-992.

4. Wu, T. T. & Kabat, E. A. (1970) J. Exp. Med. 132, 211-240.5. Bentley, D. L., Farrell, P. J. & Rabbits, T. H. (1982) Nucleic

Acids Res. 10, 1841-1856.6. Paige, C. J., Kincade, P. W. & Ralph, P. (1978) J. Immunol.

121, 641-647.7. Maki, R., Kearney, J., Paige, C. J. & Tonegawa, S. (1980) Sci-

ence 209, 1366-1369.8. Perry, R. P. & Kelley, D. E. (1979) Cell 18, 1333-1339.9. Parslow, T. G. & Granner, D. K. (1982) Nature (London) 299,

449-451.10. Maniatis, T., Hardison, R. C., Lacy, E., Lauer, J., O'Connell,

C., Quon, D., Sim, G. K. & Efstratiadis, A. (1978) Cell 15,687-701.

11. Benton, W. D. & Davis, R. W. (1977) Science 1%, 180-182.12. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65,

499-560.13. Parslow, T. G., Milburn, G. L., Lynch, R. G. & Granner,

D. K. (1983) Science 220, 1389-1391.

14. Thomas, P. S. (1980) Proc. Natl. Acad. Sci. USA 77, 5201-5205.

15. Parslow, T. G. & Granner, D. K. (1983) Nucleic Acids Res.11, 4775-4792.

16. Nelson, K., Mather, E. & Perry, R. P. (1984) Nucleic AcidsRes. 12, 1911-1923.

17. Perry, R. P., Kelley, D. E., Coleclough, C., Seidman, J. G.,Leder, P., Tonegawa, S., Matthyssens, G. & Weigert, M.(1980) Proc. Natl. Acad. Sci. USA 77, 1937-1941.

18. Kelley, D. E., Coleclough, C. & Perry, R. P. (1982) Cell 29,681-689.

19. Max, E. E., Maizel, J. V. & Leder, P. (1981) J. Biol. Chem.256, 5116-5120.

20. Nishioka, Y. & Leder, P. (1980) J. Biol. Chem. 255, 3691-3694.21. Max, E. E., Seidman, J. G., Miller, H. & Leder, P. (1980) Cell

21, 793-799.22. Seidman, J. G., Max, E. E. & Leder, P. (1979) Nature, (Lon-

don) 280, 370-375.23. Selsing, E. & Storb, U. (1981) Cell 25, 47-58.24. Altenburger, W., Steinmetz, M. & Zachau, H. G. (1980) Na-

ture (London) 287, 603-607.25. Heinrich, G., Traunecker, A. & Tonegawa, S. (1984) J. Exp.

Med. 159, 417-435.26. Hozumi, N., Wu, G. E., Murialdo, H., Roberts, L., Vetter,

D., Fife, W. L., Whiteley, M. & Sadowski, P. (1981) Proc.Natl. Acad. Sci. USA 78, 7019-7023.

27. Wu, G. E., Govindji, N., Hozumi, N. & Murialdo, H. (1982)Nucleic Acids Res. 10, 3831-3843.

28. Clarke, C., Berenson, J., Goverman, J., Boyer, P. D., Crews,S., Siu, G. & Calame, K. (1982) Nucleic Acids Res. 10, 7731-7749.

29. Kataoka, T., Nikaido, T., Miyata, T., Moriwaki, K. & Honjo,T. (1982) J. Biol. Chem. 257, 277-285.

30. Cohen, J. B., Effron, K., Rechavi, G., Ben-Neriah, Y., Zakut,R. & Givol, D. (1982) Nucleic Acids Res. 10, 3353-3370.

31. Weischet, W. O., Glotov, B. O., Schnell, H. & Zachau, H. G.(1982) Nucleic Acids Res. 10, 3627-3645.

32. Queen, C. & Baltimore, D. (1983) Cell 33, 741-748.33. Chung, S.-Y., Folsom, V. & Wooley, J. (1983) Proc. Natl.

Acad. Sci. USA 80, 2427-2431.34. Gruss, P., Dhar, R. & Khoury, G. (1981) Proc. Natl. Acad.

Sci. USA 78, 943-947.35. Baneri, J., Rusconi, S. & Schaffner, W. (1981) Cell 27, 299-

308.36. Wasylyk, B., Wasylyk, C., Augereau, P. & Chambon, P.

(1983) Cell 32, 503-514.37. Dayhoff, M. O., Chen, H., Hunt, L. T., Barker, W. C., Yeh,

L.-S., George, D. G. & Orcutt, B. C., eds. (1983) Nucleic AcidSequence Database (National Biomedical Research Founda-tion, Washington, DC).

38. Das, H. K., Lawrance, S. K. & Weissman, S. M. (1983) Proc.Natl. Acad. Sci. USA 80, 3543-3547.

39. Van Ness, B. G., Weigert, M., Coleclough, C., Mather, E. L.,Kelley, D. E. & Perry, R. P. (1981) Cell 27, 593-602.

40. Gillies, S. D., Morrison, S. L., Oi, V. T. & Tonegawa, S.(1983) Cell 33, 717-728.

41. Banerji, J., Olson, L. & Schaffner, W. (1983) Cell 33, 729-740.42. Falkner, F. G. & Zachau, H. G. (1982) Nature (London) 298,

286-288.43. Stafford, J. & Queen, C. (1983) Nature (London) 306, 77-79.

Proc. Nad Acad ScL USA 81 (1984)

Dow

nloa

ded

by g

uest

on

July

28,

202

0