Molecular analysis of the bicoid gene from Drosophila

11
The EMBO Journal vol.9 no.9 pp.2977 - 2987, 1990 Molecular analysis of the bicoid gene from Drosophila pseudoobscura: identification of conserved domains within coding and noncoding regions of the bicoid mRNA Mark A.Seeger' and Thomas C.Kaufman Howard Hughes Medical Institute and Program in Genetics and Cellular, Molecular and Developmental Biology, Department of Biology, Indiana University, Bloomington, IN 47405, USA 'Present address: Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA Communicated by F.C.Kafatos The specification of anterior positional information during Drosophila embryogenesis is largely dependent upon the function of the maternal-effect gene bicoid (bed). Two aspects of bed function are particularly striking. First, the bcd protein product forms a gradient during early embyrogenesis, which regulates the transcription of at least one zygotic segmentation gene, hunchback, in a concentration dependent manner. Secondly, formation of the bcd protein gradient is dependent upon the specific localization of bcd mRNAs at the anterior end of the oocyte/embryo during oogenesis, a process which requires a cis-acting 625 nucleotide sequence within the 3' untranslated region of the bed mRNA. We have cloned and sequenced the bcd gene from Drosophila pseudoobscura as a tool in identifying important functional domains within this transcription unit. DNA sequence comparisons reveal: (i) varying degrees of amino acid sequence conservation among the proposed functional domains of the bcd protein, (ii) the conservation of potential RNA secondary structures within the bcd mRNA localization element, and (iii) the maintenance of a short open reading frame within the 5' untranslated leader that may play a role in trans- lational regulation. Finally, the D.pseudoobscura bed gene partially rescues the phenotype of a bcd- mutation when placed into the D.melanogaster genome by germline transformation. The lack of full phenotypic rescue can be explained in part by the observed improper localization of the D.pseudoobscura bed mRNA when expressed in D.melanogaster. Key words: bicoid/Drosophilalhomeobox gene/maternal- effect gene/RNA localization Introduction The presence of two organizing centers that exert long-range influences on positional information along the antero- posterior axis of insect embryos has long been postulated (see Sander, 1976 for a review). The experimental basis for this proposal originates from mechanical manipulations of insect embryos that disrupt antero-posterior axis specification. These experimental results are best rationalized in terms of morphogenetic gradients originating from the organizing centers at the termini of the early embryo (Sander, 1959, 1976; Yajima, 1960; Kalthoff, 1979; Frohnhofer Oxford University Press et al., 1986). More recently, the genetic analysis of several maternal-effect mutations in Drosophila, which exhibit global effects on antero-posterior positional values, have lent further support to these gradient models (see Nusslein-Volhard et al., 1987 for a review). In particular, the elegant genetic and molecular analysis of the maternal-effect gene bicoid has provided the first direct evidence for a diffusible morpho- gen that directs embyronic pattern formation. The bicoid(bcd) gene product is required for the specification of anterior positional values in the Drosophila melanogaster embryo. Females mutant for the bed gene produce embryos that lack all head, thoracic and anterior abdominal segments with a concomitant duplication of telson derivatives at the anterior end of the embryo (Frohnhofer and Niisslein-Volhard, 1986). Cytoplasmic transplantation experiments suggests that bcd' activity is located at the anterior pole of the early embryo in a graded fashion with the highest concentrations of bcd' activity located at the extreme anterior end (ibid). These results, as well as others, led Frohnhofer and Nusslein-Volhard (1986) to propose that bcd' activity is the primary organizer of anterior pattern in the early embryo and that this activity is present in the form of a gradient. The molecular analysis of bed has confirmed this hypothesis. The bed protein product is first observed in cleavage stage embyros where it forms a gradient with the highest concen- trations located at the anterior end of the embryo (Driever and Nusslein-Volhard, 1988a). Alterations of the blastoderm fate map induced by changes in bed gene dosage or the use of different mutant backgrounds (i.e. swallow) are precisely correlated with changes in the shape of the bed protein gradient (Driever and Nusslein-Volhard, 1988b). These results indicate that anterior positional values are specified by the bed protein product in a concentration dependent manner. A direct role for bed protein in the specification of anterior identity is exemplified by its interaction with the zygotic gap gene hunchback, where bed protein binds specific regions to the hunchback promoter and activates transcription in a concentration dependent manner (Driever and Niisslein-Volhard, 1989; Driever et al., 1989a; Struhl et al., 1989). These observations thus demonstrate a direct link between maternally supplied positional information and spatially restricted zygotic gene transcription. The bed protein gradient appears to form by diffusion of the newly translated protein product from a highly localized source of bicoid mRNA. The bed mRNA is localized specifically at the anterior pole of the embryo throughout early embryogenesis (Frigerio et al., 1986; Berleth et al., 1988). This asymmetric distribution is established during oogenesis when bed mRNA is transcribed in nurse cell nuclei and transported to the oocyte, where it appears to become 'trapped' at the anterior end of the developing oocyte (ibid). Macdonald and Struhl (1988) have identified a cis-acting 625 nucleotide fragment within the 3' untranslated region of the bed mRNA that is both necessary and sufficient for the 2977

Transcript of Molecular analysis of the bicoid gene from Drosophila

The EMBO Journal vol.9 no.9 pp.2977 - 2987, 1990

Molecular analysis of the bicoid gene from Drosophilapseudoobscura: identification of conserved domainswithin coding and noncoding regions of the bicoidmRNA

Mark A.Seeger' and Thomas C.Kaufman

Howard Hughes Medical Institute and Program in Genetics andCellular, Molecular and Developmental Biology, Department ofBiology, Indiana University, Bloomington, IN 47405, USA

'Present address: Department of Molecular and Cell Biology,University of California, Berkeley, Berkeley, CA 94720, USA

Communicated by F.C.Kafatos

The specification of anterior positional informationduring Drosophila embryogenesis is largely dependentupon the function of the maternal-effect gene bicoid (bed).Two aspects of bed function are particularly striking.First, the bcd protein product forms a gradient duringearly embyrogenesis, which regulates the transcriptionof at least one zygotic segmentation gene, hunchback, ina concentration dependent manner. Secondly, formationof the bcd protein gradient is dependent upon the specificlocalization of bcd mRNAs at the anterior end of theoocyte/embryo during oogenesis, a process which requiresa cis-acting 625 nucleotide sequence within the 3'untranslated region of the bed mRNA. We have clonedand sequenced the bcd gene from Drosophilapseudoobscura as a tool in identifying importantfunctional domains within this transcription unit. DNAsequence comparisons reveal: (i) varying degrees of aminoacid sequence conservation among the proposedfunctional domains of the bcd protein, (ii) theconservation of potential RNA secondary structureswithin the bcd mRNA localization element, and (iii) themaintenance of a short open reading frame within the5' untranslated leader that may play a role in trans-lational regulation. Finally, the D.pseudoobscura bed genepartially rescues the phenotype of a bcd- mutation whenplaced into the D.melanogaster genome by germlinetransformation. The lack of full phenotypic rescue canbe explained in part by the observed improperlocalization of the D.pseudoobscura bed mRNA whenexpressed in D.melanogaster.Key words: bicoid/Drosophilalhomeobox gene/maternal-effect gene/RNA localization

IntroductionThe presence of two organizing centers that exert long-rangeinfluences on positional information along the antero-posterior axis of insect embryos has long been postulated(see Sander, 1976 for a review). The experimental basis forthis proposal originates from mechanical manipulationsof insect embryos that disrupt antero-posterior axisspecification. These experimental results are best rationalizedin terms of morphogenetic gradients originating from theorganizing centers at the termini of the early embryo (Sander,1959, 1976; Yajima, 1960; Kalthoff, 1979; Frohnhofer

Oxford University Press

et al., 1986). More recently, the genetic analysis of severalmaternal-effect mutations in Drosophila, which exhibit globaleffects on antero-posterior positional values, have lent furthersupport to these gradient models (see Nusslein-Volhardet al., 1987 for a review). In particular, the elegant geneticand molecular analysis of the maternal-effect gene bicoidhas provided the first direct evidence for a diffusible morpho-gen that directs embyronic pattern formation.The bicoid(bcd) gene product is required for the

specification of anterior positional values in the Drosophilamelanogaster embryo. Females mutant for the bed geneproduce embryos that lack all head, thoracic and anteriorabdominal segments with a concomitant duplication of telsonderivatives at the anterior end of the embryo (Frohnhoferand Niisslein-Volhard, 1986). Cytoplasmic transplantationexperiments suggests that bcd' activity is located at theanterior pole of the early embryo in a graded fashion withthe highest concentrations of bcd' activity located at theextreme anterior end (ibid). These results, as well as others,led Frohnhofer and Nusslein-Volhard (1986) to propose thatbcd' activity is the primary organizer of anterior patternin the early embryo and that this activity is present in theform of a gradient. The molecular analysis of bed hasconfirmed this hypothesis.The bed protein product is first observed in cleavage stage

embyros where it forms a gradient with the highest concen-trations located at the anterior end of the embryo (Drieverand Nusslein-Volhard, 1988a). Alterations of the blastodermfate map induced by changes in bed gene dosage or the useof different mutant backgrounds (i.e. swallow) are preciselycorrelated with changes in the shape of the bed proteingradient (Driever and Nusslein-Volhard, 1988b). Theseresults indicate that anterior positional values are specifiedby the bed protein product in a concentration dependentmanner. A direct role for bed protein in the specificationof anterior identity is exemplified by its interaction with thezygotic gap gene hunchback, where bed protein bindsspecific regions to the hunchback promoter and activatestranscription in a concentration dependent manner (Drieverand Niisslein-Volhard, 1989; Driever et al., 1989a; Struhlet al., 1989). These observations thus demonstrate a directlink between maternally supplied positional information andspatially restricted zygotic gene transcription.The bed protein gradient appears to form by diffusion of

the newly translated protein product from a highly localizedsource of bicoid mRNA. The bed mRNA is localizedspecifically at the anterior pole of the embryo throughoutearly embryogenesis (Frigerio et al., 1986; Berleth et al.,1988). This asymmetric distribution is established duringoogenesis when bed mRNA is transcribed in nurse cell nucleiand transported to the oocyte, where it appears to become'trapped' at the anterior end of the developing oocyte (ibid).Macdonald and Struhl (1988) have identified a cis-acting 625nucleotide fragment within the 3' untranslated region of thebed mRNA that is both necessary and sufficient for the

2977

M.A.Seeger and T.C.Kaufman

anterior localization of the bcd mRNA product. In addition,the products of two maternal-effect genes, swallow andexuperentia, have been shown to be required in trans forproper localization of bcd mRNA (Frohnhofer and Niisslein-Volhard, 1987; Stephenson et al., 1988).

Interspecific comparisons are a powerful approach for theidentification of functional domains within proteins, mRNAs,and their corresponding cis-regulatory regions (Blackmanand Meselson, 1986; Kassis et al., 1986; Henikoff andEghtedarzadeh, 1987; Wilde and Akam, 1987; Colot et al.,1988; Treier et al., 1989). To learn more about the bcdprotein product and its regulation, as well as the nature ofthe cis-acting sequences involved in bcd mRNA localization,we have cloned and sequenced the bcd gene fromD.pseudoobscura. The approximately 46 million years ofdivergence between these two species has allowed ample timefor unconstrained sequences to diverge completely (Beverelyand Wilson, 1984). This evolutionary comparison hasidentified a number of highly conserved regions/sequenceswithin the bcd gene, providing insights into the regulationand function of the bed product.

ResultsIdentfication of the bcd gene from D.pseudoobscuraThe bcd gene was isolated from D.pseudoobscura duringthe cloning of the Antennapedia gene complex (ANT-C)from this species (F.Randazzo, M.Seeger, C.Huss andT.Kaufman, unpublished data). Hybridization of theD.pseudoobscura ANT-C chromosomal walk with DNArestriction enzyme fragments encompassing the bed locusfrom D.melanogaster identified a 4.2 kb EcoRI fragmentthat encodes the majority of the D.pseudoobscura bed gene(Figure 1). Additional studies suggest that the regionimmediately adjacent to the bed locus in these two species

2 kt

Dpso 310

Dpso 401

has been remarkably conserved with regard to the relativeorganization of the adjacent genes, amalgam and zerknullt,and the intergenic distances between these genes (see Figure1 and Seeger et al., 1988 for a comparison).The genomic region encompassing the D.pseudoobscura

bed transcription unit was sequenced in its entirety, and thestructure of its transcript was inferred from homologies withthe D.melanogaster gene. The use of a dot matrix DNAsequence comparison illustrates the relative organization ofthe bed gene between these two species and leads to severalgeneral conclusions (Figure 2). First, the general exon/intronorganization of the bcd transcription unit has beenmaintained; however, the length of the D.pseudoobscuragene is considerably shorter, 2.9 kb versus 3.65 kb forD.melanogaster. This difference in length arises principlyfrom a decrease in the sizes of introns one and three from

- 500 bp each in D.melanogaster to <100 bp apiece inD.pseudoobscura. Finally, protein coding regions of the bedtranscription unit are generally more conserved thansequences within the 5' or 3' untranslated regions, and theseuntranslated regions are more conserved than the introns oradjacent nontranscribed sequences (Figure 2).We have examined the introns and putative 5' regulatory

regions for DNA sequence conservation using severaldifferent computer alignment programs. Several wellconserved regions were identified within each of the threeintrons (Figure 3). However, the specific sequences of theseconserved intronic regions are not particularly striking andtheir functional significance is unclear. Within the 400 bp

D. pseudoobscurax

7]7 II I II __

x

I

ama

500 bpL-

p Bgl p

_I I II I I II IE ,,j Smrr Pv Bgl H Pv E

Fig. 1. Molecular organization of the bcd region fromD.pseudoobscura. A restriction enzyme map along with the extent oftwo X phage clones from the D.pseudoobscura ANT-C chromosomalwalk are shown in the top portion of the figure. Restriction enzyme

fragments that hybridize to appropriate cDNA clones fromD.melanogaster are indicated. The bottom portion presents a detailedrestriction map of the 4.2 kb EcoRI fragment which encodes thetranscribed regions of the D.pseudoobscura bcd gene. The open boxbelow the restriction map denotes regions that were sequenced on bothDNA strands while the single line represents single strand sequence.

Bgl, Bgll; BgII, BglII; E, EcoRI; H, HindHI; Nr, NruI; P, PstI; Pv,PvuII; Sm, SmaI; X, XbaI.

2978

Fig. 2. Dot matrix homology comparison of the bcd genomic regionfrom D.melanogaster and D.pseudoobscura. The dot matrix homologycomparison was generated using the COMPARE and DOTPLOTprograms of the UWGCG DNA sequence analysis package. Fourteenmatches or better over a 21 bp window (66%) are required to make a

dot. The organization of the D.melanogaster and D.pseudoobscura bcdgenes are indicated with the structure of the D.pseudoobscura bedgene inferred from homology to D.melanogaster. Protein codingregions are shaded.

Hind II[

Eco RI

.wr bcd

-~~~~~~~

N-

IW .w

Analysis of the D.pseudoobscura bicoid gene

_TC-AAAA-CTC7TTTCTACG-CAAAA-GCAA-AAAAGC~AAGTS~TC-AAAATAAGCCAGTCAAGTTCAGAATAG=TAATTAATCGG'ATATAATTAT- ,~ J892---T----A-C----C--909

OAO-GOCTOAAAAATATCTAGCCAGTAA . A77CAGCGCAATATCAAAAICGAGGAATATGGTTTGCCAATCCTTGACGGAGAACGATTAACCCA-T77-A

AOCCATCGAAAATATTGAAAA-GAATCAOTTTATCGGAGGTGAAAGAATTCGTGTTTCTCAATTAAAGAAAGACAC00GGATGCATATOACGAGAA-AA1088-C ---C---T----A-A--11C06

1: CACAAAAC7tATTATTTTCCGCCATTGACCACAAGTC,GTTAAACCGTTTAAATAGAGG~GGGCTTAAGTGGGAATATCGATCACACTGCGTATAACIGGZA1071----TT--A--T------ 1088 1/230---------

*MetLeuGluArgEnaCCTTGA00ATGTAAATATATATGTATGTCG GAGATGCGCAAAATGCTGGAAAGATAA ...........GTCTCGAACGTAACCGGTTACCCCAACAAATT 49^-TC--C-S---CG--GCAGT ..... G-ATC-CAAA-A---------TG--CG-A----OGTTCGCAAGC--------A ..---G-A-A--TA-C

---TrpAlaLys---

AOCOAOTA-ACCATTTTCCCGCTCCCAATTT0AAATTGGA0A0GGGCCAATOAAAAACGAGOGCCTAAAGTCGCOOGCCGTGGTGCAGGAGGACGASCAS7 073-A---A-T---CACC---TT-G--ATT----CGCGC--CGC----AA-T-CGT--A--TAACCCCCCOCC O...........................-CG-CC5-ACCTC

MetAlaGln,ProProPr.A.pGln,AsnPheTyrHis.Hl.ProLeuProHisThrHls7hrHi!sP,o .................................................ProHisProHisProH-'sDro: 2

-T'-------- --------------------------------C-----T--G-----C-----G--------T---CAT------T----T---------

H .s2roH0i.H0sProH sPr.HOisG.IHisProG1nLeu.G:GnL.euProPr.GinPheArgA.nProPheAspL..u>>>>>0introCtno-ATC-CCCACCATCCTCATCCTrCATCAACATCCGCAGCTGCAATTGCCGICCACAATTTCGGAATCCCTTTGATTTGGTAAGAATATTTCGTTTAA.AGCAAG--CT-G--T-CG-AC-CA-A---CC . ----T--G--------------C--A--------C---------1582---Se---PoHlsroHi------------------------------------------------------ 1877--G----A--G---T5G,-C

.ntron l<<<<-<LeuP;eAspG:u,Arg:hrG' yAlaIieAsnTyrAsnTyrI b.ArgProT.yrLeu,ProAs,GSAAAAGCTGGTGTAATGTGTGATC'CTTGATCGTGTAAAGCTTT~TCGATGAGCGAACGGGGGCCATTAACTACAATTATATCCGTCCCTATCTGCCCAACC

985----ACC.-------TC-T-G.--------------------------G--A--------C--C--A-----G-------------3-02902-------------------------------------------------__-_--_-_--_-__-_--_-_--___ _

-,M.tP,oLysP,.G>>>>>..,ntr,o, 2 znrn2<<<<<1u,G.,uLe,Pr.A!;.S,AGATGCCCAAGCCCGGTGAGTCGCTAAACAC02AGAAA00CACAGAAA0TG0GACA0TTCTAAA0CCCATCCCCCGCTCCACAGAGGAGC00GCCT0A-C-------------A------2220 2224------A-2231 2265--T-------------C--C--

-CeOCValMetAr0gArgProArgArThrArgTh4ThrPheTC0SerSerG.:0eA0a1u:_.e_0Gl00GCnHisPheLe00GLnG.yArg-yr8e0 aCsTGGTGATGCGACGGCCACGACGCACTCGCACAACATTCACCAGCAGTCAGATCGC,-GAGTTG~GAGCAGC:ATTTCCT.GCAGGGACGC'TACCTGACC'GCAT--------- ----G--A ----- T-----C-- ----C--T--T-------TC ---A--A--A --- C---------- C-s- T---- -------A -----C--A --C'

P2r.ArgLe,Al.aAspL..S.rAl.LysLeu;A'-a_euGlyTh,rA';aG'n,Va;_.ysI.eTrpPh,e_ysAsn.ArgArgArgArgHisLys:'.eGinS.rAsp,GZCACGACTGGCCGATCTATCGGCCAAACTGGCT:TTGG5CACGGCCCAGGTGAAGATCTrGGTTCAAGAATCGTCGTCGTCGCCACAAAATCCAATCGGAT_---C--- --T-7-G-----G--A--G------A--CC---------A- ------- -------A -----7T-----C-----G-----TT-----S--------- -- ---

nHirLysAspGlr.SerTyrAspG,yMetPr.1-euSerProGlyLeuLysThrSerGl ;G:yAspProProSerLe.GlnAsnL.UTIrLeUG yG!yGI e84AG'CACAAG GATCAGTCCTACGATGGCATGCCATTGTCGCCGGGCCTCAAGACCAGCGAGGGAGACCCGC CCAGTCTGCAGAATCTCACTTTGGGC GoAGG, 2 P- -- --- - ---c----------G- -G_ -- --- TC-C--------- TA-G--ACAG------T--C--T--C-- ---CT------C- --T-GC-- ---- T--- ---

-------------------- G!~-----------------------------Gin---Asp---------------------Thr---Ser------------

yA:aTh rProAsn,AlaLeuThrP roSerP roThrProSerAlaThrThrAlaH isLeuVa lGluH isTyrGlyGluTh rPheAsn,AlaTyrTyrAsrTny r 2: SSGCC~ACACCAAATGCCTTGACTC CCTCGCCCACCCCTTCGGCAACGACAGCCCACCTGGTGGAGCACTATGGTGAGACATTCAATGCATACTACAAC TA-:ZA - - ---G--C--C--T--------G--A--- --G--C--AA-GC-C--T--A---A--AC---------CA-C---T-------C--C------- -- -- -

- ---------------------------------------- h,Pro-------------- hr ---------Se,---S.r----------------------

A.nH sGl1yHi'sG 1yGl1 nAlaG 1 nG 1yG 1nArgH isVal1Gl1yHisValH isGl yGln,TyrSe rG lyAlaP roG 1 yS.rGlnA.snGl yAiaGl nPhePheG 2 wAATCATGGCCA GGCCAGGCTCAGGGCCAGCGCCATGTGGGCCACGTCCATGGACAGTATTCGGGGGCGCCAGGGTCCCAGAATGGAGCA,-AGTTCTTTC ;4 8---GGA-----CAAT--C--C----C-A-T--T--CA--CA-ATGCAGT--CCTTCCGGAGG----C-AGG-CCTGGGTC--CC-AAT-TCA-TGG-55,C----G.y-------AsnHis------AlaAs --------MetHisMetGln7y,ProSer]GlyGy---ProGlyProGlySerThrAsn,Va jAsnGf~yS.y-

nTh rGl nGl rGin,Gl,Gl nL.uH isGln,G:n,GlnGlnGl nGlnProProHisH isHisGlnAsnHisGl~nGlnGlnG iGnGinG!Hi sLeuH~.i sH'isGl 28 4AGACACAACAGCAGCAGCAGCTC CATCAGCAGCAGCAACAGC AGCCGCCACACCACCATCAGAATCATCAGCAGCAACAG CAGCAGCACCTGCACCATCA :5S8--77CTTC------------G-----A-T--C-----G--A-T--AC...........................--Ph,, he- --V l---A -,i--------- Le His..

-.-AC--5GGG--------- -----C---------CAG------------GCT-----G---C-A ................................................... --C-----CHi,GJG !0y1------------------------Gln------------Ala.---------G. ---------

POCeA CCpPOC0G0lnGlC4Ly.Th0rAlCSe.rAlaCysArgVa. ... Va.LysA.pGluP roG uAlaAspT0yrAsnPheAs0nA.nSerTyr0yrMe0AgSe0rA i:'TGATTTCCAACAAAAGACGCGCCAGTGCCTG;TCGGGTG .. GTCAAGGACGAGCCGGAGGCCGACTACAACTTTAACAACTCGTATT.ATATGCGC'TCGG 7 94-- ---C--- --G-------CAA-----C---------C--CCTG--- ----- ---A-- -------- --- -- - ----C- ---G------C--C------A - -- -- -- -------- -- -----GI --- -----------------Le u-- ---- -- ---- -------------- -- ------ ----Ser---------------- --G

.a eS.,G,yV 'G I yV. !AaI AlA: Al AiAl,aA' aAlaThr.AlaA:.Pr.GIyThrAl Ser~SerA:ValA IaA I aA' aVa.SerA;aA.aG:'~CCTGTCTGGTGTTrGGTC,TCGCCGCC-GCAGCT.GCTGCTGCT7GCTACAGCC-GCACCCGGAACAGCATCGTCGGCCGTTGCGGCGGCGGTGTCGGCGG=CGGGAAA-------C-CCAC--CAT-G--AT-C------TG--CCGA-SCG-T--C.C...................................................'yMt0c------ Al.ThrAlaSTC CSC C---VCr---CAC0GlyArl...... .. ..............................................

ySl w;Va.VaiThrSerA' a'-e,,SerProGlySerGluValTyrGluProL.euThroLsnAp:SrProLyAnsG-SrrSerLeuCys.........................................................Gly'-eG:y-CASGTGGTSIACG CTGCCCT-GTC'GCCCSGGCTCC'GAAGTCTATGAGICCATTAACACCCAAAAATG'ACGAGAGTCCCAGTCTATGT ......................................................... GGCATTGGC

Z-.yP roCysAl.aTh rAlaV.iG1 yA.pTh rAspIl eAiaA.pAspMetA.pA.pGlyTh rTh rA.snLysLysThrTh rThrL..Gl n>>>>> i,t ron, 3 44 4GCCCCTCGCGCCACGGCTGTCGGCGACACCGACATAGCCGACGACATGGACGATGGGACGACCAACAAGAAGACGACGACGCTACAGGTGAGCGAGCGGA 2078--A--T-------TC--C--T-----G-----GGCG------------------C--A ---------- ------C--3219

intron 3<<<<<AsnLeuGluProLeuLysSerHisThrVa-V 454-CGATCTCCTTCCGACATTCCTCCTCCACTAGATGATCCTTGTTTCTTCCATCCC TCCCACATGTAGAACCTGGAGCCGCTAAAGAGCCATACG CTTG 2178

.7--------3248 3734----CT----A----G....4.17-A ---A-----AAT---C-G-3562 l.e------------.-.-. -- ----

.lVa1GlyLeuA.pLy.SerCy.AspA.pGlySeSerAspA.pMetSerTrGlyMetArgVaLeuSerGly.........CArGCLy...AlaPheAi.0100 4837GGTSGG0CTAGACAAAAGC0GTGACGATGGCAGACAGC ATGACATGAGCACAGGCACATG CGGGCTGTCGGGTG. 00CGCGGA ... GCCTTTGC 2266

.Ly.s,eGezyLysProer.SYAaGc.yGl E.A..Gln,Pr.Pr.ProProProLeu,Gly .....................................MetMetHi.AspThrAsn,Gln,TyrGlnCysThrM.tAspCAAATTTGGCAAACCATCGGCCGGACAGGCCCAACCACCGCCCCCGCCATTGGGG ...... ATGATGCATGACACGAACCAATATCAATGTACGATGGAT---------- C-- -C-----A-G--CT-AG-------TCGGGA------CGGCG-GGCCC---GGC--AT----------------C ------------------- -------ProPc---0GlyProGl 00------0LeuGly1------GlyVaA1A LeuGl0ylSer------------------------

Th1r!;eM.tG'nAlaTyrAsnProHisArgAsnAlaG:yGiyAsnThrGlnPheAlaTyrCysPheAsnEndACGATAATGCAAGCGTATAATCCGCATCGTAA.GC0COGGAGGC0AATACACAGTTTGCCTACTG0CTTCAACTAGTAGTATTCCCGGGGTAGATTTATTTATA-----------------------C-----G--C----CG-----CT-G ------------------r---T ..... CC-GGAC-A-AG-CG-GT-AG-G----------- --------------------------A1la ------Ser------------------------

SACTTAAGOCOGAAAAACAACCAO-A.CCTGT--.CCOGATTGTACAAAGAC0CAAATTGATTG0TAGATATC0TGCGCGTAGTTATT ......... TGTAAGCC-CT-C-TTA-CCCT-GGTCA-CCA------.------------------.--G---------------A-------AA-G-TAGGTCCAG-CCT-AGA

TGAATTGTAAATTTGTTCTTAGTGTTTTTATGTACTAGCCTAGTCAGCAGGCGGCAC. .ATTTCAATGATTCTTCAAGTGATACTATATATATTTCTCAC-CAG-GTA--TGG-TCC-A ----A-----------------------CAG--GCA-A-CC T---------CC-GT--AC- .A.-.

C CTTATCGGCAACTTTCTCTGCCATAc--CG---GCTTCGTTTGTTTTTGGTCGGGTGTTCCCCCAAAAGCCCAGGGATACCCTGCTAAAACCAAACGAGAT--C-- AC-A- . ----C- --GC--C-A-C-.--A-CT------G- -A--C-G -AA-C-GTTA--rTT--

GTGCTTTTATTGGGCTTGAACGOTATGCAGAGAAAGTTCT.TTCCCACATTT ... TAAGOTTCCTCAAAAGCAAAGGAATCTCTGTACTTTAACAAATTTTG-CCA-- -C-A--C-C-A--G----TA--- G------- A-CG ---------CGGA---r-AA-CCG----C-GCA-GAAC-OC-T--c-Gr3-c-A----

TCCATAGCTAAAACCTGTACTTATA CGAAA7GCCAAAATST-TC CCOTGGTTGGCTA.CATCATACAAAACTGTATTATTACCAGCTAAAGATTGAACATTTAAAA-------- ---A----- --A ---A-- ---A ----------------GC-rA-Cr------ --TGA-TrACCCr-A-AG-T-G--CATr--A----A-A- r-

CAATrTCCGAA ........... TAC S CA7-SACAAAAGCGS TGGCTGTTTCGACATTTGGACATTCTTTCTACAAGAAATGTCAGCAAATTGTC TGCCTTTSA TATG-TA TTTTCAA T77C- - - - .-- -'- -----STS _-C-A-A-AA-CAA---- - --CG----------GAC-------TA---------- ----A-Trrr

AAGACAAACCA .AACAT . TGTTT vG . C A- aA:--- T:-A- GAAT-ATTTATATTATATCATATATTTATATCCTAAGTTTCCATTTGAATATCACTT¢-AT TG--G-C----------AGAT-_. --................................T--T--GC-GCC-A-CGGA AC---CATAGCrCACATrC-A-rr-c-C------A

CAAGTTGCATTGAAATCTGCTTAGCTTCGA---S'-A' ---AAGAACCAC~GSTGCACCATTGTCATATGTTCTCTTTTATAAGCTATTTTTTGTGTGTAAAAG-AGA-----C------- T-AG--C--A-,-- .A-T ------.... A-T-C------ T---ATG-TGTC---T-AAGC-TT-.--------4 793

S-AATTCTTTGGGAATA7'ATAAACAAC'A-A. GA-A --;-e-T-WAAAAAs-TG-.GCCACCAGGCGCGTATGTTGAAAATCTTTCTGAACGAAAACTCGCGT,G4803---------4811]

Fig. 3. Sequence of the bcd genomic region from D.pseudoobscura and its homology with D.melanogaster. The D.pseudoobscura sequence isnumbered consecutively since the transcription initiation site has not been precisely defined. The D.melanogaster sequence is numbered as in Berlethet al. (1988). The D.melanogaster sequence is in italics, with dashes representing identities and dots representing gaps. Stars indicate sequences thatare homologous to the three major transcription initiation sites which have been defined for D.melanogaster (Seeger, 1989). The underlined regioncorresponds to a conserved polyadenylation consensus sequence.

of 5' upstream sequences that we have sequenced, threeregions exhibit greater homology than the surrounding DNA(Figure 3). Two of these conserved regions, at +30 and+210, are located at similar positions in both species relativeto the transcription start site and may represent cis-actingregulatory elements.

A short open reading frame within the 5' leaderregion is conservedComparison of the 5' leader sequences between the twospecies reveals the conservation of a short open readingframe (+443 of the D.pseudoobscura sequence in Figure3). This open reading frame encodes four amino acids in

2979

,L.LP-HlsTh A.,, HI V-Pr.Hi.G I ,M.tGl,,A!aG I nG InGlnG I,,Gl,,G!nG InGlnGl,,G InGl,,G!nGl,,G1,GlnG1, Le,TyH-'.h .

ACTGCCGCACACAAACCACGTGCCGCATCAGATGCAGGCCCAGCAACAGCAGCAGCAGCAACAGGAGCAACAACAA,lAGCAACAGCAACTGTATCA7CA7

4 C ',

7 488

8 49 8 1

9.a

3i18.68

38:3884

415

5 52360

5382460

2 5 51

2 64 8

2 7 4 8

2 8 4 4

2 94 3

3 0 3 0

3'- 2 9

3 2 2 9

3 329

3 4 2 9

M.A.Seeger and T.C.Kaufman

both D.melanogaster and D.pseudoobscura and is precededby a favorable translation initiation consensus sequence[CAAAAUG in both species versus the consensus(C/A)AA(A/C)AUG; Cavener, 1987]. While the presenceof this open reading frame and translation initiation sequenceare highly conserved, the primary amino acid sequence hasdiverged completely (Figure 3). The conservation of thepresence of this small open reading frame but not its primarysequence suggests that the bed protein is translationallyregulated, perhaps in a manner similar to the yeast GCN4protein (Mueller and Hinnebusch, 1986).Two aspects of the normal bed expression pattern are

suggestive of translational regulation. First, bcd mRNA ispresent from stage 8 of oogenesis onward; however, bedprotein is not detectable until the cleavage stages ofembryogenesis (Frigerio et al., 1986; Berleth et al., 1988;Driever and Niisslein-Volhard, 1988a). Thus, there must besome mechanism by which translation of this message isblocked throughout oogenesis and early embryogenesis.Secondly, it is possible that formation of the bed proteingradient requires more than simple diffusion from a localizedmRNA source and that translational regulation is acomponent of this process. The role, if any, that this smallconserved open reading frame plays in either of theseprocesses will require direct experimental testing.

Three additional regions within the bed 5' leader sequencesexhibits greater conservation than the surrounding sequences(Figure 3). These include the regions surrounding the majortranscription initiation site (+392-406), the translationinitiation site (+601), and a block of extensive homologyfollowing the short open reading frame discussed previously(+458-478). While conservation of sequences involved intranscriptional and translational initiation is expected, thefunction and significance of the third conserved region isunclear. Given the proximity of this third region to theconserved small open reading frame, these sequences mightalso be involved in some aspect of translational regulation.

Conservation of the bcd proteinComparison of the amino acid sequences of the bed proteinfrom these two species indicates that the proteins share 81 %amino acid identity overall, and this homology increases to86% if conservative amino acid substitutions are included(Figure 3). The D.pseudoobscura protein is 49 amino acidslarger than the D. melanogaster bed protein, with most ofthis difference in size found within the central region of thebed protein as an expansion of opa-like repeats or other'simple' sequences (see Figure 3 and 4). The extent ofsequence divergence is not constant across the length of theopen reading frame. The bed protein, as deduced from theD. melanogaster amino acid sequence, can be divided intosix discrete domains, and each of these different domainsexhibit varying degrees of sequence conservation betweenthe two species (summarized in Figure 4).The homeobox domain confers the DNA binding activity

of the bed protein. As noted previously, the bed protein bindsto regulatory regions of the hunchback gene in a sequencespecific manner, an activity which is mediated by thehomeobox (Driever and Niisslein-Volhard, 1989; Drieveret al., 1989a; Hanes and Brent, 1989; Struhl et al., 1989).The amino acid sequence of the homeodomain, including

- 50 amino acids amino-terminal, is identical between thetwo species (Figure 4). A similar degree of conservation has

\ ~~~~~~~~* 1 \U ...... | ,- - | -

iMhI FOBOY "I ST )PA' A-!' '':lr f'! CO'. .A,;Z E

iFAfz 4.n.r P-:t

,},> .Ccl 2:'0i :;{:0l 100'! 190

Fig. 4. Conservation of the bcd protein relative to proposed functionaldomains. (Top) Conservation and divergence between theD.melanogaster and D.pseudoobscura coding regions. Eachnonconservative amino acid substitution is indicated by a line, deletionsby a triangle facing upwards, and insertions by a triangle facingdownward. The number of amino acids included in insertions anddeletions are indicated. Conservative changes were as in Treier et al.(1989) (Bottom) Various proposed functional domains of the bcdprotein are shown (see text).

been observed for the engrailed homeobox betweenD. melanogaster and D. virilis (Kassis et al., 1986) and forthe zinc finger domains of the hunchback protein inD. melanogaster and D. virilis (Treier et al., 1989). Thisabsolute conservation is particularly striking given thechanges seen in other regions of the bed protein andpresumably reflects strong constraints upon this primaryamino acid sequence for maintaining the specificity andfunction of the bed homeobox.An acidic domain is found near the carboxyl-terminal end

of the bed protein (amino acids 345 -414), and this regionhas been well conserved in D.pseudoobscura, especially withrespect to its acidic nature (16 acid and five basic residuesin D. melanogaster versus 16 acidic and four basic residuesin D.pseudoobscura; Figures 3 and 4). Driever and Niisslein-Volhard (1989) originally proposed that this acidic domainmay be required for function of bed protein as a tran-scriptional activator; however, more recent studies suggestthat the situation is much more complex (Driever et al.,1989a,b; Struhl et al., 1989). Truncations of the bed proteinthat do not include this acidic domain are capable ofactivating transcription in a variety of assays. These deletionstudies demonstrate that the region immediately carboxyl-terminal to the homeobox is capable of activating tran-scription, although the extent of this region that is requiredis dependent upon the particular assay utilized (see Drieveret al., 1989b; Struhl et al., 1989). While the acidic domainis not absolutely required for transcriptional activation, theinclusion of this domain within the bed protein generallyresults in more efficient transcriptional activation. Thus, theconservation of this domain in D.pseudoobscura presumablyreflects the importance of this region and supports thehypothesis that this acidic region is required for maximalbed activity.Two domains within the bed protein are enriched for a

particular amino acid or pair of amino acid residues. Thefirst such domain is the PRD repeat characterized by thereiterated combination of amino acids histidine and proline(see Figure 4). Although the function of such a repeat isunclear, it has been found in a number ofDrosophila proteins(Frigerio et al., 1986). Six amino acid changes and one dele-tion have occurred between D. melanogaster andD.pseudoobscura within the 30 amino acids comprising thePRD repeat. However, none of these changes decrease the

2980

;- q pPF'PFAT

Analysis of the D.pseudoobscura bicoid gene

proline and histidine content of this region and, in fact, twoof these changes result in the replacement of a serine inD.melanogaster by a proline in D.pseudoobscura (Figure3). In general, the PRD domain has been well conservedboth in amino acid composition and size. The second domainis comprised of stretches of polyglutamine also known as'opa' repeats (Wharton et al., 1985). The opa repeats havenot only been maintained in D.pseudoobscura, but there hasbeen an expansion of these glutamine-rich regions by - 13glutamine residues (Figure 3). Variation in the size of opa-like elements appears to be a common phenomenon as it hasalso been observed for both engrailed and hunchback (Kassiset al., 1986; Treier et al., 1989).PEST domains are associated with proteins that exhibit

short half-lives and are thought to act as signals for rapidprotein degradation (Rogers et al., 1986; Rechsteiner, 1987).The bcd protein encodes one striking PEST sequencespanning amino acids 170-203 (Driever and Niisslein-Volhard, 1988a). Consistent with the presence of this PESTdomain, the bcd protein has an estimated half-life of< 30 min (Driever and Nuisslein-Volhard, 1988a). Since therate of proteolytic degradation is an important variable inestablishing a stable protein gradient, it is not surprisingthat this region, with only four nonconservative substi-tutions occurring within this domain (Figures 3 and 4),has been well conserved between D.melanogaster andD.pseudoobscura.

Rebagliati (1989) has noted that the bcd protein encodesa potential RNA recognition motif (RRM). RRMs are foundin some, but not all, RNA and single-stranded nucleic acidbinding proteins. The motif is typified by a highly conservedocatamer flanked by additional conserved regions, whichtogether encompass - 80 amino acids (Adam et al., 1986;Query et al., 1989). The presence of an RRM within thebcd protein would not be totally unexpected. The bcd proteinproduct is necessary for formation of the caudal proteingradient during early embyrogenesis, which unlike the bcdprotein gradient forms from a homogeneously distributedmRNA source (Macdonald and Struhl, 1986; Mlodzik andGehring, 1987). It has been suggested that the bcd proteinmay interact directly with the caudal mRNA and regulateits translation (Macdonald and Struhl, 1986). While theD.melanogaster sequence (amino acids 382-469 inD. melanogaster) conforms with the RRM consensus at 30different residues (Rebagliati, 1989), the homologous regionof the D.pseudoobscura bcd protein (amino acids 428-518;Figure 3) fits the consensus at only 20 positions (Figures3 and 4). More importantly, there has been a deletion ofone amino acid within the highly conserved octamer in D.pseudoobscura (see D.pseudoobscura amino acids 480-486;Figure 3).These data raise doubts about the functional significance

of the RRM homology in D. melanogaster, althoughalternative explanations are possible. We would- expectfunctional relationships between early pattern formationgenes such as bcd and caudal to have been maintained duringthe evolutionary divergence of D. melanogaster andD.pseudoobscura. However, in the absence of directevidence that bcd protein is required for proper caudalregulation in D.pseudoobscura, it remains possible thatfunctional constraints for an RRM domain within theD. melanogaster protein do not apply to the D.pseudoobscurabcd protein. Alternatively, primary amino acid sequence

D. melanogaster

GG1.M.tPr.LysPrOA spValPheftoSorG 1uG1.Le.ProA*pCAGATGCCCAAGCCAG GTGAGCTC ... AAAGCCAACAAAGTCAGCCATCGTCTTAT. CAGATGTCTTTCCCTCAG AGGAGCTGCCCGAClfil III IIIII 11I11 11 III I111 1 11 11 II III 1111111111 11CAGATGCCCAAGCCCG GTGAGTGCTGAAACACGCAGAAAACACAGAAACTGTGACATTTCTAAAGCCCATTCCCCCGCTCCACAG AGGAGCTGCCTGATGlnMetPr.Ly.ProG 1uG1uL.uPr.A9p

EXON 2 ||EXON 3

D. pseudoobscura

Fig. 5. Comparison of the alternate splicing pathways between exonstwo and three. Alignment of exon 2, intron 2 and the beginning ofexon 3 from D.melanogaster (top) and D.pseudoobscura (bottom)with the potential splicing pathways in both species indicated. Whilethe D.melanogaster bcd gene produces two protein products whichdiffer by five amino acids due to alternate splicing at this intron/exonjunction, the D.pseudoobscura bcd gene can produce only a singleprotein which corresponds to the smaller of the two D.melanogasterprotein products.

constraints on a bcd RRM may be different from thosedescribed for other RRMs.

Alternate splicing adjacent to the homeobox does notoccur in D.pseudoobscuraTwo different protein products of 489 and 494 amino acidsare produced from the D.melanogaster bcd transcription unitas a result of an alternate splicing event at the beginning ofthe third exon [see Figure 5 (Berleth et al., 1988; Seeger,1989)]. Interestingly, this five amino acid difference oocursjust amino-terminal of the bcd homeobox. Similar alternatesplicing events just upstream of homeoboxes have also beennoted for Ultrabithorax (O'Connor et al., 1988),Antennapedia (Bermingham and Scott, 1988; Stroeher et al.,1988), and labial (Mlodzik et al., 1988). Although thespecific function of these different protein forms is notknown, it would seem reasonable that heterogeneity near thehomeobox may influence their DNA binding specificity orpotential interactions with other transcriptional regulatoryproteins.Given the prevalence of alternate splicing upstream of the

homeobox and its potential functional implications, it is quitestriking that the generation of these two bcd protein productsby the use of the alternate acceptor splice site does not occurin D.pseudoobscura (Figure 5). Examination of this regionindicates that the five amino acids that are unique to the largerprotein product of D. melanogaster are not conserved inD.pseudoobscura, although the amino acids specifiedupstream of the splice donor in exon two and downstreamof the second acceptor site for exon three are identicalbetween the two species (Figure 5). Moreover, only onesuitable acceptor splice site sequence is found within thisintron in D.pseudoobscura, contrary to the situation inD.melanogaster (Figure 5). While it remains uncertain,whether the two D. melanogaster bcd protein products havedistinct functions it is clear that the short form homolog issufficient for bcd function in D.pseudoobscura.

Comparison of the bcd mRNA localization elementA 625 nucleotide segment of the 3' untranslated region ofthe bcd mRNA has been defined that is both necessary and

2981

M.A.Seeger and T.C.Kaufman

sufficient in cis for anterior localization of the bcd mRNAduring oogenesis (Macdonald and Struhl, 1988). Thepotential for this region to form extensive RNA secondarystructures was noted by these authors (ibid) who proposedthat such secondary structures might represent the cis-actingelements that are recognized by the trans-acting bcd mRNAlocalization factors. Determining whether a predicted RNAsecondary structure forms in vivo is problematic; oneapproach involves the use of interspecific comparisons.Compensatory changes in stem-loop structures overevolutionary time are considered to be indicative that aparticular RNA structure forms in vivo and is functional (seePace et al., 1989). With these considerations, we haveanalyzed the bcd mRNA localization element forconservation of RNA secondary structures.To identify putative RNA secondary structures within the

localization element, we have utilized a RNA foldingprogram that permits the examination of the thermo-dynamically most optimal folding as well as a number ofsuboptimal RNA secondary structures (Zuker, 1989).Typically there are numerous foldings, which can bestructurally quite distinct from each other, within 5-10%of the computed free energy for the optimal folding and arethus just as probable to form in vivo (ibid). The top ten RNAsecondary structures were examined for both theD. melanogaster and D.pseudoobscura localization elementswith the suboptimal foldings in both species varying by <2%relative to the minimum free energy calculated for the 'mostoptimal' folding in each species. When the predicted RNAsecondary structures for both species were examined, onepair exhibited a striking amount of overall similarity (Figure6). Six different elements of a predicted RNA secondarystructure spread over the entirety of the 625 nucleotide regionare highly conserved between these two species (Figure 6Aand B). This conservation of secondary structure is notsimply the result of sequence homology since the 3'untranslated regions of the bcd gene between these twospecies exhibit only 66% identity to each other (Figure 3).In addition, when percent DNA sequence homology betweenthe two species is plotted across the mRNA localizationelement, no correlation between sequence homology and theparticipation of these sequences in RNA secondary structuresis observed (Figure 6C).The conservation of RNA secondary structure becomes

more striking when individual stem - loop structures areexamined in greater detail. For instance, stem -loopstructure V (see Figure 6) has maintained precisely 30 pairednucleotides in the stem and five nucleotides within the looplength despite extensive sequence divergence (Figure 7).Similar 'Y' structures are found at the end of structure I withsix paired nucleotides found in the left portion and four inthe right (Figure 6). This precise structure has beenmaintained despite major sequence changes among the pairednucleotides. Finally, stem-loop structure II also exhibitsremarkable conservation of secondary structure. This regionhas sustained at least 43 nucleotide substitutions of which34 maintain the stability of the stem -loop structure viacompensatory changes or G -U pairing. Also note theanalogous positioning of a 'bulge' located approximatelyhalf-way up the stem on the right side in both species (Figure6). A similar motif is also found in RNA secondary structureIV (Figure 6). Equivalent examples of secondary structureconservation despite extensive divergence of the primary

A

B

C

D. melanogaster

h:-:Z'I,I

V

D. pseudoobscura

NN

m J °°c t : g)6

y

-ITI,| ,1 11, ;'s ! <--V-lllll.YlgV-l~~~~~Iv8 b | a 0t b j:a 11 w|b r

Fig. 6. Conserved RNA secondary structures are found within the bcdRNA localization element. Computer generated RNA secondarystructures for the D.melanogaster bcd RNA localization element (A)and the homologous region in D.pseudoobscura (B). Conservedstem-loop structures are labeled 1-VI. Open circles are found every50 nucleotides and the arrows in panel A represent the extent ofdeletions which disrupt the function of the RNA localization element(see text). Panel C plots the percent homology between these twospecies across the RNA localization element (using a window of ninenucleotides and a computer generated sequence alignment). The extentof the various stem-loop structures is indicated linearly below thegraph with 'a' and 'b' representing the two halves of a stem region.Loop regions are shown as shaded areas.

sequence can be found for the other stem -loop structuresas well (data not shown).The elements of this conserved RNA secondary structure

that are recognized by the trans-acting localization factorsare not clear. It is possible that several, all, or none of thesestem-loop structures are involved. Although in vitromutagenesis of individual stem-loop elements, which is nowpossible, will be required to define their function precisely,the deletion analyses of the bcd mRNA localization elementby Macdonald and Struhl (1988) do provide some insight.A terminal deletion of 120 nucleotides at the 5' end of thelocalization element or of 150 nucleotides at the 3' endabolishes bcd mRNA localization activity. These deletionswould minimally disrupt stem -loop structures I, V and VI,respectively (the arrows in Figure 6A indicate the extent ofthese deletions). Thus, it seems likely that at least structuresI and V and/or VI are required for bcd mRNA localization.

2982

Analysis of the D.pseudoobscura bicoid gene

Dmel

a

g c

U CU -AU -AC-GU -AU -AA -U

aC -Gg *U

ca

c -gG -CU -AU -AU -AU -A

A -UC -GA -U

au

a-ua -u

c ca

U -Aa -u

a ca

U -Aa -uc -g

a9-cu -ac -gU -Ag -ctG -C

3688 3775

Dpso

c a

U C

U -AU -AC -G

aU -AU -AA -UC -Ga -Ug -c

a

G - C9U -AU -AU -AU -A

A -UC -GA -Ug- c

uc gc

c

U -Aau g

U -Ag- c

u -au -a

c a

g- c

aU -Au -aG -C

2972 3045

Fig. 7. Conservation of RNA stem-loop structure V. A detailedcomparison of the sequence and predicted RNA secondary structure forthis stem-loop structure is illustrated (also see Figure 6). conservednucleotides, as predicted by a computer generated DNA sequencealignment, are shown in capital letters while diverged sequences areshown in lower case. Potential base pairings are indicated. Despiteextensive sequence divergence, the number of paired nucleotidesincluded in the computer generated RNA secondary structure for bothspecies is identical (30), as well as the number of nucleotides in thepredicted loop region (five).

The D.pseudoobscura bcd gene provides partialfunction in D.melanogasterTo determine if the D.pseudoobscura bcd gene is functionalin D. melanogaster, we cloned the D.pseudoobscura bcd geneinto a P-element transformation vector and then inserted thisconstruct into the D.melanogaster genome by germlinetransformation. The D.pseudoobscura rescue constructincludes -2 kb DNA sequences both 5' and 3' of thetranscription unit (see Materials and methods). This constructshould include all of the necessary cis-acting regulatorysequences, since a similar D. melanogaster bcd constructwhich includes less 5' and 3' flanking sequences (seeMaterials and methods) completely rescues the femalesterility associated with a strong hypomorphic bcd allele (datanot shown).

Multiple transformant lines of the D.pseudoobscura bcdconstruct were recovered and tested for their ability to rescuethe female sterility associated with the bcdRDl mutation, astrong hypomorphic allele (Seeger, 1989). While none ofthe transformant lines were able to restore female fertilitywhen crossed into the bcdRDl background, all of theexamined lines did exhibit partial rescue of the bcd-

phenotype. Embryos from homozygous bcdRD1 femalestypically display a complete deletion of head and thoracicsegments, a disruption of anterior abdominal segments, anda duplication of telson derivatives at the anterior end(Figure 8B). The D.pseudoobscura bcd tranformants exhibitrestoration of normal thoracic and abdominal segmentationalong with a suppression of the mirror image telsonduplication (Figure 8C). Head segmentation continues to bedisrupted in these embyros, although some partial rescue isevident by the production of an incomplete and disorganizedcephalopharyngeal apparatus (Figure 8C). In addition,fusions or disruptions of the fourth and fifth abdominalsegments are often observed in these partially rescuedembyros.

This partial phenotypic rescue can be detected at theblastoderm stage by monitoring the distribution of hunchbackprotein. The hunchback protein is normally expressed in twodomains at the late blastoderm stage, an anterior domainfrom 55 % to 100% egglength and a posterior domain from10% to 20% egglength (see Figure 8D; Tautz, 1988). Thesedomains correlate to a large extent with the regions that areaffected in hunchback- embryos, which exhibit a deletionof the labial and thoracic segments along with a partialdeletion of the seventh and eighth ablominal segments(Bender et al., 1987; Lehmann and Niisslein-Volhard,1987). Proper expression of the anterior hunchback domainis dependent upon bcd protein, such that embryos frombcd- females do not form an anterior hunchback domainbut instead form a duplicated posterior domain at the anteriorend (Figure 8E; Tautz, 1988). Consistent with the restorationof normal thoracic segmentation in the D.pseudoobscura bcdtransformants, the anterior domains of hunchback expres-sion is generated in embryos from P [D.pseudoobscurabcd']; bcdRD' females (Figure 8F).We have examined four different D.pseudoobscura bcd

tranformant lines in detail, and all four lines exhibit the sameexpressivity or degree of phenotypic rescue. However, thesefour lines do differ in the penetrance of the rescuedphenotype with the frequency of rescued embryos varyingfrom 4 to 24%. For example, transformant line 1 produces24% partially rescued embryos when crossed into a bcdaRDlbackground, while line 2 generates only 4% partially rescuedembryos. This variation among transformant linespresumably represents position effects on the bcd promoterby the surrounding insertion site sequences, thereby leadingto differences in the amount of D.pesudoobscura bcd productproduced. Thus, a certain threshold amount ofD.pseudoobscura bcd product is required to initiate thephenotypic rescue that is observed. We propose that properactivation of anterior hunchback expression respresents thisthreshold event, since the phenotypic rescue that is observedin the D.pseudoobscura bcd transformants can be explainedalmost entirely by the restoration of anterior hunchbackexpression. Previous studies indicate that on/off regulationof hunchback transcription occurs over a very limited rangeof bcd protein concentration or activity (Driever andNiisslein-Volhard, 1989; Driever et al., 1989a; Stru et al.,1989). This on/off regulation of the hunchback promoter byslightly different bcd protein concentrations could explainthe observed variability in penetrance among the trans-formant lines.The phenotype of the partially rescued embryos is very

reminiscent of the swallow and exuperentia mutant

2983

M.A.Seeger and T.C.Kaufman

ED

a

Gp

F

P

Ha l

a

*. :.."....*::. :

Al..::2

^ e

I

Fig. 8. Partial phenotypic rescue of embryos from bcd- females by the D.pseudoobscura bcd protein. (A-C) Cuticle preparations of embyros fromwild-type females (A), bcdRDI females (B), and P [D.pseudoobscura bcd+]; bcdRDI females (C). Note that the D.pseudoobscura bcd protein restoresnormal thoracic segmentation but does not rescue the deletion of more anterior defects that are seen in embryos from bcd- females. The arrow inpanel C points to a fusion of the fourth and fifth abdominal segments which is common in this genotype. a, abdominal segments; cp, cephalo-pharyngeal skeleton; t, thoracic segments; te, telson. (D-F) Distribution of hunchback protein in embryos from wild-type females (D), bCdRDIfemales (E), and P [D.pseudoobscura bcd+] bcdRDI females (F). a, anterior hunchback domain; p, posterior hunchback domain. (G-I) Distributionof D.pseudoobscura bcd mRNA in D.pseudoobscura embyros (G), in a cleavage stage D.melanogaster transformant embryo (H), and in a lateblastoderm stage D.melanogaster transformant embryo (I). While the D.pseudoobscura bcd mRNA is localized to the anterior end inD.pseudoobscura embyros, it is not localized when expressed in D.melanogaster. The embryo in panel H shows even distribution of theD.pseudoobscura mRNA except for some slight exclusion from the anterior tip (this anterior exclusion is not a consistent characteristic, but wasselected to best illustrate the even distribution). This even distribution of signal was significantly greater than controls that did not contain theD.pseudoobscura bcd transformant gene. Late blastoderm embryos (I) show ectopic expression of the D.pseudoobscura bcd mRNA over the dorsalsurface in a zerknallt-like pattern (see text). Embryos are not necessarily to scale and in appropriate cases, anterior is left and dorsal up.

phenotypes, mutations which disrupt bcd mRNA localization(Frohnhofer and Nusslein-Volhard, 1987; Berleth et al.,1988; Stephenson et al., 1988). This suggested that the lackof complete rescue by the D.pseudoobscura bcd gene couldarise by two very different mechanisms: (i) failure to localizeproperly the D.pseudoobscura bcd mRNA when expressedin D. melanogaster, or (ii) partial function of theD.pseudoobscura bcd protein in D. melanogaster. Todistinguish between these possibilities, we examined thespatial distribution of the D.pseudoobscura bcd mRNA inthe D. melanogaster transformants. This was done in wholemount embryos according to the procedure of Tautz andPfeiffle (1989) using digoxigenin-labeled single strand DNAprobes that hybridized specifically to either the

D. melanogaster or D.pseudoobscura bcd mRNA (seeMaterials and methods). While both the D.melanogaster andD.pseudoobscura bcd mRNAs were localized to the anteriorend of the early embryos in their respective species (Figure8G), we observed that the D.pseudoobscura bcd mRNA wasnot properly localized when expressed in D. melanogaster(Figure 8H). Thus, the lack of complete rescue by theD.psuedoobscura bcd gene is due at least in part to theabsence ofRNA localization. Given the lack of proper RNAlocalization, it remains possible that the D.pseudoobscurabcd protein cannot specify identities anterior to thehunchback domain. Experiments that allow for properlocalization of the D.pseudoobscura bcd mRNA inD.melonogaster will be required to assess fully the functional

2984

Analysis of the D.pseudoobscura bicoid gene

I I I IB S XS

Dmel

B

Dpso

E X

Fig. 9. Sequences included in the D. melanogaster anD.pseudoobscura bcd transformation constructs. NoteD.pseudoobscura construct includes more material bKthe transcribed sequences than was incorporated intoD.melanogaster construct. B, BamHI; EcoRI; S, Sat

activity of the D.pseudoobscura bed proD. melanogaster embryo.

In addition, we have observed ectopic e)D.pseudoobscrua bed mRNA in a zerknduring the late blastoderm stage of our differ(lines (Figure 81). Since this expression pattccur in D.pseudoobscura and has been obsertransformant lines examined (data not shownreflects some novel activity of a zerknullt en]which was included within the transformatithe D.pseudoobscura bed promoter (see FiSimilar ectopic influences of a zerknullt enhave been observed previously (Doyle et cprisingly, this ectopic expression does not haphenotypic consequences, since no phenotyjbed- phenotype is observed when the P [Dbed'] gene is inherited paternally (data n(The failure to localize the D.pseudoobsc

properly in D. melanogaster is consistent wiobservations that we have made of these traFirst, the fusions and disruptions of abdomin;and five that are occasionally observed in e[D.pseudoobscura bed']; bedRD1 females atin embryos from P [D.pseudoobscurafemales. This result is consistent with ectopiD.pseudoobscura bed protein. Secondly, wethe penetrance of rescue from our transfodecreased dramatically over a periodInterestingly, this decrease in penetrancecorrelated with a decrease in the accumulatinally contributed D.pseudoobscura bed mRNectopic zerknullt-like expression continues tcWe believe this reflects the accumulation ofbackground of these transformant lines thslightly deleterious effects of ectopic exlD.pseudoobscura bed protein. Since these tr;were initially maintained without difficultyous effects must be minor.

ConclusionsIn an attempt to learn more about the bedcloned and sequenced the homologoiD.pseudoobscura. The 46 million years

1 kb between D. melanogaster and D.pseudoobscura has allowedample time for unconstrained DNA sequences to divergecompletely (Beverley and Wilson 1984; also see Blackman

X E and Meselson, 1986; Henikoff and Eghtedarzadeh, 1987;* Wilde and Akam, 1987; Aguade, 1988; Colot et al., 1988).

Thus, by identifying conserved regions within the bcd pro-tein and mRNA, we can identify and learn about functionaldomains within these molecules. It must be stressed,however, that evolutionary comparisons are only suggestive

E E B and future experimental analysis will be required tocorroborate these findings. Despite this limitation, inter-specific comparisons represent a powerful tool.

It is apparent that different regions of the bcd protein havedifferentially accumulated changes and that these regions

id coincide with various identified domains (summarized ine that the Figure 4). The homeodomain is identical at the amino acidoth 5' and 3' of level in both species, indicating that the specific amino acidthe sequence of this domain is critical to its function.

I; X, Xbal. Additionally, both the PRD repeat and the acidic domain

tein within the at the carboxyl terminus of the bed protein are substantiallyconserved, again suggesting an important role for these

rpression of the regions. In contrast, the central portion of the bed proteindllt-like pattern has undergone substantial divergence in both sequence andent transformant length. These observations are consistent with the hypothesiscrn does not oc- that the central region of the bed protein serves simply asved in all of the a hinge between other functional domains (i.e. homeobox), it presumably and acidic domains) and thus has few constraints on itshancer element, sequence and length.cn construct, on At the protein level, the most striking difference betweenigures 9 and 1). these two species is the absence of one bed protein variantghancer element in D.pseudoobscura. While D. melanogaster produces twoii., 1989). Sur- bed proteins that differ by five amino acids just amino-yve any apparent terminal to the homeodomain, the D.pseudoobscura genepic rescue of the produces only the smaller bed protein. This difference is'.pseudoobscura particularly striking since the amino acid sequenceat shown). surrounding this region has otherwise been absolutelyura bed mRNA conserved (Figure 4). Given the proximity of this variationith several other to the homeobox, these differences might be expected to havensformant lines. profound influences on the function or specificity of theal segments four homeodomain. However, no functional differences have yetmbryos from P been described for the two D. melanogaster bed proteinre also observed forms. The larger protein form, which is not produced byi bcd']; bed' D.pseudoobscura, is apparently not necessary for theic activity of the transcriptional activation of the hunchback gene, since thehave noted that D.pseudoobscura bed gene is capable of proper hunchbacktrmant lines has regulation in D.melanogaster. It remains possible that theof 8 months. larger bed protein is required in D. melanogaster forappears to be activating genes that specify more anterior identities than

on of the mater- hunchback.{A, although the We have identified a short conserved open reading framea remain strong. within 5' leader sequences of the bed gene that is most likelymodifiers in the involved in translational regulation of the bed protein,iat suppress the perhaps in a manner analogous to the yeast GCN4 mRNApression of the (Mueller and Hinnebusch, 1986). Translational regulationansformant lines rather than simple diffusion may play some role in thethese deleteri- formation of the bed protein gradient during early

embryogenesis. In addition, it is notable that bed mRNA ispresent during much of oogenesis, while the protein productis not detectable until early embyrogenesis. Therefore, somemechanism must be masking this message from the

gene, we have translational machinery. Additional experiments directed atus gene from this small open reading frame will be required to define itsof divergence function precisely.

2985

M.A.Seeger and T.C.Kaufman

The potential for at least six RNA secondary structureswithin the bcd mRNA localization element has beenconserved between D. melanogaster and D.pseudoobscura.The conservation of these secondary structures, along withthe multitude of compensatory subsitutions, suggests quitestrongly that these secondary structures form and functionin vivo. The specific role of these secondary structures inbcd mRNA localization is less clear. While the deletionexperiments of Macdonald and Struhl (1988) would implicatestem-loop structures I and V and/or VI in the bcdlocalization process, it is not possible to assess the role ofthe other RNA secondary structures (II-IV) from existingdata. It is reasonable to propose that all these structuresfunction in bcd mRNA localization, although the possibilitythat they also play a role in some other, as yet undefined,process cannot be excluded.Given the extensive conservation of RNA secondary

structure within the bcd mRNA localization element, it wassurprising to find that D.pseudoobscura bcd mRNA is notlocalized when expressed in D.melanogaster. What thismeans relative to the function of these RNA secondarystructures is unclear. One interpretation, as mentionedpreviously, is that these RNA secondary structures are notrequired for RNA localization and that a different structureor sequence is required for this process. We favor thealternative explanation that these general RNA secondarystructures are important for the localization process;however, subtle changes have occurred betweenD. melanogaster and D.pseudoobscura such that theD.pseudoobscura structure is not efficiently recognized andlocalized by the D.melanogaster trans-acting factors. Ex-periments where chimeric RNA localization elements arecreated and tested in vivo will be required to distinguish bet-ween these possibilities. What remains certain is that theseRNA secondary structures function in a process that naturalselection has maintained.

Materials and methods

Isolation of the bcd region from D.pseudoobscuraThe D.pseudoobscura ANT-C was cloned from a D.pseudoobscura genomicDNA library (in the EMBL4 X phage vector and kindly provided byC.Langley) by entering at multiple points using standard low stringencyhybridization conditions and then extending these entry points bychromosomal walking techniques. The relative location of ANT-C membergenes was determined by hybridization of cDNA clones from theD.melanogaster gene products to Southern blots of phage from theD.pseudoobscura walk that were digested with various restriction enzymes.Southern blotting, subcloning and other molecular biological techniques were

performed using standard procedures (Maniatis et al., 1982).

DNA sequencing and analysisDNA sequencing was erformed by the Sanger dideoxy method (Sangeret al., 1977) using 3 S-labeled dATP and sequencing grade Klenow(Boehringer Mannhiem). Subclones of the D.pseudoobscura bed genomicregion were generated using existing restriction enzyme sites and the singlestrand M13 phage vectors (mpl8/mpl9). Additional sequence was generatedusing custom made 17mer oligonucleotide primers (Molecular BiologyInstitute, Indiana University). DNA sequence was analyzed using softwarefrom International Biotechnologies, Inc., the FASTP program of Lipmanand Pearson (1985), and the University of Wisconsin Genetics ComputerGroup (UWGCG) package. RNA secondary structures were generated usingthe programs of the UWGCG package and Zuker (1989).

Construction of rescue constructs and germline transformationGenomic fragments for germline transformation were cloned into the cosPerP-element transformation (kindly provided by V.Pirrotta) vector that utilized

the white gene for detection of tranformants. The w; A2-3 strain was usedas recipient for injections and as the source of P-element transposase(Robertson et al., 1988). Injections were performed according to Spradlingand Rubin (1982) with the modifications of Robertson et al. (1988). Isolatedgermline transformants were mobilized to new chromosomal locations usingthe A2-3 chromosome according to Robertson et al. (1988). TheD.melanogaster bed transformation construct included a 5.8 kbEcoRI-BamHI fragment (see Figure 9) from the bcd genomic region(Berleth et al., 1988; Seeger, 1989). This construct, which completelyrescues the bcd- phenotype, contains -3 kb less 5' sequences than theP-element construct utilized by Berleth et al. (1988), thus further definingthe sequences required for D. melanogaster bcd expression. TheD.pseudoobscura bed transformation construct contains the 7.2 kb BamHIfragment (see Figures 9 and 1). In both transformation constructs the bcdgene was oriented such that promoter sequences were adjacent to the P-element ends.

Antibody staining and whole-mount in situAntibody staining of embryos was as described in Seeger et al. (1988). Theaffinity-purified rabbit anti-hunchback antisera was kindly provided byM.Bender. In situ hybridizations to whole embryos utilized digoxigenin-labeled single strand DNA proves (Genius kit, Boehringer Mannheim)according to the procedure of Tautz and Pfeiffle (1989) with modificationssuggested by N.Patel. The D.melanogaster bed single strand antisense probewas generated from 565 bp of the 3' untranslated region [sequences4295-4860, see Berleth et al. (1988)] while the correspondingD.pseudoobscura probe comprised 1036 bp of 3' untranslated sequences(sequences 2440-3476, see Figure 3). Both probes fail to cross hybridizedetectably to the bed mRNA from the other species when using this wholemount technique.

Acknowledaements

We are grateful to L.Haffley for assistance with the DNA sequencing,S.Chouinard for isolating the germline transformants, Dr C.Langley andDr D.Cavener for the D.pseudoobscura genomic library, and Dr M.Benderfor supplying the hunchback antisera. We thank Dr K.Matthews for criticalcomments on the manuscript. M.A.S. would also like to thank DrT.Blumenthal, Dr J.Bonner, Dr M.Muskavitch, and Dr S.Strome forcomments on earlier forms of this manuscript. This work was supportedby a National Institutes of Health Predoctoral Fellowship (GM07757) toM.A.S. and an NIH grant (GM24299) to T.C.K.

ReferencesAdam,S., Nakagawa,T., Swanson,M., Woodruff,T. and Dreyfuss,G. (1986)

Mol. Cell. Biol., 6, 2932-2943.Aguade,M. (1988) Mol. Biol. Evol.,5, 433-441.Bender,M., Turner,F.R. and Kaufman,T.C. (1987) Dev. Biol., 119,418-432.

Berleth,T., Burri,M., Thoma,G., Bopp,D., Richstein,S, Frigerio,G.,Noll,M. and Nuisslien-Volhard,C. (1988) EMBO J., 7, 1749-1756.

Bermingham,J.R. Jr and Scott,M.P. (1988) EMBO J., 7, 3211-3222.Beverley,S.M. and Wilson,A.C. (1984) J. Mol. Evol., 21, 1-13.Blackman,R.K. and Meselson,M. (1986) J. Mol. Biol., 188, 499-515.Cavener,D. (1987) Nucleic Acids Res., 15, 1353-1361.Colot,H.V., Hall,J,C. and Rosbash,M. (1988) EMBO J., 7, 3929-3937.Doyle,H.J., Kraut,R. and Levine,M. (1989) Genes Dev., 3, 1518-1533.Driever,W. and Nusslein-Volhard,C. (1988a) Cell, 54, 83-93.Driever,W. and Nusslein-Volhard,C. (1988b) Cell, 54, 95-104.Driever,W. and Nuisslein-Volhard,C. (1989) Nature, 337, 138-143.Driever,W., Thoma,G. and Nuisslein-Volhard,C. (1989a) Nature, 340,

363-367.Driever,W., Ma,J., Nusslein-Volhard,C. and Ptashne,M. (1989b) Nature,

342, 149-154.Frigerio,G., Burri,M., Bopp,D., Baumgartner,S. and Noll,M. (1986) Cell,

47, 735-746.Frohnhofer,H.G. and Nusslein-Volhard,C. (1986) Nature, 347, 120-125.Frohnhofer,H.G. and Nuisslein-Volhard,C. (1987) Genes Dev., 1, 880-890.Frohnhofer,H.G., Lehmann,R. and Nuisslein-Volhard,C. (1986) J Embryol.

Exp. Morphol., 97, 169-179.Hanes,S.D. and Brent,R. (1989) Cell, 57, 1275-1283.Henikoff,S. and Eghtedarzadeh,M.K. (1987) Genetics, 117, 711-725.Kalthoff,K. (1979) Symp. Soc. Dev. Biol., 37, 97-126.Kassis,J., Poole,S., Wright,D. and O'Farrell,P. (1986) EMBO J., 5,

3583 -3589.

2986

Analysis of the D.pseudoobscura bicoid gene

Lehmann,R. and Nusslein-Volhard,C. (1987) Dev. Biol., 119, 402-417.Lipman,D.J. and Pearson,W.R. (1985) Science, 227, 1435-1441.Macdonald,P.M. and Struhl,G. (1986) Nature, 334, 537-545.Macdonald,P.M. and Struhl,G. (1988) Nature, 336, 595-598.Maniatis,T., Fritsch,E.F. and Sambrook,J. (1982) Molecular Cloning: A

Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold SpringHarbor, NY.

Mlodzik,M. and Gehring,W. (1987) Development, 101, 421-435.Mlodzik,M., Fjofe,A. and Gehring,W. (1988) EMBO J., 7, 2569-2578.Mueller,P.P. and Hinnesbusch,A.G. (1986) Cell, 45, 201-207.Nusslein-Volhard,C., Frohnhofer,H.G. and Lehmann,R. (1987) Science,

238, 1675-1681.O'Connor,M.B., Binari,R., Perkins,L.A. and Bender,W. (1988) EMBO

J., 7, 435-445.Pace,N.R., Smith,D.K., Olsen,G.J. and James,B.D. (1989) Gene, 82,65-75.

Query,C., Bentley,R. and Keene,J. (1989) Cell, 57, 89-101.Rebagliati,M. (1989) Cell, 58, 231-232.Rechsteiner,M. (1987) Biochem. Biophys. Res. Commun., 143, 194-198.Robertson,H.M., Preston,C.R., Phillis,R.W., Johnson-Schlitz,D.M.,

Benz,W.K. and Engels,W.R. (1988) Genetics, 118, 461-470.Rogers,S., Wells,R. and Rechsteiner,M. (1986) Science, 234, 364-368.Sander,K. (1959) Wilhelm Roux Arch. EntwMech. Org., 151, 430-497.Sander,K. (1976) Adv. Insect Physiol., 12, 125-238.Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA,

74, 5463-5467.Seeger,M.A. (1989) Molecular genetic analysis ofthe Zerdnll-bicoid interval

of the Antennapedia Complex in Drosophila melanogaster. Ph.D.dissertation, Indiana University, Bloomington, Indiana.

Seeger,M.A., Haffley,L. and Kaufman,T.C. (1986) Cell, 55, 589-600.Spradling,A.C. and Rubin,G.M. (1982) Science, 218, 341-347.Stephenson,E.C., Chao,Y.C. and Fackenthal,J.D. (1988) Genes Dev., 12,

1655-1665.Stroeher,V.L., Gaiser,C. and Garber,R.L. (1988) Mol. Cell. Biol., 8,4143-4154.

Struhl,G., Struhl,K. and Macdonald,P.M. (1989) Cell, 57, 1259-1273.Tautz,D. (1988) Nature, 332, 281-284.Tautz,D. and Pfeiffle,C. (1989) Chromosoma, 98, 81-85.Treier,M., Pfeifle,C. and Tautz,D. (1989) EMBO J., 8, 1517-1525.Wharton,K.A., Johansen,K.M., Xu,T. and Artavanis-Tsakonas,S. (1985)

Cell, 43, 567-581.Wilde,C.D. and Akam,M. (1987) EMBO J., 6, 1393-1401.Yajima,H. (1960) J. Embryol. Exp. Morphol., 8, 198-215.Zuker.M. (1989) Science, 244, 48-52.

Received on May 17, 1990

2987