Rn asynthesis

17
RNA Synthesis, Processing, & Modification 37 341 Daryl K. Granner, MD, & P. Anthony Weil, PhD BIOMEDICAL IMPORTANCE The synthesis of an RNA molecule from DNA is a complex process involving one of the group of RNA polymerase enzymes and a number of associated pro- teins. The general steps required to synthesize the pri- mary transcript are initiation, elongation, and termina- tion. Most is known about initiation. A number of DNA regions (generally located upstream from the ini- tiation site) and protein factors that bind to these se- quences to regulate the initiation of transcription have been identified. Certain RNAs—mRNAs in particu- lar—have very different life spans in a cell. It is impor- tant to understand the basic principles of messenger RNA synthesis and metabolism, for modulation of this process results in altered rates of protein synthesis and thus a variety of metabolic changes. This is how all or- ganisms adapt to changes of environment. It is also how differentiated cell structures and functions are estab- lished and maintained. The RNA molecules synthe- sized in mammalian cells are made as precursor mole- cules that have to be processed into mature, active RNA. Errors or changes in synthesis, processing, and splicing of mRNA transcripts are a cause of disease. RNA EXISTS IN FOUR MAJOR CLASSES All eukaryotic cells have four major classes of RNA: ri- bosomal RNA (rRNA), messenger RNA (mRNA), trans- fer RNA (tRNA), and small nuclear RNA (snRNA). The first three are involved in protein synthesis, and snRNA is involved in mRNA splicing. As shown in Table 37–1, these various classes of RNA are different in their diversity, stability, and abundance in cells. RNA IS SYNTHESIZED FROM A DNA TEMPLATE BY AN RNA POLYMERASE The processes of DNA and RNA synthesis are similar in that they involve (1) the general steps of initiation, elongation, and termination with 5to 3polarity; (2) large, multicomponent initiation complexes; and (3) adherence to Watson-Crick base-pairing rules. These processes differ in several important ways, including the following: (1) ribonucleotides are used in RNA synthe- sis rather than deoxyribonucleotides; (2) U replaces T as the complementary base pair for A in RNA; (3) a primer is not involved in RNA synthesis; (4) only a very small portion of the genome is transcribed or copied into RNA, whereas the entire genome must be copied during DNA replication; and (5) there is no proofread- ing function during RNA transcription. The process of synthesizing RNA from a DNA tem- plate has been characterized best in prokaryotes. Al- though in mammalian cells the regulation of RNA syn- thesis and the processing of the RNA transcripts are different from those in prokaryotes, the process of RNA synthesis per se is quite similar in these two classes of organisms. Therefore, the description of RNA synthesis in prokaryotes, where it is better understood, is applica- ble to eukaryotes even though the enzymes involved and the regulatory signals are different. The Template Strand of DNA Is Transcribed The sequence of ribonucleotides in an RNA molecule is complementary to the sequence of deoxyribonu- cleotides in one strand of the double-stranded DNA molecule (Figure 35–8). The strand that is transcribed or copied into an RNA molecule is referred to as the template strand of the DNA. The other DNA strand is frequently referred to as the coding strand of that gene. It is called this because, with the exception of T for U changes, it corresponds exactly to the sequence of the primary transcript, which encodes the protein product of the gene. In the case of a double-stranded DNA mol- ecule containing many genes, the template strand for each gene will not necessarily be the same strand of the DNA double helix (Figure 37–1). Thus, a given strand of a double-stranded DNA molecule will serve as the template strand for some genes and the coding strand of other genes. Note that the nucleotide sequence of an RNA transcript will be the same (except for U replacing T) as that of the coding strand. The information in the template strand is read out in the 3to 5direction. Copyrighted Material Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

Transcript of Rn asynthesis

RNA Synthesis, Processing,& Modification 37

341

Daryl K. Granner, MD, & P. Anthony Weil, PhD

BIOMEDICAL IMPORTANCE

The synthesis of an RNA molecule from DNA is acomplex process involving one of the group of RNApolymerase enzymes and a number of associated pro-teins. The general steps required to synthesize the pri-mary transcript are initiation, elongation, and termina-tion. Most is known about initiation. A number ofDNA regions (generally located upstream from the ini-tiation site) and protein factors that bind to these se-quences to regulate the initiation of transcription havebeen identified. Certain RNAs—mRNAs in particu-lar—have very different life spans in a cell. It is impor-tant to understand the basic principles of messengerRNA synthesis and metabolism, for modulation of thisprocess results in altered rates of protein synthesis andthus a variety of metabolic changes. This is how all or-ganisms adapt to changes of environment. It is also howdifferentiated cell structures and functions are estab-lished and maintained. The RNA molecules synthe-sized in mammalian cells are made as precursor mole-cules that have to be processed into mature, activeRNA. Errors or changes in synthesis, processing, andsplicing of mRNA transcripts are a cause of disease.

RNA EXISTS IN FOUR MAJOR CLASSES

All eukaryotic cells have four major classes of RNA: ri-bosomal RNA (rRNA), messenger RNA (mRNA), trans-fer RNA (tRNA), and small nuclear RNA (snRNA).The first three are involved in protein synthesis, andsnRNA is involved in mRNA splicing. As shown inTable 37–1, these various classes of RNA are differentin their diversity, stability, and abundance in cells.

RNA IS SYNTHESIZED FROM A DNATEMPLATE BY AN RNA POLYMERASE

The processes of DNA and RNA synthesis are similarin that they involve (1) the general steps of initiation,elongation, and termination with 5′ to 3′ polarity; (2)large, multicomponent initiation complexes; and (3)adherence to Watson-Crick base-pairing rules. Theseprocesses differ in several important ways, including the

following: (1) ribonucleotides are used in RNA synthe-sis rather than deoxyribonucleotides; (2) U replaces Tas the complementary base pair for A in RNA; (3) aprimer is not involved in RNA synthesis; (4) only a verysmall portion of the genome is transcribed or copiedinto RNA, whereas the entire genome must be copiedduring DNA replication; and (5) there is no proofread-ing function during RNA transcription.

The process of synthesizing RNA from a DNA tem-plate has been characterized best in prokaryotes. Al-though in mammalian cells the regulation of RNA syn-thesis and the processing of the RNA transcripts aredifferent from those in prokaryotes, the process of RNAsynthesis per se is quite similar in these two classes oforganisms. Therefore, the description of RNA synthesisin prokaryotes, where it is better understood, is applica-ble to eukaryotes even though the enzymes involvedand the regulatory signals are different.

The Template Strand of DNA Is Transcribed

The sequence of ribonucleotides in an RNA molecule iscomplementary to the sequence of deoxyribonu-cleotides in one strand of the double-stranded DNAmolecule (Figure 35–8). The strand that is transcribedor copied into an RNA molecule is referred to as thetemplate strand of the DNA. The other DNA strand isfrequently referred to as the coding strand of that gene.It is called this because, with the exception of T for Uchanges, it corresponds exactly to the sequence of theprimary transcript, which encodes the protein productof the gene. In the case of a double-stranded DNA mol-ecule containing many genes, the template strand foreach gene will not necessarily be the same strand of theDNA double helix (Figure 37–1). Thus, a given strandof a double-stranded DNA molecule will serve as thetemplate strand for some genes and the coding strandof other genes. Note that the nucleotide sequence of anRNA transcript will be the same (except for U replacingT) as that of the coding strand. The information in thetemplate strand is read out in the 3′ to 5′ direction.

ch37.qxd 3/16/04 11:02 AM Page 341

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

342 / CHAPTER 37

Table 37–1. Classes of eukaryotic RNA.

RNA Types Abundance Stability

Ribosomal 28S, 18S, 5.8S, 5S 80% of total Very stable(rRNA)

Messenger ~105 different 2–5% of total Unstable to (mRNA) species very

stableTransfer ~60 different ~15% of total Very stable

(tRNA) speciesSmall nuclear ~30 different ≤ 1% of total Very stable

(snRNA) species

5′3′

3′5′

5′ P-P-PRNA transcript

Transcription

RNAP complex

β′β

α ασ

3′OH

Figure 37–2. RNA polymerase (RNAP) catalyzes thepolymerization of ribonucleotides into an RNA se-quence that is complementary to the template strandof the gene. The RNA transcript has the same polarity(5′ to 3′) as the coding strand but contains U ratherthan T. E coli RNAP consists of a core complex of twoα subunits and two β subunits (β and β′). The holoen-zyme contains the σ subunit bound to the α2ββ′ coreassembly. The ω subunit is not shown. The transcription“bubble” is an approximately 20-bp area of meltedDNA, and the entire complex covers 30–75 bp, depend-ing on the conformation of RNAP.

DNA-Dependent RNA Polymerase Initiates Transcription at a Distinct Site, the Promoter

DNA-dependent RNA polymerase is the enzyme re-sponsible for the polymerization of ribonucleotides intoa sequence complementary to the template strand ofthe gene (see Figures 37–2 and 37–3). The enzyme at-taches at a specific site—the promoter—on the tem-plate strand. This is followed by initiation of RNA syn-thesis at the starting point, and the process continuesuntil a termination sequence is reached (Figure 37–3).A transcription unit is defined as that region of DNAthat includes the signals for transcription initiation,elongation, and termination. The RNA product, whichis synthesized in the 5′ to 3′ direction, is the primarytranscript. In prokaryotes, this can represent the prod-uct of several contiguous genes; in mammalian cells, itusually represents the product of a single gene. The 5′terminals of the primary RNA transcript and the ma-ture cytoplasmic RNA are identical. Thus, the startingpoint of transcription corresponds to the 5� nu-cleotide of the mRNA. This is designated position +1,as is the corresponding nucleotide in the DNA. The

5′ 3′3′ 5′

Gene A Gene B Gene C

Template strands

Gene D

Figure 37–1. This figure illustrates that genes can betranscribed off both strands of DNA. The arrowheads in-dicate the direction of transcription (polarity). Note thatthe template strand is always read in the 3′ to 5′ direc-tion. The opposite strand is called the coding strand be-cause it is identical (except for T for U changes) to themRNA transcript (the primary transcript in eukaryoticcells) that encodes the protein product of the gene.

(1) Template binding

RNAP

pppApN

(5) Chain terminationand RNAP release

ATP + NTP

(2) Chain initiation

pppApN

pppApN

(3) Promoterclearance

NTPs

NTPs

(4) Chain elongation

pppApN

p

p

Figure 37–3. The transcription cycle in bacteria. Bac-terial RNA transcription is described in four steps:(1) Template binding: RNA polymerase (RNAP) bindsto DNA and locates a promoter (P) melts the two DNAstrands to form a preinitiation complex (PIC). (2) Chaininitiation: RNAP holoenzyme (core + one of multiplesigma factors) catalyzes the coupling of the first base(usually ATP or GTP) to a second ribonucleosidetriphosphate to form a dinucleotide. (3) Chain elonga-tion: Successive residues are added to the 3′-OH termi-nus of the nascent RNA molecule. (4) Chain termina-tion and release: The completed RNA chain and RNAPare released from the template. The RNAP holoenzymere-forms, finds a promoter, and the cycle is repeated.

ch37.qxd 3/16/04 11:02 AM Page 342

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 343

Table 37–2. Nomenclature and properties ofmammalian nuclear DNA-dependent RNApolymerases.

Form of RNA Sensitivity to Polymerase �-Amanitin Major Products

I (A) Insensitive rRNAII (B) High sensitivity mRNAIII (C) Intermediate sensitivity tRNA/5S rRNA

numbers increase as the sequence proceeds downstream.This convention makes it easy to locate particular re-gions, such as intron and exon boundaries. The nu-cleotide in the promoter adjacent to the transcriptioninitiation site is designated −1, and these negative num-bers increase as the sequence proceeds upstream, awayfrom the initiation site. This provides a conventionalway of defining the location of regulatory elements inthe promoter.

The primary transcripts generated by RNA polym-erase II—one of three distinct nuclear DNA-depen-dent RNA polymerases in eukaryotes—are promptlycapped by 7-methylguanosine triphosphate caps (Fig-ure 35–10) that persist and eventually appear on the 5′end of mature cytoplasmic mRNA. These caps are nec-essary for the subsequent processing of the primarytranscript to mRNA, for the translation of the mRNA,and for protection of the mRNA against exonucleolyticattack.

Bacterial DNA-Dependent RNAPolymerase Is a Multisubunit Enzyme

The DNA-dependent RNA polymerase (RNAP) of thebacterium Escherichia coli exists as an approximately400 kDa core complex consisting of two identical αsubunits, similar but not identical β and β′ subunits,and an ω subunit. Beta is thought to be the catalyticsubunit (Figure 37–2). RNAP, a metalloenzyme, alsocontains two zinc molecules. The core RNA polymeraseassociates with a specific protein factor (the sigma [σ]factor) that helps the core enzyme recognize and bindto the specific deoxynucleotide sequence of the pro-moter region (Figure 37–5) to form the preinitiationcomplex (PIC). Sigma factors have a dual role in theprocess of promoter recognition; σ association withcore RNA polymerase decreases its affinity for nonpro-moter DNA while simultaneously increasing holoen-zyme affinity for promoter DNA. Bacteria contain mul-tiple σ factors, each of which acts as a regulatoryprotein that modifies the promoter recognition speci-ficity of the RNA polymerase. The appearance of dif-ferent σ factors can be correlated temporally with vari-ous programs of gene expression in prokaryotic systemssuch as bacteriophage development, sporulation, andthe response to heat shock.

Mammalian Cells Possess Three Distinct Nuclear DNA-Dependent RNA Polymerases

The properties of mammalian polymerases are de-scribed in Table 37–2. Each of these DNA-dependentRNA polymerases is responsible for transcription of dif-

ferent sets of genes. The sizes of the RNA polymerasesrange from MW 500,000 to MW 600,000. These en-zymes are much more complex than prokaryotic RNApolymerases. They all have two large subunits and anumber of smaller subunits—as many as 14 in the caseof RNA pol III. The eukaryotic RNA polymerases haveextensive amino acid homologies with prokaryoticRNA polymerases. This homology has been shown re-cently to extend to the level of three-dimensional struc-tures. The functions of each of the subunits are not yetfully understood. Many could have regulatory func-tions, such as serving to assist the polymerase in therecognition of specific sequences like promoters andtermination signals.

One peptide toxin from the mushroom Amanitaphalloides, α-amanitin, is a specific differential inhibitorof the eukaryotic nuclear DNA-dependent RNA polym-erases and as such has proved to be a powerful researchtool (Table 37–2). α-Amanitin blocks the translocationof RNA polymerase during transcription.

RNA SYNTHESIS IS A CYCLICAL PROCESS& INVOLVES INITIATION, ELONGATION,& TERMINATION

The process of RNA synthesis in bacteria—depicted inFigure 37–3—involves first the binding of the RNAholopolymerase molecule to the template at the pro-moter site to form a PIC. Binding is followed by a con-formational change of the RNAP, and the first nu-cleotide (almost always a purine) then associates withthe initiation site on the β subunit of the enzyme. Inthe presence of the appropriate nucleotide, the RNAPcatalyzes the formation of a phosphodiester bond, andthe nascent chain is now attached to the polymerizationsite on the β subunit of RNAP. (The analogy to the Aand P sites on the ribosome should be noted; see Figure38–9.)

Initiation of formation of the RNA molecule at its5′ end then follows, while elongation of the RNA mole-

ch37.qxd 3/16/04 11:02 AM Page 343

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

344 / CHAPTER 37

Figure 37–4. Electron photomicrograph of multiplecopies of amphibian ribosomal RNA genes in theprocess of being transcribed. The magnification isabout 6000 ×. Note that the length of the transcripts in-creases as the RNA polymerase molecules progressalong the individual ribosomal RNA genes; transcrip-tion start sites (filled circles) to transcription termina-tion sites (open circles). RNA polymerase I (not visual-ized here) is at the base of the nascent rRNA transcripts.Thus, the proximal end of the transcribed gene hasshort transcripts attached to it, while much longer tran-scripts are attached to the distal end of the gene. Thearrows indicate the direction (5′ to 3′) of transcription.(Reproduced with permission, from Miller OL Jr, Beatty BR:Portrait of a gene. J Cell Physiol 1969;74[Suppl 1]:225.)

cule from the 5′ to its 3′ end continues cyclically, an-tiparallel to its template. The enzyme polymerizes theribonucleotides in a specific sequence dictated by thetemplate strand and interpreted by Watson-Crick base-pairing rules. Pyrophosphate is released in the polymer-ization reaction. This pyrophosphate (PPi) is rapidlydegraded to 2 mol of inorganic phosphate (Pi) by ubiq-uitous pyrophosphatases, thereby providing irreversibil-ity on the overall synthetic reaction. In both prokary-otes and eukaryotes, a purine ribonucleotide is usuallythe first to be polymerized into the RNA molecule. Aswith eukaryotes, 5′ triphosphate of this first nucleotideis maintained in prokaryotic mRNA.

As the elongation complex containing the coreRNA polymerase progresses along the DNA molecule,DNA unwinding must occur in order to provide accessfor the appropriate base pairing to the nucleotides ofthe coding strand. The extent of this transcription bub-ble (ie, DNA unwinding) is constant throughout tran-scription and has been estimated to be about 20 basepairs per polymerase molecule. Thus, it appears that thesize of the unwound DNA region is dictated by thepolymerase and is independent of the DNA sequence inthe complex. This suggests that RNA polymerase hasassociated with it an “unwindase” activity that opensthe DNA helix. The fact that the DNA double helixmust unwind and the strands part at least transientlyfor transcription implies some disruption of the nucleo-some structure of eukaryotic cells. Topoisomerase bothprecedes and follows the progressing RNAP to preventthe formation of superhelical complexes.

Termination of the synthesis of the RNA moleculein bacteria is signaled by a sequence in the templatestrand of the DNA molecule—a signal that is recog-nized by a termination protein, the rho (ρ) factor. Rhois an ATP-dependent RNA-stimulated helicase thatdisrupts the nascent RNA-DNA complex. After termi-nation of synthesis of the RNA molecule, the enzymeseparates from the DNA template and probably disso-ciates to free core enzyme and free σ factor. With theassistance of another σ factor, the core enzyme thenrecognizes a promoter at which the synthesis of a newRNA molecule commences. In eukaryotic cells, termi-nation is less well defined. It appears to be somehowlinked both to initiation and to addition of the 3′polyA tail of mRNA and could involve destabilizationof the RNA-DNA complex at a region of A–U basepairs. More than one RNA polymerase molecule maytranscribe the same template strand of a gene simulta-neously, but the process is phased and spaced in such away that at any one moment each is transcribing a dif-ferent portion of the DNA sequence. An electron mi-crograph of extremely active RNA synthesis is shownin Figure 37–4.

THE FIDELITY & FREQUENCY OFTRANSCRIPTION IS CONTROLLED BY PROTEINS BOUND TO CERTAIN DNA SEQUENCES

The DNA sequence analysis of specific genes has al-lowed the recognition of a number of sequences impor-tant in gene transcription. From the large number ofbacterial genes studied it is possible to construct con-sensus models of transcription initiation and termina-tion signals.

ch37.qxd 3/16/04 11:02 AM Page 344

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 345

The question, “How does RNAP find the correctsite to initiate transcription?” is not trivial when thecomplexity of the genome is considered. E coli has4 × 103 transcription initiation sites in 4 × 106 basepairs (bp) of DNA. The situation is even more complexin humans, where perhaps 105 transcription initiationsites are distributed throughout in 3 × 109 bp of DNA.RNAP can bind to many regions of DNA, but it scansthe DNA sequence—at a rate of ≥ 103 bp/s—until itrecognizes certain specific regions of DNA to which itbinds with higher affinity. This region is called the pro-moter, and it is the association of RNAP with the pro-moter that ensures accurate initiation of transcription.The promoter recognition-utilization process is the tar-get for regulation in both bacteria and humans.

Bacterial Promoters Are Relatively Simple

Bacterial promoters are approximately 40 nucleotides(40 bp or four turns of the DNA double helix) inlength, a region small enough to be covered by anE coli RNA holopolymerase molecule. In this consensuspromoter region are two short, conserved sequence ele-ments. Approximately 35 bp upstream of the transcrip-

tion start site there is a consensus sequence of eight nu-cleotide pairs (5′-TGTTGACA-3′) to which the RNAPbinds to form the so-called closed complex. Moreproximal to the transcription start site—about ten nu-cleotides upstream—is a six-nucleotide-pair A+T-richsequence (5′-TATAAT-3′). These conserved sequenceelements comprising the promoter are shown schemati-cally in Figure 37–5. The latter sequence has a lowmelting temperature because of its deficiency of GCnucleotide pairs. Thus, the TATA box is thought toease the dissociation between the two DNA strands sothat RNA polymerase bound to the promoter regioncan have access to the nucleotide sequence of its imme-diately downstream template strand. Once this processoccurs, the combination of RNA polymerase plus pro-moter is called the open complex. Other bacteria haveslightly different consensus sequences in their promot-ers, but all generally have two components to the pro-moter; these tend to be in the same position relative tothe transcription start site, and in all cases the sequencesbetween the boxes have no similarity but still providecritical spacing functions facilitating recognition of −35and −10 sequence by RNA polymerase holoenzyme.Within a bacterial cell, different sets of genes are often

Transcriptionstart site

+1

Promoter Transcribed region

TRANSCRIPTION UNIT

Coding strand 5′Template strand 3′ TGTTGACA TATAAT

−35region

−10region

PPP5′

Terminationsignals

3′5′DNA

5′ Flankingsequences

3′ Flankingsequences

RNAOH3′

Figure 37–5. Bacterial promoters, such as that from E coli shown here,share two regions of highly conserved nucleotide sequence. These regionsare located 35 and 10 bp upstream (in the 5′ direction of the coding strand)from the start site of transcription, which is indicated as +1. By convention,all nucleotides upstream of the transcription initiation site (at +1) are num-bered in a negative sense and are referred to as 5′-flanking sequences. Alsoby convention, the DNA regulatory sequence elements (TATA box, etc) aredescribed in the 5′ to 3′ direction and as being on the coding strand. Theseelements function only in double-stranded DNA, however. Note that thetranscript produced from this transcription unit has the same polarity or“sense” (ie, 5′ to 3′ orientation) as the coding strand. Termination cis-elements reside at the end of the transcription unit (see Figure 37–6 formore detail). By convention the sequences downstream of the site at whichtranscription termination occurs are termed 3′-flanking sequences.

ch37.qxd 3/16/04 11:02 AM Page 345

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

346 / CHAPTER 37

coordinately regulated. One important way that this isaccomplished is through the fact that these co-regulatedgenes share unique −35 and −10 promoter sequences.These unique promoters are recognized by different σfactors bound to core RNA polymerase.

Rho-dependent transcription termination signalsin E coli also appear to have a distinct consensus se-quence, as shown in Figure 37–6. The conserved con-sensus sequence, which is about 40 nucleotide pairs inlength, can be seen to contain a hyphenated or inter-rupted inverted repeat followed by a series of AT basepairs. As transcription proceeds through the hyphen-ated, inverted repeat, the generated transcript can formthe intramolecular hairpin structure, also depicted inFigure 37–6.

Transcription continues into the AT region, andwith the aid of the ρ termination protein the RNApolymerase stops, dissociates from the DNA template,and releases the nascent transcript.

Eukaryotic Promoters Are More Complex

It is clear that the signals in DNA which control tran-scription in eukaryotic cells are of several types. Twotypes of sequence elements are promoter-proximal. Oneof these defines where transcription is to commencealong the DNA, and the other contributes to the mecha-nisms that control how frequently this event is to occur.For example, in the thymidine kinase gene of the herpes

simplex virus, which utilizes transcription factors of itsmammalian host for gene expression, there is a singleunique transcription start site, and accurate transcriptionfrom this start site depends upon a nucleotide sequencelocated 32 nucleotides upstream from the start site (ie, at−32) (Figure 37–7). This region has the sequence ofTATAAAAG and bears remarkable similarity to thefunctionally related TATA box that is located about 10bp upstream from the prokaryotic mRNA start site (Fig-ure 37–5). Mutation or inactivation of the TATA boxmarkedly reduces transcription of this and many othergenes that contain this consensus cis element (see Figures37–7, 37–8). Most mammalian genes have a TATA boxthat is usually located 25–30 bp upstream from the tran-scription start site. The consensus sequence for a TATAbox is TATAAA, though numerous variations have beencharacterized. The TATA box is bound by 34 kDaTATA binding protein (TBP), which in turn binds sev-eral other proteins called TBP-associated factors(TAFs). This complex of TBP and TAFs is referred to asTFIID. Binding of TFIID to the TATA box sequence isthought to represent the first step in the formation of thetranscription complex on the promoter.

A small number of genes lack a TATA box. In suchinstances, two additional cis elements, an initiator se-quence (Inr) and the so-called downstream promoterelement (DPE), direct RNA polymerase II to the pro-moter and in so doing provide basal transcription start-ing from the correct site. The Inr element spans the start

AGCCCGCTCGGGCG

TTTTTTTT

GCGGGCTCGCCCGA

TTTTTTTT

AAAAAAAA

AAAAAAAA

AGCCCG

G

GGG

CC

CUU

U UUUUUU-3′

5′RNA transcript

Coding strand 5′Template strand 3′

Coding strand 5′Template strand 3′

Direction of transcription

5′3′ DNA

5′

3′DNA

Figure 37–6. The predominant bacterial transcription termination signal contains an inverted, hyphenated re-peat (the two boxed areas) followed by a stretch of AT base pairs (top figure). The inverted repeat, when tran-scribed into RNA, can generate the secondary structure in the RNA transcript shown at the bottom of the figure.Formation of this RNA hairpin causes RNA polymerase to pause and subsequently the ρ termination factor inter-acts with the paused polymerase and somehow induces chain termination.

ch37.qxd 3/16/04 11:02 AM Page 346

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 347

Promoter proximalupstream elements

GC CAAT GC TATA box tk coding region

+1

−25

Promoter

Sp1

CTF

Sp1TFIID

Figure 37–7. Transcription elements and binding factors in the herpes simplex virus thymidine ki-nase (tk) gene. DNA-dependent RNA polymerase II binds to the region of the TATA box (which is boundby transcription factor TFIID) to form a multicomponent preinitiation complex capable of initiatingtranscription at a single nucleotide (+1). The frequency of this event is increased by the presence of up-stream cis-acting elements (the GC and CAAT boxes). These elements bind trans-acting transcriptionfactors, in this example Sp1 and CTF (also called C/EBP, NF1, NFY). These cis elements can function inde-pendently of orientation (arrows).

Regulated expression “Basal” expression

Distalregulatoryelements

Promoterproximalelements

Promoter

Enhancer (+)and

repressor (−)elements

Promoterproximalelements

(GC/CAAT, etc)

Otherregulatoryelements TATA Inr DPE

Coding region

+1

Figure 37–8. Schematic diagram showing the transcription control regions in a hypothetical class II(mRNA-producing) eukaryotic gene. Such a gene can be divided into its coding and regulatory regions,as defined by the transcription start site (arrow; +1). The coding region contains the DNA sequence thatis transcribed into mRNA, which is ultimately translated into protein. The regulatory region consists oftwo classes of elements. One class is responsible for ensuring basal expression. These elements gener-ally have two components. The proximal component, generally the TATA box, or Inr or DPE elements di-rect RNA polymerase II to the correct site (fidelity). In TATA-less promoters, an initiator (Inr) element thatspans the initiation site (+1) may direct the polymerase to this site. Another component, the upstreamelements, specifies the frequency of initiation. Among the best studied of these is the CAAT box, butseveral other elements (Sp1, NF1, AP1, etc) may be used in various genes. A second class of regulatorycis-acting elements is responsible for regulated expression. This class consists of elements that enhanceor repress expression and of others that mediate the response to various signals, including hormones,heat shock, heavy metals, and chemicals. Tissue-specific expression also involves specific sequences ofthis sort. The orientation dependence of all the elements is indicated by the arrows within the boxes. Forexample, the proximal element (the TATA box) must be in the 5′ to 3′ orientation. The upstream ele-ments work best in the 5′ to 3′ orientation, but some of them can be reversed. The locations of some el-ements are not fixed with respect to the transcription start site. Indeed, some elements responsible forregulated expression can be located either interspersed with the upstream elements, or they can be lo-cated downstream from the start site.

ch37.qxd 3/16/04 11:02 AM Page 347

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

348 / CHAPTER 37

site (from −3 to +5) and consists of the general consen-sus sequence TCA+1 G/T T T/C which is similar to theinitiation site sequence per se. (A+1 indicates the firstnucleotide transcribed.) The proteins that bind to Inr inorder to direct pol II binding include TFIID. Promotersthat have both a TATA box and an Inr may be strongerthan those that have just one of these elements. TheDPE has the consensus sequence A/GGA/T CGTG andis localized about 25 bp downstream of the +1 start site.Like the Inr, DPE sequences are also bound by the TAFsubunits of TFIID. In a survey of over 200 eukaryoticgenes, roughly 30% contained a TATA box and Inr,25% contained Inr and DPE, 15% contained all threeelements, while ~30% contained just the Inr.

Sequences farther upstream from the start site deter-mine how frequently the transcription event occurs.Mutations in these regions reduce the frequency oftranscriptional starts tenfold to twentyfold. Typical ofthese DNA elements are the GC and CAAT boxes, sonamed because of the DNA sequences involved. As il-lustrated in Figure 37–7, each of these boxes binds aprotein, Sp1 in the case of the GC box and CTF (orC/EPB,NF1,NFY) by the CAAT box; both bindthrough their distinct DNA binding domains (DBDs).The frequency of transcription initiation is a conse-quence of these protein-DNA interactions and complexinteractions between particular domains of the tran-scription factors (distinct from the DBD domains—so-called activation domains; ADs) of these proteins andthe rest of the transcription machinery (RNA polym-erase II and the basal factors TFIIA, B, D, E, F). (See

below and Figures 37–9 and 37–10). The protein-DNA interaction at the TATA box involving RNApolymerase II and other components of the basal tran-scription machinery ensures the fidelity of initiation.

Together, then, the promoter and promoter-proxi-mal cis-active upstream elements confer fidelity and fre-quency of initiation upon a gene. The TATA box has aparticularly rigid requirement for both position and ori-entation. Single-base changes in any of these cis ele-ments have dramatic effects on function by reducingthe binding affinity of the cognate trans factors (eitherTFIID/TBP or Sp1, CTF, and similar factors). Thespacing of these elements with respect to the transcrip-tion start site can also be critical. This is particularlytrue for the TATA box Inr and DPE.

A third class of sequence elements can either increaseor decrease the rate of transcription initiation of eukary-otic genes. These elements are called either enhancers orrepressors (or silencers), depending on which effectthey have. They have been found in a variety of locationsboth upstream and downstream of the transcription startsite and even within the transcribed portions of somegenes. In contrast to proximal and upstream promoter el-ements, enhancers and silencers can exert their effectswhen located hundreds or even thousands of bases awayfrom transcription units located on the same chromo-some. Surprisingly, enhancers and silencers can functionin an orientation-independent fashion. Literally hun-dreds of these elements have been described. In somecases, the sequence requirements for binding are rigidlyconstrained; in others, considerable sequence variation is

E

H

B

pol II

F

DA

+10 +30–30–50 –10 +50

TATA

Figure 37–9. The eukaryotic basal transcription complex. Formation of the basal transcription complex beginswhen TFIID binds to the TATA box. It directs the assembly of several other components by protein-DNA andprotein-protein interactions. The entire complex spans DNA from position −30 to +30 relative to the initiation site(+1, marked by bent arrow). The atomic level, x-ray-derived structures of RNA polymerase II alone and of TBPbound to TATA promoter DNA in the presence of either TFIIB or TFIIA have all been solved at 3 Å resolution. Thestructure of TFIID complexes have been determined by electron microscopy at 30 Å resolution. Thus, the molecu-lar structures of the transcription machinery are beginning to be elucidated. Much of this structural information isconsistent with the models presented here.

ch37.qxd 3/16/04 11:02 AM Page 348

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 349

Basalcomplex

TAF

Basal complex

Basal complex

CCAAT

Rate oftranscription

CAAT TATA nil

Rate oftranscription

TAF

TAF

TAF

CTFC

TF

CTF

CTF+

CCAAT

A B

Basalcomplex

TATA nil

BasalcomplexTBP

TAF

CTF

TAF

TBP

TAF

TBP

TATA

CA

AT

TBP

TAF

CAAT

TBP

Figure 37–10. Two models for assembly of the active transcription complex and for how activators and coacti-vators might enhance transcription. Shown here as a small oval is TBP, which contains TFIID, a large oval that con-tains all the components of the basal transcription complex illustrated in Figure 37–9 (ie, RNAP II and TFIIA, TFIIB,TFIIE, TFIIF, and TFIIH). Panel A: The basal transcription complex is assembled on the promoter after the TBP sub-unit of TFIID is bound to the TATA box. Several TAFs (coactivators) are associated with TBP. In this example, a tran-scription activator, CTF, is shown bound to the CAAT box, forming a loop complex by interacting with a TAFbound to TBP. Panel B: The recruitment model. The transcription activator CTF binds to the CAAT box and inter-acts with a coactivator (TAF in this case). This allows for an interaction with the preformed TBP-basal transcriptioncomplex. TBP can now bind to the TATA box, and the assembled complex is fully active.

allowed. Some sequences bind only a single protein, butthe majority bind several different proteins. Similarly, asingle protein can bind to more than one element.

Hormone response elements (for steroids, T3, reti-noic acid, peptides, etc) act as—or in conjunction with—enhancers or silencers (Chapter 43). Other processesthat enhance or silence gene expression—such as the re-sponse to heat shock, heavy metals (Cd2+ and Zn2+),and some toxic chemicals (eg, dioxin)—are mediatedthrough specific regulatory elements. Tissue-specific ex-pression of genes (eg, the albumin gene in liver, the he-moglobin gene in reticulocytes) is also mediated by spe-cific DNA sequences.

Specific Signals Regulate Transcription Termination

The signals for the termination of transcription byeukaryotic RNA polymerase II are very poorly under-

stood. However, it appears that the termination signalsexist far downstream of the coding sequence of eukary-otic genes. For example, the transcription terminationsignal for mouse β-globin occurs at several positions1000–2000 bases beyond the site at which the poly(A)tail will eventually be added. Little is known about thetermination process or whether specific terminationfactors similar to the bacterial ρ factor are involved.However, it is known that the mRNA 3′ terminal isgenerated posttranscriptionally, is somehow coupled toevents or structures formed at the time and site of initi-ation, depends on a special structure in one of the sub-units of RNA polymerase II (the CTD; see below), andappears to involve at least two steps. After RNA polym-erase II has traversed the region of the transcriptionunit encoding the 3′ end of the transcript, an RNA en-donuclease cleaves the primary transcript at a positionabout 15 bases 3′ of the consensus sequence AAUAAAthat serves in eukaryotic transcripts as a cleavage signal.

ch37.qxd 3/16/04 11:02 AM Page 349

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

350 / CHAPTER 37

Finally, this newly formed 3′ terminal is polyadenylatedin the nucleoplasm, as described below.

THE EUKARYOTIC TRANSCRIPTION COMPLEX

A complex apparatus consisting of as many as 50unique proteins provides accurate and regulatable tran-scription of eukaryotic genes. The RNA polymerase en-zymes (pol I, pol II, and pol III for class I, II, and IIIgenes, respectively) transcribe information contained inthe template strand of DNA into RNA. These polym-erases must recognize a specific site in the promoter inorder to initiate transcription at the proper nucleotide.In contrast to the situation in prokaryotes, eukaryoticRNA polymerases alone are not able to discriminate be-tween promoter sequences and other regions of DNA;thus, other proteins known as general transcription fac-tors or GTFs facilitate promoter-specific binding ofthese enzymes and formation of the preinitiation com-plex (PIC). This combination of components can cat-alyze basal or (non)-unregulated transcription in vitro.Another set of proteins—coactivators—help regulatethe rate of transcription initiation by interacting withtranscription activators that bind to upstream DNA el-ements (see below).

Formation of the Basal Transcription Complex

In bacteria, a σ factor–polymerase complex selectivelybinds to DNA in the promoter forming the PIC. Thesituation is more complex in eukaryotic genes. Class IIgenes—those transcribed by pol II to make mRNA—are described as an example. In class II genes, the func-tion of σ factors is assumed by a number of proteins.Basal transcription requires, in addition to pol II, anumber of GTFs called TFIIA, TFIIB, TFIID,TFIIE, TFIIF, and TFIIH. These GTFs serve to pro-mote RNA polymerase II transcription on essentially allgenes. Some of these GTFs are composed of multiplesubunits. TFIID, which binds to the TATA box pro-moter element, is the only one of these factors capa-ble of binding to specific sequences of DNA. As de-scribed above, TFIID consists of TATA bindingprotein (TBP) and 14 TBP-associated factors (TAFs).

TBP binds to the TATA box in the minor groove ofDNA (most transcription factors bind in the majorgroove) and causes an approximately 100-degree bendor kink of the DNA helix. This bending is thought tofacilitate the interaction of TBP-associated factors withother components of the transcription initiation com-plex and possibly with factors bound to upstream ele-ments. Although defined as a component of class IIgene promoters, TBP, by virtue of its association with

distinct, polymerase-specific sets of TAFs, is also an im-portant component of class I and class III initiationcomplexes even if they do not contain TATA boxes.

The binding of TBP marks a specific promoter fortranscription and is the only step in the assembly processthat is entirely dependent on specific, high-affinity pro-tein-DNA interaction. Of several subsequent in vitrosteps, the first is the binding of TFIIB to the TFIID-promoter complex. This results in a stable ternary com-plex which is then more precisely located and moretightly bound at the transcription initiation site. Thiscomplex then attracts and tethers the pol II-TFIIF com-plex to the promoter. TFIIF is structurally and func-tionally similar to the bacterial σ factor and is requiredfor the delivery of pol II to the promoter. TFIIA bindsto this assembly and may allow the complex to respondto activators, perhaps by the displacement of repressors.Addition of TFIIE and TFIIH is the final step in the as-sembly of the PIC. TFIIE appears to join the complexwith pol II-TFIIF, and TFIIH is then recruited. Each ofthese binding events extends the size of the complex sothat finally about 60 bp (from −30 to +30 relative to +1,the nucleotide from which transcription commences)are covered (Figure 37–9). The PIC is now completeand capable of basal transcription initiated from the cor-rect nucleotide. In genes that lack a TATA box, thesame factors, including TBP, are required. In such cases,an Inr or the DPEs (see Figure 37–8) position the com-plex for accurate initiation of transcription.

Phosphorylation Activates Pol II

Eukaryotic pol II consists of 12 subunits. The twolargest subunits, both about 200 kDa, are homologousto the bacterial β and β′ subunits. In addition to the in-creased number of subunits, eukaryotic pol II differsfrom its prokaryotic counterpart in that it has a series ofheptad repeats with consensus sequence Tyr-Ser-Pro-Thr-Ser-Pro-Ser at the carboxyl terminal of the largestpol II subunit. This carboxyl terminal repeat domain(CTD) has 26 repeated units in brewers’ yeast and 52units in mammalian cells. The CTD is both a substratefor several kinases, including the kinase component ofTFIIH, and a binding site for a wide array of proteins.The CTD has been shown to interact with RNA pro-cessing enzymes; such binding may be involved withRNA polyadenylation. The association of the factorswith the CTD of RNA polymerase II (and other com-ponents of the basal machinery) somehow serves tocouple initiation with mRNA 3′ end formation. Pol IIis activated when phosphorylated on the Ser and Thrresidues and displays reduced activity when the CTD isdephosphorylated. Pol II lacking the CTD tail is inca-pable of activating transcription, which underscores theimportance of this domain.

ch37.qxd 3/16/04 11:02 AM Page 350

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 351

Pol II associates with other proteins to form aholoenzyme complex. In yeast, at least nine gene prod-ucts—called Srb (for suppressor of RNA polymer-ase B)—bind to the CTD. The Srb proteins—or medi-ators, as they are also called—are essential for pol IItranscription, though their exact role in this process hasnot been defined. Related proteins comprising evenmore complex forms of RNA polymerase II have beendescribed in human cells.

The Role of Transcription Activators & Coactivators

TFIID was originally considered to be a single protein.However, several pieces of evidence led to the impor-tant discovery that TFIID is actually a complex consist-ing of TBP and the 14 TAFs. The first evidence thatTFIID was more complex than just the TBP moleculescame from the observation that TBP binds to a 10-bpsegment of DNA, immediately over the TATA box ofthe gene, whereas native holo-TFIID covers a 35 bp orlarger region (Figure 37–9). Second, TBP has a molec-ular mass of 20–40 kDa (depending on the species),whereas the TFIID complex has a mass of about 1000kDa. Finally, and perhaps most importantly, TBP sup-ports basal transcription but not the augmented tran-scription provided by certain activators, eg, Sp1 boundto the GC box. TFIID, on the other hand, supportsboth basal and enhanced transcription by Sp1, Oct1,AP1, CTF, ATF, etc. (Table 37–3). The TAFs are es-sential for this activator-enhanced transcription. It isnot yet clear whether there are one or several forms ofTFIID that might differ slightly in their complement of

TAFs. It is conceivable that different combinations ofTAFs with TBP—or one of several recently discoveredTBP-like factors (TLFs)—may bind to different pro-moters, and recent reports suggest that this may ac-count for selective activation noted in various promot-ers and for the different strengths of certain promoters.TAFs, since they are required for the action of acti-vators, are often called coactivators. There are thusthree classes of transcription factors involved in the reg-ulation of class II genes: basal factors, coactivators, andactivator-repressors (Table 37–4). How these classes ofproteins interact to govern both the site and frequencyof transcription is a question of central importance.

Two Models Explain the Assembly of the Preinitiation Complex

The formation of the PIC described above is based onthe sequential addition of purified components in invitro experiments. An essential feature of this model isthat the assembly takes place on the DNA template.Accordingly, transcription activators, which have au-tonomous DNA binding and activation domains (seeChapter 39), are thought to function by stimulating ei-ther PIC formation or PIC function. The TAF coacti-vators are viewed as bridging factors that communicatebetween the upstream activators, the proteins associatedwith pol II, or the many other components of TFIID.This view, which assumes that there is stepwise assem-bly of the PIC—promoted by various interactions be-tween activators, coactivators, and PIC components—is illustrated in panel A of Figure 37–10. This modelwas supported by observations that many of these pro-teins could indeed bind to one another in vitro.

Recent evidence suggests that there is another possi-ble mechanism of PIC formation and transcription reg-ulation. First, large preassembled complexes of GTFsand pol II are found in cell extracts, and this complexcan associate with a promoter in a single step. Second,the rate of transcription achieved when activators areadded to limiting concentrations of pol II holoenzymecan be matched by increasing the concentration of thepol II holoenzyme in the absence of activators. Thus,

Table 37–3. Some of the transcription controlelements, their consensus sequences, and thefactors that bind to them which are found inmammalian genes transcribed by RNApolymerase II. A complete list would includedozens of examples. The asterisks mean thatthere are several members of this family.

Element Consensus Sequence Factor

TATA box TATAAA TBPCAAT box CCAATC C/EBP*, NF-Y*GC box GGGCGG Sp1*

CAACTGAC Myo DT/CGGA/CN5GCCAA NF1*

lg octamer ATGCAAAT Oct1, 2, 4, 6*AP1 TGAG/CTC/AA Jun, Fos, ATF*Serum response GATGCCCATA SRFHeat shock (NGAAN)3 HSF

Table 37–4. Three classes of transcription factorsin class II genes.

General Mechanisms Specific Components

Basal components TBP, TFIIA, B, E, F, and H

Coactivators TAFs (TBP + TAFs) = TFIID; Srbs

Activators SP1, ATF, CTF, AP1, etc

ch37.qxd 3/16/04 11:02 AM Page 351

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

352 / CHAPTER 37

activators are not in themselves absolutely essential forPIC formation. These observations led to the “recruit-ment” hypothesis, which has now been tested experi-mentally. Simply stated, the role of activators andcoactivators may be solely to recruit a preformedholoenzyme-GTF complex to the promoter. The re-quirement for an activation domain is circumventedwhen either a component of TFIID or the pol IIholoenzyme is artificially tethered, using recombinantDNA techniques, to the DNA binding domain (DBD)of an activator. This anchoring, through the DBDcomponent of the activator molecule, leads to a tran-scriptionally competent structure, and there is no fur-ther requirement for the activation domain of the acti-vator. In this view, the role of activation domains andTAFs is to form an assembly that directs the preformedholoenzyme-GTF complex to the promoter; they donot assist in PIC assembly (see panel B, Figure 37–10).The efficiency of this recruitment process determinesthe rate of transcription at a given promoter.

Hormones—and other effectors that serve to trans-mit information related to the extracellular environ-ment—modulate gene expression by influencing the as-sembly and activity of the activator and coactivatorcomplexes and the subsequent formation of the PIC atthe promoter of target genes (see Chapter 43). The nu-merous components involved provide for an abundanceof possible combinations and therefore a range of tran-scriptional activity of a given gene. It is important tonote that the two models are not mutually exclusive—stepwise versus holoenzyme-mediated PIC formation.Indeed, one can envision various more complex modelsinvoking elements of both models operating on a gene.

RNA MOLECULES ARE USUALLYPROCESSED BEFORE THEY BECOME FUNCTIONAL

In prokaryotic organisms, the primary transcripts ofmRNA-encoding genes begin to serve as translationtemplates even before their transcription has been com-pleted. This is because the site of transcription is notcompartmentalized into a nucleus as it is in eukaryoticorganisms. Thus, transcription and translation are cou-pled in prokaryotic cells. Consequently, prokaryoticmRNAs are subjected to little processing prior to carry-ing out their intended function in protein synthesis. In-deed, appropriate regulation of some genes (eg, the Trpoperon) relies upon this coupling of transcription andtranslation. Prokaryotic rRNA and tRNA molecules aretranscribed in units considerably longer than the ulti-mate molecule. In fact, many of the tRNA transcriptionunits contain more than one molecule. Thus, inprokaryotes the processing of these rRNA and tRNA

precursor molecules is required for the generation ofthe mature functional molecules.

Nearly all eukaryotic RNA primary transcripts un-dergo extensive processing between the time they aresynthesized and the time at which they serve their ulti-mate function, whether it be as mRNA or as a com-ponent of the translation machinery such as rRNA,5S RNA, or tRNA or RNA processing machinery, snRNAs. Processing occurs primarily within the nu-cleus and includes nucleolytic cleavage to smaller mole-cules and coupled nucleolytic and ligation reactions(splicing of exons). In mammalian cells, 50–75% ofthe nuclear RNA does not contribute to the cytoplas-mic mRNA. This nuclear RNA loss is significantlygreater than can be reasonably accounted for by the lossof intervening sequences alone (see below). Thus, theexact function of the seemingly excessive transcripts inthe nucleus of a mammalian cell is not known.

The Coding Portions (Exons)of Most Eukaryotic Genes Are Interrupted by Introns

Interspersed within the amino acid-coding portions(exons) of many genes are long sequences of DNA thatdo not contribute to the genetic information ultimatelytranslated into the amino acid sequence of a proteinmolecule (see Chapter 36). In fact, these sequences ac-tually interrupt the coding region of structural genes.These intervening sequences (introns) exist withinmost but not all mRNA encoding genes of higher eu-karyotes. The primary transcripts of the structural genescontain RNA complementary to the interspersed se-quences. However, the intron RNA sequences arecleaved out of the transcript, and the exons of the tran-script are appropriately spliced together in the nucleusbefore the resulting mRNA molecule appears in the cy-toplasm for translation (Figures 37–11 and 37–12).One speculation is that exons, which often encode anactivity domain of a protein, represent a convenientmeans of shuffling genetic information, permitting or-ganisms to quickly test the results of combining novelprotein functional domains.

Introns Are Removed & Exons Are Spliced Together

The mechanisms whereby introns are removed fromthe primary transcript in the nucleus, exons are ligatedto form the mRNA molecule, and the mRNA moleculeis transported to the cytoplasm are being elucidated.Four different splicing reaction mechanisms have beendescribed. The one most frequently used in eukaryoticcells is described below. Although the sequences of nu-

ch37.qxd 3/16/04 11:02 AM Page 352

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 353

5′ Cap 3′ Primary transcript

Exon 1Intron

Exon 2

An

G GCap Nucleophilic attack at 5′ end of intron

Cut at 3′ end of intron

Ligation of 3′ end of exon1 to 5′ end of exon 2

An

G OHCap Lariat formationA G An

GCap

Cap and

A G

A GG G

G An

An

G

A

G

Intron is digested

OH

A

G

G—G

Figure 37–11. The processing of the primary transcript to mRNA. In this hy-pothetical transcript, the 5′ (left) end of the intron is cut (↓) and a lariat formsbetween the G at the 5′ end of the intron and an A near the 3′ end, in the con-sensus sequence UACUAAC. This sequence is called the branch site, and it is the3′ most A that forms the 5′–2’ bond with the G. The 3′ (right) end of the intron isthen cut (⇓). This releases the lariat, which is digested, and exon 1 is joined toexon 2 at G residues.

cleotides in the introns of the various eukaryotic tran-scripts—and even those within a single transcript—arequite heterogeneous, there are reasonably conserved se-quences at each of the two exon-intron (splice) junc-tions and at the branch site, which is located 20–40 nu-cleotides upstream from the 3′ splice site (see consensussequences in Figure 37–12). A special structure, thespliceosome, is involved in converting the primarytranscript into mRNA. Spliceosomes consist of the pri-

mary transcript, five small nuclear RNAs (U1, U2, U5,U4, and U6) and more than 60 proteins. Collectively,these form a small nucleoprotein (snRNP) complex,sometimes called a “snurp.” It is likely that this penta-snRNP spliceosome forms prior to interaction withmRNA precursors. Snurps are thought to position theRNA segments for the necessary splicing reactions. Thesplicing reaction starts with a cut at the junction of the5′ exon (donor or left) and intron (Figure 37–11). This

5′ 3′UAAGU UACUAAC 28-37 nucleotides CA

C

Consensus sequences

Intron5′ Exon Exon 3′

GAG GAG

Figure 37–12. Consensus sequences at splice junctions. The 5′ (donor or left) and 3′ (ac-ceptor or right) sequences are shown. Also shown is the yeast consensus sequence(UACUAAC) for the branch site. In mammalian cells, this consensus sequence is PyNPyPy-PuAPy, where Py is a pyrimidine, Pu is a purine, and N is any nucleotide. The branch site is lo-cated 20–40 nucleotides upstream from the 3′ site.

ch37.qxd 3/16/04 11:02 AM Page 353

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

354 / CHAPTER 37

is accomplished by a nucleophilic attack by an adenylylresidue in the branch point sequence located just up-stream from the 3′ end of this intron. The free 5′ termi-nal then forms a loop or lariat structure that is linkedby an unusual 5′–2′ phosphodiester bond to the reac-tive A in the PyNPyPyPuAPy branch site sequence(Figure 37–12). This adenylyl residue is typically lo-cated 28–37 nucleotides upstream from the 3′ end ofthe intron being removed. The branch site identifiesthe 3′ splice site. A second cut is made at the junctionof the intron with the 3′ exon (donor on right). In thissecond transesterification reaction, the 3′ hydroxyl ofthe upstream exon attacks the 5′ phosphate at thedownstream exon-intron boundary, and the lariatstructure containing the intron is released and hy-drolyzed. The 5′ and 3′ exons are ligated to form a con-tinuous sequence.

The snRNAs and associated proteins are requiredfor formation of the various structures and intermedi-ates. U1 within the snRNP complex binds first by basepairing to the 5′ exon-intron boundary. U2 within thesnRNP complex then binds by base pairing to thebranch site, and this exposes the nucleophilic A residue.U5/U4/U6 within the snRNP complex mediates anATP-dependent protein-mediated unwinding that re-sults in disruption of the base-paired U4-U6 complexwith the release of U4. U6 is then able to interact firstwith U2, then with U1. These interactions serve to ap-proximate the 5′ splice site, the branch point with itsreactive A, and the 3′ splice site. This alignment is en-hanced by U5. This process also results in the forma-tion of the loop or lariat structure. The two ends arecleaved, probably by the U2-U6 within the snRNPcomplex. U6 is certainly essential, since yeasts deficientin this snRNA are not viable. It is important to notethat RNA serves as the catalytic agent. This sequence isthen repeated in genes containing multiple introns. Insuch cases, a definite pattern is followed for each gene,and the introns are not necessarily removed in se-quence—1, then 2, then 3, etc.

The relationship between hnRNA and the corre-sponding mature mRNA in eukaryotic cells is now ap-parent. The hnRNA molecules are the primary tran-scripts plus their early processed products, which, afterthe addition of caps and poly(A) tails and removal ofthe portion corresponding to the introns, are trans-ported to the cytoplasm as mature mRNA molecules.

Alternative Splicing Provides for Different mRNAs

The processing of hnRNA molecules is a site for reg-ulation of gene expression. Alternative patterns ofRNA splicing result from tissue-specific adaptive anddevelopmental control mechanisms. As mentioned

above, the sequence of exon-intron splicing events gen-erally follows a hierarchical order for a given gene. Thefact that very complex RNA structures are formed dur-ing splicing—and that a number of snRNAs and pro-teins are involved—affords numerous possibilities for achange of this order and for the generation of differentmRNAs. Similarly, the use of alternative termination-cleavage-polyadenylation sites also results in mRNAheterogeneity. Some schematic examples of theseprocesses, all of which occur in nature, are shown inFigure 37–13.

Faulty splicing can cause disease. At least oneform of β-thalassemia, a disease in which the β-globingene of hemoglobin is severely underexpressed, appearsto result from a nucleotide change at an exon-intronjunction, precluding removal of the intron and there-fore leading to diminished or absent synthesis of the β-chain protein. This is a consequence of the fact thatthe normal translation reading frame of the mRNA isdisrupted—a defect in this fundamental process (splic-ing) that underscores the accuracy which the process ofRNA-RNA splicing must achieve.

Alternative Promoter Utilization Provides a Form of Regulation

Tissue-specific regulation of gene expression can beprovided by control elements in the promoter or by the

1

mRNA precursor

2 3 AAUAA AAUAA (A)n

1

1

Selective splicing

2 3 AAUAA AAUAA (A)n

1′

Alternative 5′ donor site

2

2

3 AAUAA AAUAA (A)n

2′

Alternative 3′ acceptor site

3 AAUAA AAUAA (A)n

1

Alternative polyadenylation site

3 AAUAA (A)n

Figure 37–13. Mechanisms of alternative process-ing of mRNA precursors. This form of RNA processinginvolves the selective inclusion or exclusion of exons,the use of alternative 5′ donor or 3′ acceptor sites, andthe use of different polyadenylation sites.

ch37.qxd 3/16/04 11:02 AM Page 354

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 355

use of alternative promoters. The glucokinase (GK)gene consists of ten exons interrupted by nine introns.The sequence of exons 2–10 is identical in liver andpancreatic B cells, the primary tissues in which GK pro-tein is expressed. Expression of the GK gene is regulatedvery differently—by two different promoters—in thesetwo tissues. The liver promoter and exon 1L are locatednear exons 2–10; exon 1L is ligated directly to exon 2.In contrast, the pancreatic B cell promoter is locatedabout 30 kbp upstream. In this case, the 3′ boundary ofexon 1B is ligated to the 5′ boundary of exon 2. Theliver promoter and exon 1L are excluded and removedduring the splicing reaction (see Figure 37–14). The ex-istence of multiple distinct promoters allows for cell-and tissue-specific expression patterns of a particulargene (mRNA).

Both Ribosomal RNAs & Most Transfer RNAs Are Processed From Larger Precursors

In mammalian cells, the three rRNA molecules aretranscribed as part of a single large precursor molecule.The precursor is subsequently processed in the nu-cleolus to provide the RNA component for the ribo-some subunits found in the cytoplasm. The rRNAgenes are located in the nucleoli of mammalian cells.Hundreds of copies of these genes are present in everycell. This large number of genes is required to synthe-size sufficient copies of each type of rRNA to form the107 ribosomes required for each cell replication.Whereas a single mRNA molecule may be copied into105 protein molecules, providing a large amplification,the rRNAs are end products. This lack of amplificationrequires a large number of genes. Similarly, transferRNAs are often synthesized as precursors, with extra se-quences both 5′ and 3′ of the sequences comprising the

mature tRNA. A small fraction of tRNAs even containintrons.

RNAS CAN BE EXTENSIVELY MODIFIED

Essentially all RNAs are covalently modified after tran-scription. It is clear that at least some of these modifica-tions are regulatory.

Messenger RNA (mRNA) Is Modified at the 5� & 3� Ends

As mentioned above, mammalian mRNA moleculescontain a 7-methylguanosine cap structure at their 5′terminal, and most have a poly(A) tail at the 3′ termi-nal. The cap structure is added to the 5′ end of thenewly transcribed mRNA precursor in the nucleusprior to transport of the mRNA molecule to the cyto-plasm. The 5� cap of the RNA transcript is requiredboth for efficient translation initiation and protectionof the 5′ end of mRNA from attack by 5′ → 3′ exonu-cleases. The secondary methylations of mRNA mole-cules, those on the 2′-hydroxy and the N6 of adenylylresidues, occur after the mRNA molecule has appearedin the cytoplasm.

Poly(A) tails are added to the 3′ end of mRNA mol-ecules in a posttranscriptional processing step. ThemRNA is first cleaved about 20 nucleotides down-stream from an AAUAA recognition sequence. Anotherenzyme, poly(A) polymerase, adds a poly(A) tail whichis subsequently extended to as many as 200 A residues.The poly(A) tail appears to protect the 3′ end ofmRNA from 3′ → 5′ exonuclease attack. The presenceor absence of the poly(A) tail does not determinewhether a precursor molecule in the nucleus appears inthe cytoplasm, because all poly(A)-tailed hnRNA mole-cules do not contribute to cytoplasmic mRNA, nor doall cytoplasmic mRNA molecules contain poly(A) tails

Liver

1B 1L 2A 2 3 4 5 6 7 8 9 10

(˜30 kb)

B cell /pituitary

1B 1L 2A 2 3 4 5 6 7 8 9 10

(˜30 kb)

Figure 37–14. Alternative promoter use in the liver and pancreatic B cell glucokinasegenes. Differential regulation of the glucokinase (GK) gene is accomplished by the use oftissue-specific promoters. The B cell GK gene promoter and exon 1B are located about30 kbp upstream from the liver promoter and exon 1L. Each promoter has a uniquestructure and is regulated differently. Exons 2–10 are identical in the two genes, and theGK proteins encoded by the liver and B cell mRNAs have identical kinetic properties.

ch37.qxd 3/16/04 11:02 AM Page 355

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

356 / CHAPTER 37

(the histones are most notable in this regard). Cytoplas-mic enzymes in mammalian cells can both add and re-move adenylyl residues from the poly(A) tails; thisprocess has been associated with an alteration of mRNAstability and translatability.

The size of some cytoplasmic mRNA molecules,even after the poly(A) tail is removed, is still consider-ably greater than the size required to code for the spe-cific protein for which it is a template, often by a factorof 2 or 3. The extra nucleotides occur in untrans-lated (non-protein coding) regions both 5′ and 3′ ofthe coding region; the longest untranslated sequencesare usually at the 3′ end. The exact function of these se-quences is unknown, but they have been implicated inRNA processing, transport, degradation, and transla-tion; each of these reactions potentially contributes ad-ditional levels of control of gene expression.

RNA Editing Changes mRNA After Transcription

The central dogma states that for a given gene and geneproduct there is a linear relationship between the cod-ing sequence in DNA, the mRNA sequence, and theprotein sequence (Figure 36–7). Changes in the DNAsequence should be reflected in a change in the mRNAsequence and, depending on codon usage, in protein se-quence. However, exceptions to this dogma have beenrecently documented. Coding information can bechanged at the mRNA level by RNA editing. In suchcases, the coding sequence of the mRNA differs fromthat in the cognate DNA. An example is the apolipo-protein B (apoB) gene and mRNA. In liver, the singleapoB gene is transcribed into an mRNA that directs thesynthesis of a 100-kDa protein, apoB100. In the intes-tine, the same gene directs the synthesis of the primarytranscript; however, a cytidine deaminase converts aCAA codon in the mRNA to UAA at a single specificsite. Rather than encoding glutamine, this codon be-comes a termination signal, and a 48-kDa protein(apoB48) is the result. ApoB100 and apoB48 have dif-ferent functions in the two organs. A growing numberof other examples include a glutamine to argininechange in the glutamate receptor and several changesin trypanosome mitochondrial mRNAs, generally in-volving the addition or deletion of uridine. The exactextent of RNA editing is unknown, but current esti-mates suggest that < 0.01% of mRNAs are edited inthis fashion.

Transfer RNA (tRNA) Is ExtensivelyProcessed & Modified

As described in Chapters 35 and 38, the tRNA mole-cules serve as adapter molecules for the translation of

mRNA into protein sequences. The tRNAs containmany modifications of the standard bases A, U, G, andC, including methylation, reduction, deamination, andrearranged glycosidic bonds. Further modification ofthe tRNA molecules includes nucleotide alkylationsand the attachment of the characteristic CpCpAOH ter-minal at the 3′ end of the molecule by the enzyme nu-cleotidyl transferase. The 3′ OH of the A ribose is thepoint of attachment for the specific amino acid that isto enter into the polymerization reaction of proteinsynthesis. The methylation of mammalian tRNA pre-cursors probably occurs in the nucleus, whereas thecleavage and attachment of CpCpAOH are cytoplasmicfunctions, since the terminals turn over more rapidlythan do the tRNA molecules themselves. Enzymeswithin the cytoplasm of mammalian cells are requiredfor the attachment of amino acids to the CpCpAOHresidues. (See Chapter 38.)

RNA CAN ACT AS A CATALYST

In addition to the catalytic action served by the snRNAs in the formation of mRNA, several otherenzymatic functions have been attributed to RNA.Ribozymes are RNA molecules with catalytic activity.These generally involve transesterification reactions,and most are concerned with RNA metabolism (splic-ing and endoribonuclease). Recently, a ribosomal RNAcomponent was noted to hydrolyze an aminoacyl esterand thus to play a central role in peptide bond function(peptidyl transferases; see Chapter 38). These observa-tions, made in organelles from plants, yeast, viruses,and higher eukaryotic cells, show that RNA can act asan enzyme. This has revolutionized thinking about en-zyme action and the origin of life itself.

SUMMARY

• RNA is synthesized from a DNA template by the en-zyme RNA polymerase.

• There are three distinct nuclear DNA-dependentRNA polymerases in mammals: RNA polymerases I,II, and III. These enzymes control the transcriptionalfunction—the transcription of rRNA, mRNA, andsmall RNA (tRNA/5S rRNA, snRNA) genes, respec-tively.

• RNA polymerases interact with unique cis-active re-gions of genes, termed promoters, in order to formpreinitiation complexes (PICs) capable of initiation.In eukaryotes the process of PIC formation is facili-tated by multiple general transcription factors(GTFs), TFIIA, B, D, E, F, and H.

• Eukaryotic PIC formation can occur either step-wise—by the sequential, ordered interactions of

ch37.qxd 3/16/04 11:02 AM Page 356

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com

RNA SYNTHESIS, PROCESSING, & MODIFICATION / 357

GTFs and RNA polymerase with promoters—or inone step by the recognition of the promoter by a pre-formed GTF-RNA polymerase holoenzyme complex.

• Transcription exhibits three phases: initiation, elon-gation, and termination. All are dependent upon dis-tinct DNA cis-elements and can be modulated bydistinct trans-acting protein factors.

• Most eukaryotic RNAs are synthesized as precursorsthat contain excess sequences which are removedprior to the generation of mature, functional RNA.

• Eukaryotic mRNA synthesis results in a pre-mRNAprecursor that contains extensive amounts of excessRNA (introns) that must be precisely removed byRNA splicing to generate functional, translatablemRNA composed of exonic coding and noncodingsequences.

• All steps—from changes in DNA template, sequence,and accessibility in chromatin to RNA stability—aresubject to modulation and hence are potential con-trol sites for eukaryotic gene regulation.

REFERENCES

Busby S, Ebright RH: Promoter structure, promoter recognition,and transcription activation in prokaryotes. Cell 1994;79:743.

Cramer P, Bushnell DA, Kornberg R: Structural basis of transcrip-tion: RNA polymerase II at 2.8 angstrom resolution. Science2001;292:1863.

Fedor MJ: Ribozymes. Curr Biol 1998;8:R441.Gott JM, Emeson RB: Functions and mechanisms of RNA editing.

Ann Rev Genet 2000;34:499.Hirose Y, Manley JL: RNA polymerase II and the integration of

nuclear events. Genes Dev 2000;14:1415.Keaveney M, Struhl K: Activator-mediated recruitment of the

RNA polymerase machinery is the predominant mechanismfor transcriptional activation in yeast. Mol Cell 1998;1:917.

Lemon B, Tjian R: Orchestrated response: a symphony of tran-scription factors for gene control. Genes Dev 2000;14:2551.

Maniatis T, Reed R: An extensive network of coupling among geneexpression machines. Nature 2002;416:499.

Orphanides G, Reinberg D: A unified theory of gene expression.Cell 2002;108:439.

Shatkin AJ, Manley JL: The ends of the affair: capping and poly-adenylation. Nat Struct Biol 2000;7:838.

Stevens SW et al: Composition and functional characterization ofthe yeast spliceosomal penta-snRNP. Mol Cell 2002;9:31.

Tucker M, Parker R: Mechanisms and control of mRNA decap-ping in Saccharomyces cerevisiae. Ann Rev Biochem 2000;69:571.

Woychik NA, Hampsey M: The RNA polymerase II machinery:structure illuminates function. Cell 2002;108:453.

ch37.qxd 3/16/04 11:02 AM Page 357

Copy

right

ed M

ater

ial

Copyright © 2003 by The McGraw-Hill Companies Retrieved from: www.knovel.com