Dna Replication

37
MICROBIAL GENETICS DNA Replication Prof. Rupinder Tewari Professor Department of Biotechnology Panjab University Sector 14, Chandigarh - 160 014 E-mail: [email protected] 26- Oct-2006 (Revised 12-Dec-2007) CONTENTS DNA Replication Models for DNA Replication Bidirectional Nature of DNA Replication 5-3Semi-discontinuous Synthesis of DNA DNA Replication Requires Many Enzymes and Proteins Mechanism of DNA Replication Eukaryotic DNA Replication DNA Replication – Enzymes, Accessary Proteins and Inhibitors DNA Helicases Topoisomerases Primases DNA Polymerases Ligases Accessory Proteins Inhibitors of DNA Replication Concepts of Gene Expression Introduction Transcription Transcription in Prokaryotes Transcription in Eukaryotic Cells Processing of Primary mRNA Transcript Translation Ribosomes tRNAs Aminoacyl RNA Synthetases Protein Factors Mechanism of Translation Translation in Eukaryotes Regulation of Gene Expression Prokaryotic and Eukaryotic Gene Expression: a Comparison Eukaryotic Gene Regulation Key words DNA Replication; Transcription; Translation; Gene regulation: Gene expression.

Transcript of Dna Replication

Page 1: Dna Replication

MICROBIAL GENETICS

DNA Replication

Prof. Rupinder Tewari Professor

Department of Biotechnology Panjab University

Sector 14, Chandigarh - 160 014 E-mail: [email protected]

26- Oct-2006 (Revised 12-Dec-2007)

CONTENTS

DNA Replication Models for DNA Replication Bidirectional Nature of DNA Replication 5′-3′ Semi-discontinuous Synthesis of DNA DNA Replication Requires Many Enzymes and Proteins Mechanism of DNA Replication Eukaryotic DNA Replication DNA Replication – Enzymes, Accessary Proteins and Inhibitors DNA Helicases Topoisomerases Primases DNA Polymerases Ligases Accessory Proteins Inhibitors of DNA Replication Concepts of Gene Expression Introduction Transcription Transcription in Prokaryotes Transcription in Eukaryotic Cells Processing of Primary mRNA Transcript Translation Ribosomes tRNAs Aminoacyl RNA Synthetases Protein Factors Mechanism of Translation Translation in Eukaryotes Regulation of Gene Expression Prokaryotic and Eukaryotic Gene Expression: a Comparison Eukaryotic Gene Regulation Key words DNA Replication; Transcription; Translation; Gene regulation: Gene expression.

Page 2: Dna Replication

2

DNA Replication

Deoxyribonucleic acid (DNA) contains the genetic instructions for the biological development of a cellular form of life including some viruses. DNA is an antiparallel double helix molecule with sugar-phosphate backbone on the outer side and nitrogen bases in the inner side. The bases are paired specifically, also known as complementary pairing, Adenine (A) with Thymine (T), and Guanine (G) with Cytosine (C) by two and three hydrogen bonds, respectively. DNA is a long polymer of nucleotides (a polynucleotide) that transfers the information to RNA (mRNA) that in turn controls the sequence of amino acid residues in proteins, using the genetic code. James Watson and Francis Crick proposed the DNA double helix structure in 1953 and shared the the Noble Prize in 1962 with Maurice Wilkins. Models for DNA Replication

While proposing their double helix model, Watson and Crick were particularly excited because the complementary nature of the DNA molecule suggested a self-replicating mechanism, a property which would be crucial for a genetic material. Three hypothesis were in contention regarding the mode of self-replication of DNA and these were (Fig. 1):

i) Conservative replication according to which the original strand is unchanged and a completely new DNA molecule is synthesized.

ii) Semi-conservative replication describes a way in which each DNA strand of a double helical molecule serves as a template for the synthesis of two new DNA molecules each with one new and one old strand.

iii) Dispersive replication suggests more-or-less random interspersion of parental and new segments in daughter DNA molecules.

Page 3: Dna Replication

3

In 1957, Matthew Meselson and Franklin Stahl, proved that DNA is replicated by semi- conservative mode of replication. Their experiment involved growing a culture of Escherichia coli in a medium containing 15NH4Cl (ammonium chloride labeled with the heavy isotope of nitrogen) for several generations. A small aliquot of culture was taken out, it’s DNA extracted and was run on CsCl density gradient centrifugation. All the DNA was seen to settle at a point as a single band. From this culture, some cells were transferred to a normal growth medium (containing non- labeled 14NH4Cl). Samples were taken after 20 minutes (i.e. after one cycle of cell division) and 40 minutes (i.e. after two cell divisions) of incubation. DNA was extracted from each sample and subjected to density gradient centrifugation. The results are depicted in Fig.2. After 20 minutes all the DNA contained similar amounts of 14N and 15N, but after 40 minutes two bands were seen, one corresponding to hybrid 14N-15N-DNA, and the other to DNA molecules made entirely of 14N. This was inferred from the different patterns of DNA band(s) after density gradient centrifugation.

The banding pattern seen after 20 minutes enables conservative replication to be discounted because this scheme predicts that after one round of replication there will be two different types of double helix, one containing just 15N and the other containing just 14N. The single 14N-15N-DNA band that was actually seen after 20 minutes is compatible with both dispersive and semi-conservative replication, but the two bands seen after 40 minutes are consistent only, with semi-conservative replication. Dispersive replication should continue to give rise to hybrid 14N-15N molecules after two rounds of replication, whereas the granddaughter molecules produced at this stage by semiconservative replication include two that are made entirely of 14N-DNA and 14N /15N-DNA.

Page 4: Dna Replication

4

Bidirectional Nature of DNA Replication

Once the semi-conservative mode of replication of DNA was confirmed, scientists started asking new questions, e.g. a) is there a discreet region on DNA for initiating DNA replication?, b) is replication unidirectional or bidirectional? and c) Does both DNA strands unwind partially or completely during replication? Answer to these problems were provided by J. Cairns in 1963. He was conducting experiments with the bacterium, Escherichia coli to prove semi-conservative mode of replication using autoradiography technique. This experiment was based on the fact that radioactive atoms exposed to photographic film will form visible grains on the film which provide an estimate of quantity of radioactive material present. Cairns grew E.coli cells in media containing radioactive thymidine (with 3H), a component of DNA. He then extracted the DNA, examined it under electron microscope and coupled it with autoradiography by placing a photographic film on the same. His interpretation of autoradiograph revealed several points. First, that E.coli DNA was circular. Second, The radioactive impression on film formed a structure which was similar to Greek letter theta (θ) thereby suggesting that DNA replicated while maintaining its integrity that is it didn’t appear to break during the process of DNA replication (Fig.3). Third, replication appeared to be occurring at one or two Y-junctions or replication forks in the loop so formed. The DNA is unwound and replication proceeded in one or both directions. In some samples, radioactivity was not applied till the replication fork (dynamic ends of loops where parental strands are unwound and separated strands are replicated) had begun. So the radioactivity label appeared after the theta structure had already begun forming. By comparing the distribution of silver grains on autoradiograph, Cairns found the replication to be bidirectional. Also the replication loops always originate at a particular point called as origin of replication which in E.coli is known as OriC. The length of DNA that is replicated following one initiation event at a single origin is a unit referred as a replicon. Thus, bacterial chromosome, such as in E.coli can be considered as a replicon.

Page 5: Dna Replication

5

5′-3′ Semi-discontinuous Synthesis of DNA

The work of several scientists in the subsequent years provided the critical inputs on the mechanism of DNA replication. A new strand of DNA is synthesized, by DNA polymerase, always in 5′-3′ direction and thus the incoming nucleotide is added on to the 3′-OH group of the last nucleotide in the growing daughter DNA strand. Till date, no DNA polymerase has been discovered that can synthesize DNA in 3′-5′ direction. As the two DNA strands are antiparallel the strand acting as template is read from its 3′ end towards 5′ end. In a replication fork, as observed by Cairns, if both of the strands are to be synthesized simultaneously then one of the strands would have to grow in 3′-5′ direction. But as mentioned earlier, DNA polymerase with 3′-5′ biosynthetic activity does not occur. The other possibility is that the two strands completely unwind which was also negated by the work of Cairns. So the question is, how both DNA daughter strands are synthesized at the same time? This problem was solved by Reiji Okazaki and his colleagues in 1960s. They showed that while one of the strands is synthesized in short segments called Okazaki fragments (ranging in length from few hundred to few thousand nucleotides), the other follows a continuous replication. They thus proposed that one strand is synthesized continuously, called the leading strand, in which the 5′-3′ synthesis of new strand moves in the same direction as that of the replication fork. The other undergoes discontinuous synthesis in the form of Okazaki fragments called lagging strand. While in the latter, the 5′-3′ synthesis is maintained it proceeds in direction opposite to the direction of fork movement. DNA Replication Requires Many Enzymes and Proteins

Arthur Kornberg and his colleagues (1955) purified and crystallized DNA polymerase from E.coli cells and called it as DNA polymerase I. This enzyme had the ability to polymerize deoxynucleotides to synthesize new daughter DNA strands. However, subsequent studies revealed more DNA polymerases, and it turned out that DNA polymerase III, and not DNA polymerase I, was the key enzyme involved in DNA replication. The process of DNA replication is very complex and requires the involvement of nearly 30 proteins even in a prokaryotic organism like E.coli. The number is much more in case of eukaryotic DNA replication. Briefly, enzymes/proteins are needed for unwinding of DNA, removal of topological tension in DNA strands due to unwinding, synthesis of DNA, proteins/enzymes for correcting errors in newly synthesized DNA strands, prevention of ssDNA from digestion by nucleases etc. These enzymes/proteins have been more or less characterized in this organism and are described later in this chapter. Mechanism of DNA Replication

The mechanism of DNA replication in all forms of life i.e. prokaryotes and eukaryotes is essentially the same though it is more complex in eukaryotes. The basic steps involved in DNA synthesis are: Initiation, Elongation and Termination. Initiation

In E.coli, DNA replication gets started at a discreet region of the chromosome called origin of replication (oriC). This region consists of 245bp and has many unique features which are highly conserved in bacterial kingdom. This locus contains two series of short repeats: a) a

Page 6: Dna Replication

6

tandem array of three 13bp sequence that are rich in A=T base pairs and b) four repeats of a 9bp sequence (Fig. 4).

The initiation phase of DNA replication requires at least nine enzymes/proteins as listed in Table 1. Table 1: Proteins/enzymes involved in initiation step of DNA replication in Escherichia

coli Protein/enzyme Size

(KDa) No. of subunits

Function

DnaA protein 52.00 1 Recognition of oriC region in DNA; denaturation of oriC region

Helicase (DnaB protein) 300.00 6 Unwinding of dsDNA DnaC protein 29.00 1 Assists helicase to bind to DNA HU 19.00 2 Helps DnaB protein in denaturation of

DNA Primase (DnaG protein) 60.00 1 Synthesis of RNA primers Single strand DNA binding proteins (SSB)

75.60 4 Binding to ssDNA

RNA polymerase 454.00 5 Promotes DnaA protein activity DNA gyrase (DNA topoisomerase II)

400.00 4 Relieves topological strain generated by DNA unwinding

Dam methylase 32.00 1 Methylates (5′) GATC palindromic sequences present in oriC locus

Page 7: Dna Replication

7

To begin with, nearly 20 DnaA protein molecules bind to four 9 bp repeats of OriC. These Dna molecules in conjunction with histone like protein molecules, called HU proteins, and ATP molecules denatures the DNA in the A=T rich region of conserved sequence. This is followed by the binding of DnaB proteins to the A=T rich regions of 13 bp repeats, which in turn is assisted by another protein, DnaC and ATP molecules. DnaB protein in fact is an enzyme called helicase which unwinds the double stranded DNA to create a localized region of single stranded form. Another set of protein called single stranded DNA binding proteins (SSB) attach to the unwound DNA strands. Their function is to prevent binding of ssDNA forms into their native state i.e. dsDNA form and to prevent its digestion by nucleases. The unwound part of dsDNA bound to SSB, called prepriming complex, is the region where DNA polymerase will initiate synthesis of new DNA. The unwinding of short stretch of DNA results in putting tension, also called topological stress, on the dsDNA molecule. This tension is relieved by the involvement of another enzyme, DNA gyrase or topoisomerase II. Initiation of DNA replication is regulated, such that replication occurs only once during a cell cycle. This control is mediated by DNA methylation. An enzyme Dam (DNA Adenine Methylation) methylase, methylates the N-6 position of adenine located within the palindromic sequence (5′) GATC in the oriC. The 245 bp oriC has 11 such sequences as compared to one per 256bp in rest of the E.coli chromosome. Newly synthesized DNA strand is not methylated. Therefore, immediately after replication of DNA, the hemimethylated DNA (one strand methylated and other not methylated) gets bound to cell membrane of bacteria. In this situation, DNA cannot be replicated. After a while, oriC region of hemi-methylated DNA is released from cell membrane, gets methylated at specific places (N-6 of adenines in GATC region) and is ready for next round of DNA replication. As per Cairns model of DNA replication, both parental strands act as templates simultaneouly for synthesis of two daughter complementary DNA strands. This type of synthesis generates what is called a replication fork as it resembles a two-pronged fork. With the local unwinding of dsDNA, two template strands now become available for the initiation of replication. All known DNA polymerases have a common property i.e. they cannot initiate DNA replication or in other words all of them require the presence of a short stretch of nucleotides, called as primer, on to which incoming deoxynucleotides are added. During DNA synthesis, an RNA polymerase primase or primase adds a short stretch (10-60 nucleotides long) of RNA nucleotides called as primer, complementary to one of the parental DNA strands. Elongation

Once the primer has been formed, DNA polymerase starts adding new nucleotides and thus the DNA daughter strands start to grow. The addition of nucleotides is very fast . Nearly one thousand nucleotides per second are incorporated in both leading and lagging strands. As described earlier, the new DNA strand is always synthesized, by DNA polymerase, in 5′-3′ direction and incoming nucleotide attaches to the 3′-OH group of the last nucleotide in the growing daughter DNA strand. This process continues as the longer template stretch is made availably by progressive unwinding of the dsDNA. In order to maintain the 5′- 3′ direction of DNA synthesis on two antiparallel template strands, Reiji Okazaki and his colleagues in 1960s proposed that one of the DNA strands is synthesized in small stretches called Okazaki fragments, ranging in length from few hundred to few thousand nucleotides (Fig.5). This strand, whose DNA synthesis is discontinuous, is termed as lagging strand The other DNA daughter strand is synthesized continuously, called the leading strand, in which the 5′-3′ synthesis of new strand moves in the same direction as that of the replication fork. The

Page 8: Dna Replication

8

direction of synthesis of lagging strand is opposite to the direction of fork movement. The discontinuous synthesis of lagging strand in 5′-3′ DNA direction gives rise to overall growth in 3′-5′ direction (Fig.5). It is interesting to note that the DNA Pol III molecule , which synthesizes leading strand also synthesizes lagging strand. This can be understood by examining the structure of this enzyme. DNA Pol III is a dimeric enzyme consisting of 10 different subunits. Each subunit is responsible for the synthesis of both leading and lagging strands. A small region of the template strand, responsible for the synthesis of lagging strand, makes a loop. The orientation of the loop is such that the direction of the DNA fragment i.e. Okazaki fragment to be synthesized is the same as that of leading strand. After synthesis of nearly 1000 nucleotides in the lagging strand, DNA Pol III enzyme let go the lagging strand. A new loop is then formed and a new cycle of DNA synthesis starts.

Page 9: Dna Replication

9

Later on, the primer sequence from leading as well as individual Okazaki fragments is excised and replaced by deoxynucleotides by DNA polymerase I. The nicks are sealed by DNA ligase. The ligation is carried out in the presence of NAD+ in bacteria and ATP in case of viruses and eukaryotes. Table 2 lists the role of individual proteins in the elongation of DNA strands during DNA replication.

Table 2: Role of individual proteins in the elongation of DNA strands during DNA replication

Protein Size

(KDa) No. of subunits

Function

SSB 75.6 4 SSB proteins coat ssDNA and prevents its degradation by nucleases

DnaB (helicase) 300.00 6 Unwinding of dsDNA DnaG (primase) 60.00 1 Synthesis of RNA primer DNA polymerase III 791.50 17 Synthesis of DNA strands DNA polymerase I 103.00 1 Filling of gaps in DNA strands;

excision of RNA primers DNA ligase 74.00 1 Ligation DNA gyrase (Topoisomerase II)

400.00 4 Supercoiling of DNA

Termination

Near the completion of the replication, the two replication forks meet at the terminus (Ter) region containing multiple copies of Ter sequence. In this process, the Tus (terminus utilization substance) protein binds to the Ter sequence and forms a Tus-Ter complex. This complex halts the replication fork which ever enters first at the termination site. Ter sequence prevents the over replication in case when one of the replication fork gets delayed due to DNA damage or some other reason by stopping the first replication fork. There are six ter sequences in a chromosome, three stop the fork coming from one side and the other three stop the one coming from other side. The fork reaching later stops by colliding with the first one. At the end of theta mode of replication, it generates a topologically intertwined two circular molecules of chromosomal DNA, which are released by topoisomerase IV later on and the ligase closes them up. Eukaryotic DNA Replication

Although the mechanism of DNA synthesis in eukaryotes and prokaryotes is similar, DNA replication in eukaryotes is much more complicated. Eukaryotic cells have more than a dozen DNA polymerases. Two of these (sigma and delta) are important for the replication of eukaryotic chromosomes. The rate of synthesis of DNA in eukaryotic cells is about one tenth the rate of bacterial DNA synthesis. E.coli replication fork moves at a speed of 25000 base pairs per minute as compared to only 2000 base pairs per minute in case of eukaryotes. At this speed, the cells would take as much as 500 hours to replicate it’s DNA. To overcome this hurdle, DNA of eukaryotes have many replicons. In yeast like Saccharomyces cerevisae, which can be considered to be the eukaryotic equivalent of E. coli, these replicons are known

Page 10: Dna Replication

10

as Autonomously Replicating Sequences (ARS). There are approximately 400 ARS elements in 12 chromosomes of Saccharomyces cerevisae. The mode of DNA replication in eukaryotes is similar to prokaryote’s i.e. semi-conservative, in which one strand serves as the template for the second strand. Furthermore, DNA replication occurs only at a specific step in the cell cycle. The first step in the eukaryotic DNA replication is the formation of the pre-initiation replication complex (the pre-RC) which occurs in G1 phase. The formation of the pre-RC is known as licensing, but a licensed pre-RC cannot initiate replication. Initiation of replication occurs during the S-phase. Thus, the separation of licensing and activation ensures that the origin can only fire once per cell cycle. DNA replication in eukaryotes is not very well characterized. However, researchers believe that it begins with the binding of the origin recognition complex (ORC) to the origin of replication sites. This complex is a hexamer of related proteins and remains bound to the origin, even after DNA replication occurs. Once the complex is properly positioned around origin of replication, DNA replication sets in motion. At each of the multiple origin sites, the replication is bidirectional. DNA helicase unwinds parental DNA strand and SSB called replication protein A molecules attach to ssDNA template strands. Formation of RNA primers and daughter DNA strand, up to 20 nucleotides is carried out by DNA polymerase alpha. Another protein called protein replication factor C (RFC) displaces DNA polymerase alpha. RFC sequentially binds a protein called proliferative cell nuclear antigen (PCNA), DNA polymerase delta. This complex then synthesizes long stretches of DNA strands. DNA polymerase-δ has a 3′-5′ proofreading activity and has structural similarity to E.coli DNA polymerase III. DNA polymerase € is used in DNA repair where it replaces the DNA polymerase-δ or may be in the removal of primers of okazaki fragments. The short Okazaki fragments are joined to one another, end to end, by DNA ligase. The replication continues in both the directions until adjacent replicons meet and fuse. In general, prokaryotic genome is single and circular, where as eukaryotic chromosomes are more than one and linear. DNA polymerase finds it difficult to replicate DNA ends because of topological constraints. DNA sequence analysis of the chromosome ends revealed that they are made up of tandem repeats of GC rich hexanucleotide sequence called telomeres. In human each telomere comprises of over hundred repeat sequences (GGGTTC). Telomeres are synthesized by a special class of enzymes called telomerase. Finally, telomeres must be capped by a protein to prevent chromosomal instability. Each replicon is replicated only once per single cell division. This fact is controlled by a group of proteins, which are involved in formation of ORC required for initiation of DNA replication. Once replication has started, these factors are broken down by cellular proteases. DNA Replication – Enzymes, Accessary Proteins and Inhibitors

At the very basic, we know that DNA is replicated by DNA-directed DNA polymerases, which utilize the single stranded DNA as template to synthesize the daughter strand. However, one cannot imagine the process of replication to be single handedly carried out by a polymerase alone. Firstly, as mentioned above, the two strands have to be separated so as to provide the needed single stranded template – a process which involves the action of a class of enzymes known as the helicases. Moreover, none of the already discovered polymerases can initiate replication, that is they can only elongate pre-existing strand. Hence, there arises the need for another class of enzymes, namely, the primases. Thirdly, the lagging strand synthesis is semi-discontinuous, thereby facilitating the requirement of ligases.

Page 11: Dna Replication

11

The above paragraph strives to justify the fact that the process of DNA replication involves the interplay of a plethora of enzymes, as well as accessory proteins, many of which have been understood in good amount of detail. What follows is a description of each of the components of the replicative machinery. DNA Helicases

Helicases constitute a class of enzymes that can move along a DNA duplex utilizing the energy of ATP hydrolysis to separate the strands. In most prokaryotes, the separated strands are inhibited from subsequently reannealing by a single-strand-binding protein (SSB protein), which binds to both separated strands. Since a single stranded template is imperative for the action of polymerases, almost all DNA polymerases acting on duplex DNA are dependent on helicase activity. In E.coli, over ten different helicases are present. These have varying polarities as well as processivities. Helicases may traverse along the DNA strand in either the 5′-3′ or the 3′-5′ direction, while one helicase is known to move in either direction. Processivity is measured by the number of nucleotides separated during each association event. Among the different known helicases, the one that is best characterized in E.coli and is in fact, central to its DNA replication is the DnaB protein. It is a hexamer of 50 kDa subunits. The subunits are arranged so that there is a hole down the center into which the ssDNA fits. The three-dimensional structure of the amino-terminal 141-residues of DnaB helicase has been determined by both high-resolution NMR and x-ray crystallography. This is the domain that is essential for the interaction with proteins such as primase and DnaC. It is a six-helix bundle with a folding pattern that is quite unique to it. It processively unwinds the duplex with a 5′-3′polarity. This enzyme possesses some rather distinctive features. Firstly, it is the only enzyme present which can utilize not only ATP but also GTP and CTP, although ATP is preferred. Moreover, melting can be independent of most of the replicative machinery. It does not require the action of primase or DNA pol III though these when present do stabilize the replication fork. Apart from its role in strand separation, the DnaB protein also interacts with the primase, and is imperative for its activity. In keeping with its multiple roles, the DnaB helicase has sites for, activation of primase, DnaC protein (which holds it in an inactive state), single stranded DNA, double stranded DNA and bringing about the hydrolysis of ATP. Topoisomerases

Enzymes belonging to this class solve the topological problems associated with DNA replication by introducing temporary single- or double-strand breaks in the DNA.What essentially happens is that for every 10.5 base pairs melted by the helicase, one link between the two strands must be removed by introducing a strand breakage. In fact, in prokaryotes, a negatively supercoiled template is a prerequisite for initiation of replication. However this is a high energy state and hence topoisomerases are imperative to maintaining this energy state. Topoisomerases are classified based on the type of strand break that they introduce. Accordingly, the Type I enzymes, introduce a break into one strand of DNA while Type II enzymes generate a break in both strands. Type I enzymes can act on single stranded circular DNA or duplexes only if they contain a nick or a gap, while Type II enzymes can act on even two covalently closed duplex circles.

Page 12: Dna Replication

12

In E.coli, the Type I enzymes include the Topoisomerase I as well as Topoisomerase III. Topo I is a zinc dependent enzyme, which has a molecular weight of approx. 100 kDa. It is essentially involved in relaxing of the negatively supercoiled DNA, in the course of replication. In eukaryotes, Topo I, is a monomeric protein of approx 95 kD. In contrast to its prokaryotic counterpart, the eukaryotic enzyme relaxes positive supercoils as well and favors duplex regions over ssDNA. Topo III is encoded by the topB gene in E.coli and is a monomer of 74 kDa. It possesses a significantly higher activity than Topo I as well as DNA gyrase, when it comes to separation of nicked or interlocked circles. It is present in only one to ten copies per cell. The E.coli DNA gyrase, also know as Topoisomerase II, is a type II enzyme. Although gyrase is evolutionarily related to other topoisomerases, it is the only enzyme of this group that can introduce supercoils into DNA. ATP is hydrolysed in this process. It consists of two subunits that form an A2B2 complex, with a molecular weight close to 400 kDa. The GyrA subunit is encoded by the gyrA (nalA) gene and has a molecular weight of 96,756 Da. The GyrB subunit, is encoded by the gyrB (cou) gene and has a molecular weight of 89762 Da. The A protein consists of an amino-terminal domain (59-64 kDa), which is involved in DNA breakage and reunion, and a carboxy-terminal domain (33 kDa), which is involved in DNA-protein interactions. The B protein consists of an amino-terminal domain (43 kDa), which contains the ATPase activity, and a carboxyterminal domain (47 kDa), which is involved in interactions with the A protein and DNA. The eukaryotic Topo II enzyme is a homodimer, and is involved in relaxation of negative as well as positive supercoils, but in contrast to the bacterial gyrase does not introduce negative supercoils. Primases

The fact that a free 3′–OH is imperative for the DNA polymerase to initiate replication, calls for the need of an enzyme to catalyze the synthesis of primers. Moreover, while the leading strand synthesis is continuous and hence needs to be primed just once, the discontinuous synthesis of the lagging strand in the form of Okazaki fragments, involves synthesis of primer for each one of these fragments. Most prokaryotic systems have two enzymes. One is the RNA polymerase which also happens to be the enzyme which mediates transcription. The other is the Primase which bears great similarity to the former. However subtle differences do exist. Primase, the product of the dnaG gene is the smaller of the two and is a single-stranded DNA dependent polymerase as against RNA polymerase which is double stranded DNA-dependent polymerase. Moreover inhibition studies using the RNA polymerase inhibitor rifampicin have shown that primase, which is insensitive to rifampicin, initiates lagging strand synthesis, since the presence of this inhibitor only stopped leading strand synthesis. Thus, it may be inferred that it is the lagging strand synthesis, where the primase assumes prime importance. Moreover, by regulating the synthesis of the primers for the lagging strand, primase plays a regulatory role in the DNA replication process. E. coli primase protein has 582 amino acids and is present in around 50 -100 copies per cell. It has several characteristic motifs, among which motif1 is a zinc finger motif which apart from being involved in the chain initiation step, may also play a role in conferring stability on the primase. No particular function has been ascribed to motifs 2 and 3 while motifs 4, 5 and 6 have been postulated to be magnesium binding sites. One of the magnesium binding motifs

Page 13: Dna Replication

13

resembles the “active magnesium” motif found in all DNA and RNA polymerases. This motif can be considered to be involved in phosphor-diester bond formation. However, primase function is not independent and is a function of its interactions with multiple accessory proteins as well as other components of the replicative machinery. Based on its function at the replication fork, primase must minimally bind to 2 magnesium ions, 2 dNTPs, DnaB helicase and one or more of the ten subunits of DNA polymerase III holoenzyme among other cofactors. In eukaryotes, primase activity is generally a part of DNA polymerase α, which is composed of four subunits including a large DNA polymerase subunit (approx 180 kDa), two small subunits ( approx. 60 ad 50 kDA respectively) associated with primase activity and a 70 kDa protein. Pol α is localized in the nucleus and has been implicated in chromosomal replication. The primase activity of the two small subunits remains intact even when separated from the larger subunits. DNA Polymerases

Discovered by Arthur Kornberg in 1957, DNA Pol I was the first polymerase identified. The principal functions carried out by this enzyme may be listed as polymerization, pyrophosphorolysis, pyrophosphate exchange and two independent exonucleolytic hydrolyses, one mediated from the 3′-5′ end and the other from the 5′-3′ end. Except for the 5′–3′ exonucleolytic activity which is present in the relatively small N- terminal fragment, all the other activities are contained in the large C- terminal fragment (Klenow fragment). The 5′-3′ exonuclease activity is unique to DNA Pol I. The small N-terminal fragment can be separated from rest of the structure by mild protease treatment. The enzyme without 5′-3′ activity domain is called large or Klenow fragment, and has the polymerization and proof reading (3′-5′) activity. Klenow fragment comprises two domains: smaller domain (residues 324- 517) containing the 3′-5′ exonuclease activity, and the larger domain (residues 521-928) containing the polymerase active site at the bottom of a cleft which is the binding site for the B-DNA molecule. The observation that E.coli mutants with a limited Pol I activity, could grow quite normally stimulated the hunt for other enzymes, and this led to the subsequent identification of other DNA polymerases which are listed in Table 3.

Table 3: Important DNA polymerases present in cells

DNA Pol I DNA Pol II DNA Pol III

Gene polA polB polC No. of subunits 1 Equal/more than 4 Equal/more than 10 3′-5′ exonuclease activity (proof reading)

Yes Yes Yes

5′-3′ exonuclease activity Yes No No Rate of polymerization 16-20 40 250-1000 Processivityx 3-200 1,500 Equal/more than 500,000

Page 14: Dna Replication

14

The major enzyme involved in DNA synthesis is Pol III also known as replicase. Its core subunit has the composition αεθ and its catalytic properties resemble those of Pol I. It is in fact, a part of the multisubunit enzyme, the Pol III holoenzyme, which contains atleast 10 types of subunits, as listed in the table 4. However processive DNA synthesis, a property of DNA polymerase III holoenzyme of Escherichia coli, is not achieved by combining the pol III core components. This is because processivity requires the association of the core with the β subunit in the presence of the γ complex – γδδ’χψ. The β subunit confers almost unlimited processivity to the enzyme, owing to its role as a clamp. Its 366 residue monomeric subunits associate to form a dimer, which is doughnut shaped, having a diameter of 80 A°, through which the DNA strand can thread out.

Table 4: Components of DNA Pol III holoenzyme

Subunit Mass (KDa) Structural gene

α 130 polC ε 27.5 dnaQ θ 10 holE

τ 71 dnaX

γ 45.5 dnaX

δ 35 holA

δ' 33 holB

χ 15 holC

ψ 12 holD β 40.6 dnaN

Eukaryotes contain at least 5 different types of DNA polymerases, named according to the order in which they were discovered. Pol α (> 250 kDa) , a multi subunit protein, occurs only in the cell nucleus, and plays a role in the replication of chromosomal DNA. It has an associated primase activity. Pol δ(170 kDa), on the other hand exhibits a 3′- 5′ proofreading activity. Moreover, considering that in association with a protein known as the proliferating cell nuclear antigen (PCNA), it has almost unlimited processivity. It is assumed that the Pol δ-PCNA complex is the leading strand replicase while, Pol α with its limited processivity is the lagging strand replicase. Pol ε (256 kDa) is highly processive even in absence of PCNA, thus indicating that it may play arole in DNA replication. However there is compelling evidence to prove that it plays a major role in DNA repair. Pol β (36-38 kDa) has a relatively small size and its biological function has not yet been ascertained. All these polymerases are localized in the nucleus. Pol γ (160-300 kDa) occurs in mitochondria, wherein it probably replicates the mitochondrial genome. Mechanism of DNA Polymerase

The DNA polymerase catalyze the DNA synthesis at the 3′-OH of the existing primer (a short RNA molecule, synthesized by primases). Unlike other enzyme’s active site dedicated to only

Page 15: Dna Replication

15

one substrate, DNA polymerase can incorporate any of the four types of deoxynucleotides, due to the nearly identical geometry of these all. The polymerase incorporates a correct dNTP in the growing strand by their in-built capacity to form hydrogen bonds with complementary bases in the template strand and promote the catalysis to form phosphodiester bond in between the 3′-OH and α-PO4 of the incoming nucleotide, rather than detecting each new dNTP added. Incorrect base pairing dramatically lowers rates of nucleotide addition due to catalytically unfavorable alignment of the substrates. This is called kinetic selectivity. The DNA polymerases structure resembles a right hand shape with three domains being thumb, fingers and palm and the DNA strand sits in the cleft formed between the fingers and thumb. The palm domain is composed of β-sheets and form the primary elements of catalytic sites. At this region, the two divalent ions (Mg++ and Mn++) bind, and alters the environment around the 3′-OH of the primer and the incoming nucleotide. The affinity of 3′-OH for its H is decreased by one of the ions and forms O3- which begins the nucleophillic attack to α-PO4 of the incoming nucleotide. The β-PO4- and γ-PO4- are stabilized by the other divalent ion for the release of pyrophosphate for the formation of phosphodiester bond in between the primer and the incoming nucleotide. Palm domain also monitors the accuracy of the dNTP addition. This palm region makes hydrogen bonds with the nucleotides recently added in minor groove of the DNA which is specific and any mismatch between the base pairs results in the slowed catalysis of the polymerase and eventual release of DNA template from the cleft. Finger domain also has some residues which bind to the incoming nucleotides and when correct a base pair is formed between incoming nucleotide and template. The thumb domain helps in strong association between DNA polymerase and its template. DNA replication is very accurate. In E.coli the possibility of a mistake being made is only once every 109 -10 10

nucleotides added. For the E.coli chromosome of 4.6 X 106 it means that an error occur every 1000-10,000 replications. Discrimination between the correct and incorrect nucleotides during polymerization doesn’t only rely on the hydrogen bonds between the nucleotides that specify correct base pairing but it also depends on the correct geometry of the base pair which is only able to fit into the active site of the enzyme. If there is incorrect base pair then it may not fit into the active site and the nucleotide is excluded before the formation of the phosphor-diester bond. The 3′-5′ exonuclease activity present in all DNA polymerases also double checks every nucleotide after it has been incorporated. If a wrong nucleotide is added the translocation of enzyme to the site where a new nucleotide is to be added is inhibited so it removes any mispaired nucleotide and polymerase begins again. This activity is the Proof reading activity of the polymerase. Ligases

The requirement of an enzyme for sealing broken ends of DNA stems from the discontinuous mode of synthesis of the lagging strand. Essentially, ligases catalyze the formation of a phosphodiester bond between adjacent 3′-OH and 5′ –P termini of polynucleotides hydrogen- bonded to a complementary strand. The E.coli ligase is a 75 kDa polypeptide and is present in around 300 copies per cell. Eukaryotic systems possess two forms of DNA ligase. Ligase I is a 125 kDa monomer, predominantly present in proliferating cells. Its C- terminal appears to have the major portion of the enzyme activity. Ligase II, a 68 kDa heat-labile protein is more involved in repair rather than replication. The reaction catalyzed by ligases involves three basic steps:

1. Transfer of the adenyl group of NAD or ATP to the ε – NH2 of a lysine residue in the enzyme resulting in formation of an enzyme – nucleotide intermediate.

2. Adenylyl activation of the 5′ P terminus of DNA. 3. Phosphodiester bond formation with release of AMP.

Page 16: Dna Replication

16

All prokaryotic ligases utilize NAD+ as the cofactor , while all eukaryotic and phage ligases prefer ATP as the energy source for synthesis of the diester bond. Accessory Proteins

While the major enzymes have been already described, what remains is a brief look at some of the accessory proteins which have been implicated at various steps of DNA replication. Single Strand Binding (SSB) Proteins

These may be defined as binding proteins with a strong preference for DNA over RNA and for ssDNA over duplex DNA. E.coli SSB is present in around 270 copies per replication fork and has a mass of 75.6 kDa. It is a tetaramer and contributes to the opening and unwinding of a duplex at an origin of replication. Moreover it also sustains the unwinding action carried out by helicases at replication forks and improves the fidelity of strand elongation by the DNA polymerases. DNA A Protein

Initiation of E.coli chromosomal replication requires the binding of approximately 20 molecules of DnaA protein to the DnaA boxes in oriC, promoting strand opening of the AT-rich 13-mers, and facilitating the loading of DnaB helicase. DnaA protein is not cytosolic, as might have been expected, but instead is a membrane-associated protein. DnaA protein is structurally subdivided to four domains. The N-terminal domains I–II (amino acids 1–129) contain regions for associating with DnaB and DnaA itself. Domain III (amino acids 130–350) that contains an ATP-binding/ hydrolysis module also plays a role in associating with DnaB and DnaA. The C-terminal domain IV (amino acids 351–467) is the DNA-binding region that contains a helix-turn-helix motif. Eukaryotic Replication Protein A (RP-A)

This is a single strand binding protein, which acts as an auxiliary factor for Pol playing a dual role: (i) It renders the pol /primer complex more stable, thus acting as a Pol clamp; and (ii) it significantly increases the fidelity of Pol . Inhibitors of DNA Replication

Molecules which can thwart the duplication of the genome of an organism, are of utmost significance, primarily bcause of the following reasons:

1. These serve as effective anti-bacterial agents 2. Inhibition studies have contributed in a major way to our understanding of the

replicative mechanisms, especially in eukaryotic systems where mutant generation as well as selection is cumbersome.

With systematic screening experiments, a lot of natural inhibitors have been detected. Moreover, chemical synthesis of carefully designed inhibitors has also added to the extensive list of known inhibitors of DNA replication. While some directly target replication enzymes, others exert an indirect effect by modifying the DNA or causing its denaturation or

Page 17: Dna Replication

17

preventing synthesis of the nucleotides. What follows is a brief look at the various classes of inhibitors. These have been classified based on the specific component of the replicative machinery targeted by them as well as their mode of action. Inhibitors of Nucleotide Biosynthesis

A reduction in the quantity of precursors can eventually limit the rate of synthesis though such inhibitors may not be very effective since cells may have salvage pathways in addition to the de novo pathways. However these assist as anti-metabolites in blocking the enzymatic use of a substrate. These include inhibitors of synthesis of purines, pyrimidines, folate, deoxunucleotides as well as nucleoside triphosphate synthesis (Table 5).

Table 5: Inhibitors of nucleotide biosynthesis

Action Inhibitors

Purine Biosynthesis Azaserine, 6-Mercaptopurine, Ribavirin, Hadacidin, Thiadiazole, 8-Azaguanine

Pyrimidine biosynthesis 6-Azauridine, 5-Azacytidine, 6-Azacytidine, 5-Azaorotate

Folate synthesis Sulfonamides, Trimethoprim, Methotrexate, Aminopterin

Deoxynucleotide synthesis Hydroxyurea, 5- Fluorouracil

Nucleoside triphosphate synthesis Cyclamidomycin (desdanine)

Inhibitors That Modify DNA

Compounds belonging to this class associate either covalently or non-covalently with the DNA, thus modifying it and hence preventing its replication or decreasing the fidelity of replication. There are varying mechanisms by means of which this is achieved. These include crosslinking, chain breakage, intercalation, or alkalyation. Non-covalent binders such as netropsin and distamycin A form stable associations with duplex DNA, especially with the base pairs of poly d(AT) and poly d(IC) by hydrogen bonding and electrostatic interactions and hence inhibit chain growth. Compounds such as the acridine dyes, ethidium bromide as well as propidium bromide may affect the fidelity of replication by intercalation. Moreover since the acridines inhibit plasmid replication, they result in bacterial cells losing their plasmids (curing). Echinomycin, an antibiotic produced by Streptomyces is highly effective against Gram positive bacteria as well as viruses. It binds to GC rich regions. Covalent binders such as the alkylating agents - nitrogen mustards, alkyl sulfonates and nitrosoureas, form reactive intermediates that link them to a variety of nucleophilic targets on DNA, prime among them being the N7 of guanine. Some of these have been effectively used as chemotherapeutic agents in neoplasms as well as in immunosuppression. Bleomycins, isolated from Streptomyces, can preferentially break DNA sequences in chromatin through

Page 18: Dna Replication

18

their nuclease like action. Psoralens, are heterocyclic compounds that can intercalate in duplex DNA and subsequently, can induce cross links within pyrimidines located on opposite strands. Inhibitors of Topoisomerases

In the course of replication, the topological state of the DNA molecule holds a lot of significance. Hence, topoisomerases have been developed as important drug targets in prokaryotes. Morever, eukaryotic topoisomerases are targets for antitumour agents, exploiting the fact that cancer cells divide more rapidly than other cell types. Camptothecin and etoposide are examples of antitumour drugs targeted to eukaryotic topoisomerases. Coumarin antibiotics such as novobiocin, have been found to inhibit DNA gyrase, by competing with ATP for binding to the enzyme and hence inhibiting the ATPase reaction catalysed by this enzyme. In contrast to coumarins, Quinolone drugs such as nalidixic acid and oxalinic acid are entirely synthetic and exert their effect on gyrase by disrupting the DNA breakage-reunion reaction, thus inhibiting DNA supercoiling. Another compound, Microcin B17 is a 3.1-kDa bactericidal peptide. It has no detectable effect on gyrase-catalysed DNA supercoiling or relaxation activities in vitro and is unable to stabilise DNA cleavage in the absence of nucleotides. However, in the presence of ATP, or the non-hydrolysable analogue 5′-adenylyl beta, gamma-imidodiphosphate, microcin B17 stabilises a gyrase-dependent DNA cleavage complex in a manner similar to quinolones (Table 6 ).

Table 6: Inhibitors of the enzyme topoisomerase

Class of drug Topoisomerases inhibited

Coumarins Bacterial gyrase (B subunit), eukaryotic topo II

Quinolones Bacterial gyrase( A subunit) Acridines Eukaryotic Topo II Anthracyclines Eukaryotic Topo II Alkaloids Eukaryotic Topo I Peptides Bacterial gyrase (B subunit) Amilorides Eukaryotic Topo I

Inhibitors of DNA Polymerases

There are a host of compounds which have been seen to be potent inhibitors of the DNA polymerases. These include the arylhydrazinopyrimidines, which are purine dNTP analogs and hence produce a significantly strong inhibition of Bacillus subtilis DNA polymerase III. Effective anti-viral compounds have also been identified which selectively inhibit the polymerases of the Herpes simplex and vaccinia viruses. Other inhibitors are listed in the Table 7.

Page 19: Dna Replication

19

Table 7: Inhibitors of DNA polymerase activity

Inhibitor Protein bound

Arylhydrazinopyrimidines B.subtilis pol III

Phosphonoacetic acid Phosphonoformic acid

Herpes and vaccinia DNA polymerase

Aphidicolin Eukaryotic pol α, pol δ

Concepts of Gene Expression

Introduction

DNA stores genetic information of an organism in a stable form that can be replicated readily. However the expression of genetic information generally proceeds from DNA to RNA to protein. One of the strands of DNA acts as the template for the synthesis of an RNA molecule. This RNA, called mRNA, then in turn directs the synthesis of a particular protein by regulating the sequence of amino acids in a protein. This principle of direction flow of genetic information is more commonly know as the Central Dogma of Biology. Thus, this flow of information involves replication of DNA, as a autocatalytic function and also the flow of information, comprising transcription of information carried by DNA to the RNA, and consequently the translation of this information from RNA to protein. The RNA synthesis or transcription is the process of transcribing DNA nucleotide sequence into RNA sequence information. This term is used when referring to RNA synthesis using DNA as a template to emphasize that this phase of gene expression is simply a transfer of information from one nucleic acid to another. On the other hand, protein synthesis is known as translation because the four letter alphabet of nucleic acids is translated completely into an entirely different twenty letter alphabet of proteins, i.e. it involves the change from a nucleotide sequence of an RNA molecule to the amino acid sequence of a polypeptide chain. Befitting its position, linking the nucleic acid and proteins, the process of protein synthesis depends critically on both nucleic acid and protein factors. Protein synthesis takes place on ribosomes—enormous complexes containing three large RNA molecules and more than 50 proteins. RNA that is translated into protein is called messenger RNA (mRNA) because it carries a genetic message from DNA to the ribosome. In addition to the mRNA, two other types of RNA involved in protein synthesis are ribosomal RNA (rRNA), which are integral part of the ribosomes and the transfer RNA (tRNA) molecules which act as the intermediaries in translating the coded base sequence of mRNA by bringing the appropriate amino acids to the ribosome. Transcription

Template Strand of DNA is Transcribed

The sequence of nucleotides in mRNA is complementary to one of the two DNA strands. DNA strand involved in the process of transcription is called template (-) strand and the other strand is called coding (+) strand because its nucleotide sequence matches 100% with that of RNA transcript except that thymine (in DNA) has been replaced with uracil (in mRNA).

Page 20: Dna Replication

20

Coding and template strands are also called sense or antisense strands respectively. The genetic information in the template strand is read out in 3′-5′ direction. Both strands of DNA contain genes and hence act as sense strand or coding strand. Key enzyme-DNA dependant RNA Polymerase: In case of prokaryotes, RNA synthesis is mediated by a single type of enzyme, called DNA dependent RNA polymerase, whereas in eukaryotes, three different types of DNA dependent- RNA polymerase are involved in the synthesis of four types of RNA molecules (Table 8).

Table 8: Classes of eukaryotic DNA dependent RNA polymerases

Class of enzyme Location RNA product

I (A) Nucleolus rRNA (5.8S, 18S, 28s) II (B) Nucleolus mRNA, snRNAs III (C) Nucleoplasm tRNA, 5S rRNA, U6

snRNA, 7S RNA

The initial RNA transcript formed is called primary transcript. In prokaryotes, as genes may be organized in an operon, it may comprise information for that many genes. But in eukaryotes, it usually contains a single gene. RNA synthesis takes place in three stages, namely initiation, elongation and termination. Like DNA replication, transcription proceeds in the 5′ → 3′ direction. The fundamental reaction of RNA synthesis is the formation of a phosphodiester bond. With the concomitant release of the pyrophosphate to orthophosphate, the α-phosphate group of the incoming nucleoside triphosphate is attacked by the 3′- hydroxyl group of the last nucleotide in the chain. This reaction is thermodynamically favorable, and the subsequent degradation of the pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis. Transcription in Prokaryotes

The fundamentals of RNA synthesis were first elucidated in bacteria, where we come across relatively simpler mechanisms and molecules, so it seems fair enough to start with the transcription in prokaryotes. The first stage in this process is the transcription of a nucleotide sequence in DNA into a sequence of nucleotides in RNA. RNA is quite similar to DNA differing only in that it contains ribose instead of deoxyribose as its sugar, and the thymine of DNA is replaced by the uracil of RNA, and it is usually single stranded. Transcription is Catalyzed by RNA Polymerase

Transcription of DNA is carried out by the enzyme RNA polymerase, which catalyzes the synthesis of RNA using DNA as template. In case of bacterial cells, there is just one kind of RNA polymerase which can synthesize all the three major classes of RNA, namely mRNA, rRNA, tRNA. Let us take into consideration, the RNA polymerase from Escherichia coli. It is fairly large in size, approximately 400 KDa and has been well characterized. It consists of four kinds of subunits and the subunit composition of the entire enzyme describes the holoenzyme. The holoenzyme may be denoted as α2ββ’σ. It comprises two α subunits, two β

Page 21: Dna Replication

21

subunits that differ enough to be identified as β and β’ and a dissociable subunit called the sigma (σ) factor. The sigma factor helps in recognizing a promoter site where the transcription gets started, it then plays a role in the initiation of RNA synthesis but later dissociates from the rest of the enzyme. RNA polymerase without this subunit (α2ββ’) is known as the core enzyme which contains the catalytic site. Although the core enzyme is competent enough to carry out RNA synthesis it does require the complete holoenzyme to ensure initiation at the proper sites within a DNA molecule. Bacteria contain a variety of different sigma factors that selectively initiate the transcription of specific categories of genes. Most common sigma factor is called σ70 factor. Transcription Involves Four Stages

Binding, Initiation, Elongation, and Termination: Binding of RNA polymerase to a Promoter Sequence: The first step in RNA synthesis is the binding of RNA polymerase to a particular site on DNA called promoter site. This site consists of a specific sequence of several dozen base pairs and direct the RNA polymerase to the proper initiation site for transcription. Bacterial promoter sequences, also termed as promoter elements, are 40 nucleotides (approx.) long and have two conserved regions called -35 and -10 regions (Fig.6). The numbers denote the position of these regions downstream (left) of the first nucleotide (i.e. +1) of the primary transcript. The -35 region has a consensus sequence of 8 nucleotide base pairs (5′- TGTTGACA-3′) and -10 region has a consensus sequence of 6 base pairs (5′- TATAAT-3′) often called TATA box, or Pribnow box. The sequence between -10 and -35 regions is not conserved. But the distance between the two sites is very important for the efficient functioning of promoters. This site is important recognition site that interacts with a σ factor of RNA polymerase. In eukaryotes, somewhat similar sequences are also found. However, there is involvement of additional proteins and factors as well as DNA regions called enhancers or repressors (of transcription). These regions are located far away from the promoter regions, maybe upstream or down stream of promoter sites.

Page 22: Dna Replication

22

By convention, promoter sequences are transcribed in the 5′ to 3′ direction on the coding strand, which is the strand that lies opposite to the template strand. The first nucleotide to be transcribed is denoted as +1 and the second as +2, the nucleotide preceding the start site is denoted as -1. These designations refer to the coding strand of DNA. The coding strand is also known as the sense (+) strand while the template strand is known as the (-) strand. Initiation of RNA Synthesis: Binding of RNA polymerase leads to unwinding of the DNA double helix in the promoter region and hence initiation of RNA synthesis can take place. A region of duplex DNA must be unpaired so that nucleotides on one of its strands become accessible for base pairing with incoming ribonucleoside triphosphates (Fig.7). The promoter carrying DNA strand determines which way the RNA polymerase faces and the orientation of the enzyme in turn determines which DNA strand it transcribes. After the hydrogen bonding of the first two incoming NTP’s to the complementary bases of the DNA strand, RNA polymerase catalyzes the formation of a phosphodiester bond between 3′ hydroxyl group of the first ribonucleotide and 5′ phosphate group of the second, accompanied by release of pyrophosphate. This continues till the chain becomes nine nucleotides long. At this point , the sigma factor detaches from the RNA polymerase molecule and the initiation stage is complete.

Page 23: Dna Replication

23

Elongation of RNA Chain: The elongation phase of RNA synthesis begins after the formation of the first phosphodiester bond. Here detachment of the σ takes place and the core enzyme without this σ factor binds more strongly to the DNA template. Chain elongation hence continues as RNA polymerase moves along the DNA molecule, untwisting the helix and adding one complementary nucleotide to the growing RNA chain. The region containing the RNA polymerase, DNA, and nascent RNA is called a transcription bubble because it contains a locally melted bubble of DNA. This RNA-DNA hybrid helix is about 8 bp long, which corresponds to nearly one turn of a double helix. This bubble moves a distance of 17 nm in a second, which corresponds to a rate of elongation of about 50 nucleotides per second. The polymerase continues to move on, rewinding the DNA behind and unwinding the DNA ahead as it goes. Hence the lengths of the RNA-DNA hybrid and of the unwound region of DNA stay rather constant as RNA polymerase moves on. This hybrid must also rotate every time a nucleotide is added so that the 3′-OH end of the RNA stays at the catalytic site. The RNA transcript forms a transient RNA-DNA hybrid complex with DNA template strand. However, as the transcription bubble moves on, the RNA transcript moves away from DNA. The RNA polymerase lacks nuclease activity that would allow it to correct mistakes by removing mismatched base pairs as RNA elongation proceeds. Consequently the fidelity of transcription is much lower than that of replication. However this does not pose much of a problem because multiple RNA molecules are usually transcribed from each gene, and so a small number of inaccurate copies can be tolerated. Termination of RNA Synthesis: The termination of transcription is as precisely controlled as its initiation. In the termination phase of transcription, the formation of phosphodiester bond ceases, the RNA-DNA hybrid dissociates, the melted region of DNA rewinds, and RNA polymerase releases the DNA. Elongation of the growing RNA chain continues until RNA polymerase reaches a special sequence, called a termination signal, which triggers the end of transcription. In prokaryotes two classes of termination signals can be distinguished that differ in whether or not they require the participation of a protein called the Rho (ρ) factor. The most common termination signal is ρ independent and comprises a palindrome rich in GC, followed by an AT-rich sequence. This palindromic region of RNA transcript forms a hairpin structure followed by a sequence of four or more uracil residues (Fig.8).

Page 24: Dna Replication

24

The second type of transcription termination signal is ρ-dependent. It is a protein which binds to a termination site on template strand and stops transcription. It does so by hydrolyzing ATP in the presence of single stranded RNA but not in the presence of DNA or duplex RNA. The ρ is brought into action by sequences located in the nascent RNA that are rich in cystein and poor in guanine. The ATPase activity of ρ enables the protein to pull the nascent RNA. When ρ catches RNA polymerase at the transcription bubble, it breaks the RNA-DNA hybrid helix by functioning as RNA-DNA helicase. RNA processing: Prokaryotic mRNA transcripts are hardly processed i.e. modified. In fact, protein synthesis starts even before the complete mRNA transcript has been synthesized. However, other RNAs i.e. tRNA and rRNA need post-transcriptional modifications. Transcription in Eukaryotic Cells

We now turn to transcription in eukaryotes, which happens to be a much more complicated process as compared to that in prokaryotes. In eukaryotes, the transcription takes place in the membrane bounded nucleus, whereas translation takes place outside the nucleus, in the cytoplasm. This separation of transcription and translation enables eukaryotes to regulate gene expression in much more intricate ways. The main differences between the prokaryotic and eukaryotic transcription may be described as follows:

1. Three different RNA polymerases transcribe the nuclear DNA of eukaryotes. 2. Eukaryotic promoters are more varied than prokaryotic promoters. 3. Binding of eukaryotic RNA polymerases to DNA requires the participation of

additional proteins, called transcription factors, which in the eukaryotic system are not an inherent part of the RNA polymerase molecule unlike the bacterial sigma factor.

4. At the first stage of eukaryotic transcription, protein-protein interactions are of great importance.

5. Nascent eukaryotic RNA undergo extensive RNA processing, i.e various modifications.

Three Types of RNA Polymerases Regulate the Transcription in Eukaryotes

In the nucleus of a eukaryotic cell, lie three types of RNA polymerases differing in their template specificity, location in the nucleus, and susceptibility to inhibitors. These polymerases are large proteins, containing from 8 to 14 subunits and having a total molecular mass greater than 500 kDa. RNA polymerase I is present in the nucleolus and it carries out the synthesis of a RNA molecule that serves as a precursor for three or four types of rRNA found in eukaryotic ribosomes. This enzyme is not sensitive to α-amanitin. RNA polymerase II is found in the nucleoplasm and synthesizes mRNA precursors. It also synthesizes most of the snRNAs involved in the post transcriptional RNA processing. This enzyme is sensitive to low levels of α-amanitin. RNA polymerase III is also a nucleoplasmic enzyme, but it synthesizes a variety of small RNAs, including tRNA precursors and the smallest type of ribosomal RNA. Mammalian RNA polymerase III is sensitive to higher levels of α-amanitin. The promoters to which eukaryotic RNA polymerases bind are even more varied than prokaryotic promoters. These are grouped into three categories. RNA polymerase I promoter or the promoter for the transcription unit that produces precursor for the largest rRNAs comprises the core promoter that extends into the nucleotide sequence to be transcribed and

Page 25: Dna Replication

25

an upstream control element which makes the transcription more efficient. A typical promoter used by RNA polymerase II has a short initiator sequence surrounding the start point and usually, at around -25, a short sequence known as the TATA box, which has a consensus sequence of TATA followed by two or three more AA’s. In contrast to RNA polymerase I and II, the promoters used by RNA polymerase III are entirely downstream of the transcription unit’s start point when transcribing genes for tRNAs and 5S RNA. The promoters used by all the eukaryotic RNA polymerases must be recognized and bound by transcription factors before the RNA polymerase can bind to the DNA. Gene Organization

Unlike prokaryotic genes, eukaryotic protein coding DNA sequences are discontinuous. Protein coding regions (called exons) are interspersed with non-coding regions (called introns). Number as well as the size of introns in a gene varies greatly. So far no introns have been observed in histone mRNAs. Initiation of Transcription: As mentioned earlier, mRNA transcription is mediated by RNA Pol II. This enzyme binds to a specific region located 25-35 bp upstream the start site (of mRNA transcript) and has a sequence similar to TATA box i.e. TATA(AT)A(A/T). For proper binding of RNA Pol II to the promoter region, there is need of many proteins called as General (or basal) transcription factors (TFII) e.g. TFIIA, B, D, E, F, J, H. In the first instance, TFIID binds to TATA region via one of its subunit called TATA box binding protein (TBP). The combination of RNA pol II and large number of TFII factors is called transcription initiation complex. Once this complex has bound to promoter region, synthesis of mRNA transcript starts. In some cases, where TATA boxes are not preset upstream protein coding regions, there are initiator elements. These elements bind the pol II enzyme and other transcription factors and initiate the process of transcription. Elongation and Termination: The elongation of mRNA is quite similar in prokaryotes and eukaryotes. However, it is believed that transcription signals exist far away from 3′ end of mRNA transcript. Though involvement of transcription factors is expected, but experimental data has yet to be generated. It is believed that RNA endonucleases cleave the primary mRNA transcript 15 bases (approx.) downstream a consensus sequence AAUAAA. Finally the 3′ end of mRNA transcript is polyadenylated in the nucleoplasm. Processing of Primary mRNA Transcript

The primary mRNA transcript (pre-mRNA) undergoes various modifications (capping, cleavage, polyadenylation, splicing and editing) to yield a functional mRNA molecule 5′ Processing (capping)

Immediately after pre-mRNA synthesis, a single phosphate group of the first ribonucleotide located at 5′ end is removed by a phosphatase. With the help of another enzyme, guanosyl transferase, a guanosine residue is added to the ribonucleotide via unusual 5′5′ triphosphate link. A methyl group is added to N-7 position of guanosine. The addition of methylated

Page 26: Dna Replication

26

guanosine to pre-mRNA transcript is called capping. The cap protects mRNA from degradation by ribonucleases and also assists in initiation of translation process. 3′ Processing (cleavage and polyadenylation)

The 3′ region of the pre-mRNA transcript carries a polyadenylation signal sequence (5′-AAUAAA-3′). A short distance from this sequence is a 5′-YA-3′ sequence in which Y stands for any pyrimidine. Normally, a short distance away from this region is a GC-rich sequence. Several specific proteins, called cleavage proteins, bind to these 3′ end sequences and ultimately cleave the pre-mRNA between polyadenyaltion signal and GU-rich sequence. A ploy (A) tail comprising around 250 A residues bind to the 3′ end of processed mRNA with the help of an enzyme, ploy(A) polymerase (PAP). The poly (A) tail is immediately get coated with poly (A) binding proteins so as to ward off attack by nucleases. RNA Splicing

It stands for removal of intron sequences and joining of exon sequences to form mRNA. In majority of pre-mRNAs, introns start with GU and end with AG. In between these two ends (of introns) lie a polypyrimidine tract and a conserved branch point sequence. RNA splicing reaction involves two transesterification steps. First step is excision of introns and second step is ligation of exons. The reactions are caused by the complex (spliceosome) formed of snRNPs with accessory proteins. RNA Editing

The process by which the original mRNA sequence gets changed by mechanisms other than RNA splicing is called RNA editing. RNA editing involves either changes in the nucleotide/s or addition/deletion of nucleotide/s which ultimately results in the synthesis of new proteins having structural and functional properties different from the original protein. The processing of pre-mRNA transcript, involving capping, cleavage, polyadenylation, splicing and editing results in the formation of a functional mRNA ready to be translated by protein synthesis machinery. Translation

The process by which genetic information residing in mRNA is utilized for the synthesis of protein is called translation. The genetic code of mRNA is read in groups of three nucleotides called codons. Each codon is specific for a particular amino acid (Table). As there are four types of nucleotides in mRNA, maximum number of triplet codons can be 43=64. We also know that there are 20 types of amino acids present in proteins of prokaryotes and eukaryotes. It implies that many codons can code for a single amino acid (Table 9). In scientific term, we say that code is degenerate. 61 codons code for 20 different amino acids. Three codons (UAA,UAG,UGA) do not code for any amino acid. They have important function to perform in protein synthesis i.e termination of the synthesis of a protein. Hence these codons are called STOP codons. The first amino acid incorporated in the protein is by and large formyl methionine in prokaryotes and methionine in eukaryotes coded by the same codon, AUG. This codon is thus termed as initiation or start codon. Earlier, it was thought that in all forms of life each codon specifies a particular amino acids i.e. genetic code is universal. But now a few exceptions have been observed. A few codons

Page 27: Dna Replication

27

of mitochondrial DNA, code for different amino acid as compared to chromosomal DNA (Table 10 ).

Table 9: The genetic code

Codon sequence

1st base 2nd base 3rd base

U C A G U

Phe Phe Leu Leu

Ser Ser Ser Ser

Tyr Tyr Stop Stop

Cys Cys Stop Trp

U C A G

C

Leu Lue Leu Leu

Pro Pro Pro Pro

His His Gln Gln

Arg Arg Arg Arg

U C A G

A Ile Ile Ile Met

Thr Thr Thr Thr

Asn Asn Lys Lys

Ser Ser Arg Arg

U A C G

G Val Val Val Val

Ala Ala Ala Ala

Asp Asp Glu Glu

Gly Gly Gly Gly

U A C G

Table 10: Exceptions to universality of genetic code

Codon Amino acid coded

AUA By chromosomal DNA---Met By mitochondrial DNA--Ile

UGA By chromosomal DNA---Stop By mitochondrial DNA--Trp

AGA, AGG (in some animal mitochondria)

By chromosomal DNA---Arg By mitochondrial DNA--Stop

CGG (in plant mitochondria)

By chromosomal DNA---Arg By mitochondrial DNA--Trp

The sequence of codons starting with initiation codon and terminating with stop codon is called open reading frame (ORF) and codes for a single polypeptide. The major components involved in protein synthesis are ribosomes, mRNA, tRNA, enzymes and protein factors.

Page 28: Dna Replication

28

Ribosomes

Ribosomes are cytoplasmic organelles and play a vital role in protein synthesis. They are referred to as work benches of protein synthesis. Their role is to orient the mRNA and amino acid-carrying tRNAs in the proper relation with respect to each other so as to facilitate proper reading of the genetic code, hence catalyzing formation of the peptide bonds that link the amino acids in a specific sequence into a polypeptide. Prokaryotic and eukaryotic ribosomes resemble each other structurally but are not identical, due to the fact that prokaryotic ribosomes are smaller in size, carry lesser protein, have smaller RNA molecules and are sensitive to different inhibitors of protein synthesis. A prokaryotic ribosome is 70S and comprises two subunits (small and large; 30S & 50S, respectively) Each subunit is made up of rRNA and many proteins. An eukaryotic ribosome is 80S and also comprises two subunits (small and large; 40S & 60S, respectively). These subunits are also made up of rRNA and many proteins. Three main sites located on ribosomes are highly important in the process of protein synthesis: an mRNA binding site and two sites where tRNA can bind: an A (aminoacyl) site that binds each new tRNA with its attached amino acid, and a P (peptidyl) site, where the tRNA carrying the growing polypeptide chain resides. An additional E or exit site has also been proposed from where the tRNAs leave the ribosome after they have discharged their amino acids. tRNAs

The fidelity of protein synthesis requires the accurate recognition of three base codons on mRNA. An amino acid cannot itself recognize a codon, therefore, it is attached to a specific transfer RNA (tRNA) which can do so. Thus a tRNA serves as the adapter molecule that binds to a specific codon and brings with it an amino acid for the incorporation into the polypeptide chain. This adapter must therefore mediate the interaction between amino acids and mRNA. A tRNA is a clover leaf shaped structure (Fig.9) made up of a single chain of 76 ribonucleotides. The 5′ terminus is phosphorylated (pG), whereas the 3′ terminus has a free hydroxyl group. The amino acid attachment site is the 3′-hydroxyl group of adenosine residue at the 3′ terminus of the molecule. Once the amino acid is attached it is known as the aminoacyl tRNA. This tRNA is said to be charged, as it carries an activated amino acid. The tRNA molecules can recognize codons in mRNA because each tRNA possesses an anticodon, a special trinucleotide sequence located within one of the loops of the tRNA molecule; these anticodons permit tRNA molecules to recognize codons in mRNA by complementary base pairing. However, these mRNA and tRNA line up on the ribosome in a way which allows considerable flexibility between the third base of the codon and corresponding base in the anticodon. Aminoacyl RNA Synthetases

The enzymes, aminoacyl-tRNA synthetase carry out two important functions: firstly they are responsible for linking the tRNA molecules to the corresponding amino acids at the 3′ acceptor end and also activate an amino acid by reacting with ATP and forming aminoacyl adenylic acid. These enzymes catalyze the joining of an amino acid to its proper tRNAs via an ester bond, accompanied by the hydrolysis of ATP to AMP and pyrophosphate. Aminoacyl-tRNA Synthetases recognize nucleotides located in at least two different regions of tRNA molecules when they pick out the tRNA that is to become linked to a particular

Page 29: Dna Replication

29

amino acid. Once the correct amino acid has been joined to its tRNA, it is the tRNA itself that recognizes the appropriate codon in mRNA. These results prove that codons in mRNA recognize tRNA molecules rather than their bound amino acids.

Protein Factors

In addition to aminoacyl-tRNA synthetases and the protein components of the ribosome, translation also requires the participation of several other kinds of protein molecules. Though rRNA is of paramount importance, protein factors play important roles in efficient synthesis of proteins.

Page 30: Dna Replication

30

Mechanism of Translation

This mechanism of translation of mRNAs into polypeptides is an ordered, step-wise process that proceeds in the 5′to 3′ direction leading to the synthesis of polypeptides in the N-terminal to C-terminal direction. This process can be divided into three stages (Fig.10). These are:

(a) Initiation: in which mRNA is bound to the ribosome and positioned for proper translation.

(b) Elongation: in which amino acids are sequentially joined together via peptide bonds in as order specified by the arrangement of codons in mRNA

(c) Termination: in which the mRNA and the newly formed polypeptide chain are released from the ribosome.

Page 31: Dna Replication

31

Initiation of Translation

As mentioned earlier, first amino acid to be translated is formyl methionine or methionine coded by initiation codon, AUG. Since a protein has methinone amino acids present internally also, question arises as to how the protein synthesis machinery finds out which AUG codon is responsible for initiating protein synthesis. A cell has two types of aminoacyl-tRNAmethionine-tRNA (tRNA Met) and N-formylmethionine-tRNA (fMet-tRNAf

Met). The latter aminoacyl tRNA is used for initiating protein synthesis. In mRNA,upstream of the 5′ end of initiating AUG codon lies a purine rich sequence of nucleotides (5′-AGGAGGU-3′) called Shine-Dalgarno sequence, which is complementary to 16S rRNA sequence of 30S ribosomal sub unit. After the pairing of 16S rRNA of 30S subunit to Shine-Dalgarno sequence of mRNA, thus stabilizing the association, and subunit (30S) now migrates in 3′ direction till it encounters AUG initiation codon. In short, one can say that Shine-Delgarno sequence determines which AUG codon (on mRNA) will serve as initiating codon. The initiation of protein synthesis also requires additional factors called initiation factors (IF). In prokaryotes, Initiation factors IF1 and IF3 bind to the small (30S) ribosomal subunit. 30S sub unit then migrates along mRNA till it finds Shine-Dalgarno sequence and then initiating codon, AUG. IF2 binds GTP and the concomitant conformational change enables IF2 to associate with formylmethionyl-tRNA. The IF2-GTP- formylmethionyl-tRNA complex binds with mRNA at AUG triplet and 30S subunit to form the 30S initiation complex. IF3 is released. 50S sub unit then binds to 30S initiation complex with concomitant hydrolysis of GTP and release of IF1 and IF2. Resulting complex is called 70S initiation complex.. The hydrolysis of GTP bound to IF2 on entry of the 50S subunit leads to the release of the initiation factors. Once both subunits of the ribosomes are assembled with mRNA binding sites for two charged tRNA molecules are created: the P or peptidyl site, and the A, or aminoacyl site. At this stage, all three initiation factors have been released and the ribosome is ready for elongation phase of protein synthesis. It is to be noted that fMet-tRNAf

Met binds P site of ribosomes with the corresponding AUG triplet of mRNA. In eukaryotes, the initiation process in eukaryotes involves a different set of initiation factors, a somewhat different pathway for assembling the initiation complex and a special initiator tRNAMet that carries methionine and does not get formylated. There are mainly two types of initiation processes: cap-dependent and cap-independent and they maybe explained as follows. Initiation of translation involves an interaction of some proteins with a special tag (cap) bound to 5′-end of the mRNA molecules as described earlier. The protein factors bind the small ribosomal 40S subunit. The subunit accompanied by some of those protein factors moves along the mRNA chain towards its 3′-end and scans for the 'start' codon (mostly AUG) on the mRNA, which indicates where the mRNA starts coding for the protein. The sequence downstream between the 'start' and 'stop' codons is then translated by the ribosome into the amino acid sequence -- thus a protein is synthesized. In eukaryotes and archaea, the amino acid encoded by the start codon is methionine. The initiator tRNA charged with Met forms part of the ribosomal complex and thus all proteins start with this amino acid (unless it is cleaved away by a protease in some subsequent steps). What differentiates cap-independent translation from cap-dependent translation is that cap-independent translation does not require the ribosome to start scanning from the 5′ end of the mRNA cap until the start codon. The ribosome can be trafficked to the start site by ITAFs (IRES trans-acting factors) bypassing the need to scan from the 5′ end of the untranslated region of the mRNA. This method of translation has been recently discovered, and has been found to be important in

Page 32: Dna Replication

32

conditions that require the translation of specific mRNAs, despite cellular stress or the inability to translate most mRNAs. Chain Elongation

The second phase of protein synthesis is the chain elongation consisting of the growth of polypeptide chain by one amino acid. This involves sequential cycles of aminoacyl tRNA binding, peptide bond formation and translocation. Aminoacyl-tRNA binding: This phase begins with the insertion of an aminoacyl-tRNA, corresponding to second codon of mRNA, into the empty A site on the ribosome, thus bringing a new amino acid into position to be joined to the polypeptide chain. The aminoacyl tRNA is delivered to the A site in association with a protein known as the elongation factor. There are two elongation factors FT-Tu and EF-Ts, each of which plays a specific role. The function of EF-Tu is to convey the new aminoacyl tRNA to the A site of the ribosome and to bind to GTP. This solves two functions, i.e. protecting the delicate ester linkage in aminoacyl-tRNA from hydrolysis and to promote the binding of all aminoacyl tRNAs to the ribosome except the initiator tRNA making sure that AUG codons located downstream from the stop codon do not mistakenly recruit an initiator tRNA to the ribosome. The role of EF-Ts is to regenerate EF-Tu from EF-Tu-2GTP for the next round of the elongation cycle. This is followed by the formation of a peptide bond between the amino group of the amino acid bound at the A site and the carboxyl group of the initiating amino acid attached to the RNA of the P site. The peptide bond formation is catalyzed by peptidyl tranferase, a function of rRNA of large subunit. However, protein synthesis cannot continue without the translocation of the mRNA and the tRNAs within the ribosome. The formation of the peptide bond causes the growing polypeptide chain to be transferred from the tRNA located at the P site to the tRNA at the A site. At this stage, the covalent bond between the amino acid and tRNA at P site is hydrolyzed. This reaction does not require any outside source of energy and the necessary energy is supplied by cleavage of the high-energy bond that joins the amino acid or peptide chain to the tRNA located at the P site. The P site now has an empty or uncharged tRNA and the A site contains a peptidyl tRNA. The entire mRNA-tRNA-aa2-aa1 now shifts to P site and the mRNA advances a distance of three nucleotides relative to the small subunit bringing the next codon into proper position for translocation. Translocation is stimulated by EF-G and energy required for translocation is supplied by the hydrolysis of GTP. During this process of translocation, the uncharged tRNA moves from the P site to the E site where it is released from the ribosome. With the availability of a vacant A site and the corresponding mRNA codon, the process will be repeated by another charged tRNA bringing in a new amino acid. Termination of Polypeptide Synthesis

Termination occurs when one of the three termination codons moves into the A site. These codons are not recognized by any tRNAs. Instead, they are recognized by proteins called release factors namely RF1 (recognizing the UAA and UAG stop codons) or RF2 (recognizing the UAA and UGA stop codons). A third release factor RF-3 catalyzes the release of RF-1 and RF-2 at the end of the termination process. These factors trigger the hydrolysis of the ester bond in peptidyl-tRNA and the release of the newly synthesized protein from the ribosome. As the polypeptide is released, the other components of the ribosomal complex (mRNA, tRNA and ribosomes) come apart and ready to translate afresh. The ribosome also dissociates into its subunits aided by IF3.

Page 33: Dna Replication

33

Translation in Eukaryotes

By and large, protein synthesis is more or less similar in prokaryotes and eukaryotes and involves three stages i.e. Initiation, Elongation and Termination. However, there are a few differences which are mentioned below.

• Prokaryotic ribosomes are 70S (50S+30S) and eukaryotic ribosomes are 80S (60S + 40S).

• Prokaryotic mRNA is polycistronic and eukaryotic mRNA is monocistronic. • Shine-Dalgarno sequences are not present in eukaryotes. 40S subunit of eukaryotic

ribosome binds to 5′ end of mRNA and starts moving in 3′ direction till it finds AUG, the initiating codon. Usually, initiation codon is recognizable as it is present in a sequence called Kozak consensus (5′-ACCAUGG-3′).

• The initiation codon, in eukaryotes, codes for methionine in contrast to N-formylmethionine used by prokaryotes.

• For initiation of protein synthesis, prokaryotes require three initiating factors (IFs) whereas eukaryotes require nine initiating factors (eIFs).

• Elongation step involves the use of three elongation factors (EF for prokaryotes and eEF for eukaryotes). These elongation factors are distinct from one another.

• Although the genetic code is universal, but in a few exceptions, eukaryotic mitochondrial codons code for amino acids different from the usual. Note that mitochondria are absent in prokaryotes.

• In prokaryotes, protein termination is carried out by three release factors (RF). In eukaryotes, there is single eukaryotic release factor (eRF).

Regulation of Gene Expression

Gene expression is the process by which a gene's DNA sequence is converted into the structures and functions of a cell. Non-protein coding genes (e.g. rRNA genes, tRNA genes) are not translated into protein. The amount of protein that a cell expresses depends on the tissue, the developmental stage of the organism and the metabolic or physiologic state of the cell as well as the environmental signal. Regulation of gene expression is the cellular control of the amount and timing of appearance of the functional product of a gene. As described earlier gene expression is a multi-step process and any of the step could be regulated. For example, this could happen at transcriptional or post transcriptional stage as well as during translation followed by folding, post-translational modification and targeting. Gene regulation gives the cell control over structure and function, and is the basis for cellular differentiation, morphogenesis and the versatility and adaptability of any organism. Gene activity is controlled first and foremost at the level of transcription. Much of this control is achieved through the interactions between proteins that bind to specific DNA sequences and their DNA binding sites. This aspect of regulation has been covered in the chapter on Gene Regulations. Prokaryotic and Eukaryotic Gene Expression: a Comparison

In spite of all the fundamental differences between prokaryotes and eukaryotes, a few basic similarities between their gene expressions have been observed. The most important among these is the transcription initiation as a key control point and the factors affecting this control point. Also in both these systems, many genes are controlled by regulatory proteins that interact with specific DNA sequences to turn transcription on or off. However, on the other hand, a few contrasting differences have also been observed. The differences may arise in the

Page 34: Dna Replication

34

prokaryotes and eukaryotes on the basis of genome organization and expression. The greater diversity of regulatory mechanisms in eukaryotic cells reflects basic differences between prokaryotes and eukaryotes in the structure, organization, and compartmentalization of the genome, as well some fundamental features of the multicellular way of life. Beginning with the genome size and complexity, the eukaryotic genomes are almost always larger than those of prokaryotes. Incase of most eukaryotes, however, genes constitute only a small portion of the genome. In addition to the non- coding DNA located between the genes, the non-coding DNA introns located within most eukaryotic genes are another important genomic feature that may be involved in regulation. Another basic difference between the two is that most of the genetic information of eukaryotes is located in the nucleus, separated from the cytoplasmic sites of protein synthesis by nuclear envelope. An additional difference lies in the structural organization of the genome. The chromosomes of eukaryotic cells differ from bacterial chromosomes in both chemical composition and structure. Bacterial DNA is complexed with small basic proteins that may interact with DNA in much the same way as the histones do in eukaryotic chromatin. However, unlike in eukaryotes, prokaryotic DNA folding is not based on nucleosomes and the successive levels of DNA compaction seen in eukaryotes are not observed in prokaryotes. Now considering post translational modification of proteins, the proteins may be extensively modified after protein synthesis, especially in eukaryotes. Also signal sequences that target proteins to specific cellular compartments must often be removed. Other important protein modifications include the addition of prosthetic groups, the control of protein activity by phosphorylation and dephosphorylation, and the glycosylation of proteins destined for secretion or integration into the plasma membrane. Another difference that arises from the contrasting lifestyles of prokaryotes and eukaryotes is the relative importance of protein degradation as means of eliminating proteins no longer needed by the cell. Prokaryotes have enzymes that can degrade defective or unneeded proteins. As a result, prokaryotic cells are not generally dependent on protein degradation as a means of eliminating proteins. Eukaryotic cells, on the other hand, are more likely to stop dividing and to persist as non-dividing differentiated cells for long periods of time thereafter. They possess specific mechanisms for degrading proteins selectively. The regulated degradation and replacement of proteins, called protein turnover, is a more prominent feature of eukaryotic cells than of prokaryotes. Eukaryotic Gene Regulation

Gene expression in eukaryotic cells, ultimately measured in terms of activities of gene products, is the culmination controls acting at several different levels. Hence, there are five main levels at which control might be practiced:

(i) The genome (ii) Transcription (iii) RNA processing and export from nucleus to cytoplasm (iv) Translation (v) Post translational events

Gene regulation in eukaryotes is significantly more complex as compared to that in prokaryotes due to a variety of reasons. Firstly, the genome being regulated in the former is considerably larger in size. Hence it would be extremely difficult for a DNA-binding protein to recognize a unique site in this vast array of DNA sequences. Another source of complexity in eukaryotic gene regulation is the many different cell types present in most eukaryotes. Also, eukaryotic genes are not generally organized into operons. On the other hand, genes that encode proteins for steps within a given pathway often are spread widely across the

Page 35: Dna Replication

35

genome. Having covered the complexities, we now take up each aspect of eukaryotic gene regulation control. Eukaryotic gene regulation almost always involves the selective expression of a specific subset of genes from the same complete genome present in every cell, rather than any cell specific changes in the genome itself. Genomic Control

In spite of the fact that previous studies show that the genome tends to be identical in all cells of an adult eukaryotic organism, a few types of gene regulation create slight deviations to this rule. One example is gene amplification or the selective replication of certain genes. Gene amplification is one of the ways of exerting genomic control that is a regulatory change in the makeup or structural organization of the genome. In addition to amplifying gene sequences whose products are required in greater numbers, some cells also delete genes whose products are not required. This is known as gene deletion. Another aspect is that DNA rearrangements can alter the genome. A few cases are known in which gene regulation is based on the movement of DNA segments from one location to another within the genome known as DNA rearrangement. Mainly two types of DNA arrangements have been observed. The first is the yeast mating type rearrangements: all haploid cells carry one allele for mating type; and a cell’s actual mating phenotype depends on which of the two alleles, α or a, is present at a special site in the genome known as the MAT locus. Cells frequently switch mating type, presumably as a means of maximizing opportunities for mating. They do so by transposing the information for the alternative allele at the MAT locus. This process of rearrangement is called the cassette mechanism as it consists of one active and two silent loci. The second type is the antibody gene rearrangements: antibodies are proteins composed of two kinds to polypeptide subunits, called heavy chains and light chains. Any one of the thousands of different kinds of heavy chains can be assembled with any one of the thousand different kinds of light chains, creating the possibility of millions of different types of antibodies. The DNA rearrangement process that creates antibody genes also activates transcription via a mechanism involving special DNA sequences called enhancers, which are located near DNA sequences coding for specific segments, but a promoter is not present in this area and hence transcription does not occur normally. Another, means of regulating the availability of various regions of the genome is DNA methylation, the addition of methyl groups to select cytosine groups in DNA. The DNA of most vertebrates contains small amounts of methylated cytosine, which tends to cluster in non coding regions at the 5′ends of genes. Moreover, methylation patterns are inherited after DNA replication. In other words, if the old strand of a newly formed double helix has a methylated 5′-CG-3′ sequence, then the complementary 3′-GC-5′ sequence in the newly formed strand will become a target for the DNA-methylating enzyme. Changes in histone acetylation, HMG proteins, and connections to the nuclear matrix are associated with active regions of the genome. Hence they also play a vital role in genomic control of eukaryotic gene regulation. Transcriptional Control

The different proteins produced by the two cell types are a reflection of differential gene transcription. i.e. transcription of different genes produces different sets of RNA. Certain statements can be made regarding these differences between one cell type and another that many protein processes are common to all cells and any two cells in a single organism. Therefore, they have many proteins in common. Also some proteins are abundant in specialized cells in which they function and cannot be detected elsewhere. Studies of the number of different mRNAs suggest that at any one time, a typical human cell expresses

Page 36: Dna Replication

36

approximately 10,000-20,000 of its approximately 30,000 genes. When the patterns of mRNAs in a series of different human cells lines are compared, it is found that the level of expression of almost every active gene varies from one cell type to another. And lastly although the differences in mRNAs among specialized cell types are striking, they nonetheless underestimate the full range of differences in the pattern of protein production. Another point to remember is that a cell can change its expression of genes in response to external signals. Most of the specialized cells in a multicellular organism are capable of altering their patterns of gene expression in response to extracellular cues. Different cell types often respond in different ways to the same extracellular signals. Also the realization that multiple DNA control elements and their regulatory transcription factors has led to a combinatorial model for gene regulation. This proposes that a relatively small number of different DNA control elements and transcriptional factors, acting in different combinations, can establish highly specific and precisely controlled patterns of gene expression in different cell types. Hence, a gene is expressed at a maximum level only when the set of transcription factors produced by a given cell type includes all the regulatory transcription factors that bind to that gene’s positive DNA control elements. Another aspect for consideration under the transcriptional control is the common structural motifs that allow regulatory transcription factors to bind to DNA and activate transcription. Regulatory transcription factors possess two distinct activities, the ability to bind to a specific DNA sequence and the ability to regulate transcription. Both the activities reside in separate domains. The domain that recognizes and binds to a specific DNA sequence, is called the transcription factor’s DNA- binding domain, whereas the protein region required regulating the transcription is known as the transcription regulation domain. A gene regulatory protein recognizes specific DNA sequences because the surface of the protein is extensively complementary to the special surface features of the double helix of that region. In most cases the protein makes a large no. of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. DNA-protein interactions include some of the highest and most specific molecular interactions. These form motifs and they in turn use either α helices or β sheets to bind to the major groove of the DNA. A few of these structural motifs are:

The helix-turn-helix motif which is one of the simplest and most common DNA-binding motif. This is constructed from two α helices connected by a short extended chain of amino acids, which constitute the turn. The two helices are held at a fixed angle, primarily through interactions between the two helices. The C-terminal helix is called the recognition helix because it fits into the major groove of DNA; its amino acid side chains play an important part in recognizing the specific DNA sequence to which the protein binds. Each protein presents its helix-turn-helix motif in a unique way which enhances the versatility of the helix-turn-helix motif by increasing the number of sequences that the motif can be used to recognize.

Another type of motif is the zinc-finger motif. This consists of an α helix and two segments of β sheets, held in place by the interaction of precisely positioned cystein or histidine residues with a zinc atom. Zinc fingers protrude from the protein surface and serve as the points of contact with specific base sequences in the major groove of the DNA.

Another motif, the leucine zipper motif mediates both DNA binding and protein dimerization. The portion of the protein that is responsible for the dimerization is distinct from the portion that is responsible for DNA binding. This motif combines these two functions and is named leucine zipper motif because of the way the two α helices, one from each monomer are joined together to form a short coiled coil. The helices are held together by interactions between hydrophobic amino acids side chains that extend from one side of each helix.

Page 37: Dna Replication

37

The helix-loop-helix motif is composed of a short α helix connected by a loop to another, longer α helix. This motif contains hydrophobic regions that usually connect two polypeptides, which maybe either similar or different. The formation of the four helix bundle results in juxtaposition of a recognition helix derived from one polypeptide with a recognition helix derived from the other polypeptide, creating a two part DNA-binding domain.

Post-transcriptional Control

After the first two levels of regulation, namely the genomic controls and the transcriptional controls, we now move on to the next level. After transcription has taken place, the flow of genetic information involves a complex series of posttranscriptional events, any or all of which can turn out to be regulatory points. In eukaryotes, posttranscriptional events begin with the processing of primary RNA transcripts, which provides many opportunities for the regulation of gene expression. Of all the RNA transcripts in eukaryotic nuclei, RNA splicing is an especially important way of regulation because the phenomenon of alternative RNA splicing allows cells to create a variety of mRNA molecules from the same primary transcript, thereby permitting a given gene to generate more than one protein product. After RNA splicing the next important post-transcriptional process subject to potential control is the export of mRNA through nuclear pore complexes and into the cytoplasm. This transport maybe closely associated with RNA splicing. Once mRNA has been exported from the nucleus to the cytoplasm, several translational control mechanisms are available to regulate the rate at which each mRNA is translated into its polypeptide product. Some translational control mechanisms work by altering ribosomes or protein synthesis factors; others regulate the activity and/or stability of mRNA itself, in other words, the more rapidly an mRNA molecule is degraded, the less time available for it to be translated. After translation of an mRNA molecule has produced a polypeptide chain, there are still many ways of regulating the activity of the polypeptide product. With this we come to the final level of controlling gene expression, namely, the posttranslational control mechanism, that modify the protein structure and function. The regulatory mechanisms that are included in this category are reversible structural alterations that influence protein function, such as protein phosphorylation and dephosphorylation, as well as permanent alterations, such as proteolytic cleavage. Other posttranslational events subject to regulation include the guiding of protein folding by chaperone proteins, the targeting of proteins to intracellular or extracellular locations, and the interaction of proteins with regulatory molecules or ions, such as cAMP or Ca2+. Thus, a particular alteration in cell function or behavior is based on a change in gene expression and the underlying explanation might involve mechanisms operating at any one or more of these levels of control.

Suggested Reading 1. Lehninger’s Principles of Biochemistry, 4th edn. D. L. Nelson & N. M. Cox. W. H. Freeman & Coy.,

New York, USA. 2. Biochemistry. 3rd edn. D. Voet and J. G. Voet. J. Wiley & Sons, Inc. New Jersey, USA.